nagios clusters
It would be nice to have the ability to cluster Nagios instances. Something built around MPI maybe that would allow horizontal scaling without passive checks.
It would be nice to have the ability to cluster Nagios instances. Something built around MPI maybe that would allow horizontal scaling without passive checks.
Giving Nagios a dashboard where alarm can be easily show, where comment can be added (Ticket Number as an example) and alarm acquitted .
Everything on a same page or with custom filtered tabs.
Widget use : alarm appearence can change (e.g : blink) on criterion (more than 10 min... escalation ... )
Should be able to create reports on the fly in CSV, PDF or any other rich text format.
Nagios' dependency support is excellent, but it doesn't expose dependencies for view within the web interface. It would be great if the web interface would show dependencies. This is similar to the benefit of showing host/child relationships in the web interface -- can you even imagine dealing with Nagios if your only view of the structure was a single flat table of all hosts and services? I realize this view could ...more »
Nagios' dependency support is excellent, but it doesn't expose dependencies for view within the web interface. It would be great if the web interface would show dependencies.
This is similar to the benefit of showing host/child relationships in the web interface -- can you even imagine dealing with Nagios if your only view of the structure was a single flat table of all hosts and services?
I realize this view could get complicated quickly -- but how about a simple pop-up browser window next to hosts and services with dependencies? In that window, you'd show the dependency tree that Nagios already builds. This would also make it a great dependency-debugging tool.
« less full details »
It would be great to be able to easily acknowledge multiple alerts in Nagios by selecting them all from one screen and just entering the acknowledgement details once.
In host information page - show the contacts associated with the host ( helpful in large companies where the NOC is not always aware whom is the person responsible for a specific host )
Currently only one address is supported for host entries. But a lot of modern servers has more than one network interface (bonding, teaming, failover etc.) On some operating systems like Linux a failover configuration has only one IP-address. On other systems (Windows, Solaris) you have one address per interface and a virtual one added as an alias to the physical interface. Additonal we have hosts with interfaces in ...more »
Currently only one address is supported for host entries. But a lot of modern servers has more than one network interface (bonding, teaming, failover etc.) On some operating systems like Linux a failover configuration has only one IP-address. On other systems (Windows, Solaris) you have one address per interface and a virtual one added as an alias to the physical interface.
Additonal we have hosts with interfaces in several networks.But it is one host. So it should be only one configuration.
« less full details »
Its very helpful if you can provide an API to communicate with the back end. Then the front end developers will have much freedom in changing it as they want.
Plug-ins should be able to be made aware of previous values, not just previous states.
This is the NUMBER ONE BIGGEST deficiency with the current monitoring architecture of Nagios.
Nagios can't monitor RATES, it can only monitor STATES. Plugins have worked around this by keeping their own separate cache of values, stored in DB's or txt files. But some mechanism should exist in the core.
The ability to apply a schedule (daily, weekly, monthly, yearly) to downtimes provides a valuable ability to add known exceptions to monitoring. It saves creating more and more timeperiods for known outages that occur every week. i.e. On Sunday morning the APP A servers are all rebooted and it takes one hour. Instead of adding a new time period which excludes that hour. You could put in a re-occurring downtime for ...more »
The ability to apply a schedule (daily, weekly, monthly, yearly) to downtimes provides a valuable ability to add known exceptions to monitoring. It saves creating more and more timeperiods for known outages that occur every week.
i.e. On Sunday morning the APP A servers are all rebooted and it takes one hour. Instead of adding a new time period which excludes that hour. You could put in a re-occurring downtime for every Sunday.
Additionally Downtimes should be able to be applied to hostgroups and service groups. So that in the case above, once a server is added to the APP A host group it inherit the related downtime.
Additional benefits are -
* that the downtime and its comment *which explains the reasoning) are easily viewable for the server.
* there are events in the logs to show the server/service was in downtime
« less full details »
Escalations are currently based on the number of failed checks. In some situations however (need I spell SLA?) you have fixed times after which to escalate notifications for failed services. It would be nice to have some means to specify this in nagios (calculating an interation count based on the specified check interval is insufficient as check might be delayed) instead of adding this via event handlers.
Bring back DB backend. A proper admin tool preferably Web based or a GUI. This is critical for Nagios to flourish in Enterprises.
Give nagios a web service interface...
Like JSON (e.g : Nagios2JSON www.yannj.fr)
of XML or even a complete SOAP Implementation
What about kicking out ndoutils? I think using the event broker mechanism causes too much load in larger installation. I am running nearly 1300 hosts with nearly 9000 services on one machine and it works fine. Enabling ndoutils will bring down the performance to a point where nothing will really work. So looking at the database model we have 3 areas. a) actual status data b) historical data c) configuration data We ...more »
What about kicking out ndoutils? I think using the event broker
mechanism causes too much load in larger installation. I am running
nearly 1300 hosts with nearly 9000 services on one machine and it works
fine. Enabling ndoutils will bring down the performance to a point where
nothing will really work.
So looking at the database model we have 3 areas.
a) actual status data
b) historical data
c) configuration data
We do not need the same mechanism for all.
a) We have according to my informations (and I am not a really good C
programmer) one module for writing status data (statusdata.c). If we
patch this 575 line module to write to a database we have all the
information needed for alternative web interfaces.
b) This could be done patching the log writer mechanism using some code
from the ndoutils
c) Nearly same as a.
I think splitting up the NDOutils in 3 directly integrated things will
solve most problems.
Using a switch in nagios.cfg the admin can decide
- to file
- to database
- to both.
What do you think about that little idea? Think about it. My idea
regarding to the major change in timeperiods was not so bad. Perhaps
this is too. :-))
Yours
Martin Fuerstenau
« less full details »
Setting a default (mandatory) expiration for acknowledgements would require NOC to actively work issues or be paged again. Currently system allows for acknowledged but unfixed problems to fester.
Addition to status maps to view cabinet space and layout, maybe something in AJAX. Drag and Drop your server icons.
NRPE is great for server monitoring, but each time you whant to add new monitoring service on the managed node, you have to edit nrpe.conf and restart the nrpe service. This is not a problem in small deployments. However if there are 100 managed nodes its time consuming job. I recon it would be nice if nrpe could get remote configuration updates eg. new commands, modified existing commands. etc. On the nagios server ...more »
NRPE is great for server monitoring, but each time you whant to add new monitoring service on the managed node, you have to edit nrpe.conf and restart the nrpe service. This is not a problem in small deployments. However if there are 100 managed nodes its time consuming job.
I recon it would be nice if nrpe could get remote configuration updates
eg. new commands, modified existing commands. etc.
On the nagios server a repository of all managed node configuratiguration files should exist. This could be used to update managed node configuration and upload it to managed node.
I use number of shell scripts (ssh scp etc.)to automate this task but it time consuming to setup and its far from perfect.
« less full details »
I think that a plugin for IBM WebSphere Application Server would be very usefull. I think that it can make the difference between other 'best known' packeges for monitoring J2EE environment also known for their heavy.
Currently the NAGIOS configuration for hosts supports the definition of a network hierarchy via the "parents" option. For example this would normally relate to the physical switch that a server is plugged into. In most enterprise environments there is a difference between the physical (layer 2) and logical (layer 3) topologies. Improving NAGIOS to have the capability to support multiple (arbitrary) network topology layers ...more »
Currently the NAGIOS configuration for hosts supports the definition of a network hierarchy via the "parents" option. For example this would normally relate to the physical switch that a server is plugged into.
In most enterprise environments there is a difference between the physical (layer 2) and logical (layer 3) topologies. Improving NAGIOS to have the capability to support multiple (arbitrary) network topology layers would be a great advantage and help it in the enterprise space. This would allow more accurate identification of hosts that are down or unreachable in the event of certain network failures.
To implement this would require changes to the configuration definition to allow parents to be defined for different topologies. The map view should also provide the option to select the specific topology to view. The logic used to classify if a host is unreachable instead of down would also need to be modified to take into account the different topologies. Ideally notifications could be sent detailing the specific parents/topology layer causing a host to be unreachable.
eg.
nagios.cfg
label_parents_2 Layer 2
label_parents_3 Layer 3
Host definition -
parents_2 switch1,switch2
parents_3 router1,router2
« less full details »
I would appreciate the possibility to set the refresh interval of the nagios website by hand. Very often I am searching for the root cause and then the nagios page gets reloaded and the browser shows the beginning of the web page.
Just have a link, including stop page reload would satisfy my needs.
On the static page, there can be a link to the page which is refreshing itself.
It would be nice if you could groups in the web interface and assign servers to those groups, and assign users to those groups. The second part to this would be to have the web interface to only show servers in the group the user has been assigned to, unless they are a top level user. update: 10-Sep-2009 I have found that by adding a user with the same name as a contact in the nagios config, then that use will only see ...more »
It would be nice if you could groups in the web interface and assign servers to those groups, and assign users to those groups. The second part to this would be to have the web interface to only show servers in the group the user has been assigned to, unless they are a top level user.
update: 10-Sep-2009
I have found that by adding a user with the same name as a contact in the nagios config, then that use will only see services/hosts that have them as the contact or the user is within the contact group. Would be a good idea to make this more visible in the documentation.
« less full details »
Writing HTML in C not only makes it extremely difficult for web developers to customise the look and feel of the web interface, it also makes it difficult to follow exactly where a particular piece of HTML comes from. Separating presentation and code logic is one of the big goals of modern application development. Using a library like ClearSilver, Nagios can provide fast, powerful templates using native C. ClearSilver ...more »
Writing HTML in C not only makes it extremely difficult for web developers to customise the look and feel of the web interface, it also makes it difficult to follow exactly where a particular piece of HTML comes from. Separating presentation and code logic is one of the big goals of modern application development.
Using a library like ClearSilver, Nagios can provide fast, powerful templates using native C. ClearSilver is a templating language already proven in use by Trac, Yahoo, Google and others - it's easy to read for web developers. Best of all, the Hierarchial Data Format can be read and written as plain text, which makes testing and debugging easier and also decouples the web front end from the polling system. It would be possible to switch from the CGI polling Nagios every time it is checked, to Nagios producing a HDF file each time the state changes and the CGI simply using those static HDF files to generate its content.
« less full details »
Social Web