Nagios: Advanced Topics with John Sellens

System administrators manage IT resources, and part of that task involves monitoring those resources under management. Over the years, we've developed hundreds of solutions for monitoring the status of our IT resources, and one of the most popular of those is Nagios.

Personally, I have several years' experience running and managing Nagios, and only about half of them have been spent doing it 'right', which is using host templates and inheritance to simplify my configuration. When I heard that John Sellens was going to be teaching Nagios: Advanced Topics, I jumped at the chance to learn more about the tool that helps me sleep well at night (incidentally, it also occasionally wakes me up at night, but it's better than not knowing the status of my infrastructure).

As John explained, Nagios monitors hosts and the services that are provided by them. A prototypical situation would be a Nagios server monitoring a web server (the host) using ping, and verifying that the web service is available by connecting to port 80 on the host. This may be typical, but Nagios is, by no means, only able to monitor machines and the

Nagios is well known for its extreme flexibility regarding its configuration. The relatively simple syntax belies the power and flexibility that come from the Nagios scheduling core being completely separate from the plug-in architecture that actually checks services.

It was obvious that John has had several years administering complex Nagios infrastructures. As he walked us through a quick primer which reviewed the objects in the Nagios configuration, he dropped informative tidbits regarding the progression of Nagios from its roots as 'Netsaint' through the current 3.2.x iteration.

Due to the nature of how Nagios operates, it tends to be very resource heavy at large scale. This primarily comes from the external plug-in structure. Each check results in a new process being spawned and each spawned process consumes resources on the Nagios host.

To combat this issue, numerous solutions have been implemented, but as John explained, there is no panacea. The most obvious example, a configuration directive called use_large_installation_tweaks, is designed specifically to fork less frequently, as well as utilize a different memory cleanup method.

Other methods available involve using the embedded perl compiler (which requires configuring --enable-embedded-perl at compilation, plus enabling the enable_embedded_perl directive in the configuration file.

The class covered a lot of ground, and ended up with John discussing several add-ons, most notably the Nagios Remote Plugin Executable, which allows a Nagios administrator to configure passive checks which are reported upon by a remote process. This allows Nagios to provide distributed checks as well as respond to externally created events, such as a SNMP trap.

I’m looking forward to heading back to work to test some of the ideas I got during this class. I think that some implementation ideas, particularly some of the features which allow for scaling up Nagios checks.

Comments

[...] This post was mentioned on Twitter by Michael O'Keefe, James Payne. James Payne said: Nagios: Advanced Topics with John Sellens http://goo.gl/fb/Cym4O [...]

0 likes
0 dislikes