nagios to icinga2 - possible migration

This forum was archived to /woltlab and is now in read-only mode. Please register a new account on our new community platform.

You can create a thread on the new site and link to an archived thread. This archive is available as knowledge base, safe and secured.

More details here.
  • Hi folks! I may have the opportunity to migrate away from Nagios and of all the other monitoring applications I've sampled I'm inclined to go with Icinga2. I know that the NSClient++ is supported in Icinga2 and that's what we currently use with Nagios. I'd rather not go through the entire process of moving to the Icinga Agent (and yes I'm definitely aware of the security issues that exist with that client) if I can reuse that client and reuse the same port 5666 so that I don't have to bug my network guy to open up ACL's on our firewall. Our setup is not that big (600+ hosts and 4000+ services) but it could be a lengthy process have to set this all up from scratch without config manager tools like Chef and Puppet which we don't currently have. I'm trying to see if there's a way to configure the Icinga2 master server to listen on that port instead of 5665 but I've not been able to come across any documentation or forum post that provides the information. I was hoping someone here might've done something similar and can share the knowledge. Thanks in advance!

  • The clients do listen on port 5666, and the firewall should be opened up already. You don't need that port on the Icinga master then, as there won't be connections to it, right?

  • I think there's some confusion on my side. Is it possible to monitor a Windows client, for example, with only the NSClient++ and not have to install the Icinga Agent?

  • If NSClient++ is running on that machine already, you could go for a 2 step migration. First, re-use the current plugin on the master used to query NSClient++ on the clients. I'd assume that this is check_nrpe or check_nt currently.

    The ITL provides the "nrpe" CheckCommand definition, check_nrpe is available on EPEL for RHEL7 e.g. - the soft migration should be all services and their command parameters which should be translated into Icinga 2 custom attributes. Maybe it also is "check_nt" which is used inside "nscp" too.

    The second long-term step should be a full migration to the Icinga 2 client, used as command bridge.

    Hopefully the NSClient++ versions are at least 0.4.x already, an update request to 0.5.x wouldn't hurt though. It would open up more possibilities and security options.

  • I think the biggest issue I'm having here is the translation between the syntax in Nagios and Icinga. I'm trying to configure a host... something like this:

    But I'm getting errors on the GUI saying that the remote Icinga instance is not connected to the master.

  • No..just strictly using the NSClient++ client. I basically tried to recreate what this member did here. I'm definitely missing something here that I'm not understanding. The host itself shows up on the web GUI and the ping4 service is working and the server knows that this endpoint is it seems the issue is with the way the service is being defined. I'm not really seeing any examples on how to define services and commands for use with NSClient++

  • That URL describes the Icinga 2 client which locally queries NSClient++. You don't want that then.

    If you prefer to use the "old" mode with querying NSClient++ remotely from your master, I'd investigate on how you are doing it in your current setup, and apply the same method. Either check:_nrpe or check_nt, can you share some insights from your "old" configuration?

  • Sure for example in Nagios we have this command defined:

    1. define command{
    2. command_name check_nt
    3. command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
    4. }

    and then we assign a service based on that command it to the a host. This one here calculates the uptime:

    Basically I'd like to just be able to translate that and all other similar commands. I have a bunch of other nagios plugins that run python and perl scripts that would need to run in more or less the same fashion. I think the biggest issue that I'm having is just translating the do I define this command and how do I reference any and all hosts I want to apply this check to.

    The post was edited 1 time, last by lravelo ().

  • Ok then, ensure to have a read on the monitoring basics chapter (even if you say, it must work. it helps to understand the basic concepts. Especially those which differ to the old configuration format).…doc/03-monitoring-basics/

    One thing which also helps, is the differences and migration guides in the end of the TOC. Not necessarily to migrate everything 1:1 but to get an idea.…migrating-from-icinga-1x/

    My general advise when asked - do an inventory and start fresh.

    So, your example tells us that you're using check_nt. This is good, as there already is a CheckCommand for that.

    Note for you - if there isn't a CheckCommand, you need to create one for your plugin:…-monitoring/#requirements

    Going further, you really need to understand how command parameters are passed in Icinga 2. The old world used $ARG1$ and so on, with the service check_command attribute like commandname!arg1!arg2 and so on.

    This is different here, the check_command really only is a the name of the CheckCommand object. Command parameters are defined as custom attributes.

    Note for you - read about custom attributes, and command parameters here:…basics/#custom-attributes…ters-from-host-or-service

    1. check_command = "nscp"

    (I am repeating what the official Icinga training does)

    Now, how about some parameters to check_nt, there's documentation for that:…ga-template-library/#nscp

    Docs tell me, that I don't need to specify an address, that's automatically resolved from the host's address attribute. Good. The port in your example is hardcoded, the "nscp" CheckCommand has set a default value for it. I could override it (didn't you say you are using 5666 as port?) - that's a major difference to the old world, Commands also have custom attributes and allow for default values.

    The general idea is to have only one CheckCommand for different purposes. Optional and conditional arguments, and not 10 CheckCommand definitions like check_n1_1arg, check_nt_3_args, etc.

    What else would I need ... oh yes, there's two arguments defined in your example, but only one is used. Let's just forget about $ARG2$, seems to be a leftover.

    1. -v $ARG1$

    seems to be the query (or "variable" with check_nt --help). This applies to the "nscp" CheckCommand docs with "nscp_variable".

    Easy going, the whole service looks like this

    1. apply Service "uptime" {
    2. check_command = "nscp"
    3. vars.nscp_variable = "UPTIME"
    4. assign where host.vars.os == "Windows" //this requires that all hosts which get this service object generate, have this custom attribute set
    5. }

    Last but not least, get familar with apply rules and their assign where expressions. They'll save you a lot of time, if you only need a service defined once but applied to *all* Windows hosts.…oring-basics/#apply-rules

    The other service object attributes from your example:

    • max_check_attempts -> the same
    • normal_check_interval -> that's Nagios 2 syntax, deprecated in 3.x. The Icinga 2 one is called "check_interval".
    • retry_check_interval -> also deprecated in old versions. Icinga 2 just uses "retry_interval"
    • check_period -> the same

    host_name is not needed, that's automatically set via apply rule. service_description is the string identifier after "... Service".

    contacts and notification settings are handled differently in Icinga 2.

    Notifications are real objects, and they relate to Host/Service Objects. Read on here:…ing-basics/#notifications

    Notification objects also specify the notified users or user_groups (previously the contacts). The notification_options are readable states and types as setting.

    The differences are explained here:…n-hints-for-notifications (better to read there than I repeat them here)

    Please note that notification objects also require a NotificationCommand. The one you previously had in your contact definition.

    Your example could translate into this. Again, read about Apply rules beforehand.

    More to read: Value types.…cs/#attribute-value-types

    One thing to note: Don't copy paste my examples. There may be typos or errors, I did not test them. Try to find your own way through it, and iteratively test them. Start with a simple Host object, then create the Service Apply. Once everything works and is live in your Icinga Web 2, dig deeper into the new Notification sphere.

    Your homework: Find out how to assign users to user groups. That's documented and got examples too.

    Second to that, tell me your best pattern for applying services to your hosts with check_nt by posting your final solution here :)

  • Thanks for the guidance. This is starting to make sense and I'm starting to like just how much more powerful this tool is than what I'm used to. I've only added two hosts for now but I've been able to define the services that I most commonly monitor on Windows servers...below are those services:

    One thing I've been trying to figure out is to make the services a little more generic and then use certain attributes to trigger a check...take for example these two:

    These both only differ in name and parameter but it's really the same check. Would I be able to combine these two into jus one check and then perhaps just assign attributes for each drive letter? This would also help for certain Windows Services that I need to monitor...would like to know if I can avoid having to write up a Service for every single Windows Service.

    EDIT #1:

    So in hopes to actually find a way to do this, I've modified the following (probably not the best thing to do but just tinkering around to see if I can get it to work):

    and then I added the following to one of the hosts to test:

    1. vars.disks["c"] = {
    2. }
    3. vars.disks["d"] = {
    4. }

    but now I'm getting a message saying that I'm missing the "-l" parameter which this command doesn't seems like this is doable but definitely missing something.

    EDIT #2:

    After a little bit more of trial and error, I got it :-)

    Going to try to do the same thing for the Windows Services/processes

    The post was edited 3 times, last by lravelo ().

  • Oh wow, you are doing great! :) Especially for the apply-for with conditions, that's really advanced stuff.

    One thing you can also do - put more vars into the host level, but the decision making on the "os" attribute is totally fine inside the apply rule. I've done that inside the Icinga 2 training lately too :D

  • Oh wow, you are doing great! :) Especially for the apply-for with conditions, that's really advanced stuff.

    One thing you can also do - put more vars into the host level, but the decision making on the "os" attribute is totally fine inside the apply rule. I've done that inside the Icinga 2 training lately too :D

    Thank you! Can you give me an example of what exactly you're referring to by "putting more vars into the host level"?

  • Hmmm, I thought of the CMDB way, like

    1. vars.os_type = "Linux"
    2. vars.os_distribution = "Debian"
    3. vars.os_distribution_version = 9
    4. vars.app_types = [ "web", "db" ]
    5. = {
    6. "Windows Update" = "ws..."
    7. }

    Everything you can reliably set in templates and import them on the Hosts themselves. That way you can build a) service apply rules b) notification apply rules c) set specific "notes" and other attributes visibile, or usable in scripts. You can also filter by these attributes e.g. inside Icinga Web 2 or the REST API.

  • That's kind of the approach I was aiming for. For example, I have this entry for one of my hosts:

    I'm sure that this can be consolidated even further. I.e., to have something like this for example

    1. vars.disks [ "c","d" ]
    2. {
    3.     "NSClient++ Service" = "nscp"
    4.     "Windows Update" = "wuauserv"
    5.     ...
    6. }

    But I'me definitely going to have to revise the structure of the apply-for service in order to reduce.

  • I'd suggest to keep a local git repository for your configuration files. That way you can control, commit, revert in any case of config refactoring "accidents".

  • The more I play with this application and the syntax, the more I like it. It's challenging at first but I rather put in the work for all of the prep and then just add the rest. Here's what I've been able to do so far:

    For the disks, I've done the following:

    1. apply Service "Disk " for (disk in host.vars.partitions) {
    2. import "generic-service"
    3. check_command = "nscp"
    4. vars.nscp_variable = "USEDDISKSPACE"
    5. vars.nscp_params = disk
    6. vars.nscp_warn = 90
    7. vars.nscp_crit = 95
    8. notes = "Check for used disk space on disk " + disk
    9. assign where host.vars.os == "Windows"
    10. }

    As a result, I've reduced the per-host config for disks to just this:

    1. vars.partitions = [ "C", "D" ]

    For services that I need to monitor in Windows, I've done the following:

    1. apply Service for (win_service => config in host.vars.win_services) {
    2. import "generic-service"
    3. check_command = "nscp"
    4. vars.nscp_variable = "SERVICESTATE"
    5. vars.nscp_showall = true
    6. vars.nscp_params = config
    7. notes = "Service check for " + win_service
    8. assign where host.vars.os == "Windows"
    9. }

    as a result, I've been able to reduce the per-host config for windows services to the following:

    1. vars.win_services = {
    2. "Service #1" = "srv1"
    3. "Service #2" = "srv2"
    4. "Service #3" = "srv3"
    5.         ...
    6.         "Service #N" = "srvN"
    7. }

    this is super convenient and time saving as compared to all of the repetitive definitions that I am used to in Nagios. Long live Icinga! :-)