Icinga2 monitoring issue - host goes down but checks don't fail

This forum was archived to /woltlab and is now in read-only mode.
  • Hello - having some issues getting Icinga working the way I'd like - had a few questions about my configuration.


    The setup is to have 1 master Icinga server that has Icingaweb and the database on it, top down with config syncing, and about 30 or so Endpoints/Hosts in their individual Zones being monitored by the master. I currently have only a few checks/services being monitored on 3 Hosts but the way it is working I believe I did something wrong somewhere.


    First, the configs. Sorry in advance if this is overkill but I figured the more information the better.


    zones.conf :



    zones.d/global-templates/services.conf




    zones.d/master/cluster.conf


    Code
    1. object Service "cluster" {
    2. check_command = "cluster"
    3. check_interval = 5s
    4. retry_interval = 1s
    5. host_name = "icinga2.domain.com"
    6. }

    zones.d/master/host.conf



    zones.d/apps-nas-test01.domain.com/host.conf


    Code
    1. object Host "apps-nas-test01.domain.com" {
    2. check_command = "hostalive"
    3. vars.cluster_zone = name
    4. address = "172.17.18.204"
    5. vars.client_endpoint = name
    6. import "generic-host"
    7. vars.os = "Linux"
    8. }


    So - to test the monitoring, on the host Apps-nas-test01 I SSH'd in and turned off networking. In IcingaWeb, the host still appears as "Up", as if there wasn't an issue.





    Which is odd - and with the networking left down it stays like that.


    However - I'm instantly notified by the cluster check, and e-mailed very quickly (within 30 seconds). I think this is really cool and responsive.





    So what am I doing wrong with my host objects where they appear as "UP" when I power them down or kill networking? One thing that I think may be an issue is maybe the "Check Source"? Should I be doing "ping" vs "cluster-zone" for the "host.conf" object config instead of "hostalive" (which is imported from generic-host)? It just seems odd that the cluster service check on the master node is working so well, but if I were to only rely on the icingaweb monitor I'd have no idea that a host was down. Is there a "state change" setting that I need to configure so that when a host is unreachable for X amount of time, it switches to "down" in icingaweb?


    An ideal setup for us would be the host showing as "Down" when the networking is off or the host is shutdown/rebooted. Any advice would be really appreciated as I've hit a wall. I want to eventually add 30 or so more hosts to have their disks and networking monitored, and be notified by e-mail and the IcingaWebGui when any of them go down - so I'm open to any suggestions to follow the "Icinga2" best practices, as I'm setting this up with no prior Icinga/Nagios experience.

  • The icinga2 object list --service and host outputs on the master and apps-nas-test01 were putting the character count over the limit but I can provide them if needed.

  • Sigh! - so I see one of my mistakes. I re-adjusted the "max_check_attempts" to 2 - but still not getting a state change when a host is down. It appears that it doesn't correctly re-check upon failure. Check attempt stays at 1 for the hostalive check and the ping check. So I believe subsequent checks never happen, or the counter on checks isn't growing, so I never get a "state change". Here is the current config for the generic templates. And attached are screenshots of the host with the networking turned off.








  • Ended up throwing in the towel and starting from scratch with a new master server, following the installation guide from:


    https://www.icinga.com/docs/ic…t/doc/02-getting-started/


    and from:

    https://admin-docs.com/monitor…ent/setup-icinga2-master/

    https://admin-docs.com/monitor…ents-with-icinga2-client/

    https://admin-docs.com/monitor…-icinga2-client-on-linux/

    https://admin-docs.com/monitoring/icinga/install-icinga2/


    Comparing my configs from master to master, I'm not sure where I went wrong, except that the hostalive "Check Sources" on the "non-working" setup were the hosts that I wanted to monitor, and not my Icinga Master. With my new setup, the check source is my icinga master and when a host I'm monitoring goes down, the state change happens and I'm alerted its down on IcingaWeb.