Host is DOWN but Ping is OK

This forum was archived to /woltlab and is now in read-only mode.
  • Hello and thanks for accepting my account subscription for this forum. This is my first post here.

    I'm quite new to Icinga2 so maybe my problem is quite trivial to someone who's already reached guru level ;-) Anyway, I've been searching the internet for a while to help me out but I didn't succeed. Neither have I been able to find any post addressing my problem by using the forum search. So I thought I'd better ask for a hint:


    To get familiar with the software I'm running a fresh Icinga2 (r2.6.2-1) on Ubuntu 16.04.2 LTS. This machine is a satellite installation and receives its config from a master (v2.6.2 on CentOS 7). The only piece of configuration which applies to its zone is one Linux host to monitor with nothing more but hostalive.


    Code
    1. object Host "fqdn.of.host" {
    2.     address = "fqdn.of.host"
    3.     vars.os = "Linux"
    4.     check_command = "hostalive"
    5. }


    This results in 2 tests carried out for this host right out of the box with no further configuration:

    • PING OK - Packet loss = 0%, RTA = 40.67 ms
    • SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0)

    What looks very strange to me is the host status, which is:

    DOWN

    CRITICAL - Network Unreachable (fqdn.of.host)


    As far as I understood from the documentation, hostalive uses a ping to determine if a host is up or down.

    So why is this host displayed as down while ping is successful?


    Since my setup is as it came out of the box I hope someone here can quite easily explain to me what I did wrong or how I can track this issue down to its roots.

  • i bet because the host is down, no fresh results are coming up to the master from your satellite.

    In other words, your ping test seems to show the last known state and will stick with that information.


    Now you have the problem to recognize such a situation - so that you can decide if you trust the status or not.

    Implement the cluster check for that - this will inform you if an endpoint (here: satellite) goes down.

    https://docs.icinga.com/icinga…ibrary#itl-icinga-cluster

  • Hello and thanks a lot for your quick reply, sru.


    Thanks for pointing the cluster check feature out to me. I haven't had read about that in detail yet.


    Unfortunately, your hint did not solve my problem. Both checks are carried out regularly and the data shown is up to date. Additionally, I added the same host object to the master's configuration, except for giving it a different name to tell them apart. This object shows the exact same behaviour, so I guess this has nothing to do with the master slave setup.


    I'd be glad about any other hint.

  • I think I found out myself, at least to some degree.


    If someone could show me where I can find out what exactly the hostalive command does, I'd be very happy because this would help me verifying my guess.


    This is what I came up with:


    • the hostalive check command uses the ckeck_ping plugin installed in /usr/lib/nagios/plugins or /usr/lib64/nagios/plugins respectively
    • check_ping can be invoked with one of the options -4 or -6 or without
    • fqdn.of.host has both a valid A and AAAA record (ipv4 and ipv6 address)
    • when neither -4 nor -6 are given, check_ping seems to try to perform an ipv6 ping but in my case fails because the icinga2 vms don't have a working ipv6 connection
    • this results in a failed hostalive check


    What I did to solve this issue:


    To overcome this issue i replaced the hostname by the ipv4 address, which unfortunately would break the host configuration if the ip address should change in the future.

    Turning off ipv6 on the icinga2 node would not help because hostalive still would try to perform an ipv6 ping, resulting in a different error message:


    DOWN

    CRITICAL - Could not interpret output from ping command

    (/bin/ping6 -n -U -w 30 -c 5 fqdn.of.host)


    Thus it seems the hostalive check is unable to determine if trying ipv6 even makes sense.


    I hope this will help if someone else should run into the same problem.

  • Maybe just because I haven't thought of that ;-) Like I said, I'm still quite new to icinga2.


    Anyway, the most important part was to track down why the hostalive check failed in the first place. The error message doesn't point out the ipv6 issue.