Duplicate disk checks happening for certain clients in top-down configuration

This forum was archived to /woltlab and is now in read-only mode. Please register a new account on our new community platform.

You can create a thread on the new site and link to an archived thread. This archive is available as knowledge base, safe and secured.

More details here.
  • I have a "master with clients" configuration (https://www.icinga.com/docs/ic…uted-monitoring/#top-down) that's working very well for me, with the exception of the three newest hosts. Somehow, the disk check is being called twice on each host, and it's being called with the wrong parameters. One of these calls returns 0 and the other returns 2, so the state is constantly flapping between OK and Critical. When I look in the debug log on a problem host, I see the following:


    You can see two different check_disk calls, with slightly different parameters. Neither check is correct, as my service definition should exclude FUSE. Here is the service definition from the problematic client (as found in /var/lib/icinga2/api/zones/global-templates/_etc/services.conf)



    Here is part of the debug log on a host that's working correctly:

    Code
    1. [2017-12-21 22:45:41 +0000] notice/JsonRpcConnection: Received 'log::SetLogPosition' message from 'ueb-centmon-01.c.subdomain.internal'
    2. [2017-12-21 22:45:42 +0000] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 0; Checks/s: 0
    3. [2017-12-21 22:45:42 +0000] notice/JsonRpcConnection: Received 'event::ExecuteCommand' message from 'ueb-centmon-01.c.subdomain.internal'
    4. [2017-12-21 22:45:42 +0000] notice/Process: Running command '/usr/lib64/nagios/plugins/check_disk' '-c' '10%' '-w' '20%' '-X' 'none' '-X' 'tmpfs' '-X' 'sysfs' '-X' 'proc' '-X' 'devtmpfs' '-X' 'devfs' '-X' 'mtmfs'
    5. '-X' 'cifs' '-X' 'nfs' '-X' 'fuse' '-X' 'fuse.gvfsd-fuse' '-X' 'fuseblk' '-X' 'tracefs' '-X' 'configfs' '-X' 'overlay' '-m': PID 10676
    6. [2017-12-21 22:45:42 +0000] notice/Process: PID 10676 ('/usr/lib64/nagios/plugins/check_disk' '-c' '10%' '-w' '20%' '-X' 'none' '-X' 'tmpfs' '-X' 'sysfs' '-X' 'proc' '-X' 'devtmpfs' '-X' 'devfs' '-X' 'mtmfs' '-X'
    7. 'cifs' '-X' 'nfs' '-X' 'fuse' '-X' 'fuse.gvfsd-fuse' '-X' 'fuseblk' '-X' 'tracefs' '-X' 'configfs' '-X' 'overlay' '-m') terminated with exit code 0
    8. [2017-12-21 22:45:42 +0000] notice/ApiListener: Sending message 'event::CheckResult' to 'ueb-centmon-01.c.subdomain.internal'
    9. [2017-12-21 22:45:46 +0000] notice/JsonRpcConnection: Received 'log::SetLogPosition' message from 'ueb-centmon-01.c.subdomain.internal'
    10. [2017-12-21 22:45:47 +0000] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 0; Checks/s: 0

    Note that check_disk is call only once, and with the correct flag to exclude FUSE.


    I have one master (ueb-centmon-01) and about a dozen hosts. Only the three newest hosts have this issue, so I suspect it's a problem with the host configuration, rather than the check configuration. The zones are set up as follows:



    All hosts run CentOS Linux release 7.4.1708 (Core) with Icinga2 2.8.0-1 installed from icinga-stable-release with Yum.


    Any suggestions? Is there anything else I can post to help troubleshoot this issue?

  • The following wouldn't fit in my first post.


    Config for the problem host:


    Config for a working host: