Icinga2 monitored host stops/fails receiving most check "events" after some time

Hello,

Have encountered a weird issue, where recently, checks triggered from the icinga2 Master Agent have stopped working.

The only check which now works is the ‘event::Heartbeat’ all other checks have stopped producing output and a manual trigger for the checks does nothing, it does not even show on the client node side as having been received.

The first interesting part is, if I restart the icinga2 service, it will properly work, receiving all configured checks for an undetermined amounts of time, anywhere from 2 hours to possibly 4 hours or 5 hours, then it fails again.

Second interesting part is, if I run the check locally from within the client node, the check finishes successfully.

Setup details:

CentOS Linux release 7.7.1908 (Client/Master Node)

Version:
icinga2 --version (Same version on client and master nodes)
icinga2 - The Icinga 2 network monitoring daemon (version: 2.11.2-1)

Port check:
ss -tlpn | grep icinga2
LISTEN 0 128 *:5665 : users:((“icinga2”,pid=28779,fd=21))

Config check of 1 configured check command (client node):

icinga2 object list -n check_ipa_status
Object ‘check_ipa_status’ of type ‘CheckCommand’:
% declared in ‘/var/lib/icinga2/api/zones/director-global/director/commands.conf’, lines 848:1-848:38

  • __name = “check_ipa_status”
  • arguments = null
  • command = [ “/usr/bin/sudo”, “/usr/lib64/nagios/plugins/check_ipa_status”, “-n”, “all” ]
    % = modified in ‘/var/lib/icinga2/api/zones/director-global/director/commands.conf’, lines 850:5-855:5
  • env = null
  • execute
    % = modified in ‘methods-itl.conf’, lines 19:3-19:23
    % = modified in ‘methods-itl.conf’, lines 19:3-19:23
    • arguments = [ “checkable”, “cr”, “resolvedMacros”, “useResolvedMacros” ]
    • deprecated = false
    • name = “Internal#PluginCheck”
    • side_effect_free = false
    • type = “Function”
  • name = “check_ipa_status”
  • package = “_cluster”
  • source_location
    • first_column = 1
    • first_line = 848
    • last_column = 38
    • last_line = 848
    • path = “/var/lib/icinga2/api/zones/director-global/director/commands.conf”
  • templates = [ “check_ipa_status”, “plugin-check-command”, “plugin-check-command” ]
    % = modified in ‘/var/lib/icinga2/api/zones/director-global/director/commands.conf’, lines 848:1-848:38
    % = modified in ‘methods-itl.conf’, lines 18:2-18:94
    % = modified in ‘methods-itl.conf’, lines 18:2-18:94
  • timeout = 60
  • type = “CheckCommand”
  • vars = null
  • zone = “director-global”

Config check of 1 configured check command (Master Agent):
icinga2 object list -n check_ipa_status
Object ‘check_ipa_status’ of type ‘CheckCommand’:
% declared in ‘/var/lib/icinga2/api/packages/director/2dfe8f05-97e0-4724-90d9-f3d936bb303f/zones.d/director-global/commands.conf’, lines 848:1-848:38

  • __name = “check_ipa_status”
  • arguments = null
  • command = [ “/usr/bin/sudo”, “/usr/lib64/nagios/plugins/check_ipa_status”, “-n”, “all” ]
    % = modified in ‘/var/lib/icinga2/api/packages/director/2dfe8f05-97e0-4724-90d9-f3d936bb303f/zones.d/director-global/commands.conf’, lines 850:5-855:5
  • env = null
  • execute
    % = modified in ‘methods-itl.conf’, lines 19:3-19:23
    % = modified in ‘methods-itl.conf’, lines 19:3-19:23
    • arguments = [ “checkable”, “cr”, “resolvedMacros”, “useResolvedMacros” ]
    • deprecated = false
    • name = “Internal#PluginCheck”
    • side_effect_free = false
    • type = “Function”
  • name = “check_ipa_status”
  • package = “director”
  • source_location
    • first_column = 1
    • first_line = 848
    • last_column = 38
    • last_line = 848
    • path = “/var/lib/icinga2/api/packages/director/2dfe8f05-97e0-4724-90d9-f3d936bb303f/zones.d/director-global/commands.conf”
  • templates = [ “check_ipa_status”, “plugin-check-command”, “plugin-check-command” ]
    % = modified in ‘/var/lib/icinga2/api/packages/director/2dfe8f05-97e0-4724-90d9-f3d936bb303f/zones.d/director-global/commands.conf’, lines 848:1-848:38
    % = modified in ‘methods-itl.conf’, lines 18:2-18:94
    % = modified in ‘methods-itl.conf’, lines 18:2-18:94
  • timeout = 60
  • type = “CheckCommand”
  • vars = null
  • zone = “director-global”

I am also looking for further help with the “icinga2 cli”, I want to perform the above calls/executions from the command line (client node and master node) to try and see if I can debug further, unfortunately my executions always end with “null”. Therefore I think that I am not properly connecting to the “service” and/or running the checks from there.