Flood of alerts

  • Lately we have been seeing this behavior where Icinga throws flood of alerts now and then. We have one single instance running for now.

    Checking error logs around the time when alerts were thrown, it seems there are error messages similar to "exited with error code 128". At the same time we received check_load critical on that icinga server.

    Load : Additional Info: CRITICAL - load average: 18.85, 12.01, 5.95

    Version and Platform details:

    icinga2 - The Icinga 2 network monitoring daemon (version: r2.6.0-1)

    System information:

    Platform: Ubuntu

    Platform version: 14.04.5 LTS, Trusty Tahr

    Kernel: Linux

    Kernel version: 3.13.0-53-generic

    Architecture: x86_64

    Build information:

    Compiler: GNU 4.8.4

    Build host: lgw01-16

  • Also seeing this errors:

    ERROR: Executed command exits with return code '7'

    NPCD[1300]: ERROR: Command line was '/usr/lib/pnp4nagios/libexec/process_perfdata.pl -n -b /var/spool/icinga2/perfdata/service-perfdata

  • it seems there are error messages similar to "exited with error code 128"

    Are these errors limited to the call of a single plugin, a type of plugins (e.g. SNMP), or "general"?

    NPCD[1300]: ERROR: Command line was '/usr/lib/pnp4nagios/libexec/process_perfdata.pl -n -b /var/spool/icinga2/perfdata/service-perfdata

    Have you tried to execute this call on the command line (as Icinga2 user)?

  • General. We don't use SNMP plugin.

    My main question is why host keeps getting overloaded after few days. Have enough Processing power, memory and disk. It doesn't show any errors in syslog.

  • I'd suggest upgrading to 2.6.3 which fixed a couple of stability bugs. Other than that, I'd analyse the load in performance graphs, htop and so on.

    More hints here: https://docs.icinga.com/icinga…oting-analyze-environment