Icinga not executing checks and using all available memory

This forum was archived to /woltlab and is now in read-only mode.
  • In the last few days I noticed Icinga sometimes not executing any checks for a long time while consuming all available memory till it is killed by the OOM killer.

    What could be the reason for that?

    I'm using the Icinga Debian package (version 1.13.4-2) from the Debian repository on Debian Stretch.

  • Recently I've configured Nagflux together with InfluxDB and Grafana as a replacement for PNP4Nagios. A bit later the issues started.

    Previously the VM running Icinga had 2 GB of RAM. I increased the memory to 4 GB but the issue still existed.

    Yesterday, I've moved my InfluxDB and Grafana to another VM and the issue seems to be gone. I'm not sure whether it was just random because Nagflux is still running on the same server as Icinga. And why would another process (InfluxDB/Grafana) increase the memory usage of the Icinga process?

    Currently the memory usage of Icinga is at 32% (about 1,3 GB). At the time of the issue the memory usage of Icinga increased to 73% (about 2,9 GB), so I don't think the issue is caused by InfluxDB or Grafana.

  • InfluxDB is eating quite a lot of resources if many metrics are running into it. I've seen that in my Vagrant boxes with Icinga 2 as well. Same thing to an RDBMS applies - give it more resources, it will take them.

  • I'd guess the core is "hanging" in an event, cannot execute a command due to heavy I/O, no resources available, etc. Icinga 1.x runs single threaded, meaning to say, once processes are forked for execution, the core waits for their exit. Look into check execution time and latency, the icingastats cli command comes in handy there.

    On the long run, I'd suggest to migrate to Icinga 2. Especially the performance parts were one reason to start fresh after 1.x.

  • I wanted to use Icinga 2 but Check_MK (which I like because of a single check getting all the check results from a single host) does not work with Icinga 2.

    Since I moved InfluxDB and Grafana to a separate VM, the issues are gone.

  • It is fine. I'm just saying that Icinga 1.x is out of development, and you may want to look for possible upgrades in the next years.