Icinga 2: "Backend icinga2 is not running" every morning

Hi,

I’ve been using Icinga2 for some monitoring in the firm for almost a year. Everything works perfectly, parallelly with graphite for visualisation.

The only abnormal thing is: The icinga2 service stops periodically at around 7:45AM-8AM every morning. It used to stop running for the whole hour, then 2 hours, and now it stops there for 5 hours+ (it resumes working at 12PM-1PM).

When it stops running, the icingaweb still works, graphite dashboard shows up a “space” after the moment at which the last info was received.

Some info:

[root@31a6be75ba99 icingaweb2]# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.7.2-1)

Copyright (c) 2012-2017 Icinga Development Team (https://www.icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

System information:
  Platform: CentOS Linux
  Platform version: 7 (Core)
  Kernel: Linux
  Kernel version: 4.13.0-36-generic
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.5
  Build host: unknown

[root@31a6be75ba99 icingaweb2]# icinga2 feature list
Disabled features: api compatlog debuglog gelf influxdb livestatus logstash opentsdb redis statusdata syslog
Enabled features: checker command graphite ido-mysql mainlog notification perfdata

Some log from supervisor

[2018-04-02 07:35:12 +0700] information/WorkQueue: #6 (IdoMysqlConnection, ido-mysql) items: 2, rate: 3.43333/s (206/min 1042/5min 3116/15min);
[2018-04-02 07:35:32 +0700] information/WorkQueue: #6 (IdoMysqlConnection, ido-mysql) items: 2, rate: 3.43333/s (206/min 1032/5min 3116/15min);
[2018-04-02 07:36:08 +0700] information/WorkQueue: #5 (GraphiteWriter, graphite) items: 1, rate: 0.433333/s (26/min 143/5min 430/15min);
[2018-04-02 07:36:50 +0700] information/WorkQueue: #5 (GraphiteWriter, graphite) items: 1, rate: 0.5/s (30/min 141/5min 433/15min);
[2018-04-02 07:36:59 +0700] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2018-04-02 07:37:02 +0700] information/WorkQueue: #6 (IdoMysqlConnection, ido-mysql) items: 0, rate: 2.73333/s (164/min 950/5min 3028/15min);
[2018-04-02 07:37:10 +0700] information/WorkQueue: #5 (GraphiteWriter, graphite) items: 0, rate: 0.466667/s (28/min 144/5min 431/15min);
[2018-04-02 07:41:59 +0700] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2018-04-02 07:46:59 +0700] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2018-04-02 07:51:59 +0700] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2018-04-02 07:56:59 +0700] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
>>> Something happens here <<<<
[2018-04-02 13:14:13 +0700] information/WorkQueue: #5 (GraphiteWriter, graphite) items: 1, rate: 0.0166667/s (1/min 19/5min 305/15min);
[2018-04-02 13:14:13 +0700] information/WorkQueue: #6 (IdoMysqlConnection, ido-mysql) items: 5, rate: 0.0166667/s (1/min 2/5min 1699/15min); empty in 5 hours, 36 minutes and 20 seconds
#                                                                                                                                              ^
#                                                                                                                                            why this tells empty in 5 hours+ ??                                                             
[2018-04-02 13:14:13 +0700] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2018-04-02 13:14:33 +0700] information/WorkQueue: #6 (IdoMysqlConnection, ido-mysql) items: 2, rate: 2.93333/s (176/min 177/5min 1798/15min);
[2018-04-02 13:14:43 +0700] information/WorkQueue: #6 (IdoMysqlConnection, ido-mysql) items: 4, rate: 3.38333/s (203/min 204/5min 1798/15min);
[2018-04-02 13:15:03 +0700] information/WorkQueue: #6 (IdoMysqlConnection, ido-mysql) items: 2, rate: 4.28333/s (257/min 258/5min 1794/15min);
[2018-04-02 13:15:43 +0700] information/WorkQueue: #6 (IdoMysqlConnection, ido-mysql) items: 2, rate: 3.43333/s (206/min 410/5min 1798/15min);
[2018-04-02 13:16:03
2018-04-02 13:24:23,090 DEBG 'icinga2' stdout output:
+0700] information/WorkQueue: #6 (IdoMysqlConnection, ido-mysql) items: 6, rate: 3.3/s (198/min 465/5min 1787/15min);
[2018-04-02 13:16:23 +0700] information/WorkQueue: #6 (IdoMysqlConnection, ido-mysql) items: 2, rate: 3.26667/s (196/min 552/5min 1796/15min);

Supervisor shows that all services are running well, except for crond since I run all this stuff inside a docker container

[root@31a6be75ba99 icingaweb2]# supervisorctl status
crond                            FATAL     Exited too quickly (process log may have details)
httpd                            RUNNING   pid 26108, uptime 21 days, 23:53:49
icinga2                          RUNNING   pid 54, uptime 25 days, 15:56:50
mariadb                          RUNNING   pid 6163, uptime 0:25:00
sshd                             RUNNING   pid 51, uptime 25 days, 15:56:50

Graphite dashboard

Any clues please?

Memory and IO would also be interesting. And if there are any addition applications running on the monitoring host itself, which may cause problems.

Additional hints can be found in the troubleshooting docs.
https://www.icinga.com/docs/icinga2/latest/doc/15-troubleshooting/#analyze-your-environment

It turns out to be an issue with the mlocate on the host which runs daily as a cron to reindex file searching.

The process was using the whole CPU power for its reindexing that accidentally stops the icinga2 process (it’s still running tho).

Didn’t have enough time to exclude reindexing directories so I just temporary disabled the cron by

chmod -x /etc/cron.daily/mlocate

Thanks.

1 Like