Downtime issue

#1

Hi all,

All of my servers I put into downtime seem to go out of Downtime by themselves and notifications are sent, here is one of them:

image

I had set the downtime for 24 hours

image

Notifications were sent and the host didn’t appear to be in DOWNTIME any more.

Will 2.9 help with this? I have to ask as we are not able to upgrade as often as we like and as this is quite an important topics, I could try to push for an upgrade soonish if it would help.

Current setup

Icinga r2.8.4-1
Icingaweb 2.5.3

HA Master zone (2 masters) and two zones in HA so 4 pollers.

#2

No one else having any issues or something similar to my question?

(Michael Friedrich) #3

I’d suggest upgrading to 2.10 where similar problems have been fixed.

#4

Ok, thanks dnsmichi, will get this done then.

#5

Unfortunately, upgrade did not help. Have upgraded Icinga2, Icingaweb2, director all to latest builds and I put a machine into Downtime last night:

image

and today it’s not listed in the downtimes and there is no ‘DOWNTIME ENDS’ in the history of that host.

How could I go about troubleshooting this, anyone encountered this?

Thanks

#6

I’m still facing this issue and have to start instructing my colleagues to use Downtimes all the time. Has anyone encountered this problem?

#7

Something to add: I set about 10 hosts for a downtime hosts + service for Thursday. All the hosts and services populated the Overview - Downtimes list. When I got back to the office an hour later, the list (Overview - Downtimes) was empty. I checked monitoring health meanwhile and I see this:

image

“Icinga has been up and running since…” - and I guess this is when the list emptied itself.

Icingaweb2: 2.6.1

Could that restart be related?

I also saw this:

but seems our version is after that was fixed.

edit: also just put another system into downtime for 10 minutes, it shows downtime start and end, it works fine, there was no program restart in monitoring health section shown.

#8

Another point, the active end point switch from one node to the other (causing the DOWNTIME to drop from list) seems to happen every time the reload occurs and that is as follows: --reload-internal 8464

full command being: /usr/lib64/icinga2/sbin/icinga2 --no-stack-rlimit daemon --close-stdio -e /var/log/icinga2/error.log –reload-internal 8464

Could I increase that? Seems to have been a default iirc

edit: Also, the scheduled DOWNTIME periods to seem to go into DOWNTIME START (despite not being in list in overview - downtimes) and do not send out alerts for that period which is how we want this to act but if I check the history for that host, the DOWNTIME END is never displayed.

#9

Resync of libs solved this.