I am experiencing strange behaviour with my icinga2 installation.
I am using icinga v2.6.3 + icingaweb 2.4.1
Here is how the architecture setup is:
I have 2 master nodes running in HA setup - Icinga-master01 - that holds also all the configuration, then icinga-master02 - both part of the master zone.
Then there is one satellite that sits in same network as masters - but has a public address (reason for this setup is that we had very limited amount of public IPs, and I did not expect any problems that this would create)- This satellite is connected to a single host outside the environment and executing http checks to our applications.
The satellite name is icinga01 (zone icinga01) and there is client hops-icinga2-us-master (the host dedicated to run the http checks - runs checks on about 30 webpages).
Rest of the monitored clients (about a 100 of them and 1100 services) are in same isolated network with masters - connected directly to master zone.
Both masters are connected to same IDO db.
Then there is another host running all our UI services - icingaweb2, grafana and kibana. Icingaweb uses the connection to IDO db and provides the UI for our Icinga environment.
Now the odd behaviour - when I have both master hosts services up and running I have problem that some services does not seem to work, there are no notifications from them or so. All seems ok there are no alerts but I have experimentally confirmed that services are randomly "registered" either at master1 or master2... Problem is that some services seem not to run checks, although the service is seemingly OK, it will not notify if there is problem with it as it will not get check data. The last check and next check value just rises and rises in icingaweb2 interface.
When I shutdown icinga2 service on master02 the system is back operational and all the checks are executed correctly, but there is no HA.
Another weird behaviour is that checks executed via that satellite -> http checker machine will sometimes produce 2 notifications - when the webpage goes down and when it is back up - as only master1 is online both notifications are coming from master1.
Also when I reconfigure the master1 and restart the icinga service on it it will create an alarm for about 10 minutes telling that both master01 and master02 cpu being maxed out - strange is that master02 service is off at the time...
I can provide any logs or configuration files, I am not able to find way how to fix this - just I have tried to look for any possible cause and I am not too sure if I can have both Icinga2 masters talking to same IDO db, even though I did not find any article telling otherwise. Also maybe I have some problem with PKI handling there on the machines. I have used community chef cookbook to handle Icinga configuration + some added code to handle the clients and satellites configuration in distributed environments, it looked to be working until I have added the satellite with http checker node.
I would be grateful for any hints here.
Thanks in advance