I’ve been been running an Icinga cluster with HA configuration hosting 9k servers and around 250k services.
As time went by, the icingaweb2 interface started to slow down, taking a considerable amount of time to render information, specially while querying specifics about a service.
Looking at DB performance data, the amount of slow queries increases every time a configuration item gets updated (this happens all day long as people requests changes).
Checking with our DBA team, they recommend to either optimize slow queries or split read queries between the MySQL master and slave node.
The slave node has no CPU, load or DB utilization at all.
Looking at the logs I can see once in a while the “Your database is not able to keep up” legend.
DB specs, both Master and Slave:
8 core CPU.
32 GB RAM.
icinga2 and ido-mysql version 2.4.
A few questions before moving forward into building a solution MySQL Galera, where multiple “masters” are available for read and write operations:
Has anyone encounter this situation before?
Is it possible to configure the ido-mysql plugin to split reads and writes onto different MySQL nodes?
Has any query optimization fixes been added into latest versions of Icinga? I may look into updating the app but will cause me trouble if legacy support for the Graphite plugin has been dropped (we use that schema in our prod environment).
Any recommendations on how to keep up with this increasing load? We’re planning to add another 250k checks into the mix
Thank you all for reading!