We are actually working on a supervision platform with a colleague and we are facing a problem here.
system : RHEL CentOS 7
Icinga2 : 2.6.3
Our architecture for now is a cluster of master where we put configuration and three child zones with another cluster of checkers in each zones.
The problem is that one of this zones already have too much checks to perform and is already loaded at almost 100% of CPU Load.
The context :
- we can't upscale those checkers anymore (already 24 vCPU per checkers)
- we can't put 3 checkers in a zone because there is a warning from icinga2 itself warning us for bad CPU perf in this case
- we can't increase time between checks
- we tried 3 levels archi (with satelitte and client) but satellite aren't supposed to perform checks
How can we deal with this issue ?
Why icinga2 can't balance the load between more than two endpoints ?
Please, any help would be kind and appreciated.