When I wrote 2 VMs with low load, low load means that all services are not running, keeping the VMs under low loads
That is exactly what i understood. Let us call this constant number of checks "work", not "load" .
So,if i look at the screenshot i understand that we have a constant work to be done between 7/16 16:00 and 7/19 8:00.
Before 7/1712:00 that work was done by 2 VM's, after that by 3VM's. In the chart, i see lots of colored graphs without
a legend. I consider these graphs to be the individual cpu load's of a given VM in %.
There are high and low cpu loads then.
Looking at the top most orange and cyan graph, i would expect from a bugfree icinga2 that this cpu load will decrease with a remarkable edge once we have 3 instead of 2 VM's doing that constant work (I would have expected something like the bottom-most grey graph but consider that to be "behind" the others).
But i do not see that decrease.
Instead, i see a big increase, lowering then to a still small increase of the cpu load.
Which is what i interpret to be the "currently not more then 2 endpoints suggested" bug.
May be i am still in the wrong film, but you must admit that from looking at the screenshot alone
there is some room for interpretation.
That all does not handle your problem that arises *after* 7/19 8:00 when the system is given all work to be done.
I read that you have about 14000 checks per minute running snmpget in a python interpreter on a 24 core VM.
Can you tell how much of the cpu load counts to the python and snmpget processes and how much count to the icinga processes ?
If the major fraktion of the load counts to the python/snmpget processes, it might be worth to rewrite these in c or to
find a better library that runs more effective.
Also it might be worth to have a daemon in place that accepts the requests and queues them up so that the work can be done from a fixed number of pre-created worker threads.