Icinga2 - Distributed Mode - Architecture with more than two nodes per zones

This forum was archived to /woltlab and is now in read-only mode.
  • When I wrote 2 VMs with low load, low load means that all services are not running, keeping the VMs under low loads

    That is exactly what i understood. Let us call this constant number of checks "work", not "load" .


    So,if i look at the screenshot i understand that we have a constant work to be done between 7/16 16:00 and 7/19 8:00.

    Before 7/1712:00 that work was done by 2 VM's, after that by 3VM's. In the chart, i see lots of colored graphs without

    a legend. I consider these graphs to be the individual cpu load's of a given VM in %.


    There are high and low cpu loads then.


    Looking at the top most orange and cyan graph, i would expect from a bugfree icinga2 that this cpu load will decrease with a remarkable edge once we have 3 instead of 2 VM's doing that constant work (I would have expected something like the bottom-most grey graph but consider that to be "behind" the others).


    But i do not see that decrease.

    Instead, i see a big increase, lowering then to a still small increase of the cpu load.

    Which is what i interpret to be the "currently not more then 2 endpoints suggested" bug.


    May be i am still in the wrong film, but you must admit that from looking at the screenshot alone
    there is some room for interpretation.


    That all does not handle your problem that arises *after* 7/19 8:00 when the system is given all work to be done.


    I read that you have about 14000 checks per minute running snmpget in a python interpreter on a 24 core VM.
    Can you tell how much of the cpu load counts to the python and snmpget processes and how much count to the icinga processes ?

    If the major fraktion of the load counts to the python/snmpget processes, it might be worth to rewrite these in c or to

    find a better library that runs more effective.


    Also it might be worth to have a daemon in place that accepts the requests and queues them up so that the work can be done from a fixed number of pre-created worker threads.

    The post was edited 6 times, last by sru ().

  • Ok, so we're on the same page here :)


    From what we can see, Icinga2 process and checks generate a top CPU usage of about 85-90% with full work load. These 85-90% can be splitted like this :

    • 10-15% : Icinga2 process
    • 70-75% : Python + snmpget for checks

    Our VMs on our production environment also have the same work load, but run on a 2 VMs cluster (againt 3 VMs here). The Icinga2 part of the CPU usage is about 10%.


    Icinga2 does indeed require a little more CPU time while running on a 3 VMs cluster, but it looks like it is nothing we should worry about.
    I just ordered another VM from our system team, I'll submit the results of a 4 VMs cluster as soon as I have them.

  • When writing our first scripts, we compared standard Python with system calls, PySNMP and Perl. Execution times were almost identical for standard Python and Python with PySNMP libs. We decided to go for standard Python.

  • We decided to go for standard Pytho

    which would be my decision as well than - because that would decrease the dependencies of the python scripts.

    Everhow, 70%..80% at the actual checking side makes clear that this is the bottleneck to eliminate.