Icinga2 Bandwidth usage

  • Hi,


    I am going to implement a monitoring solution with icinga2 for my firm and I need to know how much bandwith is this going to use. The solution has to monitor about 10000 hosts with 10 per host and I am worried about the bandwith utilization. The traffic has to cross some firewalls and is important to not saturate the current bandwith available or increase it if it would be necessary.


    Regarding to this, How much bandwith is used by a check command (an estimation will be fine)? I will use the following types of checks:

    ping, cpu, ram, snmp, and some windows and linux services.


    Many thanks for your help.


    JM

  • Hi,


    Here is an example of my actual Proof Of Concept. I don't know if it will be relevant to you because the size of my solution is far from yours.

    However it could gives you a global idea.


    I've 13 hosts and around 65 services that are check every 30 seconds.

    All the checks are scheduled on the clients side and only results are sent back to the master.


    Here is a graph the measured the bandwidth on the master.

    As you can see, it send more than it receives.

    The amount is quite small (few kbits per second).




    As I said, it has to be compared with your needs.

    But that could gives some idea.

  • Hi Pcasis,


    Thank you for your answer. With this information, 65 services and 87Kbps as aggregate traffic, it means 1.33Kbps/service. In my case (10000host x 10 services = 100000 services splited in 4 satellites plus one master (24000 for each satellite)) it will be 24000*1.33=31920Kbps wich means 31Mbps per Satellite and 31*4= 127Mbps for de Master. 31Mbps seems to be a lot of bandwith and I think that I am making some mistake or lose something, any idea?


    How do you configure your PoC to obtain this information, which check do you use? check_itraffic via snmp? I am trying to do my own PoC to test this.


    In my case I will run each check every 5 minutes.


    Any advice will be appreciate.


    Thanks!

  • How do you configure your PoC to obtain this information,

    I'm using Observium http://www.observium.org/ to get those graphics and it uses SNMP to fetch data. I don't know which counter it uses.


    Afterwards, I don't know if we can conclude on how many Kbps are used per services. Keep in mind that my architecture is a distributed one.


    Which means, I guess, the master and the clients are exchanging date to keep the connection between them on (Someone to validate it ?)

  • Check output and performance data normally influences the size of those messages causing a lot of bandwidth. It isn't a reasonable comparison to multiply 65 hosts into 10000, sizes may vary and so does the data. If you're for example running 10000 service checks every minute, and have a check output of 4 KB, do the math. Satellite zones will replicated check result messages and other things needed to stay 1:1 the same, and of course send such events back to the parent zone. Otherwise you won't achieve any parity and might lag recent data.


    Imho you should get a test lab, build up 2 ha masters, 2 satellites and empower them by some dummy checks with produce a fair amount of check output and perfdata. Then do your measurements. Plan wisely and adjust the check_interval and retry_interval to real world values too.


    If you're looking for examples, the Icinga Vagrant boxes hold some configurations and examples how this could be done in a simple fashion.