Icinga2 virtual machine 100% CPU

icinga2
icingaweb2

(Andrei Muntean) #1

Hi Everyone,

I have migrated an icinga1 configuration to icinga2.
But for some reason the cpu stays most of the time above 90%…even 100% a lot.
We monitor ~3000hosts and ~6000 services. The active checks ~3000 are made every 30 minutes and the passive checks ~3000 are made every 10 minutes.

On this machine runs the following:
icinga2,icingaweb2,mysql,apache2 and jasperserver(but this does not affect , even if I stop it I get same high cpu)

The virtual machine has 12vCPUs and 32GB RAM memory.
Are 12 vCPUs too few for a configuration like this ?
If I stop icinga2 service the cpu goes down to 1%.

Can someone advice?

Thank you in advance!


#2

Using top or something similar to narrow down the process(es) consuming the performance would be my first try.
Perhaps your MySQL database is one part of the problem.


(Rafael Voss) #3

HI,

on my Master the more complex checks taking the most CPU Usage. So i moved the checks to another “Checker node” to get more performance on the webinterface and for icinga itself

You can use

accton on

wait some time f.e. 24h, then

sa --percentages --separate-times

Gives something like:

 38610  100.00%  274747.72re  100.00%      55.23u  100.00%       7.03s  100.00%         0avio      6774k
      19    0.05%   15484.65re    5.64%      11.69u   21.17%       2.69s   38.22%         0avio     71644k   apache2*
    2327    6.03%      34.79re    0.01%      10.62u   19.23%       1.15s   16.37%         0avio     16274k   check_iftraffic
     984    2.55%       9.44re    0.00%       7.90u   14.31%       0.29s    4.14%         0avio     10576k   check_apc.pl
    1404    3.64%       9.13re    0.00%       6.33u   11.47%       0.68s    9.70%         0avio      6638k   snmpget
     480    1.24%       5.90re    0.00%       4.44u    8.04%       0.48s    6.83%         0avio     23856k   check_printer
      83    0.21%      13.30re    0.00%       3.82u    6.92%       0.27s    3.78%         0avio     25934k   check_esxi_hard
     888    2.30%       7.00re    0.00%       3.68u    6.66%       0.36s    5.17%         0avio     15351k   check_fortigate

More Detaile here:


(Andrei Muntean) #5

Hi,

I have checked with top command but i don’t get much from it.
There are a lot of apache2 processes , can this be a cause ?
Also icinga2 uses most cpu% 600-1000% .

I have tried different optimisation settings for apache2 but same result.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2017 nagios 20 0 6146104 272464 18648 S 1124 0.8 51:20.60 icinga2
929 mysql 20 0 10.171g 1.179g 15380 S 9.3 3.8 0:12.66 mysqld
21957 icinga 20 0 76220 34104 4664 S 5.3 0.1 0:00.16 check_wmi_plus.
1651 www-data 20 0 496148 68088 14804 S 1.7 0.2 0:02.25 apache2
1624 www-data 20 0 452908 28596 18792 S 0.7 0.1 0:00.95 apache2
1649 www-data 20 0 477692 49448 14644 S 0.7 0.2 0:02.54 apache2
5437 www-data 20 0 459216 30692 14548 S 0.7 0.1 0:00.53 apache2
342 root 20 0 54416 12260 11660 S 0.3 0.0 0:02.28 systemd-journal
701 root 20 0 9810.4m 1.010g 16676 S 0.3 3.2 1:21.72 java
1563 www-data 20 0 475624 47380 14656 S 0.3 0.1 0:02.74 apache2
1564 www-data 20 0 459216 33904 17684 S 0.3 0.1 0:01.27 apache2
1587 www-data 20 0 477728 51756 16908 S 0.3 0.2 0:01.72 apache2
1647 www-data 20 0 475664 47300 14536 S 0.3 0.1 0:02.55 apache2
1655 www-data 20 0 455056 26312 14472 S 0.3 0.1 0:00.50 apache2
1730 www-data 20 0 475804 53404 20540 S 0.3 0.2 0:02.05 apache2
1834 www-data 20 0 453172 26280 16100 S 0.3 0.1 0:00.76 apache2
2045 nagios 20 0 994028 6592 4760 S 0.3 0.0 0:04.94 icinga2
5440 www-data 20 0 459248 31028 14852 S 0.3 0.1 0:00.62 apache2
8288 www-data 20 0 453084 27604 17572 S 0.3 0.1 0:00.43 apache2
12190 www-data 20 0 455056 26520 14684 S 0.3 0.1 0:00.38 apache2
12897 www-data 20 0 453200 32892 22556 S 0.3 0.1 0:00.66 apache2
15141 www-data 20 0 455124 26752 14700 S 0.3 0.1 0:00.30 apache2
20346 www-data 20 0 455244 33152 21024 S 0.3 0.1 0:00.10 apache2
20347 www-data 20 0 455100 32712 20700 S 0.3 0.1 0:00.07 apache2
20378 www-data 20 0 494248 66032 14896 S 0.3 0.2 0:00.81 apache2
20476 www-data 20 0 459176 30624 14628 S 0.3 0.1 0:00.23 apache2


(Andrei Muntean) #6

Hi,

Using the command you have me i get the following values

175855 100.00% 21216.83re 100.00% 2999.08u 100.00% 34.09s 100.00% 0avio 5101k
10 0.01% 1099.92re 5.18% 2938.01u 97.96% 22.45s 65.87% 0avio 707622k icinga2
13207 7.51% 760.90re 3.59% 34.38u 1.15% 3.10s 9.08% 0avio 19041k check_wmi_plus.
1529 0.87% 6408.44re 30.20% 20.16u 0.67% 2.19s 6.42% 0avio 41186k apache2*
10 0.01% 1099.92re 5.18% 1.34u 0.04% 5.85s 17.15% 0avio 241894k icinga2*
12592 7.16% 227.59re 1.07% 4.19u 0.14% 0.03s 0.09% 0avio 4290k snmpget
116 0.07% 0.87re 0.00% 0.45u 0.02% 0.01s 0.04% 0avio 9120k apt-get
13210 7.51% 933.25re 4.40% 0.24u 0.01% 0.05s 0.16% 0avio 5372k wmic
41653 23.69% 1214.89re 5.73% 0.01u 0.00% 0.26s 0.76% 0avio 3596k check_icmp
587 0.33% 40.68re 0.19% 0.22u 0.01% 0.01s 0.03% 0avio 4626k snmpwalk

Can the check_wmi_plus or apache2 process affect ?


(Rafael Voss) #7

The 6. column is the cpu usage. By quickly looking over it, it look like icinga2 uses 97,96% of your cpu. Thats far to much i think. As you can see, my icinga2 isn’t even in my list. So the bottleneck is icinga2 in my opinion, but it look more like a configuration error or a bug to me.

But i have to say, that my master hat a lot fewer checks than your master, as I am using satellites, so i never tried so much services on one host. Maybe he has around 500 Checks that are running in interval of max 5 mins.


(Andrei Muntean) #9

Thanks for the answer.
I was planning to split the checks to satellites, so i will do it as soon as possible maybe it will help.