Performance Issue


(Kevin) #1

Hello everyone,

me, admin of an Icinga2 Server with around 750 Hosts and 10.5k Services noticed, that my service view is getting slower and slower from time to time (better: every view which includes services).
System is all up to date, including php, icinga2 itself and CentOS 7.

We migrated 31 (increasing) Satellites with seperated zones each. Satellites are connected over WAN.
Master is a VMware VM with 4 Cores and 8GB RAM, 100GB of SSD.

How can I stop the server, getting slower and slower? Are there any kind of system requirements or recommendations?

All cores chilling at 5-15%, but if I click the service view, they jump to 90-95% for 1-2 seconds and the page won’t show until then.
Additionally I’m getting performances peaks every 45 minutes and I don’t know why.

Is this behaviour getting better with the “icinga2 database” which is coming?

Do you have any tips to get the server clean and fast as possible?

Best regards,
Kevin


(Brian LaVallee) #2

Can you provide more details about your icinga2 environment?
If you’re using a localhost database? That could be contributing to the load.
Is ‘service view’ icingaweb2 -or- a ‘page’ provided by an add-on module?


(Kevin) #3

Server holds the following stuff:

  • Icinga2 Master
  • MariaDB for Icinga
  • InfluxDB for Grafana, and Grafana

The “service view” is the embedded one, no custom view or module :frowning:

As you can see on my screenshots, the load increased over time, but we changed nothing except the number of satellites.


(Brian LaVallee) #4

Can you share the output of icinga2 feature list?

It’s still not clear about the ‘service view’ you mentioned. icinga2 only has a CLI. Are you referring to Overview -> Services from the icingaweb2 interface? Are you viewing a single service in the icingaweb2 interface?

Assuming you’re using graphite with grafana, this graph generation could be the cause of your system load.


(Kevin) #5

Enabled features: api checker command ido-mysql influxdb mainlog notification

Icingaweb2… My bad…

Nope, sadly not. No graph is getting rendered while listing the services in the web UI.


(Brian LaVallee) #6

So… I have never been too concerned about monitoring load. As long as issues are detected (and reported) in a timely manner.

While the cyclical load could be any number of cleanup routines, icingaweb2 loading slowly would surely be a concern. I have roughly 2000+ services, but icingaweb2 loads with no issues.

I do not have the DB on the same server, using a multi-master configuration with a central DB. Can you offload the DB to a dedicated instance, to see if that improves things?


(Michael Friedrich) #7

Try analyzing which process is causing these peaks in load. I would say this is MySQL and not Icinga.


(Carsten Köbke) #8

Could also caused by InfluxDB, the standard retention job that is running.
Also i would seperate Influxdb + Grafana on a seperate host (InfluxDB should have 16gb+ memory), and check if you have configured “tsi1” or “inmem”, as you database without retention (99,9% doesnt configure any retention) will grow, its better to switch to “tsi1” for InfluxDB. If you switch to tsi1, dont forget to create the index file for old data, because influx will create them only for new data and let the old data stay with “inmem”. see https://docs.influxdata.com/influxdb/v1.7/tools/influx_inspect/#buildtsi for more details about building tsi indexes.


(Kevin) #9

@poing We use a single server on a vmware cluster for simplivity.

@dnsmichi Looks like it’s the mysql service

@Carsten I added 8GB of RAM for a total of 16GBs now. Additionaly I added 4 cores, for a total of 8 now.
Looks like I missread the usage of CPU in htop… the icinga2 service is still at 30% without any change.

I think the RAM did the trick for now.

The instance is running for 10 months now, and I wan’t to delete performance data older than 12 months.


(Carsten Köbke) #10

Did you create a retention policy for your database?


(Kevin) #11

Not for now, but I already pinned your post in my favorites :smiley:

Is there a simple way to cut off old data?


(Carsten Köbke) #12

Just create or modify retention policy to delete all data older 1 year :slight_smile:
No need for CQs