Icinga 2 API Question

Hi Icinga Community,

First thank you for all the hard work into the Icinga project and the excellent work with the recent update.

I want to first describe my environment; I’m running two master nodes {HA} backed by a separate galera cluster. Each master hosts has 32 vCPU and 64GB RAM. The SATs are 8 vCPU, 64GB RAM. We are currently running Icinga2 software version 2.11.2 on the masters and 2.10.5 on the SATs. We are running an average of ~20k services per SAT zone {7 zones total}.

When users set downtime via GUI it appears to be non-responsive and doesn’t take effect. If downtime is set via the API, it appears to be successful but notification isn’t disabled and a pageout does occur.

I’ve also noticed when downtime is set for a large group of services, it may set successfully but to unset downtime some services are left untouched.

Any ideas or thoughts as to what maybe occurring? Could this be a resource limit based on the number of services we are running? We haven’t found anything in the logs which could identify the behavior. Any help would be greatly appreciated.

Hi @bennco1

From what you describe, i’d think there is an issue with the read/write to the Galera DB ( though in implementations I have done of such a cluster i find that highly unlikely), since the Web is reading & updating the DB for presentation, could it be that you have a delay in determining the Master for write ?

The API is talking directly to the masters and after that pushed to the DB, and thus the read the UI is doing can find the definition faster.

Further, the official forum has moved to https://community.icinga.com/
and you have the Core developers respond to questions there, so I’d recommend, If you do not get the answers you seek, try there

Regards

2 Likes

Hi,

first of all, welcome to the community. :slight_smile:

I agree with Aflatto and it seems to be a sync issue at the database. Does the following query show any unexpected result or deviations between all notes in the cluser?

show global status where variable_name in (‘wsrep_cluster_size’, ‘wsrep_cluster_status’, ‘wsrep_connected’, ‘wsrep_ready’);

Cheers,
Marcel

nope not at all

MariaDB [(none)]> show global status where variable_name in (‘wsrep_cluster_size’, ‘wsrep_cluster_status’, ‘wsrep_connected’, ‘wsrep_ready’);
±---------------------±--------+
| Variable_name | Value |
±---------------------±--------+
| wsrep_cluster_size | 3 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_ready | ON |
±---------------------±--------+
4 rows in set (0.00 sec)