Hi, I’m running Icinga2 in my infrastructure and we running normally on 3% service fails, which accounts for nearly 2700 notifications that needs to be sent every X minutes (my integration with an alert collector system requires so as a keep alive - say X is 5 minutes).
When enabling the notification object, my icinga master suffers from an extremely high load, which I can only assume being icinga trying to send all at the (new) notifications at the same time; … to the point of bringing down icinga2 process. The cycle repeats every notification period
Note that since start of time, Icinga has been delivering notifications to a chat solution, but I don’t see the load peak, as I believe the notification timestamp has come to a distribution-along time, so triggers are less abrupt (… is there a way to test this?)
That brings me to following questions:
- Is there a way I can throttle this notifications? - so that the notification is distributed during the notificaition period as opposed to queueing AT notification implementation? (a.k.a: I have 2000 notification and 5 minutes ; lets try sending 7 per second)
I was thinking of something of the sort like the checker configuration on satellites?
- Is there a way I can tell my Icinga infrastructure something like All notifications must be sent from this single host(s)… omiting the master ones ? (hey, get this off your shoulders, I can give you a couple of slaves to do so… )
logs would show nothing remarkable, save eventual failuires on the icinga failing to deliver a notification, but only a handful of the notifications, not the whole count of the set.
Any Ideas on how best implement handling of this situation? . I have have to roll back the integration with my twice due this performance bottleneck
- integration with chat solutions is a python script
- integration with my event collector is a python script (same with a few minor modifications)
- Interconnection occurs on a LAN (really, not network related)