I'm brand new to Icinga2, trying to design a replacement for Nagios. Today it's a single instance on AWS EC2 with ~700 physical hosts and ~300 AWS EC2 instances, with ~25K checks running. The config is fully managed by Chef, no manual edits anywhere. We're fully into Infrastructure as Code, using AWS CloudFormation to build all our AWS assets. The ~700 physical hosts are part of our product, I expect that to increase to >3000 in a year and who know what after that. Whatever I build needs to scale out horizontally really well and be easy to upgrade. And with full automation of course.
We're moving a lot of stuff to Docker on ECS (AWS's docker infrastructure), and I have a design in mind but no real idea if it'll work since my Icinga knowledge is only from reading the (very detailed!) docs and playing with a tiny local instance, so I'm looking for feedback to see if this is sane.
* Use RDS (Amazon hosted database) as the DB. Possibly look at Aurora.
* One master Docker instance. It will manage the DB creation/update scripts. If it dies, ECS will create a new instance. If a single instance is a bad idea I can create two.
* A set of satellite instances all in the same zone that will do the actual polling. Exact number will be decided by autoscale parameters, they can come and go.
* A scheduled docker instance that will do discovery and manipulate the config using Director.
What I don't know is how the certs are handled. I clearly need auto-signing and can share a secret ticket between the docker instances. Do I need a shared filesystem between the docker instances to store the certs?
Is this sane? Anyone done something like this before? Any gotcha's I should look out for?