Deploying with Docker on AWS ECS and RDS

This forum was archived to /woltlab and is now in read-only mode.
  • I'm brand new to Icinga2, trying to design a replacement for Nagios. Today it's a single instance on AWS EC2 with ~700 physical hosts and ~300 AWS EC2 instances, with ~25K checks running. The config is fully managed by Chef, no manual edits anywhere. We're fully into Infrastructure as Code, using AWS CloudFormation to build all our AWS assets. The ~700 physical hosts are part of our product, I expect that to increase to >3000 in a year and who know what after that. Whatever I build needs to scale out horizontally really well and be easy to upgrade. And with full automation of course.

    We're moving a lot of stuff to Docker on ECS (AWS's docker infrastructure), and I have a design in mind but no real idea if it'll work since my Icinga knowledge is only from reading the (very detailed!) docs and playing with a tiny local instance, so I'm looking for feedback to see if this is sane.

    Proposed design:

    * Use RDS (Amazon hosted database) as the DB. Possibly look at Aurora.

    * One master Docker instance. It will manage the DB creation/update scripts. If it dies, ECS will create a new instance. If a single instance is a bad idea I can create two.

    * A set of satellite instances all in the same zone that will do the actual polling. Exact number will be decided by autoscale parameters, they can come and go.

    * A scheduled docker instance that will do discovery and manipulate the config using Director.

    What I don't know is how the certs are handled. I clearly need auto-signing and can share a secret ticket between the docker instances. Do I need a shared filesystem between the docker instances to store the certs?

    Is this sane? Anyone done something like this before? Any gotcha's I should look out for?



  • You can use CSR auto-signing in combination with CLI commands like node setup. Still, you are not bound to it. If you prefer to deploy your certificates "manually", accompanied with the required configuration bits, from your own CA and signing mechanism, that's also a possible way to do. Especially for containers which could map volumes.

    Still, I'm not sure if it is a good idea to run icinga2 inside a Docker container side by side with the actual application. From what I understand of containers, there should only run on application and you inspect its state from the outside, through Docker itself. I've tried attempts with supervisorctl and multiple applications inside one container, but they don't seem best practice.

    In terms of the database - DB IDO needs MySQL or PostgresSQL. Dunno what Aurora or RDS are exactly.

    The rest looks good - from a point of view who hasn't much experience with container monitoring.

  • Thanks for the reply! I wasn't terribly clear on the original question, I'm not planning on running Icinga2 with the app in the same container, I'm only looking at deployment options for Icinga2.

    A follow up question: What's the impact if satellite instances come and go? For example during an upgrade I'll do a rolling replacement of all the satellite instances. What will happen to all the checks?

    And do I even need a master in this scenario?



  • If you have satellites doing the polling, you should have master. If you have for instance a HA master zone with 2 instances, this should fairly be enough. Depends on your check size though, but such things can be helped with more cpu/ram. Meaning to say, if you an instance with 1 core and 1gb ram, it is not much run. Assign resources as needed, one or two instances in a zone are fine.

    Masters and satellites make sense if e.g. the satellites remain in a DMZ whereas the master is not able to directly connect to monitored services. The master then does nothing in terms of checks, it only distributes configuration and receives check results. It will also process and dump states and metrics to DB IDO and Graphite/InfluxDB, and you have Icinga Web 2, Director and maybe Grafana on top. The satellites then don't need such, they just receive configuration, run checks, and return check results to the parent zone.