Icinga2 distributed monitoring security considerations

I have some questions on the security implications of some Icinga2 cluster features. The security section of the distributed monitoring chapter in the Icinga2 documentation lists config sync and remote command endpoint execution as disabled by default for security reasons.

So first of all, there are two basic modes of operation (for simplicity I’m not considering satellites or multiple masters for now):

  • In command endpoint mode, all the check scheduling happens on the master. The Icinga2 node on the clients is only executing checks as instructed to by the master. As the docs mention that the CheckCommands have to be defined on the client, I assume that the master sends a reference to a specific CheckCommand together with a list of variables to the client, which then assembles a command line according to the CheckCommand definition, executes it and reports back the result. The client itself does not know of the services actually defined (apart from what can be observed from the commands executed).
  • In config sync mode, all the host and service definitions relevant for a client get synced to it. The client itself then schedules the check executions and reports the results back to the master.

The exact behavior of the clients is configured by the accept_commands and accept_config options of the ApiListener:

  • If accept_commands = true, the master can execute any command that matches a CheckCommand definition as the user Icinga2 is running as on the client, including things like probing hosts using check_ping, make arbitrary HTTP requests around the network with check_http, querying network equipment with check_snmp, etc.
  • If accept_config = true, the master can push arbitrary configuration to the client, including arbitrary new CheckCommand and Service object, thus allowing the execution of arbitrary commands on the client as the Icinga2 user, leading to every admins favorite nightmare: opening the possibility to compromise the complete infrastructure by compromising the monitoring server.
  • If neither of these to options is set to true, the master has little control over the client. It only receives results from the client for hosts/services defined locally on the client. In this case, is there some command channel from the master to the client at all, like can it still trigger the execution of checks on the client (only for the predefined hosts/service)?

Is my understanding as described above correct? What are my options if I don’t want the clients to fully trust the master? I don’t think I can use any of accept_commands/accept_config, could I use some external tooling to sync the configuration between the master and clients which then applies additional sanitization? Are there other options?

1 Like

Security considerations always have two sides:

  • one side wants applications and hosts “do nothing harmful”, not even if one side is compromised
  • the other side wants to reliably check everything, and at best only manage one side (the master)

Both accept_commands and accept_config allow clients to go one step towards the management direction. At best they fully trust their parent node (“the master”).

In terms of syncing configuration to a child endpoint, there’s of course possible things one could do, e.g. lower the check interval and overload the box, to put a simple example.

Shell Injection

If you are toying with shell injection of commands, Icinga 2 escapes command parameters on execution to prevent exactly this attack vector. It is safe to assume that whatever parameters are sent, they won’t do harm on the shell itself.

If your plugins allow for parameter exploits and whatnot, this is a separate security consideration, set aside from Icinga itself.

Hardening

If you have compromised a monitoring master, you are root of many things. Users tend to make it more easy for attackers too, with default passwords, “ALL” as grant on databases, ssh keys and sudoers with full root access and more.

Things you should always have in mind, apart from any application or distributed environment:

  • use and configure “sudo” wisely, and limit command execution
  • harden SSH access, especially your master monitoring instance
  • never expose the master instance to the internet, not even if you want to have a public (Icinga Web 2) dashboard on the same box.

Conclusion

If you’re going the route with clients shouldn’t trust the master commands and parameters, don’t let the command objects being synced and disable the ITL include statements in icinga2.conf. Instead, use your own mechanism of managing those command objects on the clients. That can be config management tools, or whatever else comes to mind. Keep in mind though - if one compromises those sync tools, you are in a bad situation too.

Monitoring is a matter of trust, and the clients have to trust their parent endpoints in order to allow the entire functionality. Compared to other clients, you’ll have many levels to ensure security here. The CN == hostname == endpoint name requirement is an extended chain of trust you won’t find with other clients (NRPE and variants).

There also is a common feature request for Icinga 2 to sync plugin binaries through the cluster config sync. Apart from the argument of not being able to manage binary or script dependencies, this won’t be added to Icinga 2 also for security reasons. It must not be possible to deploy scripts, change the check_command configuration and do nasty things from a compromised master.

TL;DR - keep your master safe in a dedicated network VLAN, put satellites in front as command schedulers, and harden command execution everywhere. Then you can even use common functionality such as config sync and client commands. This kind of security I would expect in enterprise environments, still I know this sometimes is not reality.

Bonus: Manage the configuration and plugins with a different tool and ensure that no-one can manually override/manipulate it.

3 Likes

Thanks for the answer. Since you didn’t object to my post, I assume that my understanding is reasonably correct as summarized in my post.

Is there anything in config sync mode that prevents the master from creating a new CheckCommand with a malicious command string?

By the master, you mean port 5665? Is there anything that would be a problem or just as an additional layer of security?

Of course, but some config management tool is likely in use anyways. It’s always good to reduce the number of components you have to worry about. So if it’s possible to use Icinga2 in a way that requires less trust, why shouldn’t you? (I know, the answer is easier setup and comfort, so it depends on your priorities and resources.)

IMHO I’d consider this more like an out of scope thing rather than a matter of security.

Even with accept_config = true? (see my comment on the first quoted block)

From my understandings, clients would trust a satellite as much as they would trust a master, so throwing in a layer of satellites doesn’t sound like it would help that much, except maybe limiting the impact of a satellite compromise to a subset of clients. At least I didn’t find anything about Icinga2 doing something fancy there like passing signed messages from the master.

So what I’d like to have is a master that merely collects and processes information, but has no control over the clients, except maybe forcing a recheck, pausing checks, etc (roughly the stuff you can do from Icinga Web 2 without Director). Maybe that’s something bottom-up sync mode would have given me, but well, that’s gone now so there’s little point in thinking about it.

That needs an example to get a sensitive answer.

Anything beyond Icinga 2 itself. Do not open port 22 to the world, do not expose port 80 but enforce 443 everywhere. Don’t even expose 443 to the world, but use a proxy up front, or create a custom dashboard for your “looking glass” on your website (if you are an ISP for example). If you are putting the MTA on the same host, ensure that it uses a network local mail relay.

And so on. Icinga just uses port 5665 of many, that’s correct.

I think different these days. There are many tools which do a good job on their own, and integrate well enough so that you don’t have to worry about the so-called “moving parts” sysadmins fear the most.

If you don’t have some sort of config management system or lifecycle environment, you’re probably either a friend of doing things manually (and bad mouth would say “no-one can fire you, they need you to make the network work”). Or your company has so many constraints and different departments that such a change is impossible or would take years from now.

Either way, if you leverage certain parts onto these tools, you can put away management things from Icinga 2 as well. There’s a different topic on only managing the configuration with Puppet, and not using the cluster config sync somewhere here. You can do that but rest assured that not many others do that. It gets complicated, and is not what Icinga is built for. The focus lies on functionality such syncing configuration or executing commands, that’s a key functionality for a distributed monitoring system to work with.

If you want to go the “security over functionality at all cost” route, there’s ways to do that. I don’t exactly remember why accept_* are false by default, as they harden the user experience, but IIRC the general idea is to raise awareness to users to ask themselves, why they would need this step. And which implications such a feature has, like you do.

Still, the most reliable way to use Icinga is to use the config sync and the commands in a distributed environment. That’s the things we know for many years, can support in a reliable manner (don’t expect answers within minutes or days though here, we’re not support workhorses). If you want to go the management route and keep your nodes in their jail or silo, you are free to do so, but answers to questions may become harder as the audience is not so big.

Still, it is a matter of security too. Believe me, you need strong arguments against such a feature, the “out of scope” thing doesn’t work for those who want “the one and only tool which does everything for me, and does cost me nothing”.

The benefit here is that the master and the satellites may communicate over different networks, VLANs, VPNs or whatever comes to mind to secure this on a network layer. The master itself does not execute any command, it just processes check results. Such data is hard to exploit in memory.

Still, it may not solve your problem entirely but add another layer of complexity here. Common use cases show that many companies built it that way, and enforce satellites into DMZs for example.

No, bottom-up would not solve that, as the client would always send its check results to a compromised parent node coming from local configured config objects. If an attacker reads the traffic, it is fairly easy to configure these objects on the parent and read what’s going on on the client, especially expose sensitive data. That’s one thing for what I am glad that this mode is gone.

On the other hand, you’d want an intelligent endpoint who should be configured “standalone”, and just have data available once the parent node asks for it. The same question applies - how do you control which data is exposed to the parent node?

The above also has a problem: The user is enforced to configure the client with all its settings, some have to be known in advance. There’s not much automation possible in this regard, and as such this won’t really work in large environments. For a small (smart) home, it would do.

I don’t think that we’re going to fully solve your idea with such a client in the near future. You’ll have to go with what’s already there, or consider a different solution.

Thanks for the ideas and open discussion though :+1:

I was thinking of CheckCommands + Services like this one, which gets synced and executes just as one would expect (just tested with Icinga 2.8.1):

object CheckCommand "test" {
    import "plugin-check-command"
    command = ["/bin/cat", "/etc/passwd"]
    // command = ["/bin/touch", "/tmp/master-executing-some-command-on-the-client"]
    // command = ["/bin/sh", "-c", "echo Hi, executing some command on `hostname`"]
}

object Service "test" {
    import "generic-service"
    host_name = "icinga2-client.localdomain"
    check_command = "test"
}

So what I’m taking away from this:

If you use Icinga2 cluster mode with accept_config = true and accept_command = true, act as if you are granting the master (or any node in a parent zone) SSH access to the icinga/nagios user on the client system. This might require additionally restricting the user which can be difficult and error-prone (it must not be able to read sensitive information the master shouldn’t be aware of from the file system, it must not be able to make network connections to other systems which might restrict access based on the source address, etc.). You are also not limited to programs you find on the client, there’s likely some Python or Perl interpreter installed that you can use to execute pretty much arbitrary programs. It’s also possible to download additional binaries or if network access is restricted, they can also be passed base64 encoded in the config file.

Unfortunately there is also no documentation on the internals of the cluster API except the statement “The message protocol uses an internal API, and as such message types and names may change internally and are not documented.”. That’s probably the case because you don’t want third parties to implement clients for it and relying on it being stable. But it also makes it way harder to understand what’s going on in an Icinga2 cluster, which is especially important for considering the security implications. Yes, you can read the source, but that will waste a huge amount of time if you really just need a rough overview of the control and data flows.

I came across this conversation just after I wrote Distributed monitoring with top-down setup: security considerations. Julian, you seem to be concerned about security and you had the same ideas as me. I wanted to dump the old, unsafe NRPE stuff for a modern, secure solution, but the only current option for distributed monitoring (top-down) is a complete security nightmare.

Are there any chances that bottom-up may still be supported in the future? And in a secure way?

I don’t really have a good solution for this. The setup I was asking for is still using a setup from before migrating to Icinga2 with passive checks submitted via nsca-ng. The existing services are synced from the nodes to the Icinga2 instance using salt. I wouldn’t recommend building a new setup that way though. If you have some configuration management anyways, a more promising way could be to just build the bottom-up config sync yourself within that, however I’ve never got to try this.