Testing the icinga2 agent connection to the clients

I would like to check the status of the connection to the agents on the clients in order to determine if I need to (re)install or fix something.

Here is some pseudo-code:

root@icinga_master:~/icinga2 --check "check_dummy 0" -H my_client_ip --use_agent

So, a sort of quick “agent status check”.

Any ideas?

Hi,

you can use the built in Check Commands icinga, cluster, cluster-zone.

https://icinga.com/docs/icinga2/latest/doc/10-icinga-template-library/#check-commands

Also have a look into the documentation topic Health Checks.

https://icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#health-checks

Best regards
Michael

2 Likes

Thanks @mcktr!
Sure… But, from what I see, those are the normal checks triggered by icinga.

Sorry, maybe I didn’t explain it clear enough. I am searching for something more “real-time” or interactive.

Let’s say I have a configuration manger set for the master and clients. I would like that the first time I run a new client, the master checks:

  • Can I connect to that client right now? (a client which is supposed to run the agent)
  • If yes: we are good to go
  • If not:
    • Ensure icinga2 is installed (as agent) and running on the client, with firewall, SELinux, … properly set.
    • Issue a certificate request from the client to the master. Or get a ticket from the master to sign the certificate on the client.

That doesn’t exist since the connection may not be actively done by the node where you’re on the CLI. Both directions are valid, and creating a command for one will lead into false positives and more questions than this would be helpful.

The troubleshooting docs hold more system tools which can be used to troubleshoot real connection issues, if you’re not in the configuration stage already.

1 Like

@dnsmichi: I think I still didn’t explain it clear enough… sorry about that.

I’ll try to explain the whole thing as briefly as I can:

  • I try to build my own Ansible installer, which builds everything in one run and keeps it running idempotently. With everything, I mean icinga2, icingaweb2, graphite, director and (Windows/Linux) clients with or without agents.

  • At some point, I add new hosts if the “inventory hosts” are not within the output of:
    icingacli monitoring list hosts | grep ' UP '| awk '{print $2}' | tr -d ':'

  • I set an agent_hosts group for hosts that should run the agent.
    I used to check for hosts with signed certificates with:
    icinga2 ca list | grep 'CN = ' | grep '* ' | awk '{print $NF}
    and compare the result to the agent_hosts list.
    If any host on agent_hosts is not in the output of the previous command:

    • ensure icinga2 is installed and running
    • remove old certificates
    • run the agent configuration script
    • … and sign the certificate on the master
  • Now I have noticed that if we remove files on /var/lib/icinga2/certificate-requests/ this whole system gets screwed. Also if we have multiple CSR for each client.

  • So, how should I decide whether to run the “agent (re)configuration script” on a client or not?

I could also remove the certificates and generate new ones every time I run the configuration, but that is quite a dirty thing to do.

Maybe I could use the /var/lib/icinga2/icinga2.state file of the client, and compare the timestamps.

I might be trying to reinvent the wheel, or doing something completely unnecessary. But in that case, please, some hints on how to get myself on the right path.

I strongly advise against file manipulation or manual parsing from anything underneath /var/lib/icinga2 - this directory is owned by the daemon and users must not modify it. Use the CLI commands and API endpoints to interact with the daemon.

In terms of Ansible, I would always use the ticket approach, since this allows the client being setup to fetch the certificate automatically. The asynchronous step with waiting for icinga2 ca list can take minutes until that CSR is synced to the master.

Neither does ca list act as inventory. Old signing requests, or even signed requests are purged away after 1 week.

1 Like

Thanks @dnsmichi, I will follow your advise, by using the ticket approach. It is definitely cleaner.

In order to determine whether the client is already connected to the master I just check whether the certificates already exist in /etc/icinga2/pki of the client. Not the very best solution, but it will work out as long as I don’t keep reinstalling the master. And in that case, I could always explicitly request to regenerate the certificates on the clients…

My original question was about testing which clients have a working connection from the master.
So here is the answer I seem to have overlook at in the documentation:
https://icinga.com/docs/icinga2/latest/doc/15-troubleshooting/#cluster-troubleshooting-ssl-errors

As easy as:

 openssl s_client -CAfile /var/lib/icinga2/certs/ca.crt \
    -cert /var/lib/icinga2/certs/icinga2-master.example.com.crt \
    -key /var/lib/icinga2/certs/icinga2-master.example.com.key -connect my-client.example.com:5665

If the return code of that one is 0, the client can be successfully accessed from the master, otherwise we need to fix (or install) something.

It would still be good to have it in an “icinga” way, so that in order to check the agent on a client we just do something like:

icinga2 agent test my-client

which could return 0/1 and stdout like “Connection to ‘my-client’ is properly working”.

1 Like