icinga2 host not monitored from satellite

This forum was archived to /woltlab and is now in read-only mode.
  • I have an issue with a host not being correctly monitored from its satellite. The config snippets are taken from the "config previews" generated by director


    Satellite 1


    Zone config:


    zones.d/icinga-master/zones.conf

    Code
    1. object Zone "satellite1" {
    2. parent = "icinga-master"
    3. endpoints = [ "satellite1" ]
    4. }


    Endpoint template:


    zones.d/director-global/endpoint_templates.conf

    Code
    1. template Endpoint "icinga-poller-template" {
    2. }

    Endpoint config:


    zones.d/satellite1/endpoints.conf

    Code
    1. object Endpoint "satellite1" {
    2. import "icinga-poller-template"
    3. host = "satellite1"
    4. }


    host template:


    zones.d/satellite1/host_templates.conf

    Code
    1. template Host "non-pingable Linux host (satellite1 zone)" {
    2. check_command = "ssh"
    3. enable_notifications = true
    4. enable_active_checks = true
    5. enable_event_handler = true
    6. enable_perfdata = true
    7. }

    Note: "Cluster zone" property is set to "satellite1" - doesn't show up in config preview.


    Host config:

    Code
    1. object Host "satellite1-host" {
    2. import "non-pingable Linux host (satellite1 zone)"
    3. display_name = "satellite1-host"
    4. address = "satellite1-host"
    5. }



    Satellite2


    Zone config:


    zones.d/icinga-master/zones.conf

    Code
    1. object Zone "satellite2" {
    2. parent = "icinga-master"
    3. endpoints = [ "satellite2" ]

    zones.d/satellite2/endpoints.conf


    Code
    1. object Endpoint "satellite2" {
    2. import "icinga-poller-template"
    3. host = "xx.xx.xx.xx"
    4. port = "5665"
    5. }

    zones.d/satellite2/host_templates.conf


    Code
    1. template Host "Pingable host (satellite2)" {
    2. check_command = "hostalive"
    3. enable_notifications = true
    4. enable_active_checks = true
    5. enable_passive_checks = true
    6. enable_event_handler = true
    7. enable_perfdata = true
    8. volatile = true
    9. }


    Again "Cluster Zone" is set to "satellite2"


    zones.d/satellite2/hosts.conf

    Code
    1. object Host "satellite2-host" {
    2. import "Pingable host (satellite2)"
    3. address = "xx.xx.xx.xx"
    4. }


    The satellite1 host is monitored correctly, "check source" is satellite1 and the host check works ("OK" status). The satellite2 host has a "check source" of icinga-master however. The host check fails because the master is in a different network, of course. I have also tried setting the "command endpoint" parameter, which didn't seem to make any difference.


    Any ideas on this? Hopefully, I've included all the necessary configs. Let me know if not..

  • What do you mean by :


    Quote

    Again "Cluster Zone" is set to "satellite2"

    Could you send the config file that shows it ?

  • The zone isn't set in a file (as this was all configured via director), though the files exist on disk:


    /var/lib/icinga2/api/zones/satellite2/director/host_templates.conf:


    Code
    1. template Host "Pingable host (satellite2)" {
    2. check_command = "hostalive"
    3. enable_notifications = true
    4. enable_active_checks = true
    5. enable_passive_checks = true
    6. enable_event_handler = true
    7. enable_perfdata = true
    8. volatile = true
    9. command_endpoint = "satellite2"
    10. }


    /var/lib/icinga2/api/zones/satellite2/director/hosts.conf:

    Code
    1. object Host "satellite2-host" {
    2. import "Pingable host (satellite2)"
    3. address = "192.168.1.222"
    4. }


    Thanks!

  • At least zones.d/satellite1/endpoints.conf is not supposed to work that way, as this would cause the configuration file being synced to all endpoints belonging to the zone satellite1. That way you'll end up with a) local endpoint definition for satellite1 used for the initial connection attempt b) config synced from the master through zones.d - a reload will fail with duplicated objects.


    This config snippet for the satellite endpoints belongs to the master zone only.


    volatile = true should also be avoided unless you know what this setting does, if enabled (docs).


    Where's this endpoint being defined - command_endpoint = "icinga-poller-3gi"

  • Thanks for your reply.


    Re: icinga-poller-3gi - that was my attempt to obfuscate the name of the actual poller - I've corrected it in the original post.


    OK, I've added the endpoint to the "icinga-master" zone. This seems to have worked (as far as director is concerned), but I got a warning during that the deployment that "satellite1" and "satellite2" didn't belong to any zones(?) I think this just may have happened when moving them between zones, because the config history activity log does not then show any errors.


    I have also changed volatile to "false" for now.


    At this point, both host checks on the satellites appear to be failing still, though I currently have no SSH access to be able to restart icinga, etc. (if that's even necessary).

  • satellite* endpoints should belong to a zone, otherwise the connection handling will not work. If Icinga 2 stays running, you don't need to ssh into the box, the deployments via API happen to trigger a reload automatically.

  • I assume the following rendered config means that the endpoints do belong to the correct "icinga-master" zone:


    zones.d/icinga-master/endpoints.conf:

    Code
    1. object Endpoint "satellite1" {
    2. import "icinga-poller-template"
    3. host = "satellite1"
    4. }
    5. object Endpoint "satellite2" {
    6. import "icinga-poller-template"
    7. host = "xx.xx.xx.xx" port = "5665"
    8. }

    ..however, I can still see the warnings about the endpoints not belonging to a zone, and my remote host check is still monitored from master, and fails.

  • No, the zone membership is managed directly inside the Zone objects. Please check the corresponding object type documentation: https://docs.icinga.com/icinga…ect-types#objecttype-zone


    And as such, the distributed monitoring docs: https://docs.icinga.com/icinga…tributed-monitoring-zones

  • OK, I have checked the docs, but I'm just trying to make it make sense in my head :D


    So the "icinga-master" zone has already been defined as an "external object" in director. As a result of setting the "Cluster Zone" parameter in Director (I guess), the config preview for the icinga-master zone object looks like this:

    Code
    1. object Zone "icinga-master" {
    2. endpoints = [ "icinga-master", "satellite1", "satellite2" ]
    3. }

    So, as far as director is concerned, it does look like the endpoints are within the correct zone, but director doesn't seem to be able to deploy them. As this is "hardcoded", do the satellites need to be put in to the zones.conf file on the master?

    Hmm, just tried that and it didn't work.


    Is it possible to define the master zone in director? As I recall, I had to set up zones.conf in order to get the icinga2 daemon to start up in the first place...


    Sorry if I'm missing something..

  • OK, I've re-jigged the endpoint template so that the endpoint appears in the correct zone:


    zones.d/satellite2/endpoints.conf:

    Code
    1. object Endpoint "satellite2" {
    2. import "satellite2-template"
    3. host = "89.105.24.146" port = "5665"
    4. }


    ...but it still doesn't work, and I get this message in the debug.log:


    Code
    1. 2017-02-25 23:23:48 +0000] warning/ApiListener: No data received on new API connection for identity 'icinga-master'. Ensure that the remote endpoints are properly configured in a cluster setup.
    2. Context:
    3. (0) Handling new API client connection

    I guess this looks like it might be wrong:


    zones.d/icinga-master/zones.conf:

    Code
    1. object Zone "satellite2" {
    2. parent = "icinga-master"
    3. endpoints = [ "satellite2" ]
    4. }
  • OK, I've re-jigged the endpoint template so that the endpoint appears in the correct zone:


    zones.d/satellite2/endpoints.conf:

    Code
    1. object Endpoint "satellite2" {
    2. import "satellite2-template"
    3. host = "xx.xx.xx.xx"
    4. port = "5665"
    5. }


    ...but it still doesn't work, and I get this message in the debug.log:


    Code
    1. 2017-02-25 23:23:48 +0000] warning/ApiListener: No data received on new API connection for identity 'icinga-master'. Ensure that the remote endpoints are properly configured in a cluster setup.
    2. Context:
    3. (0) Handling new API client connection

    I guess this looks like it might be wrong:


    zones.d/icinga-master/zones.conf:

    Code
    1. object Zone "satellite2" {
    2. parent = "icinga-master"
    3. endpoints = [ "satellite2" ]
    4. }
  • You can't sync zone and endpoint details from the master to a satellite. This information must be statically configured on both nodes and is required for the connection establishment. After the connection is there, the SSL handshake kicks in. Then the zone hierarchy is validated (only parents can send config to children zones e.g.). Once that's settled, config sync takes place.


    In your scenario, the satellite doesn't serm to be configured, and as such, the master tries to connect, the satellite ignores it as it doesn't know the endpoint/CN presented in the SSL certificate, and the master terminates the connection after a short timeout. Then reconnect kicks in, but will continuously fail.


    1) configure zones and endpoints on the satellite

    2) move zone/endpoint config on the master from zones.d to either zones.conf or only the master zone shared among endpoints in this zone. If you only have one master, prefer to go for your zones.conf file

  • Thanks for replying on a Sunday! Much appreciated :D


    When you say to statically configure details, do you mean local zone + endpoint details? On the satellite, I have this configured (I have just added the "host=" attribute to the "master" endpoint:


    /etc/icinga2/zones.conf:



    Perhaps the zone "master" should be "icinga-master"?


    However, I seem to be seeing successful connections in the log:


    Code
    1. [2017-02-26 11:48:57 +0000] debug/ApiListener: Not connecting to Endpoint 'icinga-master' because we're already connected to it.
    2. [2017-02-26 11:48:57 +0000] notice/ApiListener: Current zone master: satellite2[2017-02-26 11:48:57 +0000] notice/ApiListener: Connected endpoints: icinga-master (1)
    3. [2017-02-26 11:48:57 +0000] notice/WorkQueue: #8 (JsonRpcConnection, #0) tasks: 0
    4. [2017-02-26 11:48:57 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.246903%
    5. [2017-02-26 11:49:01 +0000] notice/JsonRpcConnection: Received 'log::SetLogPosition' message from 'icinga-master'
    6. [2017-02-26 11:49:02 +0000] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 0; Checks/s: 0[2017-02-26 11:49:04 +0000] notice/JsonRpcConnection: Received 'event::Heartbeat' message from 'icinga-master'
    7. [2017-02-26 11:49:06 +0000] notice/JsonRpcConnection: Received 'log::SetLogPosition' message from 'icinga-master'
    8. [2017-02-26 11:49:07 +0000] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 0; Checks/s: 0


    UPDATE:


    I've added the "director-global" zone to "zones.conf" after seeing some log messages pointing to that:


    Code
    1. object Zone "director-global" {
    2. global = true
    3. }


    My host is still not monitored from the correct endpoint, however. The message states that it has been down from Feb 14th (when it was 1st created, BTW), even though it has been deleted and re-created many times since then. Does this mean it's cached somewhere?


    Thanks again :)

  • I'd use the same zone names, and also do not use the ZoneName and NodeName constants to get a better idea when looking into the configs.


    I've lost track of which service isn't executed in the right location after switching to connection problems. Can you add a screenshot and summarize that again?

  • OK, I've rationalised the zone names and used the names instead of the constants:


    I haven't actually got round to adding any services yet - As the host check wasn't working yet, I thought I'd concentrate on that..


    Here's a screenshot - I deleted the original host and added a new one with the same address + template:

    monitoring-portal.org/woltlab/cms/index.php?attachment/8939/

    I tried restarting icinga2 on the satellite, and I see this in the logs:

    ..which seems to suggest that the zones/endpoints as defined in director are conflicting with those defined statically in config files. I know you said that these could conflict if there are duplicates and also something about zones/endpoints being unable to be shared via the API. Does this mean that I should remove the zone/endpoint config from director?

  • That's clearly an error from your previous steps - /var/lib/icinga2/api/zones/director-global/director/zones.conf must be purged. It generates duplicate objects. Best is to safely remove /var/lib/icinga2/api/zones/director-global and restart the satellite once again?


    And of course you should take care of the Director config to **not** publish Endpoints and Zones via global Zone.


    It seems that your client is running an old config version and therefore is not sending in any check results for the host (PENDING).

  • I don's have SSH access right at this moment to delete the necessary files, but is this:


    And of course you should take care of the Director config to **not** publish Endpoints and Zones via global Zone.

    ..simply accomplished by removing the "director-global" zone from Director?


    If that's the case, do I then have to add a "global-templates" zone statically to the zones.conf file on each node as they are connected, as per the distributed-monitoring docs?

  • OK, now my zones.conf looks like this:


    /etc/icinga2/zones.conf:

    I've removed the "director-global" zone from director


    I've manually added a "global-templates" global zone to master's zones.conf.


    ..but I get the following in the logs (or actually when I do a checkconfig):


    Code
    1. information/cli: Icinga application loader (version: r2.6.2-1)
    2. information/cli: Loading configuration file(s).
    3. critical/config: Error: Object 'satellite2' of type 'Endpoint' re-defined: in /var/lib/icinga2/api/zones/satellite2/director/endpoints.conf: 1:0-1:34; previous definition: in /etc/icinga2/zones.conf: 21:1-21:35
    4. Location: in /var/lib/icinga2/api/zones/satellite2/director/endpoints.conf: 1:0-1:34
    5. /var/lib/icinga2/api/zones/satellite2/director/endpoints.conf(1): object Endpoint "satellite2" {
    6. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    7. /var/lib/icinga2/api/zones/satellite2/director/endpoints.conf(2): import "satellite2-template"
    8. /var/lib/icinga2/api/zones/satellite2/director/endpoints.conf(3):
    9. * checking Icinga2 configuration. Check '/var/log/icinga2/startup.log' for details.

    ..this looks to me like I need to remove this zone from director. Is this correct?


    This being the case, why are zones and endpoints configurable in the director config, if it's necessary to do this work in the config files? Also, how do you then add in information about which zones a host object belongs to in director? This will be necessary in the future when defining hosts to be in certain zones, or in the real world on different physical sites...