Distributed monitoring not working

Hi guys, I am trying to set up distributed monitoring to an existing Icinga2 master/client.

However, I am getting the error: debug/ApiListener: Not connecting to Zone 'db1.datacentre.example.com' because it's not in the same zone, a parent or a child zone.

My current scenario is: Icinga2 master -> Icinga2 satellite -> client
What am I missing here? Thanks!

icinga2-master zones.conf:

object Endpoint "icinga.datacentre.example.com" {
}

object Endpoint "icinga-au.datacentre.example.com" {
	host = "192.168.99.22"
}

object Zone "master" {
	endpoints = [ "icinga.datacentre.example.com" ]
}

object Zone "satellite" {
        endpoints = [ "icinga-au.datacentre.example.com" ]
	parent = "master"
}

object Zone "global-templates" {
	global = true
}

object Zone "director-global" {
	global = true
}

icinga2-satellite zones.conf:

object Endpoint "icinga.datacentre.example.com" {
	host = "192.168.199.9"
	port = "5665"
}

object Zone "master" {
	endpoints = [ "icinga.datacentre.example.com" ]
}

object Endpoint "icinga-au.datacentre.example.com" {
}

object Zone "satellite" {
	endpoints = [ "icinga-au.datacentre.example.com" ]
	parent = "master"
}

object Zone "global-templates" {
	global = true
}

object Zone "director-global" {
	global = true
}

icinga2 client zones.conf:

object Endpoint "icinga-au.datacentre.example.com" {
	host = "192.168.99.22"
	port = "5665"
}

object Zone "satellite" {
	endpoints = [ "icinga-au.datacentre.example.com" ]
}

object Endpoint "db1.datacentre.example.com" {
}

object Zone "db1.datacentre.example.com" {
	endpoints = [ "db1.datacentre.example.com" ]
	parent = "satellite"
}

object Zone "global-templates" {
	global = true
}

object Zone "director-global" {
	global = true
}

icinga2-master /etc/icinga2/zones.d/satellite/db1.conf

// Endpoints & Zones
object Endpoint "db1.datacentre.example.com" {
//	host = "192.168.99.11"
}

object Zone "db1.datacentre.example.com" {
     endpoints = [ "db1.datacentre.example.com" ]
     parent = "satellite"
}

// Host Objects
object Host "db1.datacentre.example.com" {
    import "generic-host"
    check_command = "hostalive"
    address = "192.168.99.11"
    vars.kernel = "centos"
    vars.os = "Linux"
    vars.client_endpoint = name //follows the convention that host name == endpoint name


// Custom Optional check - START
    vars.local_disks["/data"] = {
       disk_partitions = "/data"
  }

    vars.local_disks["/pgsql"] = {
       disk_partitions = "/pgsql"
  }


// Custom memory RAM threshold
vars.mem_warning = 10
vars.mem_critical = 5
vars.mem_free = "true"

vars.processes["Postgres"] = {
  argument = "/usr/pgsql-10/bin/postmaster"
}

// Custom Optional Check - END
}

You’ll want to make sure ido-mysql is not running on the Satellite.
Share the output of: icinga2 feature list

[root@master ~]# icinga2 feature list
Disabled features: command compatlog debuglog elasticsearch gelf influxdb livestatus opentsdb perfdata statusdata
Enabled features: api checker graphite ido-mysql mainlog notification syslog
[root@satellite ~]# icinga2 feature list
Disabled features: command compatlog debuglog elasticsearch gelf graphite ido-mysql influxdb livestatus notification opentsdb perfdata statusdata syslog
Enabled features: api checker mainlog

Are you able to monitor the Satellite? Acts just like a client!

Hey! Yup… I can monitor the satellite.
Here is the output you requested.

master:

root@icinga:/etc/icinga2/zones.d/satellite# icinga2 feature list
Disabled features: compatlog elasticsearch gelf influxdb livestatus opentsdb statusdata syslog
Enabled features: api checker command debuglog graphite ido-pgsql mainlog notification perfdata

satellite:

[root@icinga2-satellite ~]# icinga2 feature list
Disabled features: command compatlog elasticsearch gelf graphite influxdb livestatus notification opentsdb perfdata statusdata syslog
Enabled features: api checker debuglog mainlog

Did you setup the Client using icinga2 node wizard?
Did you run icinga2 pki ticket --cn 'client.domain.tld', from the wizard, on the Master?

Yes. I set up the client using icinga2 node wizard.

I’ve also run icinga2 pki ticket --cn 'client.domain.tld' on the master.

Looks like you renamed the master zone for the client. It’s Master should be the Satellite.

// client/zones.conf
object Endpoint "satellite01.doamin.tld" {
	host = "172.16.0.3"
	port = "5665"
}

object Zone "master" {
	endpoints = [ "satellite01.doamin.tld" ]
}

object Endpoint "client.doamin.tld" {
}

object Zone "client.doamin.tld" {
	endpoints = [ "client.doamin.tld" ]
	parent = "master"
}

object Zone "global-templates" {
	global = true
}

object Zone "director-global" {
	global = true
}

hmm not sure if I know what you mean… my zone.conf on the client is:

object Endpoint "icinga-au.datacentre.example.com" {
	host = "192.168.99.22"
	port = "5665"
}

object Zone "satellite" {
	endpoints = [ "icinga-au.datacentre.example.com" ]
}

object Endpoint "db1.datacentre.example.com" {
}

object Zone "db1.datacentre.example.com" {
	endpoints = [ "db1.datacentre.example.com" ]
	parent = "satellite"
}

object Zone "global-templates" {
	global = true
}

object Zone "director-global" {
	global = true
}

Point the client to the master.

...
// object Zone "satellite" {
object Zone "master" {
	endpoints = [ "icinga-au.datacentre.example.com" ]
        // This should be the satellite
}

object Endpoint "db1.datacentre.example.com" {
}

object Zone "db1.datacentre.example.com" {
	endpoints = [ "db1.datacentre.example.com" ]
	// parent = "satellite"
	parent = "master"
        // Point this to the master

hmm… Done that. Restarted icinga2 on the client. Same error: debug/ApiListener: Not connecting to Zone 'db1.datacentre.example.com' because it's not in the same zone, a parent or a child zone

hmmm… That should make the Client look to the Satellite as the Master.
The Satellite will be the connection to the zone, the Master will control the zone.


Try stopping icinga2 and start from the top-down, in the the following order.

  1. Master
  2. Satellite
  3. Client

Done. Same problem.

Here is the icinga2.log from the client, where it shows it is clearly connecting to the satellite using the master zone:

[2019-05-13 11:17:00 +1000] information/ApiListener: Finished sending replay log for endpoint 'icinga-au.datacentre.example.com' in zone 'master'.
[2019-05-13 11:17:00 +1000] information/ApiListener: Finished syncing endpoint 'icinga-au.datacentre.example.com' in zone 'master'.

icinga2.log file from the satellite:

[2019-05-13 11:19:35 +1000] information/ApiListener: New client connection for identity 'db1.datacentre.example.com' from [192.168.99.11]:47642 (no Endpoint object found for identity)
[2019-05-13 11:19:35 +1000] information/JsonRpcConnection: Received certificate request for CN 'db1.datacentre.example.com' signed by our CA.
[2019-05-13 11:19:35 +1000] information/JsonRpcConnection: The certificate for CN 'db1.datacentre.example.com' is valid and uptodate. Skipping automated renewal.

Any errors in the configs? Run icinga2 daemon -C on all the servers, top-down.

no errors… just some warnings due to some ApplyRule not being used.

hmmm… That should work.

I would suggest re-running icinga2 node wizard on the Client again.
Make sure you consider the Satellite as the Master for the Client.
Reissue the pki ticket too.

Done icinga2 node wizard on the client again.
Have also reissued the pki ticket.

The problem persists.

Maybe it is a cache issue or something? from previous wrong configs? Is there a way I can clean it up?

When I end up propagating a configuration error from the config-master, I use rm -Rf /var/lib/icinga2/api/zones. But not sure how that would help in this case.

Took me two (2) to three (3) attempts, starting from scratch, to get the distributed monitoring where I wanted it. Learning the ins-and-outs along the way.

1 Like

hmm yeah… i might need to re-build everything from scratch and try again.

One question though. Do I need to have services.conf in the /etc/icinga2/zones.d/satellite/ directory on the master, if I want a service/host check to be executed by the satellite?

Thanks!

It is finally working. Starting from the scratch solved the issue. I must’ve done something wrong in the first time.

Most of your shared configuration should be under global-templates. Here’s my directory structure:

.
β”œβ”€β”€ global-templates
β”‚   β”œβ”€β”€ centos_int.conf
β”‚   β”œβ”€β”€ cisco_snmp.conf
β”‚   β”œβ”€β”€ commands.conf
β”‚   β”œβ”€β”€ deps.conf
β”‚   β”œβ”€β”€ holidays_japan.conf
β”‚   β”œβ”€β”€ HostGroup.conf
β”‚   β”œβ”€β”€ maintenance.conf
β”‚   β”œβ”€β”€ notifications.conf
β”‚   β”œβ”€β”€ services
β”‚   β”‚   β”œβ”€β”€ asterisk.conf
β”‚   β”‚   β”œβ”€β”€ clock.conf
β”‚   β”‚   β”œβ”€β”€ cpu.conf
β”‚   β”‚   β”œβ”€β”€ disks.conf
β”‚   β”‚   β”œβ”€β”€ dns.conf
β”‚   β”‚   β”œβ”€β”€ load.conf
β”‚   β”‚   β”œβ”€β”€ nrpe.conf
β”‚   β”‚   β”œβ”€β”€ ping.conf
β”‚   β”‚   β”œβ”€β”€ processes.conf
β”‚   β”‚   β”œβ”€β”€ read-only.conf
β”‚   β”‚   β”œβ”€β”€ redis.conf
β”‚   β”‚   β”œβ”€β”€ sip.conf
β”‚   β”‚   β”œβ”€β”€ users.conf
β”‚   β”‚   └── zombie.conf
β”‚   β”œβ”€β”€ services.conf
β”‚   β”œβ”€β”€ templates.conf
β”‚   β”œβ”€β”€ timeperiods.conf
β”‚   └── users.conf
β”œβ”€β”€ master
β”‚   β”œβ”€β”€ app.conf
β”‚   β”œβ”€β”€ downtimes.conf
β”‚   β”œβ”€β”€ groups.conf
β”‚   β”œβ”€β”€ holiday_file.conf
β”‚   β”œβ”€β”€ hosts.conf
β”‚   β”œβ”€β”€ other.conf
β”‚   β”œβ”€β”€ services.conf
β”‚   └── svc_templates.conf
└── satellite
    β”œβ”€β”€ commands.conf
    β”œβ”€β”€ invttmtjagi1.conf
    β”œβ”€β”€ invttmtjagi2.conf
    β”œβ”€β”€ invttmtjagi3.conf
    β”œβ”€β”€ invttmtjagi4.conf
    β”œβ”€β”€ invttmtjagi5.conf
    β”œβ”€β”€ invttmtjagi6.conf
    β”œβ”€β”€ invttmtjagi7.conf
    β”œβ”€β”€ invttmtjagix.conf
    β”œβ”€β”€ invttmtjgc06.conf
    β”œβ”€β”€ invttmtjgclb.conf
    β”œβ”€β”€ invttmtjgcma.conf
    β”œβ”€β”€ invttmtjmcat.conf
    β”œβ”€β”€ invttmtjnms1.conf
    β”œβ”€β”€ INVTTMTJSW01.conf
    β”œβ”€β”€ invttmtjweb4.conf
    β”œβ”€β”€ invttmtjwombat.conf
    β”œβ”€β”€ nrpe.conf
    └── other.conf
1 Like

Glad to hear you got it working!