Enable ICINGA HA feature: Issue syncing up configuration between master nodes

Am trying to enable high-availability feature in icinga and having issues setting them up. I went through the documentation but still kind of un-clear. Hoping someone here can help
Icinga version: 2.8
Setup: Used to have 1 master and 1 satellite node
Components installed in master:
icinga2-server
icinga2-web2ui
icinga2-notification(for sending custom notifications)

Components installed in satellite
icinga2-client

Trying to have another master node(overall 2). To start off with document says configure icinga2 2nd master as client. Does that mean I need to install icinga2-client package in the 2nd master node.

The configuration on 1st master node “a” looks like

root@ossipemasteradc0101a:~# cat /etc/icinga2/zones.conf

object Endpoint "ossipemasteradc0101a.softlayer.local" {
  host = "172.16.0.150"
}

object Endpoint "ossipemasteradc01b.softlayer.local" {
  host = "172.16.0.152"
}

object Zone "local-dev-vergil-adc01-master" {
  endpoints = ["ossipemasteradc0101a.softlayer.local","ossipemasteradc01b.softlayer.local"]
}

object Endpoint "ossipesatadc0101a.softlayer.local" {
  host = "172.16.0.151"
}
object Zone "local-dev-vergil-adc01" {
  endpoints = ["ossipesatadc0101a.softlayer.local"]
  parent = "local-dev-vergil-adc01-master"
}
# The icinga2 web2 director module uses this to manage the whole configuration
object Zone "director-global" {
      global = true
}

object Zone "global-templates" {
  global = true
}

Should the 2nd master node “b” has similar zones.conf file? I assume the host, services for Zone “local-dev-vergil-adc01-master” will only be on node “a”

My belief is if the packages and configurations are set right then if icinga process on master node “a” crashes all the checks should move to icinga master “b” but I am confused as to what packages, configurations goes to icinga master “b”. Apologize for naive question but would really appreciate guidance here

Thanks,
Gangadhar

By chance, are these the Chef cookbooks?

Yes, they are ansible playbooks. Do you need any specific details regarding configuration

Ok, I expected some sort of automation involved here. It would be helpful to know to which documentation you are referring to exactly, so that others may chime in and help from their (Ansible) experience. I don’t have any unfortunately.

I don’t think the issue I am facing is with ansible automation. To be honest, I don’t quite understand how HA configuration are setup in master zones

This is overview of what each role does
Icinga-server
Install icinga 2.8.2-1.stretch and other dependencies
Does zone configuration by creating zone folders, hostobjects etc
Icinga2-web2ui
Does icingaweb pkg installation
icinga2-client
Installs icinga2 pkg
Does CA setups and registration with master using icingacli tools

Was referring to this doc https://www.icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/ section: High-Availability Master with Clients. The document says setup the 2nd master node as client, does that mean I need to be running icinga2-client configuration which does client registration with master node “a” and stuffs like that? Because icinga service is having start up issues if I do that? complains about api feature-enabled configuration file

I can post zone.conf and other config files from icinga master or satellite nodes if it helps further

This documentation targets a manual setup, and instructs the reader to use the ‘node wizard’ command to simulate a satellite/client in order to fetch the certificates via CSR auto-signing. This is a “workaround” which shows to modify the configuration for the zones.conf later on.

Since you’re using Ansible playbooks, such a thing can be implemented in a different fashion, e.g. by creating the signing request on the secondary satellite, letting the primary one sign the certificates and copy them over to the satellite.

Still, I don’t know which sources you’re using for the Ansible playbooks … something on GitHub, Galaxy, or a custom attempt of yours? Can you share the code which generates the above zones.conf?

Generally speaking, a secondary master needs the full setup of the first master, except for the private CA and config deployment in zones.d. Anything else with zones.conf and certificate signing must be handled within such a playbook.

I would start without Ansible in a test lab and two VMs. Learn and understand how to configure and deploy it properly, then adopt that into Ansible playbooks and automate it. Without knowing how the components work, it’ll be hard to learn and debug.

Thanks for response, most of our playbook automation is enhancement of https://github.com/SoneraCloud/ansible-role-icinga2-no-ui/tree/master/tasks

For e.g. our zones are created via

- name: create config directories for the zone monitors
  file:
    path: '{{icinga2_zones_d}}{{ outer_item.zone }}' # icinga2_zones_d is /etc/icinga2/zones.d and outer_item.zone is satellite name, for e.g. Dallas-satellite, WDC-satellite etc
    state: directory
    mode: 0777

- name: create config directories for the zone monitors
  file:
    path: '{{icinga2_zones_d}}{{ outer_item.zone }}/{{ item }}'
    state: directory
    mode: 0777
  with_items:
    - hosts
    - services
    - checkcommands
    - users
    - usergroups
    - hostgroups
    - servicegroups
    - dependency
    - notifications
    - notificationcommands
    - timeperiods
    - apply-rules
    - templates

Since you’re using Ansible playbooks, such a thing can be implemented in a different fashion, e.g. by creating the signing request on the secondary satellite, letting the primary one sign the certificates and copy them over to the satellite.

Usually we have 2 master and 2 satellite nodes. Both the satellites does CSR signing with master node “a” alone which has zone configuration etc. Are you saying the 2 satellites should do CSR signing with master node “b” too but shouldn’t have satllite zone directories

For.e.g our zones.conf looks like

root@master node a:~# cat /etc/icinga2/zones.conf

object Endpoint "<master node a hostname with fqdn>" {
  host = "master node a IP"
}

object Zone "master zone" {
  endpoints = ["master node a hostname with fqdn"]
}

object Endpoint "satellite node a hostname with fqdn" {
  host = "satellite node a IP"
}
object Endpoint "master node b hostname with fqdn" {
  host = "satellite node b IP"
}
object Zone "satellite zone 1" {
  endpoints = ["satellite node a hostname with fqdn","satellite node b hostname with fqdn"]
  parent = "master zone"
}
# The icinga2 web2 director module uses this to manage the whole configuration
object Zone "director-global" {
      global = true
}

object Zone "global-templates" {
  global = true
}

Based on documentation I believe there should be endpoint for master node b and master zone should include that detail too
So it looks like all the roles that runs on master node “a” should be on master node “b” too(except creating the satellite zones and master zone directory) and all the satellites should also register with master node “b” similar to how they do with node “a”?

Thanks,
Gangadhar

Please format your posts with Markdown, I’ve edited the above already. Instructions for better readability can be found here.

I don’t know exactly if someone else already built such a HA setup with Ansible playbooks, maybe @KevinHonka has.

Cutting things down from what I do with Puppet:

  • The CA needs to be setup and installed only on the primary master (that’s what’s generated with pki new-ca)
  • /etc/icinga2/zones.d can be used for config deployments on the first master only, secondary needs to ensure that this directory stays empty
  • zones.conf must be the same on both masters, they need to know a) that they belong in the same zone b) which child zones and endpoints will connect
  • satellite endpoints in child zones need the “master” zone deployed which then contains two endpoints, so the cluster connection and trust relationship is intact
  • If an additional node is installed, it needs certificates for TLS connections. Icinga 2 provides a built-in mechanism where you can generate a ticket string on the master holding the CA, and use that on such a client to run node setup and pass the ticket string as trusted. These things can be automated with Ansible tasks (run on master first, then run it on the client).

An example on certificate requests is shown here, do note though that certificate paths changed with Icinga 2 v2.8, future paths are described here.

Haven’t had the pleasure of doing it with those ansible playbooks.
Ours is quite rudimentary.
We query the Director for the icinga2_ticket and then parse it as an args to a modified kickstart script. it looks something like this:

- name: "Generating ticket"
  uri:
    url: "http://icinga.example.de/icingaweb2/director/host/ticket?name={{ ansible_fqdn }}"
    user: icingauser
    password: xXPasswordXx
    return_content: yes
    validate_certs: False
  register: icinga2_ticket

- name: "Copying setup script to remote server"
  copy:
    src: "{{ setup.kickstart_file }}"
    dest: "/tmp/"
    mode: 0777
  when: setup is True

- name: "Running configuration script"
  command: "bash /tmp/{{ setup.kickstart_file }} {{ ansible_fqdn }} {{ icinga2_ticket.msg }}"
  when: setup is True

this works great for a general setup that is orchestrated by the director

Thanks both for all the inputs, I was on vacation for a week and didn’t get a catch to try these up. Will be actively working on HA setup next week and update the thread on where I stand!! Thanks again

Hi @KevinHonka @dnsmichi ,

This is what I tried in my dev vagrant boxes but doesn’t seem working

Env setup:

  1. 2 icinga masters and 2 icinga satellites
  2. icinga server components and dependent pkg installed on master
  3. icinga client components and dependent pkg installed on satellites
  4. zones.d directory has files/sub-directories only on primary master “a”
  5. Satellites/client registration happened with only primary master “a”
  6. icinga server installed on “b”. zones.conf are the same in primary master “a” and secondary master “b”
    object Endpoint "ossipemasteradc0101a.<domain>" {
      host = "<master IP 1>"
    }

    object Endpoint "ossipemasteradc0101b..<domain>" {
      host = "<master IP 2>"
    }

    object Zone "local-dev-vergil-adc01-master" {
      endpoints = ["ossipemasteradc0101a..<domain>","ossipemasteradc0101b..<domain>"]
    }

    object Endpoint "ossipesatadc0101a..<domain>" {
      host = "<sat IP 1>"
    }
    object Endpoint "ossipesatadc0101b.<domain>" {
      host = "<sat IP 2>"
    }
    object Zone "local-dev-vergil-adc01" {
      endpoints = ["ossipesatadc0101a.<domain>","ossipesatadc0101b.<domain>"]
      parent = "local-dev-vergil-adc01-master"
    }
    # The icinga2 web2 director module uses this to manage the whole configuration
    object Zone "director-global" {
          global = true
    }

    object Zone "global-templates" {
      global = true
    }

This config is same on secondary master “b”. When I test this setup in dev but shutting down icinga process on master node “a” and doing checks on monitors it’s not moving the checks to secondary master “b”. Instead it’s failing due to icinga process not running on “a” which indicates checks are happening still against “a” alone. Based on these I have 2 questions

Q1: * /etc/icinga2/zones.d can be used for config deployments on the first master only, secondary needs to ensure that this directory stays empty -> You mentioned icinga2 zones.d should be empty on master “b”. On an event of failure or icinga crash on master “a” and if zones.d is empty on master “b” how will satellites know to move the check from master “a” to “b”. FYI we don’t have central DB for icinga and icingaweb. the icinga and icingaweb DB’s reside locally on primary master “a” mysql DB. When I run mysql and icinga playbook(that sets the icinga DB) on master node “b” it see the DB’s and tables created on master “b” similar to “a” but the tables in master “b” are empty sets and all the icinga_hosts, icinga_services table on master node “a” has all the hosts and services details. I tried deploying some of the monitors that reside only on master node “a” to be deployed on “b” too but I see the directories created but no icinga objects/conf file on node “b”

Q2: If we leave the zones.d and satellite directories only on master node “a” and leave master “b” with just icinga-server installed and process up and running, I think the satellites should do certificate signing with “b” too similar to how it’s doing against “a”. Since icinga master “a” and “b” are on same zone and satellites has certs signed with both on situations where master “a” icinga crashes checks are moved/balanced to master “b”. But I don’t know if icinga satellites supports certificates with 2 primary nodes(since as of now the DB resides locally on node “a”). The satellite-primary certificates is done via below ansible steps

- name: create /etc/icinga2/pki directory
  file: path=/etc/icinga2/pki state=directory owner=nagios group=nagios

- name: Generate a new local self-signed certificate.
  command: >
    icinga2 pki new-cert --cn {{ ansible_fqdn }}
    --key /etc/icinga2/pki/{{ ansible_fqdn }}.key
    --cert /etc/icinga2/pki/{{ ansible_fqdn }}.crt
  args:
    creates: "/etc/icinga2/pki/{{ ansible_fqdn }}.key"

- name: Request the master certificate from the master host and store it as trusted-master.crt.
  command: >
    icinga2 pki save-cert --key /etc/icinga2/pki/{{ ansible_fqdn }}.key
    --cert /etc/icinga2/pki/{{ ansible_fqdn }}.crt
    --trustedcert /etc/icinga2/pki/trusted-master.crt
    --host {{ icinga2_primary_master_ip }}
  args:
    creates: /etc/icinga2/pki/trusted-master.crt

- name: check if node setup needs happen
  command: grep "^const NodeName = \"{{ ansible_fqdn }}\"$" /etc/icinga2/constants.conf
  register: node_setup
  changed_when: false
  failed_when: false

- debug: var=ansible_fqdn
- debug: var=icinga2_primary_master_ip

- name: Get the client ticket from the server
  command: icinga2 pki ticket --cn {{ ansible_fqdn }}
  register: client_ticket
  delegate_to: "{{ icinga2_primary_master_host }}" # Runs/delegate this command to icinga master to get the client ticket
  become: "yes"
  when: node_setup.rc != 0

- name: Continue with the additional node setup steps.
  shell: >
    icinga2 node setup --ticket {{ client_ticket.stdout }}
    --endpoint {{ icinga2_primary_master_ip }}
    --zone {{ ansible_fqdn }}
    --master_host {{ icinga2_primary_master_ip }}
    --trustedcert /etc/icinga2/pki/trusted-master.crt
  when: node_setup.rc != 0
  notify: restart icinga2

It’s quite possible my thought-process is not 100% correct and we need to enable icinga HA in different way but want to get the things I tried so we can pin-point the problem with my current approach

Note: Currently am not trying to achive icinga master HA support via automation, trying things out manually is absolutely fine with me once I know the right steps I can automate it for our internal purposes

  1. You designate a configuration Master, does not matter which one. Adding confg files to /etc/icinga2/zones.d/on the configuration Master end leaving the other empty. On reload Icinga2 will read the configuration to /var/lib/icinga2/api/zones. This will be sent to the other Master and is also stored in /var/lib/icinga2/api/zones.

  2. Satellites should be registered against both masters. Use icinga2 node wizard on the Satellite and specify both of the Master nodes.

For HA you do not want independent databases on eash Master node. You should use a shared DB, such as an external DB cluster. Only one Master will use the DB, active-standby when using ido-mysql. The other DB will be empty, until the Master is active.


With my first build, I cloned the first Master. This avoided some of the downfalls when trying to get HA working. Since the certificates and TicketSalt were already identical. They need to match on both Master instances.

When not using identical cloned instances. Notice how the HA instructions state:

  • Set up icinga2-master2.localdomain as client (we will modify the generated configuration).

This is important to sync the certificates!

Hi,

Thanks for the response, Couple of questions

  1. For HA you do not want independent databases on eash Master node. You should use a shared DB, such as an external DB cluster. Only one Master will use the DB, active-standby when using ido-mysql . The other DB will be empty, until the Master is active. -> In our case the DB resides on primary master node “a”, will secondary master “b” know/access the DB on master node “a” since they are on same region? Will having DB on icinga master “a” locally work for HA setyp?. We are planning to have a central DB for icinga but it’s not anytime sooner in our infra-structure

  2. Also we install the same icinga-server 2.8 stretch pkg on both masters. I believe that accounts to “same identical cloned instance”. Our 2 satellites sets up certificates with primary master “a” via above mentioned steps icinga2 pki module. Does the 2 satellites needs to setup ceritificates with master “b” via same method or are you saying master “b” should act as a satellite node and setup certificates against master node “a” via icinga2 pki module similar to how satellites-master certificate set up happens

  1. You have to configure the DB on each instance. Icinga2 does not share setup configuration details with other instances. If you want other instances to use the same DB, settings in /etc/icinga2/features-enabled/ido-mysql.conf should point to the same DB.
  • The zone.d contents are the only thing shared with other instances. Unless specific commands are run, such as icinga2 node wizard.

  • You could point the secondary Master to the DB running on the primary Master. A DB cluster is not required, but does make the DB running on the Master instance a single point-of-failue.

  1. By cloned instance, I did a full copy of an entire system, including a base Icinga2 installation. Changing the hostname and IP addresses on the cloned Virtual Machine.
  • Because of CSR auto-signing, I did not use any icinga2 pki commands. The contents of /var/lib/icinga2/ca were identical on the cloned instance.

When connecting an instance to Masters or Satellites you should specify all the instances, when using icinga2 node wizard.

Add more master/satellite endpoints? [y/N]:

Ok, this is my current dev environment setup

  1. 2 icinga master nodes
  2. 2 icinga satellites nodes

Master node “a” -> Icinga installed and setup done via icinga2 node wizard to configure it as master. https://www.icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/ Saw certs file generated under master “a” and zones.conf updated etc. Generated zones.conf. CA directory was same in both master a and b

cat /etc/icinga2/zones.conf
/*
 * Generated by Icinga 2 node setup commands
 * on 2018-08-17 15:49:52 -0500
 */

object Endpoint NodeName {
}

object Zone ZoneName {
        endpoints = [ NodeName ]
}

object Zone "global-templates" {
        global = true
}

object Zone "director-global" {
        global = true
}

Master b has the same certs and ca files as master node “a”. icinga process up and running on both master

Moving on to 1st satellite nodes. Icinga installed. Configuration happening via icinga2 node wizard CLI

icinga2 node wizard
Welcome to the Icinga 2 Setup Wizard!

We will guide you through all required configuration details.

Please specify if this is a satellite/client setup ('n' installs a master setup) [Y/n]: Y

Starting the Client/Satellite setup routine...

Please specify the common name (CN) [ossipesatadc0101a.softlayer.local]:

Please specify the parent endpoint(s) (master or satellite) where this node should connect to:
Master/Satellite Common Name (CN from your master/satellite node): ossipemasteradc0101a.fqdn

Do you want to establish a connection to the parent node from this node? [Y/n]: Y
Please specify the master/satellite connection information:
Master/Satellite endpoint host (IP address or FQDN): 172.16.aa.aa
Master/Satellite endpoint port [5665]:

Add more master/satellite endpoints? [y/N]: y
Master/Satellite Common Name (CN from your master/satellite node): ossipemasteradc0101b.fqdn

Do you want to establish a connection to the parent node from this node? [Y/n]: Y
Please specify the master/satellite connection information:
Master/Satellite endpoint host (IP address or FQDN): 172.16.bb.bb
Master/Satellite endpoint port [5665]:

Add more master/satellite endpoints? [y/N]: N
critical/TcpSocket: Invalid socket: Connection refused
critical/pki: Cannot connect to host '172.16.bb.bb' on port '5665'
critical/cli: Peer did not present a valid certificate.

What’s missing here??. Also another question is when I generate icinga satellite tickets on master node via icinga2 pki ticket --cn ossipesatadc0101a.fqdn it generates 2 uniques tickets on each master. One of the master nodes gave this ticket d4adcb264825d4f7237e5505f061b3207122db30 and another one a458c1b47902fa2ef5643a84859a4838652857a1. My assumption was if the certs on masters are same the tickets for satellites will be common. Please let me know if you need additional configuration details

Based on docs, trying this manually

* Set up  `icinga2-master1.localdomain`  as [master](https://www.icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#distributed-monitoring-setup-master).
* Set up  `icinga2-master2.localdomain`  as [client](https://www.icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#distributed-monitoring-setup-satellite-client) (we will modify the generated configuration).
* Set up  `icinga2-client1.localdomain`  and  `icinga2-client2.localdomain`  as [clients](https://www.icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#distributed-monitoring-setup-satellite-client) (when asked for adding multiple masters, set to  `y`  and add the secondary master  `icinga2-master2.localdomain` ).

Hi,

Configuring the second master “b” as client to master node “a” caused icinga process on master “b” not to start. complained about api.conf not being correct. I disabled api feature to get icinga process started on master “b”. Then I tried to add both master “a” and “b” as master to satellite nodes via icinga2 node wizard also result in same error

critical/TcpSocket: Invalid socket: Connection refused
critical/pki: Cannot connect to host '172.16.0.152' on port '5665'
critical/cli: Peer did not present a valid certificate.

Please advise what else I can try? Kind of blocked on our HA setup trying both ways. I can provide config info and other details if needed @poing @KevinHonka @dnsmichi

Got icinga HA working after trying out the option to set secondary master as satellite but in master zone. Have some questions around deploying monitors when primary master is down and HA is enabled. Will open new thread for it. Thanks for the help

1 Like