Cluster sync problem

This forum was archived to /woltlab and is now in read-only mode.
  • Hello,

    i'm trying to run Icinga2 in HA cluster mode and i'm having some problems with syncing API created objects between them.

    There are 2 servers in cluster:
    icinga-srv1 - config master
    both of them are in a zone called 'master'. MySQL-IDO is running on a third server icinga-ido.

    The intention is to create all objects via API and at first everything was OK, configs were placed in correct /var/lib/icinga2 folders on both machines and everything was working fine. After a while Icinga2 daemon on icinga-srv2 had to be restarted and then the strange thing happened - icinga2 daemon complained that there is no host object for one of its service! You cannot even create a service object until you have a host object! Anyway icinga-srv1 had everything in place and the configs were correct so assuming that icinga-srv2 would resync configs from 'config master' i deleted all API created object config files from /var/lib/icinga2/api/packages/_api/icinga-srv2/conf.d/ and restarted icinga2 daemon again. Once it was up it completely ignored the configs on icinga-srv1. It didn't resync and when icinga-srv2 became active endpoint it cleared all the hosts/services from IDO server (visible in the debug.log). That's probably correct because it didn't have any configs in /var/lib/icinga2/api/packages/_api/icinga-srv2/conf.d/. If icinga2 is stopped on icinga-srv2 then everything goes back to normal as icinga-srv1 has all necessary configs in place.

    Any ideas how to make the second node sync configs from config master again? Has anyone encountered such problems?

    I tried clearing everything in /var/lib/icinga2 and /var/cache/icinga2 but still it only syncs zone configs and global-templates from icinga-srv1 /etc/icinga2/zones.d to icinga-srv2 /var/lib/icinga2/api/zones skipping API created objects that are in /var/lib/icinga2/api/packages/_api/icinga-srv1/conf.d/.

    I can't really tell how to reproduce this issue, but everything works if you start from scratch and messes up later, especially if you restart icinga2 enough times on the second node :) basically i always end up in a situation where icinga-srv2 is the active endpoint ignoring configs on icinga-srv1...

    Some info about the setup:



    The post was edited 4 times, last by geds ().

  • I can see that files in /var/lib/icinga2/api/repository folder icinga-srv2 contain all the hosts and their services from icinga-srv1, but not in /var/lib/icinga2/api/packages/_api/

  • Hi,

    thanks for looking into this problem.

    The hosts were created like this:
    curl -k -s -u "$USERNAME:$PASSWORD" -H 'Accept: application/json' -X PUT "https://icinga-srv1:5665/v1/objects/hosts/$HOSTNAME" -d "{ \"templates\": [ \"generic-host\" ], \"attrs\": { \"address\": \"$IP\", \"check_command\": \"hostalive\", \"vars.os\": \"$OS\" } }"

    and services:
    curl -k -s -u "$USERNAME:$PASSWORD" -H 'Accept: application/json' -X PUT "https://icinga-srv1:5665/v1/objects/services/$HOSTNAME!$SERVICE_NAME" -d "{ \"attrs\": { \"check_command\": \"passive\", \"enable_active_checks\": \"0\" } }"

    Everything looks OK if the active endpoint is icinga-srv1 however this is a rare occasion. Once active endpoint switches to icinga-srv2 - everything from icinga-srv1 becomes lost.

  • Does the second node have "accept_config = true" being set in its api.conf?

    One farther issue might be the missing "zone" attribute. When creating such objects it would certainly help workaround such sync issues by explicitly assigning the zone for these objects.

  • Yes, both nodes have "accept_config = true" in api.conf. I have tried setting "zone" attribute, but only after the hosts were created and it didn't help. In the documentation it is written:

    "Objects without a zone attribute are only synced in the same zone the Icinga instance belongs to." Since both nodes are in the same zone i thought that "zone" attribute is optional and i didn't pay too much attention to it.

    I'll try to create "fresh" hosts with "zone" attribute included and see if it helps.

  • So i did some experimenting with "zone" attribute using this scenario:

    1. Start from scratch - clear all the API created object on both servers
    2. Take down icinga-srv2
    3. Create some hosts and services on icinga-srv1
    4. Bring up icinga-srv2 to make sure the sync works.
    5. Take down icinga-srv2 again and 'rm /var/lib/icinga2/api/packages/_api/icinga-srv2-1460986546-0/conf.d/*/*'
    6. Update host attributes on icinga-srv1
    7. Bring up icinga-srv2 again to see if the sync worked.

    I've used this scenario in several ways:

    a) In step 3 hosts/services were created WITHOUT "zone" attribute.
    "zone" attribute was added only in step 6.
    End result - sync didn't work and '/var/lib/icinga2/api/packages/_api/icinga-srv2-1460986546-0/conf.d/' was not updated.

    b) In step 3 hosts/services were created WITHOUT "zone" attribute.
    In step 6 updated exactly the same attributes as when creating hosts WITHOUT "zone" attribute.
    End result - sync didn't work and '/var/lib/icinga2/api/packages/_api/icinga-srv2-1460986546-0/conf.d/' was not updated.

    c) In step 3 hosts/services were created WITH "zone" attribute.
    In step 6 updated exactly the same attributes as when creating hosts WITH "zone" attribute.
    End result - sync did work but only for those hosts which were updated in step 6. Other hosts were not synced.

    So in a way "zone" attribute "helped", but only if objects were initially created with it and if you update these objects after step 5. If you add "zone" attribute later - it doesn't help.

    To summarize icinga-srv2 API objects gets synced ONLY if they have "zone" attribute and timestamp for these objects are updated in icinga-srv1.

    There was one time that sync to icinga-srv2 didn't work properly on step 4. Hosts were synced, but not their services. I guess that might be the initial cause of this problem.

    Any ideas? Should I file a bug report in Icinga2 issue tracker?

    The post was edited 3 times, last by geds ().

  • Updating the zone attribute after an initial PUT request will do nothing. That's by design, you're required to add the "zone" attribute upon object creation.

    One thing which is a bug, is original problem as the documentation states "Objects without a zone attribute are only synced in the same zone the Icinga instance belongs to.". This is currently not working as the API is required to virtually add the current zone name, resulting in the zones.d config directory but not conf.d as your problem description shows. You should open an issue for that, AFAIK there is none at the time of writing.