continue reload of satellites within the same zone when new configuration is applied

This forum was archived to /woltlab and is now in read-only mode.
  • Hi,

    we have a distributed setup with a master and multiple satellites per zone.

    Whenever we apply a new configuration to master, it synchronizes the new configuration to all zones, satellites receive it and apply/reload the service properly.


    The problem we are facing is that the different satellites in a zone somehow detect each other and cause to reload/restart the other satellites in the zone continuously. Following is the output log of node monitoring-icinga-2.test in test zone, where you can see that it detects new configuration in node monitoring-icinga-1.test and restarts continuously (in that other node happens exacly same):




    Following is the configuration of one of the satellites:




    Thanks! /moises

  • Hi, no one facing this issue?? there must be something in the configuration that I'm missing or maybe is a real bug but I'd be surprised if no one faced it before.

    We have review configuration again and again and cannot say what is it. Is very painful because every time we add configuration to master we need to fix practically all satellites across different zones :(


    best /moix

  • I am by far no expert here but it looks like your satellites are connecting to each other and not to the master. How have you run the node wizard on the master and the satellites? Could be that you got your endpoints wrong there, that would explain things.


    Shot in the dark but who knows.

  • Hi Hactar, well the master is actively connecting to the satellites and sending the configuration, is a top down setup and I followed the configuration recommendations so satellites do not need to connect actively to master.

    The thing is that is working fine, is only when new configuration is applied when this start happening

  • Ah, that is the difference between our setups, i also run the configuration top down (master sending out config to the clients on change and reload) but in my setup the satelites are actively connecting to the master since most of them are behind firewalls. So they initiate the TLS connection, once established the master sees them as online and sends out the config. The master itself never actively initiates a connection since he would hit a firewall in just virtualy every case. Maybe that is the way to go for you too?


    If that does not work i am missing a key point somewhere in your config as i too can not see where that might come from.

  • Your initial post should provide more logs from all involved instances. Best would be to look into the debug log for the startup and reload/config validation triggers.


    One thing which is suspicious - this isn't a configuration file, but seems to be a precompiled python binary. Syncing such is NOT supported, and might be the root cause for changed configuration files all the time.


    Code
    1. [2017-11-02 04:12:09 -0400] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/global//_etc/scripts/icinga2/icinga2_status.pyc
  • Hello Hactar/dnsmichi thanks for your reply.


    Well I put the part of the logs where I found something that highlighted the problem, the rest looks fine so I didnt put but sure next time I can put more info.


    I think you are right dnsmichi, actually this morning I managed to reproduce in a lower environment and was investigating this exact case as it seems that root cause is the bytecode files of a python check we have that creates the .pyc files locally in every server, causing it the effect of syncing from one node to other and the other way around.


    I'll try to disable the generation of this file and give it a go, will update as soon as I verify it.

    Thanks

  • Thought so. It is called "configuration sync" and relies on text files. Binary syncing isn't supported, and even if it worked I do remember a patch for 2.8 which relies on the content being a string. Anything else will break the cluster.

  • Hi, even after avoiding the generation of the python .pyc files I still have the satellites restarting each other when new configuration is sent by master.

    Logs from one of the nodes when file zones/global/services.d/service-snmp.conf change



    I also notice about .timestamp files, I assume those are internal files generated by icinga processes, correct?

    Thanks

  • Hi, yes I checked ntp already and servers are synced. I left it running last weeks and tried a couple of times to remove as much config as possible, every file is just text file but behavior persist.

    When new config is added I see in logs of both nodes the file(s) updated modification entry as reported before plus additional .timestamp files:


    Code
    1. [2017-12-13 11:42:46 -0500] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/test//.timestamp
    2. [2017-12-13 11:42:46 -0500] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/test//_etc/hosts.d/product-test.conf
    3. [2017-12-13 11:42:46 -0500] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/global//.timestamp


    if I check the content of these files files in both nodes is changing continuously:

    * server 1 detect file change and tells server 2 that file content has changed ==> resync and reload

    * later server 2 tells server 1 file has changed ==> resync and reload

    ..


    this happens in loop until I stop the service, remove the api/zones folder and restart it again.


    Thanks

  • Hi, no one facing same or similar issue? we have up to 38 clients in prod, 2 satellites/zone --> 15 zones approx. So every time we add configuration is a problem because the whole system becomes unstable and we need to restart manually several icinga services.


    Is there any workaround we could apply to mitigate it? something we can try to investigate where the issue is?


    Thanks