Secondary master in HA crashes randomly

This forum was archived to /woltlab and is now in read-only mode. Please register a new account on our new community platform.

You can create a thread on the new site and link to an archived thread. This archive is available as knowledge base, safe and secured.

More details here.
  • Hi, recently we configured a secondary master in HA following the documentation at https://www.icinga.com/docs/ic…ility-master-with-clients

    As explained in the documentation, the installation is like an additional node specifying master Zone and then enabling same features like in primary master.


    Configuration in both masters is:




    We start services in both servers and they communicate fine, do checks in master zone and we can see in logs that rest nodes connect to both masters


    Problem is that after a while the secondary master start consuming more and more memory until it crashes due to out of memory. In dmesg:

    Code
    1. [Thu Dec 28 10:20:35 2017] Out of memory: Kill process 28079 (icinga2) score 924 or sacrifice child
    2. [Thu Dec 28 10:20:35 2017] Killed process 28079 (icinga2) total-vm:8242300kB, anon-rss:3518992kB, file-rss:0kB


    Following are figures of cpu, memory and swap of the node where we can see 2 crashes:

    [IMG:]



    Any idea why this can happen? we cannot find crash report or any other thing in the logs that indicate the cause.

    Thanks in advance

  • Adding some more information that could help.


    Today again the master in production crashed, this time the main master monitoring-icinga-master without any apparent reason (no new configuration or restart happened). The server started consuming memory until it icinga2 main process was out of memory:


    [IMG:]



    following is output from icinga2 troubleshoot: