missing performance data and wrong cluster zone log lag

This forum was archived to /woltlab and is now in read-only mode.
  • Today I ran a short test with icinga2 to evaluate the replay log in icinga 2.6.3. I have a Master with a satellite which has two endpoints. I disconnected the satelllite zone by disconnect the accordingly router interface. Then after ~30minutes i reconnected the system. This is the result:



    as one can see there are datapoints missing. On one host there are 2 datapoints missing, on the other 1 datapoint is missing. The log lag of the zone is jumping to 50k immediately after a disconnect. The Zone check is executed on the master with the cluster-zone check - cluster name is set to the satellite zone. Can somebody explain this behaviour?


    regards

    mobro

    The post was edited 1 time, last by mobro ().

  • Few others questions :


    • Does the size of the file /var/lib/icinga2/api/log/current grow ?
    • From where are scheduled your checks ? Master ? Satellite ?

    The post was edited 1 time, last by pcasis ().

  • - the endpoints on the satellite are also running on debian jessie.

    - the size of /var/lib/icinga2/api/log/current grows if the zone is disconnected

    - the satellite does schedule the checks. In the image i posted in the inital post one can see that data is replayed - one, e.g. two datapoints are missing and the log lag is jumping to 50k....


    Edit:

    Another pic:


  • That could be interresting to see if the missing data are presents in the current file.

    If yes, then something happen when the logs are replayed.

    If not, then something happen in on the satellite side.



  • Data is missing in the current file. In the current file i found a check result dating to 15:18 - no check result for 15:17. Interestingly icinga2 at the master seems to realise the connection loss of the satellite at 15:18 - maybe this is the cause of this behaviour. Icinga2 is thinking it has a connection to the parent, even though it has not, and therefore does not store the data for a later replay. Slave log lag is jumping to 6hours, 9 minutes immediately - which is annoying and not true!:



    The post was edited 1 time, last by mobro ().

  • Quote

    Icinga2 is thinking it has a connection to the parent, even though it has not

    Any proof for that? (netstat, logs, /v1/status/ApiListener)