Icinga2 - Calculate availability percentages / Wrong values in "last_hard_state" column in statehistory table

This forum was archived to /woltlab and is now in read-only mode.
  • Hello all,


    I'm running Icinga 2.7.0-1.trusty (on Ubuntu 14.04) with PostgreSQL IDO Backend, and I ran into an issue involving the last_hard_state column of the icinga_statehistory table.

    My goal was to calculate availability percentages for hosts and services during a specific timeframe using the icinga_statehistory table by writing my own plpgsql script.


    It seems that as of now a reliable calculation seems impossible due to the last_hard_state column always stating the current state in case of a current hard state.


    I found one other topic in this forum which describes the same problem:

    state and last_state

    I think there is also a corresponding issue on GitHub: #5441


    My suspicion is that this issue is being carried along since Icinga 1.x, since there are also several remarks regarding wrong values of last_hard_state in the SQL files of the Icinga Reporting package.


    Is there something I could do to assist tracking the cause of this issue? Are there any plans to correct this issue in the near future?


    Or does someone maybe have an alternative way of calculating availability percentages?

    From what I have gathered from the icinga reports package, that every record in icinga_statehistory (even for soft state changes) is being fetched, so that all state changes can be processed.


    However, from what I found out, this leaves one big issue:

    When the first record of icinga_statehistory within the specific timeframe is a recovery (state = 0), it is impossible to distinct between a previous soft or hard problem state.

    In this case I would have to count the time between the start of my specific timeframe and the recovery entry as a problem timeframe, even though the previous problem state could have been a soft one, which is not relevant to any SLAs.


    Any input would be greatly appreciated!


    Regards,

    Markus

  • Do you have an influxDB or graphite running? if so, use them, they record the state of the check that is sent to them.

    Linux is dead, long live Linux


    Remember to NEVER EVER use git repositories in a productive environment if you CAN NOT control them

  • Thanks alot for this suggestion!

    I am indeed sending data to Graphite, though for completely different reasons (charting of perfdata).

    I assume this would also only work for checks which produce perfdata.

  • yeah it only works for those that send perfdata, but it shouldn't be any problem to setup the other checks.

    Linux is dead, long live Linux


    Remember to NEVER EVER use git repositories in a productive environment if you CAN NOT control them

  • There are many issues, and one or the other may be fixed. I cannot guarantee anything here, but really would appreciate if anyone steps up and looks into the code, testing a fix, sending a PR. Some basic instructions on a dev environment can be found in the INSTALL.md - some day, I'll write better development docs, also on my list.

  • Michi, thanks alot for your answer as well.

    I don't even want to imagine, what else is also on your list regarding the Icinga universe :-)


    I will have a look on how the value last_hard_state is written to the DB - maybe I can figure it out and propose a patch.

    No promises though, I'm no software engineer and my c++ is very rusty.

    The post was edited 1 time, last by mj84 ().

  • The state change logic hides in ProcessCheckResult() which you've probably already found. If you are willing to fix it on your own, and attempting it, we can guide you easier as well. That doesn't take too much time, and normally results in a happy user ;)


    Still, if I see state change logic and 10 lines of output, I already know that I need at least half a day to fully analyze and understand the problem. That's time I don't have atm, I get my pressure from other projects. In a perfect world I would fix things instantly, but lessons learned. 340 issues and 3 core developers. Guess what :)

  • I managed to resolve this issue and sent a PR:

    https://github.com/Icinga/icinga2/pull/5533


    Previously, the LastHardState was set to the current state, whenever a hard change occured.

    But since the last state is also available in that context, i modified the function that whenever the last state was a hard state, the LastHardState would be updated.