Performance data issue with check_disk

  • Hi all!


    We have the problem, that disk service checks go correctly CRITICAL, but their performance data and respectively their representation in Icinga Web 2 does not reflect the values.


    Today, one of our "disk" services went critical =O

    Code
    1. DISK CRITICAL - free space: / 9 MB (0% inode=38%):

    When I log into that host and look at the disk space usage, I see: uh, it's really full ;) It reports 9x 1M block as free, and that seems ok with the CRITICAL service output (free space 9MB), but in fact, 2328 - 2187 is not 9... :/

    Code
    1. # df -m
    2. ...
    3. /dev/dm-1 2328 2187 9 100% /
    4. ...

    When running the service check, that's its output:

    Code
    1. output: "DISK CRITICAL - free space: / 9 MB (0% inode=38%);",
    2.       performance_data: [
    3.         "/=2187MB;2233;2280;0;2327",
    4.         "...",
    5.       ],

    As you can see, used space is 2187MB - here's where my problem starts: performance data shows an actual value of 2,14 GiB, and this value is below all thresholds. That's why the performance data still show the service "green" while the service check itself is CRITICAL - I have no clue why.




    This behaviour is new for us, and more than one machine is affected (still working on). Up to now our disk check performance data went definitely red when service check result was CRITICAL. Do you have an idea what's going wrong there? Thank you very much <3


    Cheers,

    Marianne



    PS: I defined the check_disk service as follows:

  • Icinga Web 2 just parses the performance data string whereas 'label'=value;warn;crit applies.


    If you look at the warning and critical values, you can see that those thresholds are never reached in regards of performance data. This is exactly what the plugin returns.


    The actual service state is different from performance data. The plugin calculates that itself.


    I would investigate on why "warn" and "crit" are different to what you've configured as command parameters (in percent).

  • I just did some checking on one of my servers and it seems the performancedata generated by the check_disk plugin are kinda strange.


    Example:

    Code
    1. output: "DISK OK - free space: / 2119 MB (32% inode=66%);",
    2. performance_data: [
    3. "/=4432MB;6233;6579;0;6926"
    4. ],

    But when you actually subtract 4432 from 6926 the free Space is: 2494


    I'm not sure where the error comes from.

    Maybe I'm just stupid and there is some underlying stuff happening that causes this, not sure though.

    Linux is dead, long live Linux


    Remember to NEVER EVER use git repositories in a productive environment if you CAN NOT control them

  • I have seen this too,


    its because the checkplugin reads the free space from some unknown source ( at least for me :) ) but calculates the free space for performancedata itself. So these two values have nothing to do with each other and so they will never be the same.


    The output of the checkplugin uses the same value for free space as df, while the perfdata calculates free space from "size - used". I think the value of "free" calculates some other things into it, like free blocks or other filesystem usage like inode table and so on, so its always smaller than size - used.


    Be warned, there is a lot of guessing in this post :)


    Edit: some more information why there could be different results: https://www.cyberciti.biz/tips…rts-different-output.html

  • well seeing as one of those is almost 5 Years old... I do not think it will get fixed. Time to write my own disk check then

    Linux is dead, long live Linux


    Remember to NEVER EVER use git repositories in a productive environment if you CAN NOT control them

  • thought about using python for it, as I really dislike bash and do also not know anything about golang :D maybe worth a shot though

    Linux is dead, long live Linux


    Remember to NEVER EVER use git repositories in a productive environment if you CAN NOT control them

  • Looking at the source code this might be the problem:

    Code
    1. /* What a mess of units. The output shows free space, the perf data shows used space. Yikes!
    2. Hack here. Trying to get warn/crit levels from freespace_(units|percent) for perf
    3. data. Assumption that start=0. Roll on new syntax...
    4. */
  • Isnt it maybe an issue with the different units ?

    So, the Plugin checks the Space in Bytes(?) and due the configuration the Size/Space will be displayed/calculated in GB ?


    Hope for some thoughts, thanks ;)

  • already checked that, the calculation seems sound as wolfgang and mcktr already said, this seems to be a bug due to the usage of different programs to get the data.

    Linux is dead, long live Linux


    Remember to NEVER EVER use git repositories in a productive environment if you CAN NOT control them