Posts by withanHdammit

This forum was archived to /woltlab and is now in read-only mode.

    I have a vLAN that I want to monitor some devices on but when I add a second NIC to my Icinga2 host (Debian 8/Jessie) all of the networking stops working.


    I have configured them on separate vLANs but nothing seems to matter. I add the second NIC, all networking goes offline. I remove the second NIC, all networking works correctly.


    Why is all networking going offline when I add a second NIC and what can I do differently to get the second NIC to connect?


    I know it's a little vague, I'm happy to share configs, let me know what you want to see and I'll update the post with the info.


    Thanks!!


    h

    Sorry for not being more verbose. It means that it appeared to still be graphing the perfdata. I moved the command to the service definition template and that worked.


    Code
    1. template Service "wmi-service" {
    2. import "generic-service"
    3. check_command = "check_wmi"
    4. if (vars.wmi_mode == "checkservice") {
    5. enable_perfdata = false
    6. }
    7. }

    I have a service template set up that as part of the template enables processing perfdata:

    Code
    1. template Service "wmi-service" {
    2. import "generic-service"
    3. check_command = "check_wmi"
    4. }
    Code
    1. apply Service "svc - Spooler" {
    2. import "wmi-service"
    3. assign where match("*Spooler*",host.vars.services)
    4. vars.wmi_mode = "checkservice"
    5. vars.wmi_arga = "Spooler"
    6. vars.wmi_warn = "1"
    7. vars.wmi_crit = "1"
    8. }

    I attempted to disable the perfdata in the service by this:

    Code
    1. apply Service "svc - Spooler" {
    2. import "wmi-service"
    3. assign where match("*Spooler*",host.vars.services)
    4. enable_perfdata = false
    5. vars.wmi_mode = "checkservice"
    6. vars.wmi_arga = "Spooler"
    7. vars.wmi_warn = "1"
    8. vars.wmi_crit = "1"
    9. }

    Is this all it takes? It looks like the perfdata is still being processed...

    No 64bit counter available, but the first one is workable, that's kind of where I was leaning, so thanks for confirming. I actually reduced the calculation down a bit and decided that if the bandwidth used was less than 0 to add 4294967296 to the value and the graphs have been looking good since!


    Code
    1. def get_elapsed(current_value,last_value):
    2. if current_value >= last_value:
    3. elapsed_value = current_value - last_value
    4. else:
    5. elapsed_value = current_value - last_value + 4294967296
    6. return elapsed_value

    I'm writing a check to monitor bits per second on a given interface. I've got the check working, however I'm finding an interesting anomaly with the SNMP value.The values I'm checking are the standard SNMP octet records:
    sysUpTime: .1.3.6.1.2.1.1.3.0
    ifInOctets: .1.3.6.1.2.1.2.2.1.10
    ifOutOctets: .1.3.6.1.2.1.2.2.1.16



    Essentially what I'm doing is grabbing the current time (in timeticks and converting to seconds), current ifInOctets, and current ifOutOctets, writing those to a state file, then on the next check read the state file into temporary variables to subtract from current to give me time elapsed, in-bytes elapsed, and out-bytes elapsed, then do simple division to find in-bytes per second and out-bytes per second, convert those to bits per second and report the output.



    The issue is that the ifInOctets and ifOutOctets appear to reset after about 4GB. The value says it's Counter 32, which I think means its a 32 bit counter (not sure if that's relevant). The issue comes in when the counter resets to 0 and starts counting over. My check says that there was negative 4GB(ish) used, which then throws the graphs off. I know that I can tell it if the value is less than zero to count it as zero, which will prevent the negative value, but still gives me an invalid reading every now and then. I'm not sure what the exact value is that it resets, but again, appears to be around the 4GB mark (based on what my graphs are showing).



    My question is how do I account for the counter resetting, especially if I don't know it's actual reset value? Do I just assume it resets at 4GB (I hate assuming, I'd rather KNOW!)?


    The check code is (Python):

    Inbound graphs:
    [Blocked Image: https://c2.staticflickr.com/6/5481/29544448704_8b88c2844b_o.png]
    Outbound graphs:
    [Blocked Image: https://c1.staticflickr.com/9/8624/29877703070_bebdfdcafe_b.jpg]

    It seems to me Icinga may be perfectly suited to that scenario. In fact, you can have Icinga send you email notifications when problems arise so you can call your customer and say "hey, there's a problem with such and such device". Give you a more proactive approach to the large home networks. Sell it as an additional service :-)

    What's interesting on this output power check is one of the values uses a non-standard label on the perfdata, yet it shows in IcingaWeb2 as expected.
    <plugin_output>
    Critical - UPS is on battery (3), Output power: 119.3 VAC, Output frequency: 59.9 Hz, Output load: 24.7%|'Output voltage'=119.3VoltsAC 'Output frequency'=60.0Hz 'Output load'=24.7% 'Output current'=2.1Amps 'Output status'=3 'Energy Used'=3079.89kWa
    </plugin_output>

    The negative value thing isn't it. I just added a new check today that has the negative values. I'm sure it's the non-compliant labels.


    Looks like that's the deal, bummer. Would be nice to use my own labels on the perf charts. I also noticed I'm running IcingaWeb2 v2.3.4 (and Icinga2 v2.4.1) so maybe I'll upgrade this weekend.


    That doesn't appear to be the cause. On another of my service checks, I'm looking at wifi bridge statistics. My command line output looks like this:
    OK - SSID 6283185N Signal Strength: -52dBm|'SSID 6283185N Signal Strength'=-52dBm 'SSID 6283185N Signal to Noise Ratio'=44dB 'SSID 6283185N Client Connection Quality'=33% 'SSID 6283185N Noise Floor'=-92dB 'SSID 6283185N TX rate'=866700 'SSID 6283185N RX rate'=866700 'SSID 6283185N Channel Width'=80MHz


    And the webview shows this (I can grab a screenshot if necessary but gotta leave for a meeting right now):
    LabelValue
    SSID 6283...hannel Width0.00
    SSID 6283...nal Strength0.00
    SSID 6283... Noise Floor0.00
    SSID 6283...tion Quality33%
    SSID 6283... Noise Ratio39.00
    SSID 6283185S TX rate866,700.00
    SSID 6283185S RX rate866,700.00



    Note the third line (noise floor) shows 0 and the perfdata label is "dB"
    Note the fifth line (noise ratio) shows a value and the perfdata label is "dB"


    Same label, different result... Although it is fair to say that the noise ratio is a positive value and the noise floor is a negative value. And both are graphing correctly in pnp4nagios.

    I have a framework script written in Python that I have adapted and use for SNMP queries. I haven't uploaded it to any of the plugin exchanges yet but I'd be happy to share if you'd like. It can do temperature conversions (from C to F or F to C, does Byte conversions (to KB, MB, etc.), does math on values retrieved (like converting timeticks to seconds), lots of stuff...


    I am troubleshooting a bug that only sometimes happens with the perfdata output but the plugin still works well and creates graphs (at least with pnp4nagios).

    I wrote a check to monitor an APC UPS via SNMP. For the perfdata, on some of the data points, the text view shows 0 but the graphs are populating correctly. When running the check at the command line it also shows values for the perfdata.


    Command line output:
    OK - Battery Status is normal|Status=2 'Time on battery'=0Seconds Capacity=100.0% 'Battery temp'=83.8deg_F 'Runtime remaining'=59Minutes Volts=27.1Volts


    The webview of the same check shows this:
    [Blocked Image: https://drive.google.com/open?…1OGc-gkmRmajI4bEZZekdDQkE]
    https://drive.google.com/open?…1OGc-gkmRmajI4bEZZekdDQkE


    I don't understand why the Runtime Remaining, Battery Temp, and Battery voltage checks show 0, but they clearly have values and the graphs are populating properly:
    [Blocked Image: https://drive.google.com/open?…1OGc-gkmRmblNYT1hvdVZQSms]
    https://drive.google.com/open?…1OGc-gkmRmblNYT1hvdVZQSms

    Is there a way to expand a command so I can see exactly what Icinga2 is sending for the check?


    I have an snmp printer check that when I run from the command line works fine, but when run from Icinga2 it tells me there is an invalid version for the "-v"
    Check Command:


    Host Definition:


    Icinga2 Output:


    Command line check:


    Command line result:

    Is there a way to change the default for sticky acknowledgements? I do not want my acknowledgements to be sticky, if a state goes from warning to critical after it has been acknowledged, I want to know about the state change, but if if the sticky bit is set, I don't get the new alert thus I would like to have the default to be unchecked.
    [Blocked Image: https://db.tt/xNgGOunt]