Iftrafic64 CRITICAL

metrics
director
icinga2
icingaweb2

(Nikita) #1

Hey. Such a problem, monitor sweat swatches, I take traffic from them. At the moment I created a host named its name on the switch, and configured each port as a separate service with the plugin iftraffic64. Everything works fine, but here some ports for some reason give me such values. Use 64 bit counters. For example, 28 ports, 25 of them show the value of app, and 3 ports write a mistake every 8 minutes. And every 8 minutes it changes.
CRITICAL - IN bandwidth (7378697629486.55%) too high

I used the plugin https://exchange.icinga.com/exchange/check_iftraffic64

Tell me how to fix this problem?


(Christoph Niemann) #2

Hi,

can you provide the complete command line?

Is it possible that you use 32bit counters (–32bit). That would explain this behavior, that some ports work as long as the counter is inside the 32bit range.


(Nikita) #3

At the moment my port 8 is again a critic
–64bit(default)
CRITICAL - IN bandwidth (307445734567.48%) too high|inUsage=307445734567.48%;85;98 outUsage=1.75%;85;98 inBandwidth=384307168209348800.00B outBandwidth=2183465.54B inAbsolut=18446743903200122396c outAbsolut=312458554451c

–32bit
CRITICAL - IN bandwidth (12498.62%) too high|inUsage=12498.62%;85;98 outUsage=0.00%;85;98 inBandwidth=15623277779.91B outBandwidth=0.00B inAbsolut=1409272136c outAbsolut=312483901191


(Chris) #4

for the bandwidth check usually a temp file is used.
Please check this and look if it is owned by the wrong user or has the worng permissions.

The file location can be found in the script.


(Nikita) #5

Yes there is a file, the rights are correct. As I wrote above 8 minutes error, 8 minutes all is well.


#6

The inAbsolut value is close to 2^64 so a kind of overflow might occur shortly after. I’m not sure how these overflows are handled in check_iftraffic64.On the other hand you’re surely not the first using this plugin and I can’t remember anyone else reporting such problems.

Can these 8 minute intervals be related to check_intervals / regulars value overflows / other obvious issues?


(Chris) #7

please share your service definition and the checkcommand .
Did you define the link speed ? The plugin might fil sometimes to detect it on the network device


(Nikita) #8

Unfortunately, I do not know, it’s possible that the values overflow, but then the graph will load the graph of the load in a non-project.


(Nikita) #9

Port on the switch 100mb
./check_iftraffic64.pl -H 192.168.1.1 -C Comm-i 8 -b 1000 -u m -v 2 -I 1000 -O 1000
CRITICAL - IN bandwidth (508875698585.31%) too high|inUsage=508875698585.31%;85;98 outUsage=0.02%;85;98 inBandwidth=636094623231640064.00B outBandwidth=25200.52B inAbsolut=18446744040124567496c outAbsolut=351318022313c


#10

Looking at bandwidth values like “636094623231640064” (636.094.623.231.640.064) seems to be a very fast interface (a lot more than a 100mb interface can handle).
It’d be helpful to see a history of these values during a period of 15 (?) minutes to get a feeling if these values grow over time or jump to irrelevant values at some time.

Edit: please check whether the counters are actually 64bit.


(Nikita) #11

Sorry 1000M I missed 1 zero.

$snmpIfSpeed iso.3.6.1.2.1.31.1.1.1.15.8 = Gauge32: 1000

$snmpIfSpeed32 iso.3.6.1.2.1.2.2.1.5.8 = Gauge32: 1000000000

Judging by the plugin’s information

–32bit FLAG
Set to use 32 bit counters instead of 64 bit (default: 64 bit). default: 64 bit


(Nikita) #12

Please tell me the solution.


(Michael Friedrich) #13

Please read the FAQ and do not stress it. Thanks.


#14

It seems you already have found it yourself, doesn’t it?


(Nikita) #15

No =( Have not found.


#16

What about testing using this option?


(Nikita) #17

I tested, and provided the result above.


#18

Indeed. Sorry.

So once more:

It’d be helpful to see a history of these values during a period of 15 (?) minutes to get a feeling if these values grow over time or jump to irrelevant values at some time.

BTW: Please format code and similar snippets pressing the “</>” icon at the top of your current post.


(Nikita) #19

The last log on the port, from the bottom up.

 -->"OK - Average IN: 461.25KB (0.37%), Average OUT: 270.99KB (0.22%)Total RX: 292.01GBytes, Total TX: 779.64GBytes","hard_state"
    -->"CRITICAL - OUT bandwidth (7378697629484.01%) too high","hard_state"
    -->"[ 1/2 ] CRITICAL - OUT bandwidth (245956572989.53%) too high","soft_state"
    -->"0","OK - Average IN: 316.33KB (0.25%), Average OUT: 286.61KB (0.23%)Total RX: 287.51GBytes, Total TX: 775.26GBytes","hard_state"
    -->"CRITICAL - OUT bandwidth (7378697629487.56%) too high","hard_state"
    -->"[ 1/2 ] CRITICAL - OUT bandwidth (245956572993.09%) too high","soft_state"
    -->"OK - Average IN: 4.05MB (3.24%), Average OUT: 5.02MB (4.02%)Total RX: 282.47GBytes, Total TX: 771.48GBytes","hard_state"
    -->"CRITICAL - OUT bandwidth (7378697629484.13%) too high","hard_state"
    -->"[ 1/2 ] CRITICAL - OUT bandwidth (245956572989.57%) too high","soft_state"
    -->"OK - Average IN: 446.89KB (0.36%), Average OUT: 3.17MB (2.54%)Total RX: 277.70GBytes, Total TX: 767.03GBytes","hard_state"
    -->"CRITICAL - OUT bandwidth (7378697629484.69%) too high","hard_state"
    -->"[ 1/2 ] CRITICAL - OUT bandwidth (245956572990.66%) too high","soft_state"
    -->"OK - Average IN: 12.72MB (10.18%), Average OUT: 708.11KB (0.57%)Total RX: 257.35GBytes, Total TX: 762.55GBytes","hard_state"
    -->"CRITICAL - OUT bandwidth (7378697629485.02%) too high","hard_state"
    -->"[ 1/2 ] CRITICAL - OUT bandwidth (245956572992.88%) too high","soft_state"
    -->"OK - Average IN: 2.72MB (2.18%), Average OUT: 3.03MB (2.42%)Total RX: 205.72GBytes, Total TX: 758.42GBytes","hard_state"
    -->"CRITICAL - OUT bandwidth (7378697629484.44%) too high","hard_state"
    -->"[ 1/2 ] CRITICAL - OUT bandwidth (245956572989.93%) too high","soft_state"
    -->"OK - Average IN: 4.96MB (3.97%), Average OUT: 4.68MB (3.74%)Total RX: 187.66GBytes, Total TX: 754.11GBytes","hard_state"
    -->"2","CRITICAL - OUT bandwidth (14757395258972.17%) too high","hard_state"
    -->"2","[ 1/2 ] CRITICAL - OUT bandwidth (245956572991.45%) too high","soft_state"
    -->"0","OK - Average IN: 2.80MB (2.24%), Average OUT: 3.18MB (2.54%)Total RX: 182.91GBytes, Total TX: 749.77GBytes","hard_state"
    -->,"CRITICAL - OUT bandwidth (14757395258972.94%) too high","hard_state"
    -->"[ 1/2 ] CRITICAL - OUT bandwidth (245956572991.84%) too high","soft_state"
    -->"OK - Average IN: 2.64MB (2.11%), Average OUT: 2.71MB (2.17%)Total RX: 178.98GBytes, Total TX: 745.44GBytes","hard_state"
    -->"2","CRITICAL - OUT bandwidth (7378697629483.91%) too high","hard_state"
    -->"2","[ 1/2 ] CRITICAL - OUT bandwidth (245956572989.41%) too high","soft_state"
    -->"OK - Average IN: 1.23MB (0.99%), Average OUT: 1.51MB (1.21%)Total RX: 171.67GBytes, Total TX: 740.98GBytes","hard_state"
    -->"CRITICAL - OUT bandwidth (7378697629483.87%) too high","hard_state"

#20

The output differs from the former snippets shown and seems to be incomplete in structure (sometimes with state values, something without) and length (I’d have expected a period including a switch from OK to non-OK).

Edit: as the values are way too high I’d check whether the specified counters are the right ones on/for your hardware.