SNMP Switch discovery getting Service discovery failed for this host: Your request timed out after 110 seconds


#21

No, getting the same error (due to timeout)

Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request *[GET /verint/check_mk/wato.py]

Reason: Error reading from remote server


(Philipp Näther) #22

Can you check if this help to get rid of the proxy error?

https://lists.mathias-kettner.de/pipermail/checkmk-en/2017-May/022708.html


#23

Changed timeout to 600 but did not work


(Philipp Näther) #24

I am sorry but I am out of ideas right now. Maybe you go and try the mailing list to get an answer that helps you.


#25

Hi Philip,
The Proxy Error was solved after a reboot, I wonder what service I missed restarting for the change to revert.
Any suggestions?

In any case now I’m getting the error below so we managed to increase the timeout to 130 sec!! but we need more :slight_smile:

Service discovery failed for this host : Your request timed out after 130 seconds. This issue may be related to a local configuration problem or a request which works with a too large number of objects. But if you think this issue is a bug, please send a crash report.

[Retry discovery while ignoring this error (Result might be incomplete).]


(Philipp Näther) #26

So after having the proxy error resolved, could you try to simulate the snmpwalk like I suggested:


#27

Done, but getting the error below. Ho can I verify it is using the file?

Service discovery failed for this host : Your request timed out after 130 seconds. This issue may be related to a local configuration problem or a request which works with a too large number of objects. But if you think this issue is a bug, please send a crash report.

[Retry discovery while ignoring this error (Result might be incomplete).]


(Philipp Näther) #28

You could verify it by

a) simply disconnecting the switch (might be not applicable in your case)
b) use another snmp community in the host conf that has no right to read from the actual device
c) create a new host with a dummy IP address and select the stored walk for the new host with the WATO rule

Using the stored walk it definitely should not run into a timeout imho.


#29

Ok created a dummy snmp only host 10.10.10.10 with same comunity
Created a copy on one of the snmp files tin ~/var/check_mk/snmpwalks/10.10.10.10
Took some time but got the 2715 rows.
Then switched to a 4000 lines file and got the time out error, so I believ timeout happens on file based also

Service discovery failed for this host : Your request timed out after 130 seconds. This issue may be related to a local configuration problem or a request which works with a too large number of objects. But if you think this issue is a bug, please send a crash report.

[Retry discovery while ignoring this error (Result might be incomplete).]


(Philipp Näther) #30

And if you run cmk -vv --debug -II dummyhost on command line now, does that work?


#31

After a few seconds I get (just the tail below, did another GUI test I can confirm still timing out):

[piggyback] Execute data source
[piggyback] No persisted sections loaded

  • EXECUTING DISCOVERY PLUGINS (25)
    Trying discovery with: cisco_cpu, cisco_temperature, cisco_hsrp, rmon_stats, if64, cisco_cpu_multiitem, cisco_temp_sensor, cisco_power, cisco_fantray, if64adm, cisco_qos, cisco_mem_asa64, cisco_fan, cisco_redundancy, cisco_mem_asa, snmp_uptime, if64_tplink, cisco_mem, cisco_temperature.dom, fsc_if64, cisco_fru_module_status, hr_mem, cisco_secure, cisco_fru_powerusage, snmp_info
    1 cisco_cpu
    24 cisco_fan
    2 cisco_mem
    16 cisco_power
    4037 cisco_qos
    1 cisco_redundancy
    24 cisco_temperature
    209 if64
    1 snmp_info
    1 snmp_uptime
    SUCCESS - Found 4316 services

(Philipp Näther) #32

Well there you see what is probably causing you the timeouts.

So your dummy host now has over 4k services inventoried. But I wouldn’t recommend to restart the core right now. You probably do not want over 4k services just because the QoS plugin. So you have two options now:

a) If possible you disable QoS on the switch or configure the SNMP access so it does not provide this QoS data over SNMP anymore.
b) Go to the WATO rule “monitoring configuration -> Inventory and check_mk settings: disabled checks”, select “cisco_qos” to the list, apply the rule to an explicit host or a host tag

After that cmk should skip those 4k checks and so shouldn’t run into a timeout anymore.


#33

B option worked great on two of the hosts that had too many QOS hosts, will check the rest to see.
Thanks man :slight_smile:


(Philipp Näther) #34

Nice, if the rest runs ok aswell, please mark a post here as solution, thx.


#35

Great stuff thank you so much for your help!