CHECK_NRPE: Socket timeout after 10 seconds.

  • Guten Morgen,


    ich hab das Problem, dass in unregelmäßigen Abständen ich die Meldung „CHECK_NRPE: Socket timeout after 10 seconds.“ (bei hoher Systemauslastung des Clients, etc. ).

    Gibt es die Möglichkeit den Timeout für alle Abfragen zentral auf z.B. 30 oder 40 sec zu setzen, ohne jede Abfrage bearbeiten zu müssen?

    Des Weiteren habe ich die Frage ob man Icinga2 mitteilen kann das er erst nach X versuchen der Abfrage eine Mail versenden soll.


    Danke schonmal!

  • den Timeout wirst du wahrscheinlich bei jedem Check ändern müssen, da kommt man nicht drum rum.


    Deine zweite Frage betreffend, ja ist ohne probleme möglich, wenn du dem Notificationscript die zahl der Fails mitgibst.

    Da gibt es irgendwo in der Doku ein tolles beispiel, kann es leider gerade nicht finden.

    Linux is dead, long live Linux

  • Danke für die schnelle Antwort, meinst du diese Konfiguration?

    Code
    1. 3.7.2. Notification Delay
    2. apply Notification "mail" to Service { import "generic-notification"
    3. command = "mail-notification" users = [ "icingaadmin" ]
    4. interval = 5m
    5. times.begin = 15m // delay notification window
    6. assign where service.name == "ping4"
    7. }
  • ich kenn mich mit NRPE nicht sonderlich gut aus, vermute allerdings, dass der Timeout in der Konfiguration auf den Servern gesetzt werden muss.

    Linux is dead, long live Linux

  • Achso, kommunikationsproblem.


    Ich bezog mich auf den Timeout des NRPE.


    Was die verzögerung des Mailversandes angeht, wirst du dein Skript anpassen müssen, welches das Notificationcommand aufruft.

    Du kannst dem Skript, z.b. die Variable last_state übergeben und schauen ob es größer 0 ist und gleichzeitig die Variable check_attempt überprüfen, ob diese größer 2 ist, wenn du erst nach dem zweiten Fail benachrichtigt werden möchtest.


    Kann dir leider nicht genau sagen, wie man sowas am besten macht.

    Wenn du den Direktor im Einsatz hast, schau dir einfach mal die Variablen eines Services an, da dürftest du fündig werden.

    Linux is dead, long live Linux

  • Gibt es die Möglichkeit den Timeout für alle Abfragen zentral auf z.B. 30 oder 40 sec zu setzen, ohne jede Abfrage bearbeiten zu müssen?

    Wenn man das Standard nrpe command verwendet, kann man mit dem Parameter "vars.nrpe_timeout = 60" den Timeout von nrpe hochsetzen.

    Kannst du mal ein Beispiel deiner Konfiguration zeigen ?


    Außerdem kann man je nach fall auch die max_check_attempts erhöhen.


    times.begin = 15m // delay notification window

    Grundsätzlich sollte die Notification dadurch verzögert werden.

  • Meinst du so?


    apply Service "CPU-Load" {

    import "generic-service"

    max_check_attempts = 3

    check_interval = 2m

    retry_interval = 30s

    check_command = "XY"

    vars.nrpe_timeout = "120"

    assign where host.vars.os == "Linux"}




    apply Notification "mail-icingaadmin" to Service {

    import "mail-service-notification"

    times = {

    begin = 10m

    }


    user_groups = host.vars.notification.mail.groups

    users = host.vars.notification.mail.users


    times.begin = 15m // delay notification window

    interval = 0 // disable re-notification


    assign where host.vars.notification.mail

    }




  • HI,


    erditier mal die conf datei von "generic -service" und füge folgendes hinzu


    Code
    1. check_timeout = 120

    Dann hast dü für alle checks 2 Minuten timeout. Icinga2 hat einen default wert von 10s.

  • Hi there,


    I have a problem, but with another variation.

    I send manually from linux to windows:


    check_nrpe -H x.x.x.x -c hostcpuusage


    and it works fine including -t 5

    but in icinga2's dashboard I see:


    CHECK_NRPE: Socket timeout after 10 seconds.


    Any ideas?

  • Do icinga2 feature enable debuglog, then restart icinga2 and rerun that check and search in the debuglog for the commandline of your service.

    Debuglog should be /var/log/icinga2/debug.log.

    The post was edited 1 time, last by Mikesch ().

  • Sorry for the delay...

    I did debug.log but it's a huge one. I see in log chunks like this:


    [2017-07-17 13:19:40 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_nrpe' '-H' '10.192.137.23' '-c' 'processmetrics' '-a' '-n IFAB-TEST-DIVAEssenceMover_3@PFTHD04 -w @75:80 -c @80: -u http://10.192.137.22:10101/Dal…IVAEssenceMover_3@PFTHD04 -a 10.192.137.37:10102': PID 21325

    [2017-07-17 13:19:40 +0200] debug/CheckerComponent: Check finished for object 'PFTHD04!DIVAEssenceMover_3'

    [2017-07-17 13:19:40 +0200] notice/ApiListener: Relaying 'event::SetNextCheck' message

    [2017-07-17 13:19:40 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_servicestatus SET next_check = FROM_UNIXTIME(1500290440) WHERE instance_id = 1 AND service_object_id = 4483

    [2017-07-17 13:19:40 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_servicestatus SET acknowledgement_type = '0', active_checks_enabled = '1', check_command = 'check_dalet_storage', check_source = 'daletdasboard.local', check_type = '0', current_check_attempt = '1', current_notification_number = '0', current_state = '0', endpoint_object_id = 3898, event_handler = '', event_handler_enabled = '1', execution_time = '0.39321708679199219', flap_detection_enabled = '0', has_been_checked = '1', instance_id = 1, is_flapping = '0', is_reachable = '1', last_check = FROM_UNIXTIME(1500290380), last_hard_state = '0', last_hard_state_change = FROM_UNIXTIME(1500280321), last_notification = FROM_UNIXTIME(1499767672), last_state_change = FROM_UNIXTIME(1500280261), last_time_critical = FROM_UNIXTIME(1500279961), last_time_ok = FROM_UNIXTIME(1500290380), last_time_warning = FROM_UNIXTIME(1499697860), latency = '0', long_output = '', max_check_attempts = '2', next_check = FROM_UNIXTIME(1500290679), next_notification = FROM_UNIXTIME(1500290593), normal_check_interval = '5', notifications_enabled = '1', original_attributes = 'null', output = 'UNCPATHCHEKER OK - OK ', passive_checks_enabled = '1', percent_state_change = '0', perfdata = '\'Number Of Files\'=0 \'Unreferenced File Count\'=0.0 \'Volume Available HD Space\'=142805.8101851852s \'Volume Available SD Space\'=550822.4107142857s \'Volume Free Space\'=2021527060480B \'Volume Total Space\'=4649979478016B \'Volume Use Percent\'=56.52610790999615% \'Volume Used HD Space\'=185680.5601851852s \'Volume Used SD Space\'=716196.4464285715s \'Volume Used Space\'=2628452417536B Availability=1;;1: Availability=1;;1:', problem_has_been_acknowledged = '0', process_performance_data = '1', retry_check_interval = '1', scheduled_downtime_depth = '0', service_object_id = 4438, should_be_scheduled = '1', state_type = '1', status_update_time = FROM_UNIXTIME(1500290380) WHERE service_object_id = 4438

    [2017-07-17 13:19:40 +0200] debug/IdoMysqlConnection: Query: INSERT INTO icinga_logentries (endpoint_object_id, entry_time, entry_time_usec, instance_id, logentry_data, logentry_time, logentry_type, object_id) VALUES (3898, FROM_UNIXTIME(1500290380), '491137', 1, 'SERVICE ALERT: PFTHD08;Network;OK;SOFT;1;CHECKNETWORK OK - Bytes Received/sec is 0 ', FROM_UNIXTIME(1500290380), '8192', 4298)

    [2017-07-17 13:19:40 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_servicestatus SET acknowledgement_type = '0', active_checks_enabled = '1', check_command = 'nrpe', check_source = 'daletdasboard.local', check_type = '0', current_check_attempt = '1', current_notification_number = '0', current_state = '0', endpoint_object_id = 3898, event_handler = '', event_handler_enabled = '1', execution_time = '0.60030984878540039', flap_detection_enabled = '0', has_been_checked = '1', instance_id = 1, is_flapping = '0', is_reachable = '1', last_check = FROM_UNIXTIME(1500290380), last_hard_state = '0', last_hard_state_change = FROM_UNIXTIME(1500290140), last_notification = FROM_UNIXTIME(1499810705), last_state_change = FROM_UNIXTIME(1500290380), last_time_critical = FROM_UNIXTIME(1500290329), last_time_ok = FROM_UNIXTIME(1500290380), last_time_unknown = FROM_UNIXTIME(1499707464), latency = '0', long_output = '', max_check_attempts = '2', next_check = FROM_UNIXTIME(1500290439), next_notification = FROM_UNIXTIME(1500290593), normal_check_interval = '1', notifications_enabled = '1', original_attributes = 'null', output = 'CHECKNETWORK OK - Bytes Received/sec is 0 ', passive_checks_enabled = '1', percent_state_change = '0', perfdata = '\'Bytes Received/sec\'=0 \'Bytes Sent/sec\'=0 \'Bytes Total/sec\'=0 \'Network current bandwidth\'=1000000000 \'Output Queue Length\'=0 \'Packets Outbound Discarded\'=0 \'Packets Outbound Errors\'=0 \'Packets Received Discarded\'=0 \'Packets Received Errors\'=0', problem_has_been_acknowledged = '0', process_performance_data = '1', retry_check_interval = '1', scheduled_downtime_depth = '0', service_object_id = 4298, should_be_scheduled = '1', state_type = '0', status_update_time = FROM_UNIXTIME(1500290380) WHERE service_object_id = 4298

    [2017-07-17 13:19:40 +0200] debug/IdoMysqlConnection: Query: INSERT INTO icinga_statehistory (check_source, current_check_attempt, endpoint_object_id, instance_id, last_hard_state, last_state, long_output, max_check_attempts, object_id, output, state, state_change, state_time, state_time_usec, state_type) VALUES ('daletdasboard.local', '1', 3898, 1, '0', '2', '', '2', 4298, 'CHECKNETWORK OK - Bytes Received/sec is 0 ', '0', '1', FROM_UNIXTIME(1500290380), '490911', '0')

    [2017-07-17 13:19:40 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_servicestatus SET acknowledgement_type = '0', active_checks_enabled = '1', check_command = 'nrpe', check_source = 'daletdasboard.local', check_type = '0', current_check_attempt = '1', current_notification_number = '0', current_state = '0', endpoint_object_id = 3898, event_handler = '', event_handler_enabled = '1', execution_time = '0.67568111419677734', flap_detection_enabled = '0', has_been_checked = '1', instance_id = 1, is_flapping = '0', is_reachable = '1', last_check = FROM_UNIXTIME(1500290380), last_hard_state = '0', last_hard_state_change = FROM_UNIXTIME(1500068831), last_notification = FROM_UNIXTIME(1499767578), last_state_change = FROM_UNIXTIME(1500068780), last_time_critical = FROM_UNIXTIME(1500068729), last_time_ok = FROM_UNIXTIME(1500290380), last_time_unknown = FROM_UNIXTIME(1499897022), latency = '0', long_output = '', max_check_attempts = '2', next_check = FROM_UNIXTIME(1500290439), next_notification = FROM_UNIXTIME(1500290593), normal_check_interval = '1', notifications_enabled = '1', original_attributes = 'null', output = 'PROCESSLOAD OK - OK ', passive_checks_enabled = '1', percent_state_change = '0', perfdata = '\'#Notifications sent per minute\'=0 \'Average Queue Duration\'=0s \'Average Queued Calls\'=0 \'Average Transaction Duration\'=1s \'Client Communication Errors per Minute\'=0 \'Connected Clients\'=5 \'Peak Queue Duration\'=1s \'Peak Queued Calls\'=0 \'Peak Transaction Duration per Minute\'=1 \'Transactions per Minute\'=1 Availability=1;;1;0;1 ElapsedTime=1121722s;;;0;2147483647 HandleCount=792;;;0;16744434 PageFaultsPersec=0;;;0;2000000 PercentPrivilegedTime=0;;;0;100 PercentProcessorTime=0%;;;0;100 PercentUserTime=0%;;;0;100 ThreadCount=61;;;0;3200 VirtualBytes=4095.99999kB;;;0;7812.5 WorkingSet=134912kB;;0;0;4000000', problem_has_been_acknowledged = '0', process_performance_data = '1', retry_check_interval = '1', scheduled_downtime_depth = '0', service_object_id = 4400, should_be_scheduled = '1', state_type = '1', status_update_time = FROM_UNIXTIME(1500290380) WHERE service_object_id = 4400

    [2017-07-17 13:19:40 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_servicestatus SET next_check = FROM_UNIXTIME(1500290440) WHERE instance_id = 1 AND service_object_id = 4542

    [2017-07-17 13:19:40 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_servicestatus SET next_check = FROM_UNIXTIME(1500290440) WHERE instance_id = 1 AND service_object_id = 4519

    [2017-07-17 13:19:40 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_servicestatus SET next_check = FROM_UNIXTIME(1500290440) WHERE instance_id = 1 AND service_object_id = 4651

    [2017-07-17 13:19:40 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_servicestatus SET next_check = FROM_UNIXTIME(1500290440) WHERE instance_id = 1 AND service_object_id = 4305

    [2017-07-17 13:19:40 +0200] notice/Process: PID 21325 ('/usr/lib/nagios/plugins/check_nrpe' '-H' '10.192.137.23' '-c' 'processmetrics' '-a' '-n IFAB-TEST-DIVAEssenceMover_3@PFTHD04 -w @75:80 -c @80: -u http://10.192.137.22:10101/Dal…IVAEssenceMover_3@PFTHD04 -a 10.192.137.37:10102') terminated with exit code 255

    [2017-07-17 13:19:40 +0200] warning/PluginCheckTask: Check command for object 'PFTHD04!DIVAEssenceMover_3' (PID: 21325, arguments: '/usr/lib/nagios/plugins/check_nrpe' '-H' '10.192.137.23' '-c' 'processmetrics' '-a' '-n IFAB-TEST-DIVAEssenceMover_3@PFTHD04 -w @75:80 -c @80: -u http://10.192.137.22:10101/Dal…IVAEssenceMover_3@PFTHD04 -a 10.192.137.37:10102') terminated with exit code 255, output: connect to address 10.192.137.23 port 5666: Connection refused

    connect to host 10.192.137.23 port 5666: Connection refused


    Port 5666 (nscp.exe) sometimes refused. Why?

    If you need something exactly, please, tell me.