Pinned [SOLVED] illegal attempt to update using time

  • [SOLVED] illegal attempt to update using time

    Hallo Monitorer,
    ich habe nun das gesamte Internet nach einer Lösung durchsucht, jedoch nicht wirklich was gefunden.

    PNP funktionierte wunderbar, bis ich letzte Woche auf Icinga 1.7.1 geupdatet hab.

    perfdata.log

    Source Code

    1. RRDs::update ERROR /usr/local/pnp4nagios/var/perfdata/azs-dt100-02/Network_Ping.rrd: illegal attempt to update using time 1342521488 when last update time is 1342521488 (minimum one second step)


    Soll ja glaub ich bedeuten, 2 Checks zur gleichen Zeit.

    Von diesen Kandidaten gibt es immer so 1-2 pro Minute, mal Ping mal CPU Load.

    Nur habe ich nichts geändert, bis auf das Icinga Update. Aus Verzweiflung habe ich dann noch RRD, sowie PNP aktualisiert.

    Hier das Ergebnis vom Verify-Script

    Source Code

    1. [INFO] ========== Starting Environment Checks ============
    2. [INFO] My version is: pnp4nagios-head
    3. [INFO] Reading /usr/local/icinga/etc/icinga.cfg
    4. [OK ] Running product is 'icinga'
    5. [OK ] object_cache_file is defined
    6. [OK ] object_cache_file=/usr/local/icinga/var/objects.cache
    7. [INFO] Reading /usr/local/icinga/var/objects.cache
    8. [OK ] resource_file is defined
    9. [OK ] resource_file=/usr/local/icinga/etc/resource.cfg
    10. [INFO] Reading /usr/local/icinga/etc/resource.cfg
    11. [INFO] Reading /usr/local/pnp4nagios/etc//process_perfdata.cfg
    12. [INFO] Reading /usr/local/pnp4nagios/etc//pnp4nagios_release
    13. [OK ] Found PNP4Nagios version "0.6.18"
    14. [OK ] ./configure Options '--with-nagios-user=icinga' '--with-nagios-group=icinga' '--with-rrdtool=/opt/rrdtool-1.4.7/bin/rrdtool'
    15. [OK ] Effective User is 'icinga'
    16. [OK ] User icinga exists with ID '1001'
    17. [OK ] Effective group is 'icinga'
    18. [OK ] Group icinga exists with ID '1002'
    19. [INFO] ========== Checking Bulk Mode + NPCD Config ============
    20. [OK ] process_performance_data is 1 compared with '/1/'
    21. [OK ] service_perfdata_file is defined
    22. [OK ] service_perfdata_file=/usr/local/pnp4nagios/var/service-perfdata
    23. [OK ] service_perfdata_file_template is defined
    24. [OK ] service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$
    25. [OK ] PERFDATA template looks good
    26. [OK ] service_perfdata_file_mode is defined
    27. [OK ] service_perfdata_file_mode=a
    28. [OK ] service_perfdata_file_processing_interval is defined
    29. [OK ] service_perfdata_file_processing_interval=15
    30. [OK ] service_perfdata_file_processing_command is defined
    31. [OK ] service_perfdata_file_processing_command=process-service-perfdata-file
    32. [OK ] host_perfdata_file is defined
    33. [OK ] host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdata
    34. [OK ] host_perfdata_file_template is defined
    35. [OK ] host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$
    36. [OK ] PERFDATA template looks good
    37. [OK ] host_perfdata_file_mode is defined
    38. [OK ] host_perfdata_file_mode=a
    39. [OK ] host_perfdata_file_processing_interval is defined
    40. [OK ] host_perfdata_file_processing_interval=15
    41. [OK ] host_perfdata_file_processing_command is defined
    42. [OK ] host_perfdata_file_processing_command=process-host-perfdata-file
    43. [INFO] Icinga config looks good so far
    44. [INFO] ========== Checking config values ============
    45. [OK ] npcd daemon is running
    46. [OK ] /usr/local/pnp4nagios/etc/npcd.cfg is used by npcd and readable
    47. [INFO] Reading /usr/local/pnp4nagios/etc/npcd.cfg
    48. [OK ] perfdata_spool_dir is defined
    49. [OK ] perfdata_spool_dir=/usr/local/pnp4nagios/var/spool/
    50. [OK ] Command process-service-perfdata-file is defined
    51. [OK ] '/bin/mv /usr/local/pnp4nagios/var/service-perfdata /usr/local/pnp4nagios/var/spool/service-perfdata.$TIMET$'
    52. [OK ] Command looks good
    53. [OK ] Command process-host-perfdata-file is defined
    54. [OK ] '/bin/mv /usr/local/pnp4nagios/var/host-perfdata /usr/local/pnp4nagios/var/spool/host-perfdata.$TIMET$'
    55. [OK ] Command looks good
    56. [OK ] Script /usr/local/pnp4nagios/libexec/process_perfdata.pl is executable
    57. [INFO] ========== Starting global checks ============
    58. [OK ] status_file is defined
    59. [OK ] status_file=/usr/local/icinga/var/status.dat
    60. [INFO] Reading /usr/local/icinga/var/status.dat
    61. [INFO] ==== Starting rrdtool checks ====
    62. [OK ] RRDTOOL is defined
    63. [OK ] RRDTOOL=/opt/rrdtool-1.4.7/bin/rrdtool
    64. [OK ] /opt/rrdtool-1.4.7/bin/rrdtool is executable
    65. [OK ] RRDtool 1.4.7 Copyright 1997-2012 by Tobias Oetiker <tobi@oetiker.ch>
    66. [OK ] USE_RRDs is defined
    67. [OK ] USE_RRDs=1
    68. [OK ] Perl RRDs modules are loadable
    69. [INFO] ==== Starting directory checks ====
    70. [OK ] RRDPATH is defined
    71. [OK ] RRDPATH=/usr/local/pnp4nagios/var/perfdata
    72. [OK ] Perfdata directory '/usr/local/pnp4nagios/var/perfdata' exists
    73. [WARN] 1298 hosts/services are not providing performance data
    74. [WARN] 'process_perf_data 1' is set for 1299 hosts/services which are not providing performance data!
    75. [OK ] 'process_perf_data 1' is set for 2380 of your hosts/services
    76. [INFO] ==== System sizing ====
    77. [OK ] 2379 hosts/service objects defined
    78. [INFO] ==== Check statistics ====
    79. [WARN] Warning: 2, Critical: 0
    80. [WARN] Checks finished...
    Show All


    Gibt es Lösungsvorschläge?
    ./stiftmaster

    The post was edited 1 time, last by stiftmaster ().

  • icinga.debug

    Source Code

    1. [1342529970.063387] [016.0] [pid=30139] ** Handling check result for service 'Processes -Total Processes' on host 'srv-vc01'...
    2. [1342529970.063390] [016.1] [pid=30139] HOST: srv-vc01, SERVICE: Processes -Total Processes, CHECK TYPE: Active, OPTIONS: 0, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 0, OUTPUT: Processes = 62,00 Processes | Processes=62,000000Processes;0,000000;0,000000;\n
    3. [1342529970.063400] [016.1] [pid=30139] Service is OK.
    4. [1342529970.063404] [016.1] [pid=30139] Service did not change state.
    5. [1342529970.063410] [016.1] [pid=30139] Rescheduling next check of service at Tue Jul 17 15:00:27 2012
    6. [1342529970.063417] [016.0] [pid=30139] Scheduling a non-forced, active check of service 'Processes -Total Processes' on host 'srv-vc01' @ Tue Jul 17 15:00:27 2012
    7. [1342529970.063466] [016.1] [pid=30139] Checking service 'Processes -Total Processes' on host 'srv-vc01' for flapping...
    8. [1342529970.063472] [016.1] [pid=30139] Service is not flapping (0.00% state change).
    9. [1342529970.063476] [016.1] [pid=30139] Checking host 'srv-vc01' for flapping...
    10. [1342529970.063479] [016.1] [pid=30139] Host is not flapping (0.00% state change).
    11. [1342529970.063500] [016.1] [pid=30139] Deleted check result file '/usr/local/icinga/var/spool/checkresults/c6Eg2qz'
    Show All


    perfdata.log

    Source Code

    1. 2012-07-17 14:59:42 [30179] [0] RRDs::update /usr/local/pnp4nagios/var/perfdata/srv-vc01/Processes_-Total_Processes.rrd 1342529970:62.000000
    2. 2012-07-17 14:59:42 [30179] [0] RRDs::update ERROR /usr/local/pnp4nagios/var/perfdata/srv-vc01/Processes_-Total_Processes.rrd: illegal attempt to update using time 1342529970 when last update time is 1342529970 (minimum one second step)


    Oder welches Log meinst Du
  • ich meinte icinga.debug, aber ueber die zeit gesehen - wenn du sagst, dass da checks innerhalb der selben sekunde stattfinden, dann moechte ich das in eingem groben interval sehen - sagen wir mal 5 bis 10 minuten. um ggf ein muster zu erkennen.
  • Das sind die Kandidaten in diesem Zeitraum


    [1342534530.076669] srv-icinga01/Network_Netstat_-_SSH_Connections
    [1342534530.085126] pabx-101/Network_Ping
    [1342534530.181864] srv-vc01/Processes_-Total_Processes
    [1342534561.268499] srv-ns01/Memory_Free


    editiert:
    Files
    • icinga.zip

      (99.03 kB, downloaded 176 times, last: )

    The post was edited 2 times, last by stiftmaster ().

  • ich hab mal ein .zip draus gemacht und angehaengt. ich mag externe urls auf logs nicht, die halten meist nicht ewig...selbiges wuerde ich dir auch fuer die zukunft empfehlen.
    jetzt mal in ruhe lesen.
    Files
    • icinga.zip

      (99.03 kB, downloaded 194 times, last: )
  • stiftmaster wrote:


    Source Code

    1. 2012-07-17 14:59:42 [30179] [0] RRDs::update /usr/local/pnp4nagios/var/perfdata/srv-vc01/Processes_-Total_Processes.rrd 1342529970:62.000000 2012-07-17 14:59:42 [30179] [0] RRDs::update ERROR /usr/local/pnp4nagios/var/perfdata/srv-vc01/Processes_-Total_Processes.rrd: illegal attempt to update using time 1342529970 when last update time is 1342529970 (minimum one second step)



    Wenn sowas passiert würde ich im Log erst mal nach dem Timestamp suchen und analysieren ob die Daten wirklich zweimal reingekommen sind.

    Source Code

    1. grep 1342529970 perfdata.log
    +++ PNP Developer +++ PNP 0.6.25 ist online ! +++
    +++ Threema ID NBDA3UU8 +++
    OMD - Open Monitoring Distribution
  • grep 1342529970 /usr/local/pnp4nagios/var/perfdata.log

    Source Code

    1. 2012-07-17 14:59:42 [30180] [0] RRDs::update ERROR /usr/local/pnp4nagios/var/perfdata/srv-ns01/Memory_Free.rrd: illegal attempt to update using time 1342529950 when last update time is 1342529970 (minimum one second step)
    2. 2012-07-17 14:59:42 [30180] [0] RRDs::update ERROR /usr/local/pnp4nagios/var/perfdata/srv-prn01/CPU_Load.rrd: illegal attempt to update using time 1342529950 when last update time is 1342529970 (minimum one second step)
    3. 2012-07-17 14:59:42 [30179] [0] RRDs::update /usr/local/pnp4nagios/var/perfdata/srv-vc01/Processes_-Total_Processes.rrd 1342529970:62.000000
    4. 2012-07-17 14:59:42 [30179] [0] RRDs::update ERROR /usr/local/pnp4nagios/var/perfdata/srv-vc01/Processes_-Total_Processes.rrd: illegal attempt to update using time 1342529970 when last update time is 1342529970 (minimum one second step)
  • darf ich die service definitionen zu den checks die du da auflistest, mal sehen? irgendwelche addons installiert, die in diesen checkmechanismus eingreifen?
  • Für den srv-icinga01 z.B.
    (also Addons sind keine weiter konfiguriert)

    Source Code

    1. define service{
    2. use 24x7-service-check-one-times-per-minute,pnp-service
    3. host_name srv-icinga01
    4. service_description Network Netstat - SSH Connections
    5. check_command check_local_netstat!22!4!6!22
    6. }


    Source Code

    1. define service{
    2. name pnp-service
    3. register 0
    4. action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$' class='tips' rel='/pnp4nagios/index.php/popup?host=$HOSTNAME$&srv=$SERVICEDESC$
    5. }


    Source Code

    1. define service{
    2. name 24x7-service-check-one-times-per-minute
    3. use generic-service
    4. max_check_attempts 3
    5. normal_check_interva 1
    6. retry_check_interval 2
    7. register 0
    8. }


    Source Code

    1. define service{
    2. name generic-service
    3. active_checks_enabled 1
    4. passive_checks_enabled 1
    5. parallelize_check 1
    6. obsess_over_service 1
    7. check_freshness 0
    8. notifications_enabled 1
    9. event_handler_enabled 1
    10. flap_detection_enabled 1
    11. failure_prediction_enabled 1
    12. retain_status_information 1
    13. retain_nonstatus_information 1
    14. is_volatile 0
    15. check_period 24x7
    16. max_check_attempts 3
    17. normal_check_interval 5
    18. retry_check_interval 2
    19. contact_groups admins
    20. notification_options w,c,r,s
    21. notification_interval 1440
    22. notification_period 24x7
    23. process_perf_data 1
    24. register 0
    25. }
    Show All
  • stiftmaster wrote:


    define service{
    name 24x7-service-check-one-times-per-minute
    use generic-service
    max_check_attempts 3
    normal_check_interva 1
    retry_check_interval 2
    register 0
    }


    is das ein copypaste fehler, oder tatsaechlich so?

    andere frage - wieso checkt man einen service jede minute, um dann im fehlerfall alle 2 minuten zu rechecken?
    ansonsten hat sich mein verdacht, es hier mit passiven checks only zu tun zu haben, wohl in luft aufgeloest.

    zum problem an sich - ich sehe um 1342534550 massig checkresults, die eingelesen werden. dazu wuerde mich die icinga.cfg interessieren, insbesondere das reaping interval.

    Source Code

    1. # egrep -v "^#|^$" icinga.cfg


    und gemessen an der ausfuehrung, und der relativ geringen latenz - der check wird scheduled und ist 5sec spaeter schon wieder in der queue. ich wuerde mal annehmen, dass das reaping interval bei etwa 5sec liegt.

    Source Code

    1. [1342534555.318764] [024.1] [pid=30139] Run a few checks before executing a service check for 'Network Netstat - SSH Connections'.
    2. [1342534555.318797] [016.0] [pid=30139] Attempting to run scheduled check of service 'Network Netstat - SSH Connections' on host 'srv-san12': check options=0, latency=0.318000
    3. [1342534555.318819] [016.0] [pid=30139] Checking service 'Network Netstat - SSH Connections' on host 'srv-san12'...
    4. [1342534555.318883] [016.1] [pid=30139] Check result output will be written to '/tmp/checkNmVFHY' (fd=13)
    5. [1342534560.105679] [016.1] [pid=30139] Processing check result file: '/usr/local/icinga/var/spool/checkresults/cHCx47l'
    6. [1342534560.119571] [016.0] [pid=30139] ** Handling check result for service 'Network Netstat - SSH Connections' on host 'srv-san12'...
    7. [1342534560.119577] [016.1] [pid=30139] HOST: srv-san12, SERVICE: Network Netstat - SSH Connections, CHECK TYPE: Active, OPTIONS: 0, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 0, OUTPUT: OK - tcp22_in is 0 | tcp22_in=0\n
    8. [1342534560.119590] [016.1] [pid=30139] Service is OK.
    9. [1342534560.119594] [016.1] [pid=30139] Service did not change state.
    10. [1342534560.119602] [016.1] [pid=30139] Rescheduling next check of service at Tue Jul 17 16:16:55 2012
    11. [1342534560.119610] [016.0] [pid=30139] Scheduling a non-forced, active check of service 'Network Netstat - SSH Connections' on host 'srv-san12' @ Tue Jul 17 16:16:55 2012
    12. [1342534560.119675] [016.1] [pid=30139] Checking service 'Network Netstat - SSH Connections' on host 'srv-san12' for flapping...
    13. [1342534560.119681] [016.1] [pid=30139] Service is not flapping (0.00% state change).
    14. [1342534560.119685] [016.1] [pid=30139] Checking host 'srv-san12' for flapping...
    15. [1342534560.119689] [016.1] [pid=30139] Host is not flapping (0.00% state change).
    16. [1342534560.119718] [016.1] [pid=30139] Deleted check result file '/usr/local/icinga/var/spool/checkresults/cHCx47l'
    Show All


    sieht fuer mich also sehr normal aus - den service gibts natuerlich massig, weil er vielen verschiedenen hosts zugeordnet ist, jede relation fuer sich unique.
  • @Michael

    lass uns erst mal auf PNP Seite analysieren.

    @stiftmaster

    Ich finde es komisch das rrdtool anmäckert das es bereits einen eintrob für einen gewissen Timestamp gibt, dieser aber nicht zwei mal im log auftaucht.
    Weiterhin ist es komisch das es im Log Einträge zur gleichen Zeit von zwei process_perfdata.pl Prozessen gibt.

    Daraufhin solltest du deine PNP Logs mal genauer untersuchen.
    +++ PNP Developer +++ PNP 0.6.25 ist online ! +++
    +++ Threema ID NBDA3UU8 +++
    OMD - Open Monitoring Distribution
  • Anmerkung: Hier ist nicht überall exakt derselbe Zeitstempel unterwegs:

    stiftmaster wrote:

    14:59:42 [30180] using time 1342529950 when last update time is 1342529970
    14:59:42 [30180] using time 1342529950 when last update time is 1342529970
    14:59:42 [30179] [0] RRDs::update 1342529970:62.000000
    14:59:42 [30179] using time 1342529970 when last update time is 1342529970
    (14:59:42 = 1342529982)

    Also ein Geschehen, das über mehr als 30 Sekunden verteilt ist. Wenn da wirklich Intervalle von 5s u.ä. eingestellt sind, klemmt platten-/filesystemseitig 'was.
  • Source Code

    1. [07-17-2012 14:59:42] NPCD: A thread was started on thread_counter = 3
    2. [07-17-2012 14:59:42] NPCD: Have to wait: Filecounter = 4 - thread_counter = 4
    3. [07-17-2012 14:59:42] NPCD: Processing file host-perfdata.1342529980 with ID 140098671748864 - going to exec /usr/local/pnp4nagios/libexec/process_perfdata.pl -n -b /usr/local/pnp4nagios/var/spool//host-perfdata.1342529980
    4. [07-17-2012 14:59:42] NPCD: Processing file 'host-perfdata.1342529980'
    5. [07-17-2012 14:59:42] NPCD: Processing file service-perfdata.1342529980 with ID 140098652776192 - going to exec /usr/local/pnp4nagios/libexec/process_perfdata.pl -n -b /usr/local/pnp4nagios/var/spool//service-perfdata.1342529980
    6. [07-17-2012 14:59:42] NPCD: Processing file 'service-perfdata.1342529980'
    7. [07-17-2012 14:59:42] NPCD: Processing file service-perfdata.1342529965 with ID 140098661168896 - going to exec /usr/local/pnp4nagios/libexec/process_perfdata.pl -n -b /usr/local/pnp4nagios/var/spool//service-perfdata.1342529965
    8. [07-17-2012 14:59:42] NPCD: Processing file 'service-perfdata.1342529965'


    [07-17-2012 14:59:42] NPCD: Have to wait: Filecounter = 4 - thread_counter = 4

    IMHO musst du mal den Thread count hochdrehen.
    ~~ Never touch a running system ~~
    ~~ Never run a touchy system ~~
  • Wir reden aber andeuernd vom perfdata.log das von process_perfdata.pl erzeugt wird.

    Zum Hntergrund:

    rrdtool kann nur neue Werte an ein RRD File anhängen.

    using time 1342529950 when last update time is 1342529970


    hier sollen also werte für 1342529950 geschriebn werden, es sind aber schon werte für 1342529970 geschriebn worden.

    Daher auch Jochens und Stefans Vermutung das process_perfdata.pl auf deinem System etwas aus dem Tritt kommt.
    Möglichwerweise zu hoher IO Wait oder schlicht schlechtes Timing
    +++ PNP Developer +++ PNP 0.6.25 ist online ! +++
    +++ Threema ID NBDA3UU8 +++
    OMD - Open Monitoring Distribution