Issues with Check_mk - rrdcached/perfdata/pnp4nagios.

  • Issue:

    rrdcached/npcd is not removing/flushing the old journalfiles and if left unchecked rrdcached process gets killed by the redhat kernel and OMD site is reported as "Partially started". When this occures either viewing graf data in the GUI takes forever, the sites crashes or graf data is not updated.

    This issue can be keept in check if i manually stop/start the site and remove rrd.journal* files about twice a week in the following path (/opt/omd/sites/master/var/rrdcached)


    It seems that only the sites with larger amount of host/services are the ones with this issues. I am showing logs of a working and a non working server

    Server master - Check_mk RAW 1.2.8p18 (Enviroment Production) - Not working

    Server gmon - Check_mk RAW 1.2.8p18 (Enviroment Test) - Working


    We have been troubleshooting this issue for quite a while so help is greatly apprecited!

  • Seems to have been resolved by lowering the following values for rrdcached:

    Previous values - /opt/omd/sites/master/etc/rrdcached.conf

    1. TIMEOUT=3600
    2. RANDOM_DELAY=1800
    3. FLUSH_TIMEOUT=7200

    New values - /opt/omd/sites/master/etc/rrdcached.conf

    1. TIMEOUT=600
    2. RANDOM_DELAY=300
    3. FLUSH_TIMEOUT=1200