Need assistance after debugging NPCD + PNP + Icinga2 issue

This forum was archived to /woltlab and is now in read-only mode.
  • Hello. :) Let me describe my saga to you, as I still have unprocessed data that I'd like to process.


    I'm using icinga2 2.4.4 with PNP 0.6.25 on FreeBSD 10.2, all installed out of ports. I've been working with this system for a week, so please excuse me if I do not know intimate details of these systems. I got this all set up and working, but sometime last night random graphs stopped working. After figuring out that NPCD is hardcoded to syslog facility local0, I was finally able to get the log entries involved:


    Code
    1. [08-09-2016 11:52:03] NPCD: ERROR: Executed command exits with return code '13'
    2. [08-09-2016 11:52:03] NPCD: ERROR: Command line was '/usr/local/libexec/process_perfdata.pl -n --bulk /var/spool/icinga2/perfdata/service-perfdata.1470768711'


    Googling revealed the existence of a verify script. However, the verify script does not support icinga2. So I dug deeper.


    After checking your github and realizing that NPCD does not save stdout or stderr anywhere (which would be really really useful, but you are using popen()/pclose()) I modified process_perfdata.pl like this:


    Perl
    1. open(STDOUT,">/tmp/perf.$$.".time().".dump") || die "Can't open perf file: $!\n";
    2. open(STDERR, ">&STDOUT") || die "Can't redirect STDERR: $!\n";


    This revealed the issue. This perl script was trying to write to /var/log/perfdata.log but it's running as user icinga which does not have permissions to write to the log directory. I fixed that by giving pnp it's own log directory. So now I have tons (around 2000) of these files:


    Code
    1. 08-09-2016 12:47:03] NPCD: File 'service-perfdata.1470751938-PID-91601' is an already in process PNP file. Leaving it untouched.
    2. [08-09-2016 12:47:03] NPCD: ThreadCounter 1/5 File is service-perfdata.1470751968-PID-91632

    This finally brings me to my question.


    How do I get these back in the PNP system so their data can be graphed?


    Thanks in advance for any cogent replies. :)

  • Icinga 2 will write the files with its daemon user (obviously icinga, old-fashioned Debian packages still use nagios). I guess NPCD is configured running as a different user and might not be able to read/remove those files. The npcd.cfg as well as your startup script should allow you to define the daemon user. Change that to the same user icinga2 is running as.

  • Both daemons are already running as the same user:



    Code
    1. icinga 25505 0.3 0.0 23560 2800 - S 11:39AM 3:47.73 /usr/local/bin/npcd -d -f /usr/local/etc/pnp/npcd.cfg
    2. icinga 22396 0.0 0.7 214668 52232 - Ss 11:17AM 1:33.90 /usr/local/lib/icinga2/sbin/icinga2 daemon -d -e /var/log/icinga2/error.log -c /usr/local/etc/icinga2/icinga2.conf

    There is a big gap in my data, but everything is graphing normally again. In /var/spool/icinga2/perfdata there are tons of files )1621 to be exact) of the form "service-perfdata.<ctime>-PID-<pid>" where <ctime> is the unix ctime value and pid is a process id.


    Does icinga2 have a knob I can turn to get it to resubmit this data to pnp? Maybe that's the wrong question too?

  • Icinga 2 doesn't store the performance data itself, just pushes them to your configured features.


    Since RRDTool doesn't like it when the heartbeat is not fulfilled it'll be quite of a problem to re-insert data into the round robin database anyways. (that is one of the things why I like Graphite even more these days - it can process data from the past).