Check_oracle_health sigfault exit code 128 despite returning values


(RunningWithScissors) #1

I have installed check_oracle_health version 3.1.2.2 without issue from labs.console.de. I see there are already checks defined for that plugin in /usr/share/icinga2/include/plugins-contrib.d/databases.conf (Thank you!)

I have created a check to be run via the icinga2 agent on the db server. When I run the check via the CLI as user icinga, it returns without any issue.

When I run the check from icinga2, it returns the result but also as unknown status.

For example: --mode=connect-time

returns OK - rman had 0 problems during the last 3 days

<Terminated by signal 11 (Segmentation fault).>

In the log I see this: [2018-12-17 16:33:29 -0500] warning/PluginCheckTask: Check command for object ‘dbserver.domain.com’ (PID: 22122, arguments: ‘/usr/lib64/nagios/plugins/check_oracle_health’ ‘–connect’ ‘dbname’ ‘–mode’ ‘connection-time’ ‘–password’ ‘dbpasswd’ ‘–username’ ‘nagios’) terminated with exit code 128, output: OK - 0.08 seconds to connect as NAGIOS | connection_time=0.0833;1;5

Anyone know why this would be happening? I used this plug in without issue for years with icinga1 and I’m trying to migrate to icinga2.


Check_oracle_health: Terminated by Signal 11
#2

As described in this thread and elsewhere the environment of monitoring user and monitoring process is probably different leading to results you got.
I’d guess that the environment variables regarding Oracle are somewhat incomplete so binaries can’t be found / connection information isn’t available / platform differences and similar reasons.


(RunningWithScissors) #3

Thanks for the reply. I did find that originally that was the cause for my checks not running. I didn’t know that icinga had the check_oracle commandchecks already defined and I created my own. Upon troubleshooting I discovered that I was duplicating efforts. I verified I was using the latest version on the github repository with the last update being one year ago.

I used a check to call printenv until I got the desired service config using the icinga2 daemon. I added environmental variables to the systemd startup scripts. The script executes and returns the correct values despite going to unknown and causing sigfault. I can see the correct output in the icinga2 web interface. Same output I get on command line but it adds the sigfault error.

I’ll add some more output and logs tomorrow when I’m back on site.


#4

The Perl documentation says in the section perl_var:

…and $? & 128 reports whether there was a core dump.

… which is the reason why you don’t get an expected plugin return code.


(RunningWithScissors) #6

I reinstalled the check_oracle_health plugins and got the same errors.

I added all variables to the service in /etc/sysconfig/icinga2 and now it’s working. Maybe it was missing the perl variables. Thanks for the lead!

DAEMON=/usr/sbin/icinga2
ICINGA2_CONFIG_FILE=/etc/icinga2/icinga2.conf
ICINGA2_INIT_RUN_DIR=/run/icinga2
ICINGA2_PID_FILE=/run/icinga2/icinga2.pid
ICINGA2_LOG_DIR=/var/log/icinga2
ICINGA2_ERROR_LOG=/var/log/icinga2/error.log
ICINGA2_STARTUP_LOG=/var/log/icinga2/startup.log
ICINGA2_LOG=/var/log/icinga2/icinga2.log
ICINGA2_CACHE_DIR=/var/cache/icinga2
ICINGA2_USER=icinga
ICINGA2_GROUP=icinga
ICINGA2_COMMAND_GROUP=icingacmd
PERL_MM_OPT=/home/icinga/perl5
PERL5LIB=/home/icinga/perl5/lib/perl5:
PERL_MB_OPT=/home/icinga/perl5
PERL_LOCAL_LIB_ROOT=:/home/icinga/perl5
TMP=/tmp
TMPDIR=$TMP
ORACLE_HOSTNAME=dbserver.domain.com
ORACLE_UNQNAME=dbname
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/oracle/product/12.2.0.1/db_1
ORACLE_SID=dbname
PATH=/u01/app/oracle/product/12.2.0.1/db_1/bin:/u01/app/oracle/product/12.2.0.1/db_1/OPatch:/usr/sbin:/usr/local/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/var/spool/icinga2/.local/bin:/var/spool/icinga2/bin
LD_LIBRARY_PATH=/u01/app/oracle/product/12.2.0.1/db_1/lib:/lib:/usr/lib
CLASSPATH=/u01/app/oracle/product/12.2.0.1/db_1/jlib:/u01/app/oracle/product/12.2.0.1/db_1/rdbms/jlib


#7

Commenting out the environment variables per category (TMP, Perl, Icinga) and rerunning the check you’ll isolate the problem instead of bloating your configuration.