Check_procs alert if service is not running

Hello,

I’m sure this is very simple… but I can’t seem to find what I’m looking for. I really have 2 problems.

  1. My check always returns at least 2 rows when I run it because the check itself triggers one and then the service triggers one

  2. I can’t seem to figure out how vars.procs_warning and vars.procs_critical actually work under the hood. My intention is actually binary… if the service is running, we should be good… and if the service is not running, Icinga2 should alert. I’ve been messing around trying to figure out the right formula for a day and a half and finally decided it’s just best to ask :slight_smile:

Here is my check…

apply Service "kibana" {
  import "generic-service"

  check_command = "procs"
  vars.procs_warning = "2:2"
  vars.procs_critical = "2:2"
  vars.procs_argument = "/usr/share/kibana"  
  vars.command = "kibana"
  vars.procs_traditional = true
  //vars.procs_user = "kibana"
  
  assign where host.name == "elastic01"
}

I was under the impression that the “vars.procs_traditional” was intended to exclude my own process from the count, but that doesn’t seem to work. Also, if I uncomment the /vars.procs_user line, I get an unknown error in Icingaweb2 that indicates it can’t find the kibana user, but the proc is definitely running under that user.

Any help is tremendously helpful… I’m pretty new to Icinga2, and have read the docs, but still pretty stuck. Thx

Hello
The thresholds are “ranges” , which means you need to specify the range in which the login is applied, in your case the critical should be “0:2”.
This will tell the check that is there are 0 process OR 2 process of that command “kibana” it should report as critical.
and the warning is usually best ( if you need only 1) as “1:1”

it is best explained in the help of the plugin itself

check_procs -w 2:2 -c 2:1024 -C portsentry
Warning if not two processes with command name portsentry.
Critical if < 2 or > 1024 processes

Hey Aflatto,

Thanks so much for the reply… I think I’m a little bit closer now. I’m using

'/usr/lib/nagios/plugins/check_procs' '-a' '/usr/share/kibana' '-c' '0:2' '-w' '1;1'

as my check, which I think makes sense, but I’m not getting the result I’d expect. Here is my ps -aux on that server

root@elastic01:~# ps -aux | grep '/usr/share/kibana'
kibana    28193  9.7  2.6 1779036 431208 ?      Ssl  17:16   0:26 /usr/share/kibana/bin/../node/bin/node /usr/share/kibana/bin/../src/cli -c /etc/kibana/kibana.yml
root      28342  0.0  0.0   6428   924 pts/0    S+   17:21   0:00 grep --color=auto /usr/share/kibana
root@elastic01:~# 

Which shows only 1 running service using the ‘/usr/share/kibana’ string… so, when I run the check manually I get the following output:

root@elastic01:~# '/usr/lib/nagios/plugins/check_procs' '-a' '/usr/share/kibana' '-c' '0:2' '-w' '1;1'
PROCS OK: 1 process with args '/usr/share/kibana' | procs=1;1;1;0:2;0;

as expected… HOWEVER If I then stop the process… and run the command again:

root@elastic01:~# systemctl stop kibana
root@elastic01:~# '/usr/lib/nagios/plugins/check_procs' '-a' '/usr/share/kibana' '-c' '0:2' '-w' '1;1'
PROCS OK: 0 processes with args '/usr/share/kibana' | procs=0;1;1;0:2;0;

The check still returns OK ??

root@elastic01:~# ps -aux | grep '/usr/share/kibana'
root      28405  0.0  0.0   6428   924 pts/0    S+   17:24   0:00 grep --color=auto /usr/share/kibana

So, I’m not 100% sure what’s going on… also, I’m not sure if this is the culprit or not, but when I run the command on the remote server I get the following output

root@elastic01:/# '/usr/lib/nagios/plugins/check_procs' '-T' '-a' '/usr/share/kibana' '-c' '0:2' '-w' '0:255'
PROCS OK: 1 process with args '/usr/share/kibana' | procs=1;0:255;0:2;0;

However, in the Icingaweb2 interface for that same command it shows the following

image

I’ve eliminated everything I can think of to reduce any confusion I might be having… this is my only host, my only service check, and my only Icinga instance in my lab…

Remember you are dealing with ranges, so if the warning range is 0-255 , the 1 proc will always return ‘OK’.
secondly the ranges need to be separated with ’ : ’ not ’ ; "

root@elastic01:~# systemctl stop kibana
root@elastic01:~# ‘/usr/lib/nagios/plugins/check_procs’ ‘-a’ ‘/usr/share/kibana’ ‘-c’ ‘0:2’ ‘-w’ '1;1’
PROCS OK: 0 processes with args ‘/usr/share/kibana’ | procs=0;1;1;0:2;0;

If you fix that I believe you will have the result you are looking for.

1 Like

Thanks again Aflatto,

I did catch the ; and changed it… and as far as the 1-255 goes… that was just me messing around to see if I could get it to work… then I took a screenshot at the wrong time. Here are my current configurations

template Host "generic-host" {
  max_check_attempts = 3
  check_interval = 1m
  retry_interval = 30s
  
  check_command = "hostalive"

  vars.notification["mail"] = {
    groups = [ "icingaadmins" ]
  }
}

template Service "generic-service" {
  max_check_attempts = 5
  check_interval = 1m
  retry_interval = 30s
}

object Host "elastic01.fqdn.domain.net" {
  import "generic-host"
  address = "10.0.0.5"
  vars.os = "Ubuntu"
  vars.os_type = "Linux" 

  vars.client_endpoint = name
}

apply Service "kibana" {
  import "generic-service"
  
  check_command = "procs"
  vars.procs_warning = "1:1"
  vars.procs_critical = "0:2"
  vars.procs_argument = "/usr/share/kibana"
  host_name = "elastic01.fqdn.domain.net"  
 
  assign where host.name == "elastic01.fqdn.domain.net"
}

Which seems to yield the following check command line:

nagios@elastic01:/root$ '/usr/lib/nagios/plugins/check_procs' '-a' '/usr/share/kibana' '-c' '0:2' '-w' '1:1'
PROCS OK: 1 process with args '/usr/share/kibana' | procs=1;1:1;0:2;0;

However, when Icinga2 runs it… it always returns 0, which SHOULD be a CRITICAL not a WARNING.