Opinion Poll: Twenty Years of Plugin Return Codes

plugins
icinga
nagios-plugins
icinga2
(Brian LaVallee) #1

As we approach ten years of icinga (2009-05-15), which started out as a fork of Nagios. Almost twenty years after the original NetSaint (1999), renamed Nagios in 2002. We have been using the same Plugin Return Codes for the last twenty years.

Numeric Value Service Status Status Description
0 OK The plugin was able to check the service and it appeared to be functioning properly
1 Warning The plugin was able to check the service, but it appeared to be above some “warning” threshold or did not appear to be working properly
2 Critical The plugin detected that either the service was not running or it was above some “critical” threshold
3 Unknown Invalid command line arguments were supplied to the plugin or low-level failures internal to the plugin (such as unable to fork, or open a tcp socket) that prevent it from performing the specified operation. Higher-level errors (such as name resolution errors, socket timeouts, etc) are outside of the control of plugins and should generally NOT be reported as UNKNOWN states.

POSIX return codes support 0 ~ 255, and we only use a few.


Do you think it’s time to support additional status codes?

I’m just interested in the opinion of the community.

  • What’s a Plugin?
  • It’s long overdue
  • I could use additional status codes
  • If it ain’t broke, don’t fix it
  • Who’s going to change all of the Plugins?
  • I don’t have an opinion
  • Other (reply to topic)

0 voters

For example: I use check_users to see the number of users logged a system. Warning and Critical could be considered too severe when a single user is logged in. Having Information or Minor status codes would be nice to have.

(Matthias) #2

How would you incorporate additional exit codes into Icinga 2 though?
Or are you proposing to add additional states to OK/Warn/Crit/Unknown?

(Rafael Voss) #3

Maybe an option to define custom return code interpretation for yourself.
I would like a information color for myself. F.e. for maintenance things, like there are new updates, but its just the info about it, or firmware updates available. Logged on Users as info would be nice to, oo you can see that someone is working on the server, but its not a warning or a critical.

Existing plugins don’t need to be changes for this.

#4

Having an additional code for objects being in maintenance/downtime would sometimes clarify the “actual” state as they aren’t operational (UP/OK) nor faulty (DOWN/UNREACHABLE/CRITICAL), so at least for hosts there isn’t a “correct” state at all and WARNING doesn’t seem to fit either.

(Brian LaVallee) #5

I’m not proposing anything at this point. Just broaching the subject.

While non-trivial icinga2 is the easy part, assuming it reads the full byte 00 ~ FF. But there’s a whole ecosystem of modules, plugins, and even other monitoring software that could be affected.

Development of a standard would be the first step. There are also a handful of POSIX exit codes with special meanings to avoid.


I like this idea, could look something like this:

object State 2 {
  display_name = "Godzilla" // Override the Default 'Critical'
}
object State 100 {
  display_name = "Gamera"
  CheckCommands = [ "ping", "ping4", "ping6" ] // Limit to specific check commands.
  color = "#8ACEDB" // colors are NOT handled by icinga2 / icingaweb2 assigns the colors

1 Like
(Matthias) #6

I like the proposal for custom return code interpretation.
Maybe there could be a range of return codes which are considered OK by monitoring tools like Icinga 2 but stored in the object as a variable.
That way the notification engine would not be affected but we could interpret custom status codes with special Icingaweb 2 themes and use them when retrieving objects via the API.

(Bård Dahlmo Lerbæk) #7

Hosts can be UP or DOWN, they could also need a UNKNOWN status.

1 Like
#8

What other statuses are there? A service is either Ok, kinda not okay, really not okay, or we don’t know. What else do you need?

Warning and Critical could be considered too severe when a single user is logged in.

Okay, so set the warn threshold > 1?

Not really trying to be critical (pun not intended), just trying to understand the use case.

(Brian LaVallee) #9

My use case might be flimsy, it was just an general question. The plugin documentation shows support for 255 return codes and only a few have been used for the last 20 years.

#10

It’s a fair question, for sure :slight_smile:

I’m having trouble imagining uses for the other 251 possible codes as far as directly displaying information to the user in the front end, but I could see there being some use for eg. Notification/EventCommands. So lets say 3 is the standard Critical, and it triggers the normal NotificationCommand and EventCommand as always. But maybe the plugin returns a 13, a different Critical code, which triggers a different Notification or EventCommand. So maybe instead of pinging the L1 on-call, it directly pings the L2 supervisor. Or, rather than run a traceroute to the host, the EC sends a command via SSH to restart the webserver.

I think this is a pretty flimsy idea, too, to be honest. That kind of conditional logic could be written directly into the scripts the *Command objects call. But, if it leads to some productive spit-balling, who knows? Maybe someone more creative can come up with something worth the effort of integrating.

(Aflatto) #11

I agree that the fact that there are 255 is nice, but when the original plugins were written, the main drive in the minimal number of return codes was KIS(S), and it stood the test of time and became wide spread mainly becasue it did just what it wanted to do, provide a simple way for people to write new plugins that will compatible with any other system that uses them.
We are talking Icinga and Nagios here, but let me remind you that there are plenty of other monitoring systems that use the same principal: Sensu, Thruk,Shinken & OpenNMS to name a few ( and most of them also use the same core plugins).

To add new return code you will need to ensure that all these products support the same codes, other wise it becomes an “Icinga /Nagios” only plugin which is counter to what the whole idea is.

So as much as I like the idea of another return code that reports “Frak if I know”, or “CKO Error detected” , I find that there really isn’t a need for expanding the return codes beyond these 4 simple ones.

1 Like