Windows PowerShell Checks with Icinga2

powershell

(Gordon) #1

Author: @GordonCole

Revision: v0.1

Tested with:

  • Icinga 2 v2.6.3-1
  • Icinga Web 2 v2.4.1
  • Windows Server 2012 R2

Introduction

A vanilla Windows Icinga2 installation provides access to a number of standard server health and performance metrics. For example hard disk space, CPU, free RAM, or accessing the value of a Windows Performance Counter. Each of these is metrics is measured using a “check”. These are programs called by the main Icinga2 service.

Users may write their own “checks”, as long as they return a result in the expected format (status, performance data, text).

PowerShell has established itself as a powerful way of automating tasks and accessing information on a Windows machine. We can use PowerShell to access server metrics and return data to Icinga2 by writing a suitable PowerShell script.

Configuration

Create a Check Command

Here we define a CheckCommand so that Icinga2 knows the path of the executable to call, in this case the powershell.exe interpreter (a PowerShell session).

This should be defined in commands.conf

object CheckCommand "powershell_check" {
  import "plugin-check-command"
  command = [ "C:\\Windows\\system32\\WindowsPowerShell\\v1.0\\powershell.exe" ]
  arguments = {
    "-command" = {
    value = "$ps_command$"
    order = -1
    }
    "-warn" = {
    value = "$ps_warn$"
    }
    "-crit" = {
    value = "$ps_crit$"
    }
    ";exit" = {
    value = "$$LastExitCode"
    }
  }
}

This will run the 32-bit version of PowerShell. If you want to use the 64-bit version use the following command instead:

  command = [ "C:\\Windows\\sysnative\\WindowsPowerShell\\v1.0\\powershell.exe" ]

Let’s consider the arguments:

  • -command - this contains a variable which will contain the path to our PowerShell script (our check)

  • -warn - this contains a variable which may be set (elsewhere) to contain a warning threshold value (optional - your script needs to be written to be able to accept arguments)

  • -crit - this contains a variable which may be set (elsewhere) to contain a critical threshold value (optional - your script needs to be written to be able to accept arguments)

  • ;exit - we always pass this argument the value “$LastExitCode”. This is not an Icinga2 variable. This argument tells the powershell.exe session to take the exit code generated by your PowerShell script, and use this for the exit code when the session exits. This exit code is important as the plug-in’s service “status” is taken from the exit code (0=OK, 1=Warning, 2=Critical, 3=Unknown). Note also the “;” in the above code. If you omit this, you powershell.exe will aways exit with code “0” and your check wont have the correct status.

  • “\” is used to escape the “\” so we see “\\” in the path name

  • “$” is used to escape the “$” so we see “$$” in the string “$LastExitCode”

If you find the above confusing, just remember that Icinga2 needs to construct a command that you could yourself enter at the command prompt in order to run a PowerShell script. For example:

C:\Windows\system32\WindowsPowerShell\v1.0\powershell.exe -command "&'C:\Program Files\ICINGA2\sbin\check_reboot.ps1' ;exit $LastExitCode

This is very useful as you can manually test your scripts (“plug ins”) from the command line without involving Icinga2. You cannot, however, see the exit code.

The above puts in place the underlying ability to call PowerShell from Icinga2.

Define a Service

We can define an Icinga2 “service” that references a PowerShell scripts or “plug-ins”. It relies on the check command we defined in the previous step.

For example:

apply Service "reboot_status_check" {
  import "generic-service"
  display_name = "Reboot Check"
  check_command = "powershell_check"
  vars.ps_command = "& 'C:\\Program Files\\ICINGA2\\sbin\\check_reboot.ps1'"
  command_endpoint = host.address
  assign where host.vars.os == "windows"
}

In this case we are using the variable vars.os in the host object definition to apply this service to all our “windows” hosts.

Write PowerShell script (Plug In)

We now require a suitable PowerShell script to act as our “plug in”. Note that monitoring plug-ins should adhere to guidelines regarding what you can pass them, and what they should return. This example may not be fully compliant in this regard.

You should consult https://www.monitoring-plugins.org/doc/guidelines.html for detailed information on writing plugins.

	# Checks if RebootRequired key exists, if so returns a warning.
	# This key is deleted upon a successful reboot.
	# This may indicate that Windows patching has taken place, without a reboot.
		 
	# Checks if RebootRequired reg path exists
	$value = test-path -path "HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\WindowsUpdate\Auto Update\RebootRequired"
	 
	# If path does not exist, return OK status
	if ($value -match "False") {
	echo "OK - No reboot required"
	$returnCode=0
	}
	 
	# Else return WARNING status
	else {
	echo "WARNING - Reboot required"
	$returnCode=1
	}
	 
	exit ($returnCode)

To note:

  • return code is passed as the “exit” code. This in turn is passed back to Icinga2. This is the “status” of the service check
  • This is a simple example and does not include any error handling, which you would normally consider
  • We need to save this script on our Windows server in the correct path, as shown in our service definition. This is best placed in the same location as the other Icinga2 Windows checks:

C:\Program Files\ICINGA2\sbin\

The exit codes from the plug-in relate to Icinga2 service status as follows:

  • 0 = OK
  • 1 = Warning
  • 2 = Critical
  • 3 = Unknown

Permissions

It should be noted that the ability to run PowerShell may be restricted by group policy.

You should consider the user that will be used to run these scripts and what rights they will need. For example, an Icinga2 service run as “Local System” will therefore run the script as the user “local system”. This user can check the registry but likely wont have rights over an SQL database, for example. Care must be taken when assigning rights.

Conclusion

Once you can harness PowerShell, your abilities to check Windows server health and vital statistics is limited only by your scripting knowledge. You may, for example:

  • Check registry keys (some applications store their status, for example “primary” or “stand-by” in the registry)
  • Check files are present and updated in particular folders (useful in file processing systems)
  • Parse application log files for error codes and generate Icinga2 warnings

You may be able to use a combination of Icinga2 and PowerShell and remove the requirement for other agents such as NSClient++.

FAQ


Windows agent powershell scripts
(Duffkess) #2

I think there are a view aspects that have been missed here.

My configuration looks like this:

object CheckCommand "check_icinga2_powershell"{
    command = [ "powershell.exe" ]
    arguments = {
	"-command" = { 
		value = "\"try{ $service.vars.powershell.scriptpath$ $service.vars.powershell.args$ } catch {Write-Host $$_.Exception.Message; exit 3} exit($$lastexitcode)\""
		description = "Powershell Wrapping Command"
        }
    }
}

The try/catch block keeps track of if there is anything wrong, like the script could not be found or there is some unhandled exception that was not identified by the script developer before.

My Servicetemplate looks like this:

template Service "service-customer-icinga2-pstemplate" {
    import "service-customer-icinga2-template"
    check_command = "check_icinga2_powershell"
    vars.powershell.scriptpath = "C:\\ProgramData\\icinga2\\var\\lib\\icinga2\\api\\zones\\global_windows_plugins\\_etc\\$service.vars.powershell.script$.ps1"
}

( I deploy my PS scripts with the Icinga2 agent config push… :stuck_out_tongue: )

and every script has its own template definition:

template Service "service-customer-icinga2-hyperv_cpu" {
    import "service-customer-icinga2-pstemplate"
    vars.powershell.script = "check_hyperv_cpu"
    vars.warn = "50"
    vars.crit = "70"	
    vars.powershell.args = "-warn $service.vars.warn$ -crit $service.vars.crit$"
}

So in my actual service configuration I can easly add the service:

object Service "Hyper-V CPU" {
    host_name = "myhostname"
    import "service-customer-icinga2-hyperv_cpu"
}

And optional add other thresholds if needed.

For myself I wrote a small guideline for programming a powershell monitoring script. Of cause, this can be enhanced. So feel free to comment.

These are:

Query Parameters for the script and set default values (always use debug boolean):

param(
    $debug = "false", 
    $instance = "name",
    $warn = 15,
    $crit = 30
)

Set some important variables:

$ErrorActionPreference = "Stop"   # this will let every failure result in an script error if something is not working (wmi querys, api calls whatever)
$global:exitcode = 3 # whatevery happens, the resultcode will be 3 - unknown. The global prefix means it is accessable from functions inside powershell
$output = "" # when testing with ISE its needed to reset the variable
$perfdata = ""  # when testing with ISE its needed to reset the variable

A small debugging function to troubleshoot if something goes wrong somewhere (so debug can be enabled from the monitoring system)

Function Debug($Text){
    if($debug -eq "true"){
        Write-Host $Text
    }
} 
# use like: Debug "debugging info $value"

Dont set exitcode to 0 if there where errors before:
When you loop through some values and need to check if one of the values is critical (multi core cpu, go through each core i.E):

function Set-ExitCode($Code){
    if($Code -gt $global:exitcode -and $code -ne 3){
        $global:exitcode = $Code
        Debug "set exitcode $code"
    }elseif($global:exitcode -eq 3){
        $global:exitcode = $Code
        Debug "set exitcode $code"
    }
} 

So its not possible that a OK value from one CPU Core will override a warning or critical value from a previous core in the foreach loop.

Try/Catch on the target information
Its very important that you do a try/catch block when you use some sort of method that could fail somehow. Most of the Information you like to monitor comes from other sources like wmi, files, folders, logs, etc. This could fail so everytime you need information from somewhere:

try{
    $check_variable = Get-Information -Target $instance
    if($check_variable -eq $null){
        throw "variable is empty"
    } # if "no values" is an error of course
}catch{
    Write-Host "Unknown - Could not get $instance"
    Write-Host $_.Exception.Message
    Exit 3
}

Checking the variable and creating the output of course:

Debug "variable is $check_variable" 
if($check_variable -ge $crit){
    Set-ExitCode -Code 2
    $output += "Critical - $instance is $check_variable"
  Debug "critical, >= $crit " 
}elseif($check_variable -ge $warn){
    Set-ExitCode -Code 1
    $output += "Warning - $instance is $check_variable"
    Debug "warning, >= $warn" 
}else{
    Set-ExitCode -Code 0
        Debug "ok, <= $warn" 
    } 

$perfdata += " '$instance'="+$check_variable+";"+$warn+";"+$crit+";“+$max 

if($global:exitcode -eq 0){
    $output = "OK – working fine"
}
Write-Host ($output+"|"+$perfdata)
Exit $global:exitcode
  • Duffkess

Monitoring Windows clients with custom scripts (NRPE?)
(Gordon) #3

Thanks for your input. If anything is incorrect in the main document, or you want to add in your sections to expand on what I started please do edit it directly. That would be good so users can rely on the main document without reading comments. It was intended as a starting point, to be expanded. Many Thanks!


(Rafael Voss) #4

Thats absolutely correct and a real problem. If you don’t want to change all Your Checks, or for some reason you use a completely different approach there is a quick possibility to detect this checks:

I have a small check against the Icinga database to take a look if there are “broken” checks. This also help when some checks do not catch some errors correctly.

#!/usr/bin/pwsh
#check_powershell_returncodes.ps1 - V0.1

$sqluser="icinga2"
$sqlpassword="SecurePW"
$sqlserver="172.16.0.1"
$sqldb = "icinga2"

$query="SELECT service_object_id, check_source, check_command
FROM $sqldb.icinga_servicestatus
where long_output like '%FullyQualifiedErrorId%'
and status_update_time  BETWEEN DATE_SUB(NOW(), INTERVAL 8 HOUR) AND NOW();"
$cmd = "mysql -h " + $sqlserver + " -u" + $sqluser + " -p" + $sqlpassword + " -D $sqldb -e `"$query`""
$queryresult = invoke-expression $cmd 

if ( $queryresult.count -gt 1 ) {
  write-host "Critical! Some checks have errors, that could return 'okay'"
  $queryresult
  exit 2
} else {
  write-host "OK - No faulty powershell scripts found"
  exit 0
}
exit 3

An even better way would be to use the icinga 2 api for that :slight_smile: