Dynamically format check_cluster Service Attributes

I have a use case where I’d like to use check_cluster to evaluate all service checks within a given host.

The idea is to first set the $host.name$ macro to a variable.

Then use the get_services function, or even get_objects, to pull all services for that host into an array.

Next, I’d like to loop through the array of services and reformat it to be compatible for check_cluster - so in the format of “$host.name$!service1”, “$host.name$!service2” and so on.

Finally, return the array as a comma-separated variable.

I’ve looked over the documentation and provided examples countless times, and have re-wrote the service to try a different approach almost as much. So, any insight would be very helpful.

Here is my code as it sits right now. Also, I’m unsure how to test that my functions are working outside of the console. Attempts to use the log function do not seem to push messages anywhere regardless of what facility I set it it use. When the validation passes, the variables just show up as “Object as type ‘Function’”.

apply Service "overall" {
        check_command = "check_cluster"
        max_check_attempts = 1

        vars.check_cluster_service = true
        vars.check_cluster_warning = 1
        vars.check_cluster_critical = 2

        vars.check_cluster_objects = {{
                # Set node
                var nodeid = macro("$host.name$")

                # Get Services for the Host (hoping the node will substitute as $host.name$ macro)
#                var host_services = {{
#                        get_objects(Service).filter(s => match("*nodeid*", s.host_name)).map(s => s.name)
#                }}

                var host_services = {{
                        get_services("$host.name$").map(s => s.name)
                }}

                        ##var host_services = macro("$service_tags$")


                # Declare new and empty array to house a combination of nodeid and host_services
                var result = []

                # Format services pulled to prefix with '$host.name$!'
                #for (item in host_services) {
                #  result.add("nodeid!item")
                #}

                for (item in host_services) {
                        var bind = [ "$host.name$" + "!" + item ]
                        result.add(bind)
                }

                # Now we should have and array of ["node!service1" "node!service2" "node!service3"]
                return result.join(",")
        }}
assign where host.address
}

So the final idea here would be to have check_cluster_objects be formatted as the combination mentioned above, where it will then use the check_cluster command I have defined in commands.conf.

Note that I have the $host.name$ macro in some of those loops, I’ve also tried to use the variable set in the first line of the function.

Idk if I’m doing something wrong with my posts, but they never seem to get responded to. Anyways, I figured this one out.

Using some of the logic from the docs, I managed to create an overall check that evaluates all of the service checks. I simplified the eval logic for this post, but you can change the it to consider situations where 3 different services result in a critical, for example. Or really any combination. The purpose of this was to provide an absolute top level view for people who are only interested in the overall state of any given host or hostgroup (since hostgroups list can be filtered by servicegroup).

Anyways, if anyone finds this useful, feel free to improve. I’m more than certain there is room for it.

apply Service "overall" {
        check_command = "dummy"
        check_interval = 10m
        max_check_attempts = 2
        retry_interval = 1m
        enable_notifications = false
        vars.dummy_state = {{

                // Pull variables
                var myhost = macro("$host.name$")       // Hostname pulled from macro function
                var services = macro("$serviceobj$")    // Pulled from host object
                var myservices = get_services(get_host(myhost)).map(s => s.name)        // All services - Pulled from function
                //var servicestates = get_services(get_host(myhost)).map(s => s.state)    // All service states - Pulled from function

                // Set some control parameters
                var mywarning = 1
                var mycritical = 2
                var myunknown = 3

                // More control parameters
                var up_count = 0
                var down_count = 0
                var warn_count = 0
                var crit_count = 0
                var unknown_count = 0


                for (service in myservices) {

                        // Skip overall and node_online checks
                        if (service == "overall"){
                                continue                                // Skip loop evaluation
                        } else {
                                // Get the service state. 
                                // Had to hack this one together after numerous attempts otherwise. 
                                mystring += myhost + "!" + service
                                servicestate = get_object(Service, mystring).state

                                if (servicestate > 0){
                                    down_count += 1

                                        if (servicestate == mywarning){                 // state == 1
                                            warn_count += 1
                                        } else if (servicestate == mycritical){         // state == 2
                                            crit_count += 1
                                        } else if (servicestate == myunknown){          // state == 3
                                            unknown_count +=1
                                        }

                                  } else {
                                      up_count += 1
                                  }

                                // *May need to set mysring and servicestate to NULL here*
                                //servicestate = null
                                //mystring = null
                        }
                }

                // The overall check has been excluded from the count, we can now use real values now
                // Insert your own eval logic here...
                //
                if (down_count >= 1){
                        if (crit_count >= 1){
                          return 2                                      // 1(++) CRITICAL states -> Critical
                        } else if (warn_count >= 1){
                          return 1                                      // 1(++) WARNING states -> Warn/Degraded
                        } else if (unknown_count >= 3){
                          return 1                                      // Multiple UNKNOWN states counted -> Yellow/Degraded
                        }
                } else {
                  return 0
                }

        }}



        vars.dummy_text = {{
                var myhost = macro("$host.name$")
                var services = macro("$serviceobj$")
                var myservices = get_services(get_host(myhost)).map(s => s.name)
                var up_count = 0
                var down_count = 0

                // Set empty array -- May need to use a string instead due to previous array limitations encountered
                var myarray = []

                // Filter for only core services
                output += "Degraded Services:" + "\n"
                for (service in myservices) {
                        if (service == "overall"){
                                continue                                // Skip loop evaluation
                        } else {

                                mystring += myhost + "!" + service
                                servicestate = get_object(Service, mystring).state
                                if (servicestate > 0){
                                        if (servicestate == 2){
                                          down_count += 1               // Move down_count above condition if we want to count unknowns in output
                                          output += "[CRITICAL]" + "\t" + service + "\n"
                                        } else if (servicestate == 1){
                                          down_count += 1
                                          output += "[WARNING]" + "\t" + service + "\n"
                                        } else if (servicestate == 3){
                                          // No up/down changes - state is unknown
                                          output += "[UNKNOWN]" + "\t" + service + "\n"
                                        }
                                } else {
                                        up_count += 1
                                }
                        }

                        servicestate = null
                        mystring = null
                }


                if (down_count == 0){
                        output = null
                        output = "Overall state is [OK]"
                }

                return output
        }}

        //assign where host.name

}

Hi,

most users here are moved to the new platform, check here and here for more infos.

Greetz