Some service alerts not sending alerts

This forum was archived to /woltlab and is now in read-only mode.
  • Hi,


    I have a strange issue where some service alerts are not generating alerts but some do. Working is hosts and swap and not working is disk.


    My disk service command is:


    Code
    1. apply Service for (disk => config in host.vars.disks) {
    2.   import "generic-service"
    3.   check_command = "disk"
    4.   assign where host.vars.os == "Linux"
    5.   assign where host.vars.os == "Centos"
    6.   command_endpoint = host.vars.client_endpoint
    7.   vars += config
    8.   vars.disk_wfree = "10%"
    9.   vars.disk_cfree = "5%"
    10. }



    which I know is ugly but I have not been able to get other assign where functions to work. X/


    Does anyone have any leads on how I should go about resolving this issue?


    Thanks!

  • I noticed you have this line in your service definition:


    command_endpoint = host.vars.client_endpoint


    Do you have any hosts defined with vars.client_endpoint? If so, what is it pointing to?

  • I noticed you have this line in your service definition:


    command_endpoint = host.vars.client_endpoint


    Do you have any hosts defined with vars.client_endpoint? If so, what is it pointing to?


    Yes I do, I think it is needed for the http check. It points to the hostname. Note I removed this line for testing now.

  • More checks: all the manual mail sending tests work. Host checks work. Do apply service for rules require an entry with:


    Code
    1. vars.notification["mail"] = {
    2. groups = [ "icingaadmins" ]
    3. }


    I thought the answer was no but I am unsure now.

  • Quote

    Do apply service for rules require an entry

    Hmm, no, that should be entered on the host object that you want mail notifications for.


    For your command_endpoint part, that is for distributed monitoring. Do you have multiple icinga2 servers running in a master-client configuration? If not, then command_endpoint is unnecessary.


    You also don't need your assign where statements, because the for loop in your apply Service for checks for hosts with the vars.disks definition. By default, your master node will have this configured like this in it's host definition:


    Code
    1. ...
    2. /* Define disks and attributes for service apply rules in `services.conf`. */
    3. vars.disks["disk"] = {
    4. /* No parameters. */
    5. }
    6. vars.disks["disk /"] = {
    7. disk_partition = "/"
    8. }
    9. ...


    Look familiar? Read this excerpt from the docs on how apply Service for works:


    Using apply Service for omits the service name, it will take the key stored in the disk variable in key => config as new service object name.

    The for keyword expects a loop definition, for example key => value in dictionary as known from Perl and other scripting languages.

    Once defined like this, the apply rule defined below will do the following:

    • only match hosts with host.vars.disks defined through the assign where condition
    • loop through all entries in the host.vars.disks dictionary. That’s disk and disk /as keys.
    • call apply on each, and set the service object name from the provided key
    • inside apply, the generic-service template is imported
    • defining the disk check command requiring command arguments like disk_partition
    • adding the config dictionary items to vars. Simply said, there's now vars.disk_partition defined for the generated service


    This all boils down to 1 question: what do you want to achieve? From your initial question, all I know is that your disk check is not generating alerts. For me, I'm guessing you might want to do disk checks for all of your hosts. Form your questions into goals, and we can take it from there. I think the best course of action is to read up on the basics of Icinga2 first, the docs are an incredibly useful resource. Read this chapter and try to understand as much as you can: https://www.icinga.com/docs/ic…doc/03-monitoring-basics/ then go on to this chapter https://www.icinga.com/docs/ic…/04-configuring-icinga-2/

  • Thanks watermelon. I just want to receive email alerts on services when thresholds are triggered. From my first post the service check


    Code
    1. apply Service for (disk => config in host.vars.disks) {
    2.   import "generic-service"
    3.   check_command = "disk"
    4.   assign where host.vars.os == "Linux"
    5.   assign where host.vars.os == "Centos"
    6.   vars += config
    7.   vars.disk_wfree = "10%"
    8.   vars.disk_cfree = "5%"
    9. }


    does not issue an alert when the disk is full.

  • Can you show what it looks like for you? Have you defined any hosts with vars.os = "Linux" and vars.os = "Centos", as well as vars.disks? If not, then your disk check will not apply to any hosts.


    I just tested it and it works for me.

  • My disk service command is:

    As a help of debugging:

    Apply rules create objects.

    In the above case, run

    Code
    1. icinga2 object list --type service
    2. icinga2 object list --type notification

    and carefully verify that all objects really have been created.

    For the notifications, check the types and states properties - these tell for which combination a notification is send.

    https://www.icinga.com/docs/ic…filters-by-state-and-type

  • Thanks a lot guys, very good feedback. Everything looks good with the commands you suggested I look at:


    icinga2 object list --type notification |grep disk

    Code
    1. Object 'staging!disk /!mail-icingaadmin' of type 'Notification':
    2.   * __name = "staging!disk /!mail-icingaadmin"
    3.   * service_name = "disk /"

    for example. I'm going to continue looking with the mail setup. I was intending to use a google apps service account to forward emails to a google group that gets sent to admins but it looks like the google apps email account only receives the host related email notifications for some reason. I'll be back with an update later.

  • When running: icinga2 object list --type notification |grep -n5 service_name


    Noting

    Code
    1. service_name

    is blank for all definitions:


  • So on the web interface, do you see the disk check for your 'staging' host? And it goes CRITICAL? And you have verified that your email alerts are set up correctly?


    EDIT: Show your host definition. If service_name is blank, that means you're assigning the service (disk) incorrectly. But how did you get the following if you say that service_name is blank for all definitions?


    Quote

    icinga2 object list --type notification |grep disk


    Code

    1. Object 'staging!disk /!mail-icingaadmin' of type 'Notification':
    2. * __name = "staging!disk /!mail-icingaadmin"
    3. * service_name = "disk /"
  • So on the web interface, do you see the disk check for your 'staging' host? And it goes CRITICAL? And you have verified that your email alerts are set up correctly?


    EDIT: Show your host definition. If service_name is blank, that means you're assigning the service (disk) incorrectly. But how did you get the following if you say that service_name is blank for all definitions?


    "So on the web interface, do you see the disk check for your 'staging' host? "

    Yes.

    "And it goes CRITICAL? And you have verified that your email alerts are set up correctly?"


    Yes and it appears to be setup correctly. Host service email alerts do work but Service emails do not (for any service, I checked several and cannot get an email or a notification with a forced notification).


    In answer to your last question by running "icinga2 object list --type notification |grep disk" all values return a blank entry for service_name.

  • Quote

    In answer to your last question by running "icinga2 object list --type notification |grep disk" all values return a blank entry for service_name.


    This doesn't make sense because you show that service_name = "disk /" in your previous post:


    icinga2 object list --type notification |grep disk

    Code
    1. Object 'staging!disk /!mail-icingaadmin' of type 'Notification':
    2.   * __name = "staging!disk /!mail-icingaadmin"
    3.   * service_name = "disk /"


    I get the feeling that you have changed something without telling us. In your initial post, you were saying how SOME service alerts sent email notifications, while others didn't. Now you're saying that NONE of the service alerts sent email notifications.


    Please post the following configs:


    - Host definition for 'staging' (omit information not pertinent to problem)

    - Current Service definition for 'disk' (assuming you've changed it)

    - mail-host-notification.sh and mail-service-notification.sh


    Additionally, have you edited the default notifications.conf at all? It should look like this:




  • Code
    1. apply Service for (disk => config in host.vars.disks) {
    2.   import "generic-service"
    3.   check_command = "disk"
    4.   assign where host.vars.os == "Linux"
    5. assign where host.vars.os == "Centos"
    6.   vars += config
    7.   vars.disk_wfree = "10%"
    8.   vars.disk_cfree = "5%"
    9. }


    cat ../scripts/mail-host-notification.sh


    cat ../scripts/mail-service-notification.sh


    I have only edited notifications.conf to include additional user groups such as


    Code
    1. apply Notification "mail-builderadmin" to Service {
    2.   import "mail-service-notification"
    3.   user_groups = host.vars.notification.mail.groups
    4.   users = host.vars.notification.mail.users
    5.   assign where host.vars.notification.mail
    6. }


    and it contains the matching host definitions.

  • I'm super confused at this point - I exactly replicated your setup minus the mail scripts (assuming your email works because you say it works for host alerts) and I am able to receive alerts.


    My icinga2 object list --type service | grep disk shows this:

    Code
    1. Object 'test!disk /' of type 'Service':
    2. * __name = "test!disk /"
    3. * check_command = "disk"
    4. * display_name = "disk /"
    5. * name = "disk /"
    6. * templates = [ "disk /", "generic-service" ]
    7. * disk_cfree = "90%"
    8. * disk_partitions = "/"
    9. * disk_wfree = "95%"

    (I set the thresholds high so I could generate alerts)


    And here's icinga2 object list --type notification | grep disk

    Code
    1. Object 'test!disk /!mail-icingaadmin' of type 'Notification':
    2. * __name = "test!disk /!mail-icingaadmin"
    3. * service_name = "disk /"

    Have you set up your templates.conf to look something like this (at least for "Problem" type)?


    And commands.conf contains something like this?



    I really don't see how your configs don't work otherwise.

  • Quote

    I solved this by reverting to an earlier vm snapshot, sorry for the hassle

    Hmm, I wonder what would've caused your issue then. Anyways, glad to help if I was of any help.