Mail notification problem

notifications
mail
postfix

(Leon) #1

Hello everyone,

I finally got my monitoring on green and wanted to set up the mail notifications. The postfix server is running fine, the master site user is allowed to send mails via our SMTP server, which arrive immediately.

In check_mk I have now set the permission to the Everything group in the main directory, as well as

  • Give these groups also permission on all subfolders,
  • Add these groups as contacts to all hosts in this folder,
  • Add these groups as contacts in all subfolders,
  • Always add host contact groups also to its services.

If I now proceed as described in the documentation and create a fake check result with the status “Critical”, I see under Host Notifications that my user should have received a notification with the command “mail”.

OMD[master]:~$ tail -f var/log/notify.log does not show any information about this notification, exactly like # tail -f /var/log/mail.log

Also in WATO at Notifications it sais that “Currently there are no unsent notification bulks pending.”

I just don’t get this to work.


Email Notification Problem
(Philipp Näther) #2

Have you tried to force a service/host to get warning or critical e.g. Shutdown a host? Because I am currently not sure if the “fake check result” thingy is working correctly since another User here discribed a similar behaviour.


(Leon) #3

Warnings and critical events occur again and again, so the notification should be triggered automatically and fill up my mailbox… but it doesn’t. :tired_face:

Here is a proof that check_mk seems to use “mail” command for sending out notifications:


(Philipp Näther) #4

Switch the notification log level to full dump in the global settings and show the content of the ~/var/log/notify.log here.


(Leon) #5

So, it’s a bit strange.

First of all, I have stored my e-mail address as fallback and checked whether it is also stored in the check_mk user, which was also the case. Then I triggered a fake check result, but here I got a completely different entry about a host in the log, which doesn’t exist anymore in monitoring.

/var/log/notify.log
2018-11-05 12:49:07 ----------------------------------------------------------------------
2018-11-05 12:49:07 Got raw notification (nu-esx-mon01.sig.int;Check_MK) context with 52 variables
2018-11-05 12:49:07 Raw context:
                    CONTACTEMAIL=
                    CONTACTNAME=check-mk-notify
                    CONTACTPAGER=
                    DATE=2018-10-31
                    HOSTACKAUTHOR=
                    HOSTACKCOMMENT=
                    HOSTADDRESS=x.x.x.x
                    HOSTALIAS=nu-esx-mon01.sig.int
                    HOSTATTEMPT=1
                    HOSTCHECKCOMMAND=check-mk-host-ping!-w 200.00,80.00% -c 500.00,100.00%
                    HOSTDOWNTIME=0
                    HOSTGROUPNAMES=check_mk
                    HOSTNAME=nu-esx-mon01.sig.int
                    HOSTNOTIFICATIONNUMBER=0
                    HOSTOUTPUT=OK - x.x.x.x: rta 0.650ms, lost 0%
                    HOSTPERFDATA=rta=0.650ms;200.000;500.000;0; pl=0%;80;100;; rtmax=1.308ms;;;; rtmin=0.324ms;;;;
                    HOSTPROBLEMID=0
                    HOSTSTATE=UP
                    HOSTSTATEID=0
                    HOSTTAGS=/wato/sig/server/vmware_infrastructure/ cmk-agent ip-v4 ip-v4-only lan no-snmp prod site:master tcp wato
                    HOST_ADDRESS_4=x.x.x.x
                    HOST_ADDRESS_6=
                    HOST_ADDRESS_FAMILY=4
                    LASTHOSTSTATE=UP
                    LASTHOSTSTATECHANGE=1540969406
                    LASTHOSTSTATEID=0
                    LASTHOSTUP=1540970233
                    LASTSERVICEOK=0
                    LASTSERVICESTATE=OK
                    LASTSERVICESTATECHANGE=1540970205
                    LASTSERVICESTATEID=0
                    LONGDATETIME=Wed Oct 31 08:17:45 CET 2018
                    LONGHOSTOUTPUT=
                    LONGSERVICEOUTPUT=
                    NOTIFICATIONAUTHOR=
                    NOTIFICATIONAUTHORALIAS=
                    NOTIFICATIONAUTHORNAME=
                    NOTIFICATIONCOMMENT=
                    NOTIFICATIONTYPE=PROBLEM
                    SERVICEACKAUTHOR=
                    SERVICEACKCOMMENT=
                    SERVICEATTEMPT=1
                    SERVICECHECKCOMMAND=check-mk
                    SERVICEDESC=Check_MK
                    SERVICEGROUPNAMES=
                    SERVICENOTIFICATIONNUMBER=1
                    SERVICEOUTPUT=(Service Check Timed Out)
                    SERVICEPERFDATA=
                    SERVICEPROBLEMID=1001
                    SERVICESTATE=CRITICAL
                    SERVICESTATEID=2
                    SHORTDATETIME=2018-10-31 08:17:45
2018-11-05 12:49:07 Computed variables:
                    CONTACTS=?
                    HOSTFORURL=nu-esx-mon01.sig.int
                    HOSTOUTPUT_HTML=OK - x.x.x.x: rta 0.650ms, lost 0%
                    HOSTSHORTSTATE=UP
                    HOSTURL=/check_mk/index.py?start_url=view.py%3Fview_name%3Dhoststatus%26host%3Dnu-esx-mon01.sig.int
                    LASTHOSTSHORTSTATE=UP
                    LASTHOSTSTATECHANGE_REL=5d 04:45:41
                    LASTHOSTUP_REL=5d 04:31:54
                    LASTSERVICEOK_REL=17840d 11:49:07
                    LASTSERVICESHORTSTATE=OK
                    LASTSERVICESTATECHANGE_REL=5d 04:32:22
                    LONGSERVICEOUTPUT_HTML=
                    MICROTIME=1541418547101716
                    MONITORING_HOST=sig-mon01
                    OMD_ROOT=/omd/sites/master
                    OMD_SITE=master
                    PREVIOUSHOSTHARDSHORTSTATE=UP
                    PREVIOUSHOSTHARDSTATE=UP
                    PREVIOUSSERVICEHARDSHORTSTATE=OK
                    PREVIOUSSERVICEHARDSTATE=OK
                    SERVICEFORURL=Check_MK
                    SERVICEOUTPUT_HTML=(Service Check Timed Out)
                    SERVICESHORTSTATE=CRIT
                    SERVICEURL=/check_mk/index.py?start_url=view.py%3Fview_name%3Dservice%26host%3Dnu-esx-mon01.sig.int%26service%3DCheck_MK
                    WHAT=SERVICE
2018-11-05 12:49:07 Preparing rule based notifications
2018-11-05 12:49:07 Found 0 user specific rules
2018-11-05 12:49:07 Global rule 'Notify all contacts of a host/service via HTML email'...
2018-11-05 12:49:07  -> matches!
2018-11-05 12:49:07 Warning: Contacts of nu-esx-mon01.sig.int;Check_MK cannot be determined. Using fallback contacts
2018-11-05 12:49:07 Warning: cannot get information about contact mailto:leon.xxx@xxx.de: ignoring restrictions
2018-11-05 12:49:07    - adding notification of mailto:leon.xxx@xxx.de via mail
2018-11-05 12:49:07 Executing 1 notifications:
2018-11-05 12:49:07   * notifying mailto:leon.xxx@xxx.de via mail, parameters: (no parameters), bulk: no
2018-11-05 12:49:07      executing /omd/sites/master/share/check_mk/notifications/mail
2018-11-05 12:49:07      Output: Unable to fetch number of graphs: Unable to fetch graph infos: <p>XML file &quot;/omd/sites/master/var/pnp4nagios/perfdata/nu-esx-mon01.sig.int/Check_MK.xml&quot; not found. &lt;a href=&quot;http://docs.pnp4nagios.org/faq/6&quot;&gt;Read FAQ online&lt;/a&gt;</p>
2018-11-05 12:49:07      Output: Spooled mail to local mail transmission agent

--- after triggered resend in notifcation analysis ---

2018-11-05 12:56:55 Preparing rule based notifications
2018-11-05 12:56:55 Found 0 user specific rules
2018-11-05 12:56:55 Global rule 'Notify all contacts of a host/service via HTML email'...
2018-11-05 12:56:55  -> matches!
2018-11-05 12:56:55 Warning: Contacts of nu-esx-mon01.sig.int;Check_MK Discovery cannot be determined. Using fallback contacts
2018-11-05 12:56:55 Warning: cannot get information about contact mailto:leon.xxx@xxx.de: ignoring restrictions
2018-11-05 12:56:55    - adding notification of mailto:leon.xxx@xxx.de via mail
2018-11-05 12:56:55 Executing 1 notifications:
2018-11-05 12:56:55   * would notify mailto:leon.xxx@xxx.de via mail, parameters: (no parameters), bulk: no

I just shut down an uncritical host, nothing was written in notify.log! If I click on “Host Notifications”, I am able to see that I should have been notified.


(Philipp Näther) #6

Can you undo all this again please?
You do not have to add hosts and services to the contact group “Everything” because check_mk internally uses this group as default anyway if no contact is assigned.
Try to get back to the default state with no contact assignments and permissions on folders etc.
Then only check the admin user with the desired email address for the contact group “Everything”


(Leon) #7

Okay, I’ve achieved something. :thinking:

The reason I don’t get notifications is that the servers are checked by the slaves and the slaves don’t send alerts to the master. The warnings and crits are displayed but the notifications are not forwarded to the masters’ alert module. There seems to be some module missing on the slaves called MKNOTIFYD (?)… The docs say that this only exists in the enterprise version :weary:

When I had a machine checked not by the slave but by the master and then clicked on “Fake Check Result”, I immediately received notifications.

How do I proceed in the RAW version, so that the slaves forward their alerts to my master and this then sends the notification to the users?

Kind regards
Leon


(Philipp Näther) #8

Ohhh… I always forget to ask if it is a distributed monitoring environment. That changed my point of view in a couple of the past topics I tried to help out.

So if you are running on the RAW edition you have to set up your slaves the same way you did with your master site. That means the slave needs:

  • a running MTA
  • a smart host that functions as a relay for mails
  • if the relay is not your main mail host, a connection from slave relay to main mail host

Email Notification Problem
(Leon) #9

Somehow the solution doesn’t quite satisfy me. :joy:
Is there another way without having to set up an MTA on each slave? It’s somehow sad that the manufacturer limits the RAW version to this important function.


(Leon) #10

I did set up a MTA on one slave, this is the output of /var/log/mail.log when a critical event was triggered:

Nov  7 11:15:05 checkmk-slave-fch01 postfix/pickup[16422]: CFBF5A1725: uid=999 from=<master@xxx.net>
Nov  7 11:15:05 checkmk-slave-fch01 postfix/cleanup[17131]: CFBF5A1725: message-id=<20181107101505.CFBF5A1725@checkmk-slave-fch01.xxx.local>
Nov  7 11:15:05 checkmk-slave-fch01 postfix/qmgr[16423]: CFBF5A1725: from=<master@xxx.net>, size=78121, nrcpt=1 (queue active)
Nov  7 11:15:05 checkmk-slave-fch01 postfix/local[17133]: warning: required alias not found: mailer-daemon
Nov  7 11:15:05 checkmk-slave-fch01 postfix/local[17133]: CFBF5A1725: to=<MAILER-DAEMON@checkmk-slave-fch01.xxx.local>, relay=local, delay=0.03, delays=0.03/0/0/0, dsn=2.0.0, status=sent (discarded)
Nov  7 11:15:05 checkmk-slave-fch01 postfix/qmgr[16423]: CFBF5A1725: removed

master@xxx.net is the address of my check_mk master and this seems to be used by the slave MTA.

What’s that with MAILER-DAEMON?


(Philipp Näther) #11

mailer-daemon is the bounce address of your slave postfix. It bounces because it can not send mail correctly I guess. Seems like your postfix config is not correct.