Monitoring dropouts of machine


#1

I have the following problem: Sometimes one of my VMs “freezes” (for 1-2 minutes). Then SSH is no longer possible and commands on the command line are executed afterwards.
I would like to monitor these dropouts, i.e. set a flag when this problem occurs again.

How can I solve this? I had thought of a cronjob which saves a timestamp to a file every minute and in case of a failure this entry is missing. But an evaluation of the file would run on the same machine and if the machine is frozen then no evaluation or transfer to CHECK_MK happens…

Anybody got an idea?

Many greetings
Thuranga


(Philipp Näther) #2

Only ssh stops working?
You could check the hosts check_mk_agent over ssh (datasource programs -> individual program call)
So you got an automatic monitoring of the ssh connection. Problem is you don’t see what is going on for the other services in case when your ssh issue appears


#3

No, evering freezes. SSH is only an example


(Philipp Näther) #4

So I would proceed as suggested. Query the host’s agent via SSH so you can monitor the freezes. You even could set a check interval lower than 60s if needed. But check if the check over SSH causes high load on the server if it freezes.