could you visually show an example? Your description is a little vague
Using my method of check_by_ssh (there are alternatives), you can execute something like the following from your master server (in your case, .100):
so what does this do?
The command /usr/lib64/nagios/plugins/check_by_ssh is included in the nagios plugins suite that you most likely installed when you configured Icinga. The /usr/lib64/nagios/plugins/ directory is default installation directory for nagios plugins in Cent OS. In this directory, you'll find a bunch of other check plugins that you may also find useful.
check_by_ssh will open an ssh session into your host (specified by the -H parameter) and execute a command (specified by the -C parameter) on that host. As you can see, the command I chose was in that same directory I mentioned above and is the check_disk plugin. You can also specify different arguments, like the -w (warning threshold) and -c (critical threshold) that you see. Once the command is executed, the ssh session is closed.
This makes it so the disk check is checked locally on the host via SSH. You can do the same for your load check, and both checks can be easily translated into Director.
One thing you will need to do if you'd like to implement this is to set up SSH keys from your master to your remote hosts, otherwise it won't work because you'd have to enter in a password via ssh. There's a guide on DigitalOcean for this: https://www.digitalocean.com/c…how-to-set-up-ssh-keys--2Quote
I can remove Icinga on all hosts?
You should be able to remove Icinga completely, unless you want those hosts to be satellites for some type of distributed monitoring setup (which I don't think you need). All you need is that plugin directory that I told you about in the beginning.
Let me know how it turns out!
Hm, I don't know much about microsoft failover clusters but is your issue as simple as monitoring services on a Windows server?
XXX.XXX.XXX.[105-110] are my hosts I would like to monitor
what operating system are you running on these hosts? This matters because there are multiple ways of approaching your problem.
You shouldn't need to install Icinga on every host that you want to monitor, that would be a scaling nightmare.
You also need to ask yourself what you want to monitor on these hosts. For example, I use check_wmi for checking Windows machines because I can see disk space, critical services, and CPU usage and a variation of check_by_ssh for checking Linux machines for the same things. You could also use check_vmware_esx for checking virtual machines. There are very many possibilities.
Before you move forward, I say you need to redo your setup to use remote checking using one of the methods mentioned above rather than local checks for every single host you have.
I can provide specific examples but you'll need to answer my questions first.
Please, when you have a question about Grafana/InfluxDB, there is a different section on our forums. Post there instead.
Regardless, I don't think this is an InfluxDB problem. You most likely have a problem with your templating settings, since you imported a dashboard template. You'll need to do some further configuration if you want it to work for your environment.
Go into your "Icinga2 with InfluxDB" dashboard, go to Settings (the gear icon), then Templating (the </> icon). From there, you will see how the hostname and services show up. Change this to be something more dynamic.
For example, my templating settings look like this:
These queries select all hosts and the services associated with them. I recommend you to read up on the documentation before going any further if you don't understand this yet.
Hmm why not just check if port 3389 is open on the remote server?
I looked through your configuration and compared it to my own and everything looks fine. The only strange thing that I found was that in your "homi" object, the following is shown:
..."DowntimeStart", "DowntimeEnd" "DowntimeRemoved" ...
(see line 30 of the "homi" object you pasted)
As you can see, there's a comma missing in between "DowntimeEnd" and "DowntimeRemoved". Theoretically, this shouldn't work then (I tested it myself and it doesn't). However, this doesn't make sense because 1) your configuration shouldn't deploy, 2) the "homi" object is the one that works anyways, and 3) both objects use the same notification templates.
Is there any explanation for this?
If I put the rules in the client-nodes folder, it works. But still no global-templates.
By rules, what do you mean?Quote
But I saw a post where it was without global = true.
Pretty sure you have to define global = true in the global-templates zone, even on the clients.
I assume you have a service named some variation of "W3SVC" then? Because if not, of course your servicegroup would not be populated.
I just tried this out in my environment because I was curious:Code
so there must either be a problem with 1) the naming convention you use for the W3SVC or perhaps 2) your service is being assigned incorrectly.
I mean how are you assigning the services themselves to the hosts? not the servicegroup
How are you assigning the W3SVC to your hosts? Apply rules or object based?
Whenever you get an error like this, always remember to read the error.
critical/config: Error: Object 'vps4.szerverpark.eu' of type 'Host' re-defined: in /var/lib/icinga2/api/zones/vps4.szerverpark.eu/_etc/hosts.conf: 1:0-1:32; previous definition: in /etc/icinga2/conf.d/hosts.conf:
As you can see, you have a previous definition in /etc/icinga2/conf.d/hosts.conf, meaning that you either want to remove that Host definition in that config file or completely remove the conf.d from being loaded by commenting out the following line in /etc/icinga2/icinga2.conf on the client:
This will eliminate all local configuration on the client, but only do this if you have your config sync set up correctly to include anything that you might be missing for check commands, services, etc.
MaBau_MeBad has a good point. If you create roles for each user, you can separate the view (based on filters) for each team by having each team set to a certain role. However, I'm not sure of how you would separate the HA/LB for each team (if there is a way, this method may be feasible).
I think your "second thought" which is to completely separate each Icinga2 master/client setup for each team would be better as it would be easier to scale as you said. I hope you're using Ansible or Puppet as well to automatically deploy your configurations as well! That would be a ton of work otherwise.
Could you show where you define the template 'check_esxi_hardware'? Are you sure you defined it in commands.conf and not templates.conf? templates.conf is also reset when you upgrade Icinga.
Yeah, that's what I thought. I was just wondering if there was maybe a way around this? I guess not though. Thanks for confirming!
Yeah, I'm using check_snmp that gives me the "normal(1)" output. Thanks for your comment though, I realized that my query was incorrect. I was asking for the value of the field rather than the state, which you can specify in the Metrics tab. I'm also just going to use Singlestat instead of a graph.
Are you sure you've added the "groups" var to a host or assigned a hostgroup to a group of hosts? I just tested this and it works for me.
and I hope you didn't manually edit roles.ini haha
This is normal behavior, however now that I think about it I think maybe there should be a feature to indicate that the alert is handled in the tactical overview, just a services are shown as 'handled' when they are acknowledged.
I've recently started doing some SNMP checks and wanted to start graphing them. Getting the data into Grafana is no problem, however, representing it is my dilemma. Here is my situation:
I have different types of switches and firewalls and they all have different SNMP information to give me (for instance, my Dell switch gives me a CPU Usage value, 2 fan statuses, and 1 temperature sensor value while my Cisco switch gives me a CPU Usage value, 4 temperature sensor values, and a fan status). I want all of that into one dashboard, which I can do, but how do I make it so each graph has different y-axis values?
Here's an example of what I'm talking about:
As you can see, the services are correctly separated so that they can be dynamically repeated, but the y-axis for the temperature graphs are percentiles because of that. If I try to fix that in the Temperature graph metrics manually (to change it to Celsius), it will just inherit the parent graph configurations (which is percentile) upon loading the dashboard again. Is there a way around this? If more information is needed, I can provide. I know this is a very specific question. By the way, I'm grabbing the services from the database through the following query (in the Templating tab):
SHOW TAG VALUES WITH KEY = "service" WHERE hostdisplayname =~ /^$monhost$/ AND "service" =~ /SNMP.*/
where monhost is
SHOW TAG VALUES FROM "check_hw_health" WITH KEY = "hostdisplayname"
Side note: I am also having trouble converting an SNMP query that returns "normal (1)" (which is what I get for fan status) into performance data that I can graph or perhaps use a Singlestat for.
Did you have issues with the default icinga configs? Icinga should work fine out of the box. I'm not sure how you resorted to editing zones.conf or using command_endpoint since you shouldn't really touch those unless you are working with distributed monitoring, which in your case, I don't think you are (which means you shouldn't have to create a zone/endpoint definition for every host).
What changed from your old hosts.conf and zones.conf to your new ones?