Icinga 2 Diagnostics script / data collector

diagnostics
support
debug

(Thomas Widhalm) #1

Hi,

Since I work as an support engineer at one of the Icinga partners I often have to play the guessing game/faq-game. A user reports a problem but first of all we have to figure out the basics. What OS, what Icinga 2 version, which addons and so on.

So I decided to start a project to create a script which collects all of this data in one easy to read output. The last part is important because I could always ask a user to send me their full configuration or even a clone of their VM. But what I need most is a thorough yet not overwhelming overview over the setup. The new script should provide us with exactly that: The most useful information for debugging common problems.

There is an option to have a more thorough collection of information though. So if we need log outputs, the full configuration, etc. the full mode should provide that as well.

I started the script after I analyzed a customers setup for writing documentation, so some of the checks are still considered to be a “quick hack”. But while I want to improve the code quality and “elegance” I still think easy to understand bash commands are still superior to super sophisticated ones in this occasion. I want the script to be easily maintainable (and auditable) and won’t bother with super fast execution times. This will be run once in a very long while so speed is not important.

So have a look at the script and feel free to contribute or open issues. Please note that I will focus the output to be as easily to read as possible. So while adding extra modes for special cases is always an option I might not add every idea to the default output. Furthermore the script should run on every supported and more platforms so keeping it simple is another thing to keep in mind.


#2

Nice! As I saw on Github, you added an Issue to add FreeBSD support. I’ll look into this!


(Thomas Widhalm) #3

I couldn’t help but planning on adding FreeBSD support since you infested me with FreedBSD - :heart: when we last met. :wink: In fact, I thought of your speeches about the benefits of FreeBSD when I opened that issue, @lme


(Dmytro Prokhorenkov) #4

It’s not a bug actually.
Your script shows agents as zones. Basically, because they are configured as zones. From a quick check of object list output I can’t say if it’s possible to remove them from results. But maybe it’s better to work on this. In future. I can try to work on this if it’s needed.


(Thomas Widhalm) #5

Hm, from a support view it doesn’t matter whether they are satellites or agents. I normally need to now roughly how many zone objects there are, regardless of them being satellites or agents. So I think in this case the script works correctly.
A more thorough way of showing the “tree” of masters, satellites and agents would be a very nice addition in an extra section even when I can’t imagine how to achive that for now. Maybe by disecting the configuration on the master? Seems to be rather complicated, but a really nice to have feature. Sounds like a wishlist feature request for me. :wink:


(Dmytro Prokhorenkov) #6

Yes, from support view it’s correct. But you can reach limits for post if you’ve too many hosts defined as agents.
Speaking about tree-view - could be difficult if you want to stay with bash.


(Thomas Widhalm) #7

This seems to be related to your issue on GitHub

It’s another point to have to script use temporary files.

It doesn’t have to look like a tree. :slight_smile: I just meant to provide an overview of dependencies between zones. Or did I misunderstand?


(Dmytro Prokhorenkov) #8

Then I misunderstood you. Saw “tree” and started thinking about tree view :slight_smile:
Yeah, it’s related. Then should be the rule to post an output of this script somewhere externally (pastebin etc).


(Thomas Widhalm) #9

The “treeview” from Icinga Web 1 / Icinga Classic is a thing many users ask for. But since it seems to be very complicated to implement and being not very useful than as a an eyecandy the developers might not implement it into Icinga Web. It didn’t work with all configuration options in the past anyway (think of multiple parents). It would be cool to have a zoomable map but I’m very sure this won’t happen soon and I think it’s far too complicated to be part of an “as simple as possible” data collecting script.

I don’t get the last sentence about posting the output to pastebin. You mean for debugging the script?


(Dmytro Prokhorenkov) #10

As I understood your idea, this should be a tool to collect maximum data for debug and in some situations post in topic. If so for people like me output would be too huge and could easily reach post size limit.


(Michael Friedrich) #11

(Thomas Widhalm) #12

Ah, now I get it! Yes, you’re totally right.
I was a bit blind on the community support eye because I created it as an aid for Icinga partners to provide support and they normally have ways to receive bigger chunks of data from their clients. But this was only the very first intention now it definitely should be a tool for the community just as much as for commercial partners.
Giving a hint about pastebin might be a good idea for the Readme.
Could anyone of the other team members chip in if there is a preferred way to paste bigger chunks of data when talking about Icinga?
There will be a “full” mode producing a compressed tarball but that’s not to be used for publicly available boards because it will contain the whole configuration including passwords etc.


(Michael Friedrich) #13

Don’t use pastebin btw, that’s an advertising hell. Propose github gists, they work reliably forever.

Readable output is a challenge. You’ll see that with the icinga2 troubleshoot cli command, which was and is far from perfect.

I’d suggest to keep the details as short and as interesting as possible. I have no interest in scrolling over logs and configs when reading about the problem where the user adds a sentence with “foo does not work”.

Also keep in mind that users will collect data, and just throw it in here. Without reflecting and analysing the problem themselves. I’m not sure whether such a script really helps this community (my personal opinion) where we encourage users to do their homework. This is different when you’re doing enterprise grade support though where customers pay you for that.

Imho such a script should also give indications which problem it may have detected, and already provide solutions and hints. It should encourage the user to fix the problem him/herself, before even posting a question somewhere.

jm2c,
Michael


(Thomas Widhalm) #14

Yes, Gists look like a better way to go. I may be putting this into the Readme.

Keeping the output short and readable was always the whole idea. I have a sample output in the Readme but it still seems to be too long. I already got an issue about finding anomalies: https://github.com/Icinga/icinga2-diagnostics/issues/6 . I might replace the output of the zones with a summary and shorten the package output even more.

@dnsmichi you’re right when you say there’s a difference between the needs of enterprise support and community support but I still believe that having the basics in a short overview is something both can use.

Having a “suggestion mode” like MySQL Tuner could be an option for further development when the basics are done.

Cheers,
Thomas


(Dmytro Prokhorenkov) #15

@dnsmichi sorry, posted the name of first tool i remember. didn’t think about gists.


(Michael Friedrich) #16

No worries. I just remembered this from IRC where pastebin/pastie/etc. failed at some point. Users tend to use such with Github or here too - and then after one month (or day) everything is gone and you can close the issue/thread.

I think, formatted text, like Markdown, which allows for copy paste, is good for short configs and logs. Longer things should be put inside a tarball and uploaded as “diag package”. This could also include coredumps for example.

Idea from NSClient++ - this one also builds a package which can be uploaded to nsclient.org itself. I don’t remember if it does that on crashes or invoked manually, but it could be a starting point. E.g. with a metadata.json file which holds the files and checksums included, to verify the integrity of the tarball later on (think of jar files).


(Thomas Widhalm) #17

@dnsmichi : This might be one of the rare moments where I’m ahead of your thinking. :smile:

I thought of a way to upload and automatically analyze the full-mode tarballs as well, but not everyone might like that. I wanted to leave the user with uploading the tarball to nextCloud or something like that themselves so they see they have full control over what they are uploading.


(Dmytro Prokhorenkov) #18

@widhalmt maybe make sense to use uname -a? should give more info to detect OS.


(Thomas Widhalm) #19

@l13t thanks for the hint, but I think while this gives a lot of information this is information we don’t need for the most debugging cases. We do have a rather small range of OS’es/distributions to support and most oft them seem to have a very distinct way to determine if we run on them and if so, which version.


(Dmytro Prokhorenkov) #20

it’s more about “how to detect os and skip not needed os checks”
uname -o could be an option. just to detect os: linux, bsd etc.