Issues with Livestatus LMD and Thruk

  • Good Day,


    I have been working with getting a new monitoring system in place at work and have been working with Thruk. We have several remote locations with Nagios/Thruk servers, so I used autossh for reverse ssh tunnels and used livestatus to set up the Thruk backend (connection parameter in Backend configuration like so: localhost:portnumber). This works great. However, we have a remote site that is literally half way around the world from our Master Thruk server- when I added this remote NAgios/Thruk server to the backend configuration, accessing any links in Thruk related to host data , it takes about 6 to 10 seconds to load the page. Previously, access to the pages was just about instantaneous. I researched on how to increase responsiveness, and it seemed that the best idea was to install LMD.

    I have been working with LMD for several days and have not been able to get it working properly. I decided to try to get it working by using OMD to set up the Thruk site, so I created another VM and set it up Thruk with OMD. Once everything was working, I attempted to set it up with LMD. I ran into the same issues. The symptoms/info are as follows:

    -All connections under Thruk Backend Configuration pass the "test" button with a green check, but none of the backends are available in Thruk EXCEPT the Thruk Master backend (hosts do not show up at all on the site).

    -These errors show up in the LMD log: site went offline: [*hostname*] bad response: 400


    It seemed that the LMD setup was pretty straight forward, but I guess I am either doing something wrong, or my use of reverse SSH tunnels to connect the slaves to the master Thruk server will not work with LMD. One of the many things I tried was to set up was forwarding the livestatus port on the slaves to a Unix Socket on the Thruk Master, but I did not have any success with getting the forwarding itself working.. I tried this b/c the Thruk Master backend configuration uses a local Unix socket (Connection parameter is: /omd/sites/prod/tmp/run/live) and this is the only one that works with LMD enabled.


    I appreciate any help anyone can offer or any assistance in troubleshooting as I am out of ideas..


    Thanks

  • Have a look at the lmd logfile or increase logging. Also check the livestatus version on the remote end. Maybe thats the reason for the Bad Response 400.

    Thanks for the suggestions. The error messages I posted above are from the lmd log file actually. How do I increase logging? Good idea about the Livestatus version. I am out of the office until Monday, but I will be able to implement your suggestions on Monday. Thanks alot!

  • I shut down my original nagios/Thruk sites on the slaves and installed the slaves with OMD and set up sites with Nagios and Thruk and LMD worked. I guess it was a version issue, thanks for the help!


    Performance for the remote slave has improved drastically b/c of LMD.


    One last question:

    Is there any way to use Nagios 4 using the OMD installtion from the Yum repo?


    Thanks

  • This is my first post in the Portal, so please let me know if I should have started a separate thread...


    We have set up Icinga2 <-> Thruk 2.14-2, using LiveStatus & LMD, successfully on our CentOS-7-based monitoring server. However, there is still a small problem with the Thruk event log: It always lists its entries in the order of older entries first.


    When, on the Event Log page, one checks the top right box "Older Entries First" and then clicks on Update, the list appears with newer entries first - the way we would like it by default.


    Is this a bug (check box designation at least seems to be wrong)? If not, how can we configure the Event Log page to permanently show newer entries first?


    P.S. sni : Thanks for some great utilities!

    P.P.S.: The reason I posted in this thread is because I had been reading it, when we had problems getting the Icinga log data to get through to Thruk for several days. It turned out the compatibility logging on the Icinga2 documentation page re LiveStatus was essential for our configuration! (Just to contribute something...)

    The post was edited 4 times, last by bkai ().

  • Hi sni - sorry for the slow reply - we weren't using the Logcache then, but are using it now. We have since also updated Thruk to a more current version (2.16.02).


    Since then the ordering problem mentioned in my previous post has disappeared. :)


    Unfortunately a new* problem later came up, for which we do not know the cause after severals days of searching. When running thruk -a logcacheupdate ..., all further (compat. log file) arguments are ignored - leading to no data arriving for event log display in the GUI - and the run ends with an error:

    Code
    1. [15:16:59,959][INFO][Thruk] logcache update failed: Can't use string ("1")
    2. as an ARRAY ref while "strict refs" in use at
    3. /usr/share/thruk/lib/Thruk/Backend/Provider/Mysql.pm line 1109.

    I had a look at Mysql.pm & managed to determine that the $files variable seems to not be pointing at anything meaningful, although the supplied call-up arguments are registered by thruk ... --verbose ....


    We have for the moment done a "brutal fix" by my temporarily setting the variable directly in Mysql.pm (line 998), for the few seconds that the logcacheupdate run is done as a half-hourly cron job. This works (here & now, and only if no-one is using the module for other stuff at the exact same moment). But a correct/elegant way of getting that variable serviced would be great! (If I'm allowed to I can also forward a tarball with relevant /etc config. files, e.g. as an attachment here.)


    (* off-topic for this thread, sorry, moderators)

  • Thanks, sni ! I assume it will be in the next thruk release - until then our workaround will do for us.