Nagios 4.3.1 Crashes when recieving data through broker_module

This forum was archived to /woltlab and is now in read-only mode.
  • I have upgraded Nagios from 4.2.4 to 4.3.1 (luckily only on my development box) and now it crashes with a SIGSEGV / SIGTERM repeatedly (about once a minute).

    For me it looks like a problem when a broker_module sends data "back" to nagios.

    I base this on the following facts.

    1) If I disable both (mod_gearman & mk-livestatus) in nagios.cfg, everything works OK.

    2) If I enable mk-livestatus, but I do not feed data back to nagios (thruk - business processes) through mk-livestatus, it still works OK

    3) If I enable mod_gearman or I let thruk feed data back to nagios through mk-livestatus it starts crashing.

    Sadly, the only thing I can see in the nagios-log are:

    Caught SIGSEGV, shutting down...

    Caught SIGTERM, shutting down...

    In the debug-log I do not see anything strange.

    Here are my SW releases:

    OS: RHEL 7.3

    Nagios 4.3.1 (build from source)

    mod_gearman 3.0.1-1 (

    gearmand 0.33-5 (

    mk-livestatus 1.2.8p18 (build from source)

    Anybody out there using Nagios 4.3.1 with either mod_gearman and/or mk-livestatus?

    Other suggestions?


  • It might be worth comparing the included header files in mod_gearman and Nagios itself. Maybe Nagios broke the binary compatibility between NEB modules by changing the structs defined in those header files.…ee/master/include/nagios4 vs…score/tree/master/include

  • Doing a quick (yes really quick) check reveals that som header files has changed.

    Here are the changes that I found doing a diff (#ifndef/#define skipped):

    I am no programmer (any more - that was a long time ago), but some of those diff's does not look so good.

  • You could try running nagios in foreground with gdb and then generate a full backtrace. I've never installed Nagios4, but I know some coding and debugging foo.

    If you're looking for gdb instructions, you may borrow some from icinga2 and adjust the paths / runtime arguments (ignore the pretty printer section).…ent#development-debug-gdb

  • If you run

    1. (gdb) p this_customvariablesmember

    this probably returns 0x0 right?

    #4 does the neb callback, and jumps into mod_gearman's code in #3 with handle_svc_check().

    This calls clear_volatile_macros_r() and clear_contact_macros_r() back inside Nagios code.…ter/common/macros.c#L2843

    That code is really old, so I would guess that memory is somehow corrupted with the linked list passed via start pointer in

    1.  customvariablesmember **vars

    Also, the code lacks any checks for null pointers which directly run into a SIGSEGV.

    I'd navigate inside gdb and use "up" to step up until frame #3 is reached and then print local variables in this scope to debug further.…module/mod_gearman.c#L849

    This code snippet calls clear_volatile_macros_r(&mac). Maybe mod_gearman sets its own custom vars, or the structs have changed in this region.

    This is far beyond my knowledge, but I would collect that information and open an issue over at mod_gearman's issue tracker on GitHub.

  • Code
    1. Program received signal SIGSEGV, Segmentation fault.
    2. clear_custom_vars (vars=vars@entry=0x7ffffffed940) at ../common/macros.c:2851
    3. 2851 my_free(this_customvariablesmember->variable_name);
    4. Missing separate debuginfos, use: debuginfo-install boost-system-1.53.0-26.el7.x86_64 gearmand-0.33-5.x86_64 glibc-2.17-157.el7_3.1.x86_64 libgcc-4.8.5-11.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64 libuuid-2.23.2-33.el7.x86_64 sssd-client-1.14.0-43.el7_3.11.x86_64
    5. (gdb) p this_customvariablesmember
    6. $1 = (customvariablesmember *) 0x2d33302d37313032

    So, no it does not return 0x0. :(

    OK, I'll head over to GitHub and open a issue.

    Thanks again for your support.


  • F.Y.I.

    Compiling mod_gearman with the Nagios-4.3.2 headers (replacing all (except epn_utils.h) headers in include/ and include/lib/ with the ones from the Nagios sources) seems to fix the issue for me. I will let it run on my test rig for a few days, than I will update my production rig.