Strange problem with 2.8.2 under alpine linux

icinga2
docker
alpine

(Bodo Schulz) #1

Well, i have a problem.

I have been building Icinga2 from the sources in my Docker Containers for some time to have a current version. (Alpine currently offers 2.8.0 as the last stable version)
As a basis I use Alpine 3.7.

I’ve been working like this since version 2.8.1 and it’s going pretty well.
When I released my Ruby gem last week as a final 1.0 release, I ran the spec tests against both 2.8.1 and 2.8.2.
With 2.8.1 everything works as intended but with 2.8.2 icinga dies with a segmentation fault as soon as I perform a lot of API calls very quickly. :frowning:
The Spec Tests have a total of 106 tests and I’ll never make it over 34.

Now I wanted to deliver at least something tangible and yesterday I dealt very intensively with gdb, but unfortunately I did not understand the whole thing. :frowning:

Maybe someone will help me with the use of gdb?

Here gdb should be able to start icinga automatically and throw a stacktrace (if possible). Alternatively, I modify the container so that I hand icinga over to the gdb.
Before someone reminds me of this, i read it and tried it, but I didn’t succeed.

I can’t and don’t want to open a bug report just because of some obscure suspicion.


(Michael Friedrich) #2

Run Icinga 2 in foreground with gdb

https://www.icinga.com/docs/icinga2/latest/doc/21-development/#gdb-run

Trigger the error from the outside, or from within an executed plugin depending on the problem.
Once the debugger stopped, generate a full backtrace and attach it to the new issue.


(Bodo Schulz) #3

yes, i try this.

gdb --args /usr/lib/icinga2/sbin/icinga2 daemon -x debug --no-stack-rlimit
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/icinga2/sbin/icinga2...done.
(gdb) r
Starting program: /usr/lib/icinga2/sbin/icinga2 daemon -x debug --no-stack-rlimit
[New LWP 676]
[New LWP 677]
[New LWP 678]
[New LWP 679]
[New LWP 680]
[New LWP 681]

Thread 6 "icinga2" received signal ?, Unknown signal.
[Switching to LWP 680]
0x00007fbd520baf33 in __clone () from /lib/ld-musl-x86_64.so.1
(gdb)

thats it.
no open port, but running gdb process

/etc/icinga2 # ps ax
PID   USER     TIME   COMMAND
    1 root       0:00 {run.sh} /bin/bash /init/run.sh
  541 root       0:00 tail -f /dev/null
  595 root       0:00 sh
  670 root       0:00 gdb --args /usr/lib/icinga2/sbin/icinga2 daemon -x debug --no-stack-rlimit
  672 root       0:00 /usr/lib/icinga2/sbin/icinga2 daemon -x debug --no-stack-rlimit
  736 root       0:00 sh
  889 root       0:00 ps ax

/etc/icinga2 # netstat -tulnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.11:43163        0.0.0.0:*               LISTEN      -
udp        0      0 127.0.0.11:47431        0.0.0.0:*                           -
/etc/icinga2 #

With your enhancements (e.g. Boost Pretty Printers) are the same.
(And i have here no access to an svn service … firewall rules)

and here I fail grandiosely. :frowning:

I suspect that there is a problem with the musl library, but that’s not really tangible right now.
I’ll check with the Alpine package manager.


(Michael Friedrich) #4

That’s the nature of a debugger - it stops on a specific signal or breakpoint. At this point, you need to analyse the scope and generate a backtrace.

https://www.icinga.com/docs/icinga2/latest/doc/21-development/#gdb-backtrace


(Bodo Schulz) #5

thank you michael!

i have the complete output pasted in this gist: https://gist.github.com/bodsch/16d6c96b8dfa6aa48c59eb9599edacef

and i don’t understand the output :frowning:
I suspect, the musl lib has no debugging informations.


(Michael Friedrich) #6

No idea either. ld-musl seems to be the Alpine version of (g)libc, http://www.musl-libc.org/faq.html

Retry with installing the debug package for musl, might provide more insights.

https://pkgs.alpinelinux.org/package/edge/main/x86/musl-dbg


(Bodo Schulz) #7

I think, i have update.
Alpine Linux has currently version 2.8.4 in there repository.
When i use this Version and the director module, the icinga process crash.

I use the automation task to configure the director:

icingacli director migration run
icingacli director kickstart run

The migration run works properly, but the kickstart run crash:

icingacli director kickstart run --verbose --debug
ERROR: Exception in /usr/share/webapps/icingaweb2-2.5.3/modules/director/library/Director/Core/RestApiClient.php:177 with message: CURL ERROR: Failed to connect to icinga2-master.matrix.lan port 5665: Connection refused

But I have only one guess, because I haven’t fully understood the internas yet.
Probably this is a network / packet problem.
I want to test a little more with the Alpine configuration.


(Michael Friedrich) #8

Hm, the Director is not able to connect to the API, what’s the real error on the Icinga 2 side (log)?


(Bodo Schulz) #9

The icinga logfile is clean (as far as I can see)

icinga2-master    | [2018-07-05 14:06:11 +0200] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 0.0666667/s (4/min 4/5min 4/15min);
icinga2-master    | [2018-07-05 14:06:11 +0200] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
icinga2-master    | [2018-07-05 14:06:11 +0200] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 0, rate: 1773.25/s (106395/min 106395/5min 106395/15min);
icinga2-master    | [2018-07-05 14:06:11 +0200] warning/TlsStream: TLS stream was disconnected.
icinga2-master    | [2018-07-05 14:06:11 +0200] critical/ApiListener: Client TLS handshake failed (from [172.21.0.4]:49374)
icinga2-master    | Context:
icinga2-master    |     (0) Handling new API client connection
icinga2-master    | 
icinga2-master    | [2018-07-05 14:06:21 +0200] warning/TlsStream: TLS stream was disconnected.
icinga2-master    | [2018-07-05 14:06:21 +0200] critical/ApiListener: Client TLS handshake failed (from [172.21.0.4]:49400)
icinga2-master    | Context:
icinga2-master    |     (0) Handling new API client connection
icinga2-master    | 
icinga2-master    | [2018-07-05 14:06:26 +0200/init/run.sh: line 67:   674 Segmentation fault      /usr/sbin/icinga2 daemon --config /etc/icinga2/icinga2.conf --errorlog /dev/stdout
icinga2-master    | exit with signal '139'
icingaweb2        | ERROR: Exception in /usr/share/webapps/icingaweb2-2.5.3/modules/director/library/Director/Core/RestApiClient.php:177 with message: CURL ERROR: Failed to connect to icinga2-master.matrix.lan port 5665: Connection refused

(Carsten Köbke) #10

Did you create an api user? Tried to conect via curl with the api user credentials? firewall ? port is open on the server ? servername can be resolved ?
Forget it, did read all, just the line with connect failed :frowning:


(Bodo Schulz) #11

yes, a api user is present and functional.
no firewall between these 2 docker containers.
all required ports are open and available.
and the servername can be resolved, yes.


(Carsten Köbke) #12

openssl lib maybe causing trouble?


(Bodo Schulz) #13

no.
all other connections are fine.
i think, it is possible a networking problem (e.g. packet sizing).


(Carsten Köbke) #14

Wireshark is your friend :slight_smile:


(Michael Friedrich) #15

That sounds really weird. Why is that config validation triggered from run.sh when Icinga 2 already is running from the logs above?


(Bodo Schulz) #16

you mean the --config /etc/icinga2/icinga2.conf part?
i will remove them (possible an copy/paste or layer8 error) - but this works well :wink:


(Bodo Schulz) #17

so, i replace the wrong cli parameters with /usr/sbin/icinga2 daemon --log-level debug
and start the provisioning again
the logfiles from master are to big, i will create an gist for this:


the logfiles from icingaweb2 are smaller, but i put they in the same gist


(Michael Friedrich) #18

No, I mean the log file lines one by one. It doesn’t make sense to me that at first glance Icinga 2 is running, then it says there’s an API connection, and for some reason an external caller runs run.sh which then crashes.


(Bodo Schulz) #19

The run.sh starts the icinga process

when icinga died, we can see the parent who start the process.


(Michael Friedrich) #20

If this is reproducible, a crash log would help. Or you’ll exec into the container and start icinga2 in foreground with gdb, run it and trigger the crash, then create a full backtrace.