LMD is reporting i/o timeout in lmd.log

icinga2
(Florin Tiucra Popa) #1

Hi all,

After going to version for OMD (OMD Build 2.70-labs-edition) I have noticed strange messages in lmd.log and also Thruk is displaying an internal server error.

In LMD.log we have something like:

[2018-12-27 13:57:42][Warn][response.go:432] write error: write unix //tmp/thruk/lmd/live.sock->@: i/o timeout
[2018-12-27 13:57:42][Info][listener.go:134] incoming services request from @ to //tmp/thruk/lmd/live.sock finished in 23.600012618s, size: 93268.943 kB

On thruk page we have something like:


message $VAR1 = 'reading header from socket failed, check your livestatus logfile: ';
at //sites/<masked_user>/share/thruk/lib/Monitoring/Livestatus/Class/Lite.pm line 386
at //sites/<masked_user>/share/thruk/lib/Monitoring/Livestatus/Class/Lite.pm line 386
at //sites/<masked_user>/share/thruk/lib/Thruk.pm line 326
Thruk::_dispatcher(‘HASH(0x2215868)’) called at //sites/<masked_user>/lib/perl5/lib/perl5/Plack/Util.pm line 145
eval {…} called at //sites/<masked_user>/lib/perl5/lib/perl5/Plack/Util.pm line 145
Plack::Util::run_app(‘CODE(0x12c43f8)’, ‘HASH(0x2215868)’) called at //sites/<masked_user>/lib/perl5/lib/perl5/Plack/Handler/FCGI.pm line 145
Plack::Handler::FCGI::run(‘Plack::Handler::FCGI=HASH(0x792e88)’, ‘CODE(0x12c43f8)’) called at //sites/<masked_user>/share/thruk/script/thruk_fastcgi.pl line 27

LMD version used:

~/etc/thruk$ lmd -vvv --version
lmd - version 1.3.0 (Build: 2.70-labs-edition_())

OMD version used:

~/etc/thruk$ omd version
OMD - Open Monitoring Distribution Version 2.70-labs-edition

LMD.ini

~/etc/thruk$ grep -v ‘^#’ lmd.ini

LogLevel = “Debug”
StaleBackendTimeout = 900
UpdateInterval = 60
FullUpdateInterval = 600
IdleTimeout = 0
IdleInterval = 0
NetTimeout = 180

Thruk_local.config:

connection_pool_size = 30
perf_bar_mode = off
<apache_status>
Site http://127.0.0.1:5000/server-status
System http://127.0.0.1/server-status
</apache_status>


name = <masked_user>
id = <masked_id>
type = livestatus

peer = localhost:<masked_live_port>


I am fully aware that this is not the current stable version and therefore I will not be able to move to OMD 2.90-labs-edition in near future.

What can I do to find the root cause for i/o writing socket file?
Do I have a timeout option or a way of digging more?

Thank you in advance,
Florin

(Sven Nierlein) #2

the response size is close to 100mb and livestatus caps the result at 100mb by default but just closes the data stream making the json data invalid.
But how does this restrain you from updating to the latest release?

(Florin Tiucra Popa) #3

Hi Sven,

This is our last milestone on deploying OMD-2.70 on production. Recently we jumped out from 2.40 to 2.70 with couple of issues (solved already) only on QA and DEV.

Production is still running OK with OMD-2.40.
Unfortunately for this production systems we cannot catch up to the latest release.
We need first to roll out OMD-2.70 on all our monitoring systems.

Do we have any chance to increase the response size or tweak it?

Many thanks,
Florin

(Sven Nierlein) #4

Which core do you use? The maximum response size is a configuration option for the livestatus module.

(Florin Tiucra Popa) #5

Hi again,

I am using ICINGA2 core with THRUK + LMD + PNP4NAGIOS.

cat /etc/SuSE-release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 4

ldd livestatus.o
ldd /usr/omd/versions/2.70-labs-edition/lib/mk-livestatus/livestatus.o
linux-vdso.so.1 => (0x00007ffdf75f7000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6dfbe9c000)
libm.so.6 => /lib64/libm.so.6 (0x00007f6dfbc23000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f6dfba0c000)
libc.so.6 => /lib64/libc.so.6 (0x00007f6dfb690000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6dfc380000)

omd config show
ADMIN_MAIL: <masked_user>
APACHE_MODE: own
APACHE_TCP_ADDR: 127.0.0.1
APACHE_TCP_PORT: 5001
AUTOSTART: on
CORE: icinga2
CRONTAB: on
DATASCRYER: off
DEFAULT_GUI: welcome
DOKUWIKI_AUTH: off
DOWNTIMEAPI: off
GRAFANA: off
INFLUXDB: off
LIVESTATUS_TCP: off
MKEVENTD: off
MOD_GEARMAN: off
MULTISITE_AUTHORISATION: off
MULTISITE_COOKIE_AUTH: off
MYSQL: off
NAGFLUX: off
NAGVIS_URLS: thruk
NSCA: off
PNP4NAGIOS: on
PROMETHEUS: off
SNMPTRAPD: off
THRUK_COOKIE_AUTH: off
TMPFS: on

(Sven Nierlein) #6

If you cannot update OMD for some reasons, you could try to replace the bin/lmd with a newer release. This not really the recommended way, but since its a single binary it has no side effects.
Btw, livestatus.o is not related to icinga2, they have their own implementation, so no idea if the result limit applies to icinga2 as well.

(Florin Tiucra Popa) #7

Hi Sven,

Many thanks for help and for out of the box workaround.

I am not sure how hard it will be to implement this at all 3 layers: DEV/QA/PROD - it may not pass our compliance rules. Except that is there any other parameter or switch to change in shell or in config files?

Thank you,
Florin