Backend icinga2 is not running, but everything works

This forum was archived to /woltlab and is now in read-only mode.
  • Hi,


    I've got a strange problem, that I hope someone can help me with.


    I'm running icinga2 (2.6.3-1), icingaweb2 (2.4.1-1), and icinga2-ido-pgsql (2.6.3-1) all from the official icinga repos on Ubuntu 14.04.5 LTS.

    I'm deploying identical setups in three environments, all completely automated via Ansible.


    Now the issue is that in two out of the three environments the IcingaWeb2 -> System Monitoring -> Health shows the error "Backend icinga2 is not running" however it is running, it's working perfectly, icingaweb2 is able to query the database and show all the stati, notifications, and users etc, it all works perfectly, it's just showing this weird error, it's not causing any real problems, it's just bugging the hell out of me.


    Cheers,

    Andy

  • Hi,


    can you please have a look in the /var/log/icinga2/icinga2.log and search for something like this:


    [2017-05-11 10:18:37 +0200] information/IdoMysqlConnection: Query queue items: 0, query rate: 1683/s (100980/min 510647/5min 1533807/15min);


    I have this problem when the query items increasing. So there are more queries then the mysql server can handle.

  • Hi,


    I've had a look and the query rate seems pretty low


    [2017-05-11 09:46:40 +0100] information/IdoPgsqlConnection: Query queue items: 0, query rate: 4.21667/s (253/min 1266/5min 3794/15min);

  • Could you please post any modifications to your features-enabled/ido-pgsql.conf?


    Further please add the output of the programstatus query shown here: https://docs.icinga.com/icinga…r/icinga2-features#db-ido

  • The ido-pgsql.conf is as configured by the ido app


    Pointing me at that query has however unearthed the cause, the data type in postgres between the working and none working instances is below.


    Working instance

    status_update_time timestamp without time zone


    Instance showing the error

    status_update_time timestamp with time zone


    Predictably the instance showing the error returns nothing, whilst the working instance, returns a value.


    As the working version is downstream, is it possible it got built with an older schema that didn't include timezones, and the newer instances do, and are failing?

  • We've got a winner :) The change was introduced inside the 2.6 schema update, and as such Icinga Web 2 needs 2.4.1 (2.4.0 has a bug which also deals with timezone issues). So to speak, timestamp without time zone is correct, as all timestamps are stored in UTC now. The application itself manages the time zone in the presentation layer (i.e. that a check in the US is visually your local browser's timezone).

  • I am actually having the same issue with mysql ido. Is this because I commented out 'include recursive conf.d' in icinga2.conf file? I have checked the version of icinga2 and ido and they are actually all the latest version from debmon (I am using Debian).

  • Just had a play for a few minutes and I think this is a correct behaviour. Probably mon02's icinga2 started first than mon01's. I stopped the icinga2 service on mon02 and now icingaweb2 shows mon01's icinga2 on both.

  • Sorry for the delayed reply, I'm manually applied the database upgrade scripts to the other two environments and they are now working and not showing this error. I was under the impression that the icinga2-ido-pgsql app handled all the database upgrades and kept the schema in check?

  • Sorry for the delayed reply, I'm manually applied the database upgrade scripts to the other two environments and they are now working and not showing this error. I was under the impression that the icinga2-ido-pgsql app handled all the database upgrades and kept the schema in check?

    Hi,


    Thanks for your info. I am using MySQL but there are upgrade script so maybe I can give a crack. But I installed the latest version from debmon repository.


    root@RCSTMON01:~# icinga2 --version

    icinga2 - The Icinga 2 network monitoring daemon (version: r2.6.3-1)


    Copyright (c) 2012-2017 Icinga Development Team (https://www.icinga.com/)

    License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>

    This is free software: you are free to change and redistribute it.

    There is NO WARRANTY, to the extent permitted by law.


    Application information:

    Installation root: /usr

    Sysconf directory: /etc

    Run directory: /run

    Local state directory: /var

    Package data directory: /usr/share/icinga2

    State path: /var/lib/icinga2/icinga2.state

    Modified attributes path: /var/lib/icinga2/modified-attributes.conf

    Objects path: /var/cache/icinga2/icinga2.debug

    Vars path: /var/cache/icinga2/icinga2.vars

    PID path: /run/icinga2/icinga2.pid


    System information:

    Platform: Debian GNU/Linux

    Platform version: 8 (jessie)

    Kernel: Linux

    Kernel version: 3.16.0-4-amd64

    Architecture: x86_64


    Build information:

    Compiler: GNU 4.9.2

    Build host: smithers

    root@RCSTMON01:~# dpkg --list | grep ido

    ii icinga2-ido-mysql 2.6.3-1~debmon8+1 amd64 host and network monitoring system - MySQL support


    Do I still have to run the database upgrade script?

  • Hey all,


    I know this is an older thread but I didn't want to create a new one because there are so many like this one.


    I'm having the same problem with the web frontend showing "Backend icinga2 is not running" but in the backend, I see that the icinga2 service is running fine. When I try to reload the service, the error goes away for about 2-3 minutes and then comes back.


    The error stems from having to migrate my Icinga VM from one vCenter to another.


    At one point, it showed that there was a line that said

    "Supervising process <number> which is not our child. We'll most likely not notice when it exits." when I ran service icinga2 status (i'm running Cent OS 7). I found some help on another thread that said to just kill <processnumber> and that seemed to get rid of that message, but did not help in fixing my overall issue.


    The error.log is empty and icinga.log did not show anything of use to my knowledge (low query rates like the ones posted above me in this thread).


    I ran mysql schema updates and tried updating icinga2, icinga2-ido-mysql, mysql anyway. That didn't help.


    The only thing that I think that might be going on is that the CPU is being overloaded from all the checks since I added a bunch of new ones, but I removed them just to see if that was the root cause and I STILL have the issue.


    I have tried analyzing this with htop, like top but with more information and it does seem to show a huge spike after reloading the process. I've attached an image of the output.


    If I need to, I'll make my own thread, but I'd love some input on asap.


    Thanks in advance!!


    EDIT 1:

    I think this is related to some kind of process that needs to be running in order for Icinga2 to work because I had to shutdown the Icinga server before migrating it to another vCenter. Therefore, since I had to shut it down, some process that was not configured to start automatically on boot is not present for Icinga. Any ideas?

  • HI Watermelon,


    please post output of


    Code
    1. cat /etc/os-release
    Code
    1. icinga2 feature list


    Code
    1. icinga2 --version


    Replace USERNAME, PASSWORD and ICINGADBNAME with the ones you used in /etc/icinga2/features-enabled/ido-mysql.conf

    Code
    1. mysql -h localhost -u USERNAME --password=PASSWORD -e "show databases; use ICINGADBNAME; show tables ; select * from icinga_dbversion"


    Code
    1. systemctl stop icinga2 && rm -f /var/log/icinga2/icinga2.log && systemctl start icinga2 && sleep 10 && fgrep -i ido /var/log/icinga2/icinga2.log
  • Hey Mikesch ,


    Here's what I have:



    Code: icinga2 feature list
    1. icinga2 feature list
    2. Disabled features: compatlog debuglog graphite livestatus opentsdb statusdata syslog
    3. Enabled features: api checker command gelf ido-mysql influxdb mainlog notification perfdata




    Code: systemctl stop icinga2 && rm -f /var/log/icinga2/icinga2.log && systemctl start icinga2 && sleep 10 && fgrep -i ido /var/log/icinga2/icinga2.log
    1. systemctl stop icinga2 && rm -f /var/log/icinga2/icinga2.log && systemctl start icinga2 && sleep 10 && fgrep -i ido /var/log/icinga2/icinga2.log
    2. [2017-07-24 10:30:53 -1000] information/DbConnection: Resuming IDO connection: ido-mysql
    3. [2017-07-24 10:30:53 -1000] information/IdoMysqlConnection: MySQL IDO instance id: 1 (schema version: '1.14.2')
    4. [2017-07-24 10:30:53 -1000] information/IdoMysqlConnection: Finished reconnecting to MySQL IDO database in 0.316027 second(s).


    After the last command, everything seems to start up okay.... but that doesn't quite make sense to me. Could you explain? It might also be because I let the server sit for 5 days without any activity.


    Thanks!

  • Sorry for the late response.


    I still have not solved the problem. Last time, that command Mikesch gave me to run seemed to work:


    systemctl stop icinga2 && rm -f /var/log/icinga2/icinga2.log && systemctl start icinga2 && sleep 10 && fgrep -i ido /var/log/icinga2/icinga2.log


    However, it didn't seem to last. A couple weeks later I seem to be having the same problem as I had before. This time, I ran @dnsmichi's programstatus query from here and have found that while the web interface shows that the backend is not running, the query gives no output but during the few minutes that it does run right after I restart the service, it does give an output. What exactly do I do with this information?


    As a last resort, I reinstalled the icinga2-ido-mysql and icingaweb2 packages and also applied the latest schema upgrade, but that didn't seem to help either.

  • Hey Mikesch ,


    I appreciate the offer for the private session, but I would really rather not have to resort to that and also I would like to have all possible solutions/discussion on the forums for others to see. Isn't there something else I can do on my end?


    Thanks.