Posts by andyb2000

This forum was archived to /woltlab and is now in read-only mode.

    SOLVED!

    Sorry, just after I posted this, I started off checking basics again and thought about the files, the files didn't have world read on them, and didn't have the webserver group assigned, so the webserver randomly wasn't able to access some files (Not sure why some had the right permissions some didn't), so as a quick test I did

    Code
    1. chmod 664 /var/log/icinga2/compat/archives/*.log

    And sure enough the web interface (classic-ui) took longer to process (This should have been my clue, it was returning far too quickly) and came back with valid data, nothing in "Insufficient Data".


    I've resolved this by adding the web server to the "adm" group which should resolve this for future.

    Still no luck, after upgrading to r2.6.3-1 and allowing this to run I'm still getting "Insufficient Data" showing on hosts and services when I query the classic-ui for data.


    The data is in the log files, just there seems to be discrepancies in the filenames so I don't know if this is causing the issue.


    Can anyone give any pointers on what to check next, how to diagnose the problem as lacking our availability reports is going to cause us major pain here?


    Thank you.

    Andy

    OK, I switched to DAILY in my compatlog.conf but something still doesn't look right:


    Code
    1. -rw-rw---- 1 nagios adm 13568195 May 4 23:59 icinga-05-05-2017-00.log
    2. -rw-rw---- 1 nagios adm 23710996 May 5 23:59 icinga-05-05-2017-23.log
    3. -rw-rw---- 1 nagios adm 23477303 May 6 23:59 icinga-05-07-2017-00.log
    4. -rw-rw---- 1 nagios adm 45980766 May 8 23:59 icinga-05-08-2017-23.log
    5. -rw-rw---- 1 nagios adm 26160246 May 9 23:59 icinga-05-10-2017-00.log
    6. -rw-rw---- 1 nagios adm 23935665 May 10 23:59 icinga-05-11-2017-00.log


    Something happened to the 7th for some reason, and why did it rotate randomly on the 5th and 8th?

    At the end of the 5th log file (icinga-05-05-2017-23.log)


    Code
    1. [Fri May 5 23:59:59 2017] EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;xxx.xxx.com;0;PING OK - Packet loss = 0%, RTA = 8.44 ms

    But it continues correctly into the file icinga-05-07-2017-00.log


    Code
    1. [Sat May 6 00:00:00 2017] CURRENT HOST STATE: xx.xx.xx.xx;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 7.45 ms


    I'm still getting the same errors in the classic-ui when I go into availability though:


    Undetermined r2.6.2-1 Not Running 0d 0h 0m 0s
    Insufficient Data 6d 0h 0m 0s 81.146%
    Total 6d 0h 0m 0s 81.146%

    Thanks, yes good idea, I've changed that and reloaded, that also made me look at the archives and something isn't right:


    Code
    1. -rw-rw---- 1 nagios adm 43831626 Apr 30 19:59 icinga-04-30-2017-19.log
    2. -rw-rw---- 1 nagios adm 11948162 May 1 07:59 icinga-05-01-2017-07.log
    3. -rw-rw---- 1 nagios adm 22772170 May 2 07:59 icinga-05-02-2017-07.log
    4. -rw-rw---- 1 nagios adm 5829541 May 2 09:59 icinga-05-02-2017-09.log
    5. -rw-rw---- 1 nagios adm 4024045 May 2 12:59 icinga-05-02-2017-12.log
    6. -rw-r--r-- 1 nagios adm 43553027 May 4 10:59 icinga-05-04-2017-10.log
    7. -rw-rw---- 1 nagios adm 2909711 May 4 11:59 icinga-05-04-2017-12.log


    I'd expect to see a file every hour, but I'm seeing quite large gaps which could explain the issue.

    Looking at the content also looks suspect:

    icinga-05-04-2017-10.log

    Code
    1. [Tue May 2 13:00:00 2017] LOG ROTATION: HOURLY
    2. [Tue May 2 13:00:00 2017] LOG VERSION: 2.0
    3. [Tue May 2 13:00:00 2017] CURRENT HOST STATE: xx.xx.xx.xx;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 8.95 ms
    4. [Tue May 2 13:00:00 2017] CURRENT HOST STATE: yy.yy.yy.yy;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 22.68 ms


    So in the file with the stamp of May 4 10:59 it has data from the May 2 at 13:00 so something is not right!


    Any idea on how/where to debug this with icinga2 as that's generating the files?

    (We have quite a few passive checks coming in from a remote location, so there is constant data, so it wouldn't be that there was no activity taking place)


    I've changed to daily rotate anyway so will watch over the next 24hrs to see if this resolves the issue.


    Thank you,


    Andy

    Just to add how we implement this since early Nagios/Icinga, we use gammu-smsd on the server with an SMS gateway plugged into it.

    In the gammu configuration file (/etc/gammu-smsdrc) we define the MySQL database connection for the daemon and a script to run on inbound sms:

    Code
    1. [smsd]
    2. service = SQL
    3. driver = native_mysql
    4. host = localhost
    5. user = sms_username
    6. password = sms_password
    7. database = sms_database
    8. checksecurity = 0
    9. RunOnReceive = /etc/nagios-scripts/sms_gammu_nagios_scanner.php

    So the sms_gammu_nagios_scanner.php script is used to connect to the MySQL database and check for messages in the inbound table. Search for specific commands in the reply and then we send it to Icinga2 (We've used the older method of sending the raw command to /var/run/icinga2/cmd/icinga2.cmd but also moving most of these to the newer API) to carry out the ACK or DISABLE functions as needed.


    One little gotcha since moving to icinga2 is the lack of the old NOTIFICATIONID variable that we used to use to identify the SMS replies (See thread NotificationCommand env variables available for info)

    Simply solved by creating our own when we send the SMS out via a script.


    Hope that helps if you need any more info please do send me a message.

    Hi,

    I'm running Icinga2 (r2.6.2-1) and mainly use the new dashboard, however we need to provide availability reports periodically to customers, internal use, etc, so still use the classicui for producing these reports since the new dashboard availability isn't there yet.


    However I've started to notice a problem appearing on the availability output, I'm getting a large amount of "undetermined" periods, and when looking closer at the details it shows this:



    So in the Undetermined section it shows "r2.6.2-1 Not Running Insufficient Data"


    Can anyone advise how to correct this problem? The compatlog is turned on (features-enabled contains a file with:

    Code
    1. library "compat"
    2. object CompatLogger "compatlog" {
    3. rotation_method = "HOURLY"
    4. }

    in it and logs are created and have content in them)


    Anyone offer any ideas on what to check to get it working again?


    Thank you!

    Andy

    That's where the symlink appears for them.


    Ah I just realised my error. I was under the assumption each user had to enable modules as they wished. However they don't, I've enabled the module I want (as an admin) and it's enabled for all users!


    Thank you PsiTrax that gave me the nudge to think it through :-)

    Hi all,


    We've moved our users over to using icingaweb2 however due to the reporting situation we do have a few that have to stay with the classicui for reporting purposes (availability reports, alert history for specific periods, etc), however this doesn't appear to be working on our setup.


    Going to the icinga2-classicui the normal areas work (Status, problems, system, etc) and find hosts, current states, etc. But when you try to view any history, either by viewing a host and then clicking "View alert history for this host" or using the "Reporting" submenu and using any of the items in there, it always comes back with "No history information was found for this host in log files for selected date."


    It seems like it cannot access the historical information for some reason.
    The event log also shows completely blank.
    I've tried querying the cgi's direct from the shell and I can get basic information back, but again no historical information (and no errors).


    The icinga2-classicui cgi.cfg looks like this:


    "program_version": "r2.5.4-1",
    (This is running on Ubuntu 16.04.1 LTS all installed from packages from http://packages.icinga.org/ubuntu icinga-trusty)

    No errors are logged to the cgi-log-file when performing the queries and no errors in apache logs either.



    Anyone got any suggestions where I can look to figure out why it's not reporting on history please?


    Thank you,
    Andy

    Hm, perhaps that's part of the problem. The service has now been deleted, so querying icinga2 object list comes back blank for that service now (even though it's still in the objects.cache file) could that be the issue, the cache file isn't being 'cleaned' after a service delete through the API?


    Checking my submit, it sets max_check_attempts to 0, so should that be set to 1 (i.e. non 0)?

    Yes no problem:


    Good point on the floats, as it looks like they've been converted for the check_interval and retry_interval.

    Hi all,
    (versions, etc, at base of message)


    I think this one is a bug but wanted to ask around first before submitting a bug report on it (I found https://dev.icinga.com/issues/11618 but it's not quite the issue I have).


    I'm creating 'dynamic' services using the API for a specific host alert, e.g.


    Code
    1. /usr/bin/curl -k -s -u myuser:mypass -H 'Accept: application/json' -X PUT "https://localhost:5665/v1/objects/services/HOSTNAME!SVC-ID1234" -d '{ "templates" : [ "linux-nonessential-server" ] , "attrs" : { "check_command" : "dummy" , "display_name" : "$in_service", "enable_active_checks" : "0", "max_check_attempts" : "0" }}'


    Which creates the service, I then submit a passive check result and it works great. After the alert I send a passive check result of OK and then later on a reaper script deletes the service. This fits our purpose exactly.
    The new (icingaweb2) is fine with it, but the classicui and JSON/cgi status queries seem to fail with this, and the error I get logged from the cgi debug is:



    Code
    1. [1480675182] Error: Invalid max_check_attempts, check_interval, retry_interval, or notification_interval value for service 'SVC-ID1234' on host 'HOSTNAME'
    2. [1480675182] Error: Could not register service (config file '/var/cache/icinga2/objects.cache', starting on line 315660)

    Is this a bug that I've found, and so I'll log with bug tracker, or has anybody else found this and has a known workaround, etc.


    Thanks in advance!
    Andy


    Icinga version:



    Icingaweb2:


    Code
    1. Version: 2.3.4+fix-1~ppa1604+1

    icinga2-classicui:



    Code
    1. Version: 2.5.4-1~ppa1~xenial1

    Thank you, that is excellent, I'm implementing it now, and for sure I owe you a few drinks!
    (I'm pleased to say the dashboard will be released open source once it's complete so will be available for others)


    That makes a lot more sense as you say, I got the host checks working great, but I'm not sure my filter for hostgroup under service query is working right, I'm using:



    And it always matches 0 results (When there are down services that would match the above statement)
    Using the Icinga Studio browser I've checked that under the host, the groups Array is there and has the "networks" entry in there, and under service the associated host is working correctly, so have I missed something on how to perform the match correctly? (If I remove the match it returns all the services not OK so the query, class, etc are working great)


    Thanks again.
    Andy

    Thanks again for replying dnsmichi, appreciate your time.


    Ahh, I see what you mean, the && to force the & to be passed as a complete parameter rather than just the url parsing deliminator.


    The code I've written is very basic and uses php curl, it's below:


    So I can simply call it using
    $return=api_connector("/v1/objects/services?joins=host&filter=service.state!=ServiceOK");
    or similar.
    So I'm thinking using the original query string I could use:



    Code
    1. /v1/objects/services?joins=host&filter=service.state!=ServiceOK&&match("networks",host.groups)&&host.last_hard_state!=1&&host.last_check!=-1&&host.acknowledgement!=0

    If I'm understanding correctly? Or like you say change my method to POST?


    I'm looking at the curl examples using POST but can't see how to chain multiple queries along such as I'm doing above, can you point me in the right direction?


    Thank you,
    Andy

    Thanks,
    Yes I've been tracking the bugs (as have one open at present on systemd start script issues) but cannot see one where there is no log in icinga2.err referring to the issue listed in crash log too.


    I suspect the key part to this is the:
    libpthread.so.0: <unknown function> (+0x113e0) [0x2b9d9556b3e0]


    Error, but trying to track what caused it in the code so the dev's can investigate closer, that's where I need a bit more info on how to debug it closer.


    Unfortunately this is on our live system :-(

    Me again!


    I've written a dashboard for our staff to use on large screens (Unfortunately none of the available ones fitted what we needed exactly), and I'm using the Icinga2 API to make the queries and display then using PHP.


    The display I'm struggling with is this logic:
    Services that are not OK, whose host is in a specific group, whose host is not down, whose host has not been acknowledged.


    So the query I built looked like this:


    Code
    1. /v1/objects/services?joins=host&filter=service.state!=ServiceOK&match('networks',host.groups)&host.last_hard_state!=1&host.last_check!=-1&host.acknowledgement!=0


    Which looks correct for the conditions I'm after, but this seems to be returning services where the parent host is down but has been acknowledged, so the final filter I believe is the one failing me. Can anyone spot the bug in the query I'm trying to make please?


    Thank you.

    Hi folks,


    Strange one, trying to track down why our icinga2 instance keeps dying on us. So far nothing has come up, checking config and it returns as all OK, but we get random crashes, crash output is like this:

    Also not sure why the GDB no such file or directory is appearing, as gdb is installed, is there somewhere I need to define it?


    Thank you.

    Hi,


    I'm using an API query to request the services and want to filter this on two parameters, the first works fine which is service not in OK state, the second I need to filter on the host group that it's attached to.
    (Hosts have an array of groups attached to them. Services are then assigned to these hosts, so the service itself doesn't have a group attached)


    The host (I've cut out the other stuff) is defined as:

    Code
    1. object Host "myhost.net" {
    2. groups = [ "networks" ]
    3. }


    My base query (without host group) is:

    Code
    1. https://localhost:5665/v1/objects/services?filter=service.state!=ServiceOK


    Which outputs all services not in OK state. So that's fine.


    I then tried to expand this using this:


    Code
    1. https://localhost:5665/v1/objects/services?filter=service.state!=ServiceOK&host.group=networks

    And this still just returns all services, no filter on host.group applied.
    I've tried a few different ways of querying this:



    Code
    1. https://localhost:5665/v1/objects/services?filter=service.state!=ServiceOK&filter=host.group=networks


    And others and cannot get this function to work, can anyone spot what i'm doing wrong here please?