Posts by mobro

This forum was archived to /woltlab and is now in read-only mode.

    Your where right - i removed the colon then the script worked. The error in the python script was caused by a wrong usage of response. For everyone who is interested, here is a code snipped that worked for me to create a host with a python script by using the director REST API:


    Thanks for the fast and good help



    Data is missing in the current file. In the current file i found a check result dating to 15:18 - no check result for 15:17. Interestingly icinga2 at the master seems to realise the connection loss of the satellite at 15:18 - maybe this is the cause of this behaviour. Icinga2 is thinking it has a connection to the parent, even though it has not, and therefore does not store the data for a later replay. Slave log lag is jumping to 6hours, 9 minutes immediately - which is annoying and not true!:



    Hello,


    I have an issue with the icinga2 director module. When I try to use the REST API to create hosts I receive the following error:


    I use the example code from https://github.com/Icinga/icin…master/doc/70-REST-API.md with the probosed director-curl script. I first made an implementation in python where I used the json module to create the json string:


    Code
    1. prepped.body = json.dumps(OrderedDict([("object_name", hostname), ('import', template_import), ('address', ip_address)]),
    2. indent=2, separators=(',', ': '))

    resulting JSON:

    Code
    1. {
    2. "object_name": "test",
    3. "import": "generic-host",
    4. "address": "127.0.0.1"
    5. }

    Response from the director API:

    Code
    1. "error": "Invalid JSON: An error occured when parsing a JSON string"

    I am using the latest version of icinga2 director (Git hash 6234648a1ffe74a7239b2ab30a3bb3b7afafc6dc), icingaweb2 2.4.1, icinga2 2.6.3-1, PHP 5.6.30, Debian Jessie.


    What am I doing wrong?

    - the endpoints on the satellite are also running on debian jessie.

    - the size of /var/lib/icinga2/api/log/current grows if the zone is disconnected

    - the satellite does schedule the checks. In the image i posted in the inital post one can see that data is replayed - one, e.g. two datapoints are missing and the log lag is jumping to 50k....


    Edit:

    Another pic:


    Today I ran a short test with icinga2 to evaluate the replay log in icinga 2.6.3. I have a Master with a satellite which has two endpoints. I disconnected the satelllite zone by disconnect the accordingly router interface. Then after ~30minutes i reconnected the system. This is the result:



    as one can see there are datapoints missing. On one host there are 2 datapoints missing, on the other 1 datapoint is missing. The log lag of the zone is jumping to 50k immediately after a disconnect. The Zone check is executed on the master with the cluster-zone check - cluster name is set to the satellite zone. Can somebody explain this behaviour?


    regards

    mobro

    Hello,


    Mikesch : i am sorry, but I don´t have an idea how this approach can help us.


    I think we found now a solution: We make a Zone where we add the main and backup system as endpoint. For the hosts (and their services) in the main system we define the main system icinga instance as command_endpoint and for the backup system the backup icinga2 instance.


    For the kpi relevant data we create a "virtual" host which only have passive services (for KPI relevant data) and the system which is currently active will supply this host with information. Therefore we should achieve a continuous monitoring of KPI relevant data, even the system is switched from the backup to the main system or the other way around. Thanks for your help, best regards

    Thanks for your reply. The networks are not able to communicate with each other - they use the same subnet. There is only one server at each system (main/backup) which can run an icinga2 instance. Every other device (Host A...Z) is a "stupid" network device - such as a camera, a switch ...


    Therefore it seems not possible to me to create a configuration as you propose.

    Hello,


    I have a question how I can solve the following monitoring task with icinga2. We have a monitoringmaster. We have a couple remote systems which have specialized hardware attached and every component which is immanent for operating the system is redundant. If the main system fails then the backup system must take over:




    Our customer expects that we measure KPI relevant data, such as uptime/availability of the system. The system is unavailable when both, the Main and the Backup system is unavailable. The question is now how I can measure the KPI relevant data over the main and backup system, without a loss of data - even in a network fail and the connection to the Monitoringmaster is broken. There will be a network connection between the main and backup system, but the backup and main hosts are in a distinct network - therefore I think I can´t span a HA Zone over the main and backup system.


    I was thinking about a Zone, in addition to the main and backup zone, where I add the main and backup system and create a host which should hold kpi relevant data - which is not possible because an endpoint only can be in one zone.


    Does anybody else have an idea how I can solve this problem with icinga?


    Thanks for your help in advance

    We use icinga 2.4.1-1 and see the following, unexpected behaviour. Eventhough the certificate on the monitoring backoffice is expired, the server client communication is still working normally. After a restart of a client icinga2 instance, the client complains that the backoffice is unauthenticated:



    The problem is that the zone check is still in OK state, but the system does not send Check results to the Backoffice.
    So fails will not show up in the Backoffice and trigger Notifications....


    Is this a Bug?

    Hello,


    I want to set a filter in icingaweb2 to get the following result:


    All Hosts with Hostname System where Service1 is OK with critical errors on Service2, or Service3 or Service4


    I played a little with the Filters but didn´t find a solution.


    Anybody has a clue how I can achieve this?


    Edit: using icingaweb 2.1.0


    regards
    mobro

    dnsmichi wrote:

    I thought both hosts are checked inside the satellite zone, but given that the health check works I'm wondering what's missing? You cannot trigger a downtime if the client doesn't sent any active check results on its own. That's by design.

    Missing is that the scheduled downtime is not triggered when the zone is offline. I understand now that this behaviour is as intended, even though I would rethink this design decision :)


    Thanks for investigating this problem with patience,


    regards,
    Mobro

    We have a Master - Satellite Setup. The satellite endpoint again has a satellite connected. The Master is RHEL 6.7, the satellites are Debian Wheezy (7) 32 Bit, all running icinga2 2.4.1-1


    Master
    |
    v
    Satellite/Master 2 (connected with a mobile internet connection)
    |
    v
    Satellite 2 (connected with LAN - no connection loss between this icinga2 instance and the Master 2 icinga2 instance)


    The satellite is connected via Mobile internet to the Master. So it happens every day that we have a reconnect on this system and the Satellite-Zone is not connected for a couple of minutes. From time to time it happens that the connection is lost for ~ 10-15 minutes. The following error occured:

    • Connection was lost for ~15 minutes - everything was OK at this time
    • The Zone was reconnected, messages get replayed, log_duration is set to 0 on the Endpoint
    • Passive-Active Services in the Satellite 2 become unknown on the Master, even though a service check result was received in time on the Satellite 2 client. The master propagates the result to the Satellite 2 client.

    This seems like a bug to me.


    The Compatlog entries for this situation on the Satellite 2:

    A service check result is received, 25 seconds later the Master propagates an "UNKNOWN" state to the Satellite 2, 60 seconds later another check result received, everything is fine again.


    Service definition of one passive active service:

    Unfortunately I don´t have any logfiles left from the Master where you could see the reconnect of the system and how the state change occurred on the Master.

    I have two hosts in a Satellite. One is the Host itself, the other is a Zone Host. When I talk about "zone", I mean this Zone Host - not the Zone object. The Zone Host is automatically created by the "icinga2 node update-config" command.
    Host Definition for both Satellite Hosts:

    • Satellite Host Object:
    • Satellite-Zone Host Object:


    I schedule a downtime for both hosts and its services with the command postet in the initial post. The problem is that the downtime only gets triggered for the "Satellite-Zone" Host.


    regards and thanks,
    Mobro

    My issue is that systems in Downtime are not shown as in Downtime. That is because the Zone is not reachable and the scheduled DT only gets triggered for the Zone, not for the "Subsystems".


    dnsmichi wrote:

    Why don't you schedule multiple downtimes then and let them trigger from the endpoint hosts downtime (trigger_id)?

    I already schedule the downtime for each host of one System. I also see this scheduled downtimes in the object list (changed the FQDN of the Master/Client):

    • Zone Downtimes, this downtime gets triggered if the Zone goes offline:
    • Server Downtime, this downtime is not triggered when the zone is offline:

    If the Zone is in Downtime, the downtime for the server and its services does not get triggered. I am not sure at the moment if for services/hosts, that are in a critical state before the zone becomes unavailable, notifications get send. But they definitely are wrong displayed in icingaweb2 and in NagVis. I hope I clarified my issue, thanks for your help,


    regards
    MoBro

    Hi!


    We have a Master-Satellite setup. OS Version is RHEL 6.7(64 Bit) as master and debian 7(32 Bit)/8(64 Bit) for the satellite hosts. The used icinga version is 2.4.1-1. There is a (PHP) script involved to schedule downtimes. It is not run by a cronjob, the script is accessible via Apache. When a system comes online, it sends a http message to the Master which sets the system out of the current downtime and schedules a downtime in 10 hours. This is done for every host in the satellite system with the COMMAND postet in my first post.


    When the system gets a connection loss in between this 10 hours and does not come back online, the scheduled downtime will get applied to the Endpoint Zone. The problem is that it will not get applied to the Host (of the Zone), services and the hosts/services "below" the endpoint host.


    Today I can not fetch the requested logs and downtime states. I will do so on Monday. I don´t know how to queue either the API neither the IDO, I will give you to output of the object list.


    regards and thanks
    mobro

    Hello!


    I have a problem with downtimes. We have systems which are only online for a couple of hours a day. When they come online, they send a message to an HTTP API. This API deletes downtimes set for this system and schedules a downtime in 10 hours in icinga2 via the livestatus interface. If the system got shutdown correct, they send again a message to the HTTP API which sets the system in a downtime. Also the Scheduled downtime works if the system is online for >10 hours. A problem occurs when the Zone is not reachable any more and the scheduled downtime should start: the Zone will get set into the scheduled downtime, also Services applied to the Zone. But the server and its services will not start the scheduled downtime and their services are displayed CRITICAL (when they where CRITICAL before the Zone lost connection) in icingaweb2 and also on the NagVis Map.


    The downtime is set as follows:


    Shell-Script
    1. COMMAND [Unix timestamp] SCHEDULE_HOST_SVC_DOWNTIME;<hostname>;36000;473040000;1;0;0;LiveStatus;"blah";\n\n
    2. # Description:
    3. #COMMAND [Unix timestamp] SCHEDULE_HOST_SVC_DOWNTIME;<hostname>;<start_time>;<end_time>;<fixed>;<trigger_id>;<duration>;<author>;<comment>;\n\n

    For me it seems like that this behaviour has something to do with the implicit dependency between the Host and the Zone - when the Zone is not reachable no action for the Host will get triggered.


    Is it possible to configure Icinga2 in a way that even though the Zone is DOWN and in a DT that also the scheduled DT for the Hosts and their Services "below" this Zone get triggered?


    Thanks for any help in advance,
    regards
    MoBro

    I tried it with the global function, it does not work. I get the following error message:


    Code
    1. critical/config: Error: Error while evaluating expression: Tried to access undefined script variable 'servicenames'
    2. Location: in /etc/icinga2/conf.d/test.conf: 13:55-13:66
    3. /etc/icinga2/conf.d/test.conf(11): display_name = "test"
    4. /etc/icinga2/conf.d/test.conf(12): var servicenames = [ "disk", "test*" ]
    5. /etc/icinga2/conf.d/test.conf(13): assign where sg_match_service_names(service.__name, servicenames)
    6. ^^^^^^^^^^^^
    7. /etc/icinga2/conf.d/test.conf(14): }
    8. /etc/icinga2/conf.d/test.conf(15):