i had an orphaned host-check for 3 days.
Now i'm looking for some informations how this might happen.
os: debian jessie
(no settings for uniq)
The host_check was initially OK and became orphand regarding to naemon.log.
The host was up the hole time. We have separate ping check and all other services were fine as well.
At this time 2 different worker are working on the desired queue. Both are connected to gearman-job-server and doing their job.
As you can see in the 2nd logentry. After logrotation, the host was assumed to be DOWN in softstate 2/3.
After this the host stayed in the state for some days. It was never being recognized as orphaned again and no results were submitted.
As i understand orphaned services, naemon should do this always if a service is removed from scheduling queue without result.
Regarding logs... no result was submitted for days. I only see the Current HOST STATE log for days. And always soft state 2/3.
Restarting the gearman_worker solved the problem. (college did that... so i am missing some informations like gearman_top at this time).
But queues and worker are monitored. The amount of worker per queue is also monitored (to avoid having zombies running).
So i am pretty sure everything was fine connected that time.
Well... how could this happen?
All scenarios i can imagine will not explain this.
My only possible explanation would be:
t0: gearman-job-server send orphaned result to naemon but somehow left the job in its own queue (maybe cause it it in jobs_running?)
t0: naemon got orphaned service from gearman-job-server and reschedule the check.
t1: gearman-job-server removed the job from naemon queue and discard the job cause use_uniq_jobs is active.
t2: naemon forgot about it. gearman-job-server already send its orphaned result...
t3: nobody cares.
Would be very nice if somebody can help or explain the behavior