Thruk 2.32-3 - Reports2 - Reports not correct

Hi there,

today I upgraded our Neamon/Thruk installation, to the latest available version 2.32-3 from the repository.
The upgrade went well but now I found out that the result of the reports are not correct.

I use the plugin Reports2 and BusinessProcess in combination. Meaning, I have several BP defined and with the Reports2 plugin I create once a month reports for these BP.

Since the upgrade the reports are all 100%. Meaning, the values are not correct which are used from Neamon. In November we had some downtime due to maintenance work on several servers. But these downtimes are not shown on the reports.

I read the documentation but I can’t find out what was changed from the version 2.30 to 2.32.

To check the main functionality of the reports2 plugin, I created a new report for only a host. But there is the same issue.
I run the report in &debug=1 mode and this is the result.

Uri: /thruk/cgi-bin/remote.cgi
*************************************
version: 2.32~3
user:    USERNAME
parameters:
$VAR1 = {
      'assumeinitialstates' => 'no',
      'attach_json' => 'no',
      'breakdown' => 'days',
      'dateformat' => '',
      'datetimeformat' => '',
      'debug' => 1,
      'decimals' => '2',
      'details_max_level' => '100',
      'graph_min_sla' => '90',
      'host' => 'HOSTNAME',
      'hostnameformat' => 'hostname',
      'hostnameformat_cust' => '',
      'includesoftstates' => 'no',
      'initialassumedhoststate' => '0',
      'initialassumedservicestate' => 0,
      'language' => 'en',
      'mail_max_level' => '-1',
      'max_outages_pages' => '-1',
      'max_pnp_sources' => '1',
      'max_worst_pages' => '1',
      'rpttimeperiod' => '',
      'service' => '',
      'show_log_entries' => 1,
      'sla' => '98',
      'timeperiod' => 'lastmonth',
      'unavailable' => [
                         'down',
                         'down_downtime',
                         'unreachable',
                         'unreachable_downtime'
                           ]
        };
debug info:
uname:      Linux MONITORING 4.4.0-170-generic #199-Ubuntu SMP Thu Nov 14 01:45:04 UTC 2019 x86_64
release:    Ubuntu 16.04.6 LTS
*************************************

$ma_options
$VAR1 = {
      'assumeinitialstates' => 'no',
      'assumestateretention' => 'yes',
      'assumestatesduringnotrunning' => 'yes',
      'backtrack' => 4,
      'breakdown' => 'days',
      'end' => 1575154800,
      'hosts' => [
                   'HOSTNAME'
                 ],
      'includesoftstates' => 'no',
      'initial_states' => {
                            'hosts' => {},
                            'services' => {}
                          },
      'initialassumedhoststate' => 'unspecified',
      'initialassumedservicestate' => 'unspecified',
      'log_file' => '/tmp/ThFJk8Gw5V',
      'rpttimeperiod' => '',
      'services' => [],
      'start' => 1572562800
    };
/tmp/ThFJk8Gw5V:

Is it normal that the temp file ( /tmp/ThFJk8Gw5V ) is allways empty? Filesize is zero.

If there are any other information needed let me know.

Regards

PS: The webpage do not show me any error messages.

If that file is empty, this probably means that there haven’t been found any log entries for the given timeframe.

Hi Sven,

thanks for your answer. That’s a littel bit strange to understand.
These reports are configured many month ago and worked fine until end of November 2019. I upgraded to the latest stable version and now they do not work anymore.

Is there a way to debug the data collection from the report? Meaning, can I execute a command on the command line to get the expected values?

In the meantime I installed a test system with the latest thruk/naemon versions. There I configured a testhost with some services. If I create there a report, the dbg.txt file is full of values.

I think the problem is located between the naemon output and the input into thruk/reporting engine. If I’m correct than the livestatus should be responsible for the data transfer. Right? Can we troubleshoot this?

Would be nice if you have any ideas for troubleshooting. I do not like to install / migrate all the stuff from the existing system to a new one.

I found something during troubleshooting. If I create a new business process and after the configuration and clicking save change I get an error “reload command succeeded, but services are missing” at the top of the page.

Could it be possible that with the latest upgrade the BP was changed? It looks like that I’m no longer be able to add/change any Business Processes. And due to the fact that all reports which do not have any values are BP’es the problem might be not at the reporting part but the Business process.

You could check which file is used to export the BP objects and see if that file is used by Naemon.

Hope I have understood correctly. You talk about the file which is created while generating the report?
If I run the report with the business process in debug mode I see at the end of the debug file for example a file like this: /tmp/VGrLaDAMR6

If I check this file at the filesystem I can see that the file exists but there is no content in it.

Another strange thing is, if I like to generate a new report for an business process and I choose type “SLA Business Process” I do not see any BP in the drop down menu. I only get “no results found”. Looks like that there is some mistake between BP and Repoting.
This behaviour is exactly the same on my fresh installed test system. Are you able to reproduce it at your side?

I am not talking about reporting. Reporting is based on Naemon objects and it seems
like the BP objects are not loaded properly. You need to make sure, that the BP objects
are exported correctly and loaded by naemon. Each BP gets exported at least as a
Host and a Service object, named exactly like the BP itself. If those do not exist in
Naemon, reporting will not have any data.
Have a look at https://thruk.org/documentation/configuration.html#_component-thruk-plugin-bp
The objects_save_file needs to be loaded by naemon.

Hi,

today I checked the configuration file (thruk_local.conf). And I found out that the Backend was not correct. I guess that the name of the backend is case sensitiv. So I changed this and I also noticed that the path to the bp templates was not correct. I don’t know if that was still the beginning or if some other guy’s changed it. Nevertheless I did a short test this morning after these changes and this test was successful. I need to wait some more time to verify if everything is now working as expected but I think we are on a good way.

But one thing is still an issue. If I choose in the reporting menu SLA business process, I’m not able to select any bp from the drop down menu. I need to write it by myself. I don’t think is a browser issue because if I select host or service, I got a full list of available hosts/services in this drop down menu.

Maybe you have also an idea regarding this little issue.
Regarding the thread topic, I will come back if the error is gone now.

Please open an new thread for a new issue.

I have tested the last days some reports, but I have always the same result.
I’m now able to get reports working again, but I will only see the current day into the reports.
Example:
Report Type: SLA Host
Host: SERVER_XXX
Timeperiod: This Month
Breakdown by: Days
All other properties are default.

The SERVER_XXX is up and running for many month now and there should enough data be available in Naemon/Thruk.

If I create a new report and do not change anything except the Host, I’m able to create a report for this host for the last 12 Month. This is the default timeperiod.
Btw. the temp file which is created during the execution of the report is the only file which is not empty. All other files, which are created by the report plugin are size zero!

Do you have any idea why this happened?

If you need any further information, please let me know.

There are some setups where the logs are archived and ziped. In that case they are not usable by Naemon anymore and cannot be used in reports. Maybe something like that?

I found out in the meantime that the standard trends also not working as expected. I only got one day in the graph and the rest are empty. If I check the configuration (naemon.cfg) for the “log_archive_path” the permissions of the naemon.log and .gz files are set to the user / group naemon. So it looks like the permissions are correct.
The files in the log_archive_path are filled with a ton of information from several services / hosts. So naemon is still writing data through it but for any reason it couldn’t read anymore.

Do you have any idea how I could troubleshot that? Should I close the case her and open another one into the naemon thread? Into the next days, I will compare settings from my test environment and the productive ones. Hope I can figure out some differences. But I’m happy to get any feedback from you / the community if such a problem was known by anyone.

Many thanks for your support. With your last post I figured out the root case of the problem.
Last year in July me or some other colleague did a change in the logrotate configuration for neamon logfiles. Instead of uncompress it was changed to compress. So all archived logfiles from neamon were compressed since this date. The procedure to generate the reports is not able to work with compressed files, so that was the problem in my case.
Since I uncompressed alle history logfiles again the reports are running fine.

The case is finally solved now for me. Again, many thanks for your help.