Tools you are using in a distributed environment

This forum was archived to /woltlab and is now in read-only mode.
  • Hi everyone,

    I'm currently looking for tool that support distributed monitoring. By distributed monitoring I mean tool that are able to work on multiple servers and that are able to send the collected data to a central point.

    I've already look at different solutions such as Shinken, Sensu, Nagios, and much more.

    I would to know now what you are actually using ? How ? Why did you choose it ? and so on.

    I'll be glad to get your answers.

    Regards :thumbsup:

  • Thank you for the link.

    I went trough the doc and find this information in Top Down (6.9.1) :



    • If the child node is not connected, no more checks are executed.
    • Requires additional configuration attribute specified in host/service objects.
    • Requires local CheckCommand object configuration. Best practice is to use a global config zone.

    Question about the first line : is there a solution for that ? I mean is there a solution to schedule checks on client side and avoid to send command from master ?

    I saw Bottom Up just after ... but it seems deprecated.

    The think is that it could be good to get values in local (client side) even if the connection with the master isn't up. For example, getting those data could be nice for reporting (not having holes in the diagrams etc etc)

    The post was edited 1 time, last by pcasis: forget sentences ().

  • Question about the first line : is there a solution for that ? I mean is there a solution to schedule checks on client side and avoid to send command from master ?


    Then a client is called a satellite.

    I would to know now what you are actually using ? How ? Why did you choose it ? and so on.

    As you might have guessed:

    I am using icinga2 because:

    • it is able to use check plugins written for nagios.
    • it scales well with lots of different scenarios.
    • it features active development and maintenance.
  • Thank for you answer. I'll read more on the satellites.


    it scales well with lots of different scenarios.

    Could you please give some example you have in minds ?

    For instance, let's imagine we have multiple satellites servers running Windows or Linux. I looked at the installations and it seems quite long with usage of Wizard. What about scaling in this case ? How updates are made on all the satellites if there is a new version of Icinga ?

    Thank a lot again for your answers.

  • There might be less clients / satellites than you think of.


    A host or service is an object to be checked.

    An endpoint (master, satellite, client) is a device that runs checks against these checkable objects.

    Endpoints with 10000...100000 managed checks are not that rare.

    For the windows world, you would "somehow" run a silent installer.

    For the linux world, you would perhaps use a tool like puppet.

    "Scales well" means for me:

    • is able to grow from a single master system to a multi-level hierarchy, including high availability for
      • the checkers (load sharing scenario)
      • the masters (election of active master that writes to the database)
    • available on a wide number of platforms (including arm, x64-86 linux, windows, etc)
    • able to run on thight boxes (raspberry pi)

    supports active checks, passive checks, event handler scripts, volatile services, configurable notifications with escalations to mostly any target media, centrally managed configuration.

    That is at least something, isnt it ?

    The post was edited 1 time, last by sru ().

  • I understood objects/services and endpoints stuff.

    Hum hum ok. I see ...

    So, what you are saying is that there typically few satellites that handles many clients. Maybe 10-12 satellites ?

    What about having 100 of satellites but having each only 1 or 2 clients ? Is it imaginable to have such thing ?

  • ew satellites that handles many clients

    Best would be to omit clients and let the satellites do the checking work because - you already found it - clients stop to work if the connection to the master drops whereas satellites continue with checking and write to a backlog meanwhile.

    In each zone two satellites, building a load sharing scenario under normal conditions and a failover if one of them dies.

    Number of zones may depend on number of different locations / networks that are not allowed to see / talk to each other.

    Is it imaginable to have such thing ?

    A satellite sending commands to be executed by clients is quite common.

    Clients aka Command-Execution-Bridges often exist because persons just learned how to configure a master and now are happy

    that this master is able to "drive" other machines checks - even with scheduling done on the master then.

    After a short time, they upgrade the client to have a dedicated configuration (ideally received from the master).

    If there is a configuration, then a dedicated scheduler is run on that machine.

    The post was edited 1 time, last by sru ().

  • Thank for those informations.

    What I meant by "Is it imaginable to have such thing ?" is more something like that :

    Is it possible/doable to have a lot of satellites (let's say 150) controlled by only one master ?

    I would that it's technically possible. But is it imaginable in a configuration management point of view ? isn't it too difficult to manage ?


  • There's a new Puppet module for Icinga 2 underway which will help you with that. Check it out here: (check the examples/ directory and the docs, this is really helpful).

  • Thank you very much. I'll look at it for sure.

    What advices could you give before deploying a distributed monitoring system ? in other words, what are the points that could possibly be problematic and that have to be considered before ?

  • If I were you, I would check the documentation including the different modes. That'll help when considering e.g. the Puppet module with either config sync or command endpoint checks ("top down"; "bottom up" has been deprecated). The docs also contain lots of hints and tips, and also propose several example setups. That in combination with the power of Puppet, Ansible, Chef and whatnot will certainly help.

    One thing I would also do - build a small setup (HA master, one satellite, one client) from scratch with the described manual setup from the docs. Once you understand the inner parts, dive into automation and config management.

    The reason why I suggest this - once you've learned about its functionality, it'll help to understand problems and updates later on.