H/A and Load-balancing on clients

api
icinga2
(V) #1

Hello guys,

I have read the following bug: https://github.com/Icinga/icinga2/issues/3533

Michael (@dnsmichi) says that issue is still exists. And I’m trying to build an architecture for a huge list of hosts. I currently have around 22k hosts, running Nagios monitoring. And thinking about migrating everything to Icinga.

The bug makes me wondering if H/A, LB would work properly. I have setup with the following: 1 Icinga master, 1 Satellite, and 6 clients within the same zone. Does H/A and LB work in this case? How do I can verify it? What’s the maximum to have clients under each satellite?

What would be the best solution to monitoring 22k hosts within one location?

Any help would be much appreciated!

(Marcel Fliegel) #2

Hey vkaf2,

in my opinion you should use different zones for your endpoints an not one. Thats much easier to maintain and I thing HA would not work with your setup.
Do you have several locations or data centre where your servers are located? If that’s the case I think the best solution would be to deploy two satellite hosts at each location/data center and lets them communicate with the two master nodes at the main site. This will also reduce the number of firewall rules.

Here an example HA configuration of the first Master Node (demaster1.example.intern), the first Satellite Node (atsatellite1.example.intern) and a Client (athost.example.intern) at austria satellite zone.

In this configs the master nodes will connect to each other and are syncing the configuration. Because the parent zone of athost.example.intern is sat_at, the master nodes will send the check commands to Austria satellite zone and the satellite nodes will execute the checks or send them to the client, if the check is executed localy.

I hope I could explain the facts reasonably well. :sweat_smile:

Greetings from Germany

demaster1.example.intern
# Master Zone
object Endpoint "demaster1.example.intern" { 
}

object Endpoint "demaster2.example.intern" {
    host = "demaster2.example.intern"
}

object Zone "master" {
    endpoints = [
        "demaster2.example.intern",
        "demaster1.example.intern"
    ]
}

# Satellite Zone Germany
object Endpoint "desatellite1.example.intern" {
	host = "desatellite1.example.intern"
}
object Endpoint "desatellite2.example.intern" {
	host = "desatellite2.example.intern"
}

object Zone "sat_de" {
    endpoints = [
	"desatellite1.example.intern",
    "desatellite2.example.intern",
    ]
    parent = "master"
}

# Satellite Zone Austria
object Endpoint "atsatellite1.example.intern" {
	host = "atsatellite1.example.intern"
}
object Endpoint "atsatellite2.example.intern" {
	host = "atsatellite2.example.intern"
}

object Zone "sat_at" {
    endpoints = [
	"atsatellite1.example.intern",
    "atsatellite2.example.intern",
    ]
    parent = "master"
}
atsatellite1.example.intern
# Master Zone DE
object Endpoint "demaster1.example.intern" {
}
object Endpoint "demaster2.example.intern" {
}

object Zone "master" {
    endpoints = [
        "demaster1.example.intern",
        "demaster2.example.intern",
    ]
}


# Satellite Zone Austria
object Endpoint "atsatellite1.example.intern" {
}

object Endpoint "atsatellite2.example.intern" {
    host = "atsatellite2.example.intern"
}

object Zone "sat_at" {
    endpoints = [
        "atsatellite2.example.intern",
        "atsatellite1.example.intern"
    ]
    parent = "master"
}
athost.example.intern
object Endpoint "athost.example.intern" {
}

object Zone "athost.example.intern" {
    endpoints = [ "athost.example.intern" ]

    parent = "sat_at"
}  
(Alex) #3

Hello vkaf2,
Have you tested you Icinga 2 HA configuration yet?

In your test enviornment disable the running Icinga2 service on your master server. If HA is configured correctly it should fail over to your second master server. From the details of your Icinga2 enviornment you provided you only have 1 master server configured so HA will not work correctly. Please review the Icinga 2 docunmentation for HA setup.
Check out this link about HA setup & configuration. https://icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#three-levels-with-master-satellites-and-clients

As far as the the bug you are asking about. There has been several version updates to Icinga2 since this bug was posted. This problem may not apply anymore.

Alex