zookeeper is a wildly used centralized service for distributed computing, it aims to handle many hard but common problems in distributed computing scenario, for example, leader election, configuration maintenance and etc. Below is what I understand how leader election works in zookeeper.

As quote from zookeeper homepage below.

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

Which can provides leader election for clients, in generally speaking, zookeeper clients are kind of service, for example, mesos masters, chronos and etc.

With leader election of zookeeper, it’s quite easy to elect a leader from multiple clients, this blog will describe the most simple way zookeeper uses to elect leader of clients.

The most simple way is every clients creates a znode in a chosen path, for example /candidates, with flag: SEQUENCE|EPHEMERAL.

We assume there are four clients started, e.g. four chronos instances started. See below diagram.

diagram1

diagram 1. There are four clients connected with zookeeper service: c_i, c_i+1, c_i+2, c_i+3

In above diagram, there are four clients, created four znodes in the /candidates path: c_i, c_i+1, c_i+2, c_i+3. We assume there sequence number assigned by zookeeper are i, i+1, i+2, i+3.

The above diagram describes the leader election rules:

  • the client has smallest sequence number is the leader
  • every client watch its previous client, e.g. c_i+1 watches c_i, c_i+3 watches c_i+2

##How about a non-leader client die? If a follower client die, say c_i+2 in below diagram, since c_i+2 isn’t the current leader client, so c_i+3 will watch c_i+1 now.

diagram2

diagram 2. A non-leader client(c_i+2) die

##How about the leader client die? If the current leader die, that’s mean c_i die, see diagram below.

diagram3

diagram 3. The leader client(c_i) die

Since c_i is the current leader client and it die, then c_i+1 which has the smallest sequence number now will be the leader client.