Chord (DHT Algorithm).


The elegance of the Chord algorithm, derives from its simplicity.

The keys of the DHT are l-bit identifiers, i.e., integers in the range [0, 2l-1].

They form a one-dimensional identifier circle modulo 2l wrapping around from (2l)-1 to 0.

Identifier Space

Chord Identifier Space. photo chord-identifier-space-transparent_zpsd25b2a34.png

Chord Address Space. Spheres represent nodes. Boxes represent keys. Arcs that do not belong to circle represent successor pointers.

Each data item and node is associated with and identifier. An identifier of a data item is referred to as a key, that of a node as an ID. Formally, the (key, value) pair (k,v) is hosted by the node whose ID is greater than or equal to k. Such node is called successor of key k.


Given a Chord identifier circle, all identifiers are well-ordered and keys and nodes are uniquely associated. Thus, each (key, value) pair is located and managed on a single, well-defined node. The DHT is formed by the set of all (key, value) pairs on all nodes of an identifier circle. The key to efficient lookup and modification operations on this data is to quickly locate the node responsible for a particuolar key.

For a very simple routing algorithm, only very little per-node state is required. Each node needs to store its successor node on the identifier circle. When a key is being looked up, each node forwards the query to its successor in the identifier circle. One of the nodes will determine that the key lies between itself and its successor. Thus, the key must be hosted by this successor.

This inefficient form of key location involves a number of messages linear to the number of nodes on the identifier circle. Chord utilizes additional per-node state for more scalable key lookups.

Chord Finger table. photo chord_finger_zps2e18cad3.png

Chord Finger Table. Nodes are connected at intervals increasing by powers of 2.

Each node maitains a routing table, the finger table, pointing to other nodes on the identifier circle. Given a circle with l-bit identifiers, a finger table has a maximum of l entries. On node n, the table entry at row i identifies the first node that succeeds n by at least 2i-1, i.e., successor(n+2i-1), where 1 <= i <= l. The first finger of a node is always its immediate successor on identifier circle.

As a finger table stores at most l entries, its size is independents of the number of keys or nodes forming the DHT. Each finger entry consits of a node ID, an IP address and port pair (or equivalent), and possibly some book-keeping information. Even for large identifiers, e.g., l=256, this is a relatively small amount of data per node which can be efficiently managed and searched. The routing information from finger tables provides information about nearby nodes and a coarse-grained view of long-distance links at intervals increasing by powers of two.

The Chord routing algorithm exploits the information stored in the finger table of each node. A node forwards queries for a key k to the closest predecessor of k on the identifier circle according to its finger table. When the query reaches a node n such that k lies between n and the successor of n on the identifier circle, node n reports its successor as the answer to the query.

Thus, for distant keys k, queries are routed over large distances on the identifier circle in a single hop. Furthermore, the closer the query gets to k, the more accurate the routing information of the intermediate nodes on the location of k becomes. Given the power-of-two intervals of finger IDs, each hop covers at least half of the remaining distance on the identifier circle between the current node and the target identifier. This results in an average of O(log N) routing hops for a Chord circle with N participating nodes. For example, a Chord network with 1000 nodes, forwards queries, on average, in roughly O(10) steps.


The Chord system described so far also needs to allow for nodes joining and leaving the system as well as to deal with node failures.

Node Arrivals

In order to join a Chord identifier circle, the new node first determines some identifier n. The original Chord protocol does not impose any restrictions on this choice. For example, n could be set at random assuming that the probability for collisions with existing node IDs is low in a identifier space large enough. There have been several proposals to restrict node IDs according to certain criteria, e.g., to exploit network locality or to avoid identity spoofing.

For the new node n, another node o must be known which already participates in the Chord system. By querying o for n's own ID, n retrieves its successor. It notifies its successor s of its presence leading to and update of the predecessor pointer of s to n. Node n then builds its finger by iteratively querying o for the successors of n+21, n+22, n+23, etc. At this stage, n has a valid successor pointer and a finger table. However n does not show up in the routing information of other nodes. In particular, it is not known to its predecessor as its new successor since the lookup algorithm is not apt to determine a node's predecessor.

Stabilization Protocol

Chord introduces a stabilization protocol to validate and update successor pointers as nodes join and leave the system. Stabilization requires an additional predecessor pointer and is performed periodically on every node. The stabilize method on a node k requests the successor of k to return its predecessor p. If p equals k, k and its successor agree on being each other's respective predecessor and successor. The fact that p lies between k and its successor indicates that p recentrly joined the identifier circle as k's successor. Thus, the node k updates its successor pointer to p and notifies p of being its predecessor.

With the stabilization protocol, the new node n does not actively determine its predecessor. Instead, the predecessor itself has to detect and fix inconsistencies of successor and predecessor pointers using stabilize method. After node n has thus learnt of its predecessor, it copies all keys it is responsible for, i.e., keys between predecessor(n) and n, while p the predecessor of n releases them.

At this stage, all successor pointers are up to date and queries can be routed correctly, albeit slowly. Since the new node n is not present in the finger tables of other nodes, they forward queries to the predecessor of n even if n would be more suitable. Node n's predecessor then needs to forward the query to n via its successor pointer. Multiple concurrent node arrivals may lead to several linear forwardings via successor pointers.

The number of nodes whose finger table needs to be updated is in the order of O(log N) in a system with N nodes. Based on the layout of a finger table, a new node n can identify, the nodes with outdated finger table as predecessor(n-2n-1) for 1 < i <= l. However, the impact of outdated finger tables on lookup performance is small, and in the face of multiple node arrivals, the finger table updates would be costly. Therefore, Chord prefers to update finger tables lazily. Similar to the stabilize method, each node n runs the fix_fingers method periodically. It picks a finger randomly from the finger table at index i (1 < i <= l) and looks it up to find the true current successor of n + 2i-1.

Node Failures

Chord adresses node failures on several levels. To detect node failures, all communication with other nodes needs to be checked for timeouts. When a node detects a failure of a finger during lookup, it chooses the next best preceeding node from its finger table. Since a short timeout is sufficient, lookup performace is not significantly affected in such a case. The fix_fingers method ensures that failed nodes are removed from the finger tables. To expedite this process, fix fingers can be invoked specifically on a failed finger.

It is particularly important to maintain the accuracy of the successor information as the correctness of lookups depends on it. If, for example, the first three nodes in the finger table of node n fail simultaneously, the next live finger f might not be the true live successor s. Thus, node n would assume that a certain key k is located at s and would accordingly send incorrect replies to queries for k. The stabilization protocol can fail in a similar fashion when multiple nodes fail, even if live fingers are used as backups for failed successors.

To maitain a valid succesor pointer in the presence of multiple simultaneous node failures, each node holds a successor list of length r. Instead of just a single successor pointer, it contains a node's first r successors. When a node detects the failure of its successor, it reverts to next live node in its successor list. During stabilize method call, a successor list with failed node is repaired by augmenting it with additional successors from a live node in the list. The Chord ring is affected only if all nodes from a successor list fail simultaneously.

The failure of a node not only means that it becomes unreachable but also that the data it managed is no longer available. Data loss from the failure of individual nodes can be prevented by replicating the data to other nodes. In Chord, the successor of a failed node becomes responsible for the keys and data of failed node. Thus, an application utilizing Chord ideally replicates data to successor nodes. Chord can use the successor list to communicate this information and possible changes to the application.

Node Departures

Treating nodes that voluntarily leave a Chord network like failed ones does not affect the stability of the network. Yet it is inefficient because the failure needs to be detected and rectified. Therefore, a leaving node should transfer its keys to its successor and notify its successor and predecessor. This ensures that data is not lost and that the routing information remains intact.

No comments:

Post a Comment