#header-inner {background-position: right !important; width: 100% !important;}


Two-phase Commit Protocol.

1. Introduction.

In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC) is a type of atomic commitment protocol (ACP). It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort (roll back) the transaction (it is a specialized type of consensus protocol).

The protocol achieves its goal even in many cases of temporary system failure (involving either process, network node, communication, etc. failures), and is thus widely used.

However, it is not resilient to all possible failure configurations, and in rare cases, user (e.g., a system's administrator) intervention is needed to remedy an outcome.

To accommodate recovery from failure (automatic in most cases) the protocol's participants use logging of the protocol's states. Log records, which are typically slow to generate but survive failures, are used by the protocol's recovery procedures.

2. Assumptions.

The protocol works in the following manner: one node is designated the coordinator, which is the master site, and the rest of the nodes in the network are designated the cohorts. The protocol assumes that there is stable storage at each node with a write-ahead log, that no node crashes forever, that the data in the write-ahead log is never lost or corrupted in a crash, and that any two nodes can communicate with each other. The last assumption is not too restrictive, as network communication can typically be rerouted. The first two assumptions are much stronger; if a node is totally destroyed then data can be lost.

The protocol is initiated by the coordinator after the last step of the transaction has been reached. The cohorts then respond with an agreement message or an abort message depending on whether the transaction has been processed successfully at the cohort.

3. Basic algorithm.

3.1. Commit request phase (or voting phase).

The coordinator sends a query to commit message to all cohorts and waits until it has received a reply from all cohorts.

The cohorts execute the transaction up to the point where they will be asked to commit. They each write an entry to their undo log and an entry to their redo log. Each cohort replies with an agreement message (cohort votes Yes to commit), if the cohort's actions succeeded, or an abort message (cohort votes No, not to commit), if the cohort experiences a failure that will make it impossible to commit.

3.2. Commit phase (or Completion phase).

3.2.1. Success.

If the coordinator received an agreement message from all cohorts during the commit-request phase:

The coordinator sends a commit message to all the cohorts.
Each cohort completes the operation, and releases all the locks and resources held during the transaction.
Each cohort sends an acknowledgment to the coordinator.
The coordinator completes the transaction when all acknowledgments have been received.

3.2.2. Failure.

If any cohort votes No during the commit-request phase (or the coordinator's timeout expires):

The coordinator sends a rollback message to all the cohorts.
Each cohort undoes the transaction using the undo log, and releases the resources and locks held during the transaction.
Each cohort sends an acknowledgement to the coordinator.
The coordinator undoes the transaction when all acknowledgements have been received.

4. Disadvantages.

The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. If the coordinator fails permanently, some cohorts will never resolve their transactions: After a cohort has sent an agreement message to the coordinator, it will block until a commit or rollback is received.

Source: Two-phase commit protocol on Wikipedia.

5. Other Considerations.

5.1. Coordinator, as well as cohorts - keep logfiles with communication and the transactions status data - which may prove useful in case of failures, either in hardware, software or in communication.
5.2. Cohorts may hold data required to reverse the transaction on their parts, in persistent memory - for a cases of unforeseen failures, for example after the voting phase.
5.3. There are a timeout periods for message passing between coordinator and cohorts.

1 comment:

  1. (EN) While not perfect, this protocol is widespread. Without it Distributed Transactions would not be possible. There are administrators employed to check for errors & correct them manually where Distributed Transactions are used. It's an employment like any other, they are paid for their time doing it.