DDBMS - Replication Control
This chapter looks at the control of replication, which is required for all sites to maintain consistent data. We will review the techniques of replication control and the algorithms necessary for the control of replication.
As discussed earlier, replication is a technique used to store multiple copies of a data table at different sites in distributed databases. The issue of having multiple copies in multiple sites, especially during update operations, is the overhead of maintaining data consistency.
Replication control techniques need to be implemented in order to preserve mutually compatible data at all sites. There are two replication control methods, namely,
- Control for Synchronous Replication
- Control of Asynchronous Replication
Synchronous Replication Control
The database is synchronized with the synchronous replication approach so that all replications always have the same value. A transaction requesting a data item would have access to the same value across all the sites. A transaction that updates a data item is expanded to ensure this uniformity such that it allows the update in all copies of the data item. The two-phase commit protocol is typically used for this reason.
For example, let us consider PROJECT(PId, PName, PLocation) as a data table. If PLocation is 'Bombay', we need to run a T1 transaction that updates PLocation to 'Mumbai'. If there are no replications, the transaction T1 operations will be –
Begin T1:
Update PROJECT Set PLocation = 'Mumbai'
Where PLocation = 'Bombay';
End T1;
If the data table has two replicas in Site A and Site B, T1 needs to spawn two children T1A and T1B corresponding to the two sites. The expanded transaction T1 will be −
Begin T1:
Begin T1A :
Update PROJECT Set PLocation = 'Mumbai'
Where PLocation = 'Bombay';
End T1A;
Begin T2A :
Update PROJECT Set PLocation = 'Mumbai'
Where PLocation = 'Bombay';
End T2A;
End T1;
Asynchronous Replication Control
The replicas do not always maintain the same meaning in the asynchronous replication approach. An outdated value can be stored in one or more replicas, and different values can be seen in a transaction. The process of bringing all the replicas to the current value is called synchronization.
A common synchronization method is the method of storing and forward. One site is designated as the primary site in this method and the other sites are secondary sites. The primary site contains updated values at all times. The primary site enters all the transactions first. Those transactions are then queued in the secondary sitesfor application. Only when a transaction is scheduled to run on it are the secondary sites updated using the rollout method.
Replication Control Algorithms
Some of the replication control algorithms are −
- Master-slave replication control algorithm.
- Distributed voting algorithm.
- Majority consensus algorithm.
- Circulating token algorithm.
Master-Slave Replication Control Algorithm
There is one master site and ‘ N 'slave sites. To detect conflicts, a master algorithm runs at the master site. At each slave site, a copy of the slave algorithm runs. In the following 2 phases, the overall algorithm executes
- Transaction acceptance/rejection phase -The slave site sends a request to the master site when a transaction enters the transaction monitor of a slave site. Checks for conflicts from the master site. The master sends an "ACK+" message to the slave site if no conflicts occur, which then starts the transaction application phase. Otherwise, to the slave, the master sends an "ACK-" message that then rejects the transaction.
- On entering this phase, the slave site where the transaction was entered transmits a request for the execution of the transaction to all slaves. The peer slaves execute the transaction on receiving the requests and send an "ACK" to the requesting slave on completion. It sends a "DONE" message to the master site after the requesting slave has received "ACK" messages from all its peers. The master acknowledges that the transaction has been done and removes it from the queue that is pending.
Distributed Voting Algorithm
This includes 'N' peer sites, all of whom have to "OK" a transaction before it begins to execute. The two phases of this algorithm are as follows:
- Phase of distributed transaction acceptance:When a transaction reaches a site's transaction manager, it sends a request for a transaction to all other sites. A peer site resolves disputes using priority-based voting rules upon receiving a request. If the transaction is "OK" on all peer sites, the requesting site begins the application process. If a transaction is not 'OK' by either of the peer sites, the requesting site rejects the transaction.
- Distributed transaction application phase −Upon entering this phase, the site where the transaction has been entered transmits a request to all slaves for the execution of the transaction. The peer slaves execute the transaction on receiving the requests and send an "ACK" message to the requesting slave on completion. After receiving 'ACK' messages from all its peers from the requesting slave, it lets the transaction manager know that the transaction has been completed.
Majority Consensus Algorithm
This is a distinction from the distributed voting algorithm, where a transaction is allowed to be executed when a transaction is "OK" by a majority of peers. This is split into three phases:
- Voting phase − When a transaction reaches a site's transaction manager, it sends a request for a transaction to all other sites. A peer site checks for disputes using voting rules when receiving a request and keeps the conflicting transactions, if any, in the pending queue. Then, either an "OK" or a "NOT OK" message send.
- Transaction acceptance/rejection phase-If a majority "OK" on the transaction is received by the requesting site, it approves the transaction and broadcasts "ACCEPT" to all sites. Otherwise, it broadcasts "REJECT" and rejects the transaction to all the sites.
- Step of the transaction application -When a peer site receives a "REJECT" response, this transaction is removed from its pending list and all deferred transactions are reconsidered. When an "ACCEPT" message is received by a peer site, the transaction applies and rejects all the deferred transactions that are in conflict with this transaction in the pending queue. On completion, it sends an 'ACK' to the requesting slave.
Circulating Token Algorithm
In this approach, the system's transactions are serialized using a circulating token and executed against each replica of the database accordingly. All transactions are then accepted, i.e., none are rejected. It has two phases,
- Phase of transaction serialization -All transactions are scheduled to run in a serialization order in this phase. A unique ticket from a sequential series, indicating the order of transactions, is assigned to each transaction on each site. When a transaction has been given a ticket, it is broadcasted to all the sites.
- Phase of transaction application:When a site receives a transaction along with its ticket, it places the transaction according to its ticket for execution. After the transaction has finished execution, this site broadcasts an appropriate message. A transaction ends when all the sites have finished executing it.