Tuesday, January 22, 2013

Writes MVCC in HBase

In a blog of HBase community, author (Gregory Chanan) describes how MVCC works in HBase, especially in read side. Here is the figure he used to demonstrate the idea.


Fig. 1  Write Steps with MVCC

In this figure, each write operation needs to obtain a RowLock before processing its real WAL writes and Memstore writes. This sequential behavior guaranteed by Lock is time consuming and in my opinion not necessary. We can guarantee that all the writes will not overlap each other by given each write a local sequential number: seqn

1) If threads try to update a cell, and found current write seqn number is less than cell's largest "finished seqn" number, the update will be discard, otherwise the write operation will be performed on WAL and Memstore. When all cells that a write operation needs to modify has been applied or discarded, its seqn number will be declared finished.

2) Each read operation is assigned a read timestamp which is the highest value that all writes with write number <= x have been completed. Then, the read for a certain (row, column) will return data cell whose write number is the largest value that is less than or equal to the read timestamp.

Figure 2 shows a typical Lock-Free MVCC writes situation. There are two write operations with seqn=1 and seqn=2. The first write operation updates cell-1 and cell-3; the second write operation update cell-2 and cell-3. Besides these two writes, there are two reads: read-1 begin just after the first write success and before the second write finished. Read-2 begins after the second write finished. 

Fig. 2  MVCC Writes Use Cases.

From Fig. 2, we can conclude that all the cell will be updated by the write operations with higher seqn number, and read operations are guaranteed to get the value that is updated by finished writes. However, For simplicity  we do not show the concurrent writes. 

Fig. 3 shows two write operations with seqn1 and seqn2. The first write updates three cells (1, 2, 3) and the second write updates two cells (2, 3). When read operation with sequence number 1 begin, the first write has finished, and the second writes is still working. Then this read gets all the cell values that were written by sequence number 1 without any partial updates.


Fig. 3  MVCC Concurrent Writes

The MVCC writes strategy will improve HBase writes in two aspects: 1) It allows different writes to execute parallel; 2) It allows us to discard part of the write operations whose sequence number is smaller than current finished seqn.

No comments:

Post a Comment