Wednesday, December 5, 2012

Talking about trigger with HBase


We have been pretty familiar with triggers in traditional SQL databases. Database administrators use triggers to keep data integrity or finish some useful actions. Most of these triggers are implemented based on the ECA model (event-condition-action). Today's HBase has not provided this features which i believe is critical for lots of applications, so in next few articles i would like to talk about how i designed and implemented this feature based on current HBase release(0.94.2).

1. Our goal

As co-processor has been introduced in HBase since 0.92, it makes no sense to give HBase a new tool for keeping data integrity or finishing some pre-, post-actions. In fact, what we are doing here is a general framework that allows programmers to write applications which can monitor some fields of HBase table, check whether conditions are fulfilled, and finally run user defined functions. So basically, we need a trigger-plused HBase that:
  • allows programmers to submit 'TRIGGERS'
  • supports "ACTIONS" to be executed distributed automatically
There are lots of difficulties here:
  • we needs to guarantee that when programmers got the successful submission return value, the "TRIGGERS" must have began to work.
  • we need to make sure all the relevant servers should contain the "TRIGGER" information. Parts are not allowed.
  • we need to provide an efficient way to make sure that trigger event can be detected quickly with small 'PUT' performance degradation.
  • user-defined functions are not restricted to some types, so we must keep eyes on failures, inconsistency, and dis-order.
  • we need to provide good load-balance algorithm to redistribute "TRIGGERS" according to both the disk load and cpu load.
  • we need to have a simple failure-tolerant approach to make whole system workable in practical environment.
2. The Architecture

Just like the fellowing figure shows, Triggers are submitted from users to the TriggerMaster node, which is also the HMaster node in HBase implementation. The TriggerMaster will distribute the trigger structure to other TriggerWorkers, which takes charge of the trigger deployment and execution.


This architecture is quite simple, but it will be much easier to do schedule or fault-tolerance stuffs. Also, the TriggerMaster was selected from all the nodes in the data center through a ZooKeeper like service, so even this single node fails, the system can recovery by select a new TriggerMaster again.



2.1 The TriggerMaster

Trigger 
2.2 The TriggerWorker
2.3 The Communication

3. Fault-Tolerance
4. Scheduler
5. Use Cases
6. Analysis


No comments:

Post a Comment