Monday, June 30, 2014

The Best Graph Processing Engine?

Graph processing is so hot these days. But, what's the qualities that the best graph processing engine should have? Here i list several native ideas, which i believe are wrong-prone and easily beat by you, so, please give me your feedback.

  1. Well support in-memory and out-of-core processing. Reason of this quality comes from the large size of graph. Even using a huge cluster, it is still possible that we can not load the whole graph into memory.
  2. Well support for time-evolving graph. It is stupid if we have to perform a complex process again on a graph with small portion of changes. Using reasonable resources to accelerate processing the time-evolving graph is essential here.
  3. Well support graph traversal. Not just map-reduce or scatter-gather style processing on graphs is important, the graph traversal starting from a given vertex and ending with a bunch of vertices is also critical in many use cases. However, an efficient graph traversal may conflict the divided-and-conquer graph partition strategy. Usually, the graph traversal was considered as a functionality of graph database, not graph processing framework. However, i still think this should be well considered before making such decision.
  4. Well support rich data graph. Graph structure only contains vertices and edges between them. However, the essential thing is the rich data on those vertices and edges. The processing model should be aware of those rich data and process them differently. 
To Be Continue...

Friday, June 27, 2014

Google abandoned MapReduce?

Recently, a big news is spread from Google I/O 2014: Google has abandoned MapReduce, which was considered as one of the most powerful weapons in Google. Rumor says the newest execution engine and programming model are MillWheel and FlumeJava respectively.

It is easy to see that MapReduce will be abandoned sooner or later: it is inefficient, slow, resource-wasting in most use cases.  From the programming model's perspective, it is simple and easy to use, but far from enough to abstract plentiful applications in real world, like iterative algorithms, multiple phase work-flows, incremental processing, and real-time stream processing etc.

Luckily, in open source troops, we got Spark, which was proven to be useful in almost all the applications we listed before. The in-memory computation also gives it lots of imaginary space in the future (maybe relevant with the anti-caching topics?).

P.S.
 After writing this, i saw some interesting posts talking the retirement of MapReduce in Google and also the possible next dotage Hadoop. For example: The elephant was a Trojan Horse: On the Death of Map-Reduce at Google.

The taste of your research

Research taste is an important but usually or deliberately forgotten concept in both science or technology research. There are thousands of researchers work all around the world trying to tackle problems. Some of them may not be capable, but most of them are really smart and working hard. So, what will distinguish you from them, like we can distinguish Eula from other mathematicians at his age? I guess the answer would be your research taste.

The taste would contain several aspects. First, you should have a good taste on the important problems. It makes no sense if you only solve some 'no-one-cares' problems and receive massive recognitions from others. Second, even you are solving an important problem, you should be able to identify yourself whether you are the right person to do such thing. It will not be a good idea if you are trying to solve a physical puzzle if you are a mathematician. With the right and important questions, you will still need a good taste for the possible solutions: whether they will work. Go too further in the wrong direction will waste your valuable time and destroy your confidence. OK, the last step, you have already had a good solution, your last taste will be how to present the results you got. In the history of science and technology, it is not rare that someone failed at representing themselves as a real contributor just because they failed to demonstrate what they have done and how this will impact the human.

It will be a little late for me to think about building a science taste now. But at least, it will not hurt. And, do you still remember that Einstein spent almost all his forty years on the unified filed theory? Your taste could go away even you have them once.