Wednesday, July 2, 2014

What's new in stream processing

The Lambda architecture seems to be popular in stream processing. It is an approach to building stream processing applications on top of MapReduce and Storm or similar systems. The idea is you need to implement two transformation logic, one for batch system and one for stream processing systems. Then, we could combine the results from both system at query time to produce a complete answer.

The problems of Lambda architecture from Jay Kreps (@jaykreps) are also noticeable.

"First, maintaining code that needs to produce the same result in two complex distributed systems is exactly as painful as it seems to be. One proposed approach to fix this is to have a language or framework that abstracts over both the real-time and batch framework. Summingbird is a framework that does this."

"Second, even we can only code once, the operational burden of running and debugging two systems is going to be very high."

The reason we are still interesting at the Lambda architecture would be:

"What they have at their disposal are two things that don’t quite solve their problem: a scalable high-latency batch system that can process historical data and a low-latency stream processing system that can’t reprocess results."
 "In this sense, even though it can be painful, I think the Lambda Architecture solves an important problem that was otherwise generally ignored. But I don’t think this is a new paradigm or the future or big data. It is just a temporary state driven by the current limitation of off-the-shelf tools. I also think there are better alternatives."
 
Some references:

  • Website, Lambda Architecture. http://lambda-architecture.net/
  • Book, big data. http://www.manning.com/marz/
  • Framework, Kafka 
  • Framework, Samza
  • Website, How to beat the guys that say they beat CAP,  Beating the CAP Theorem Checklist  

Tuesday, July 1, 2014

Want to write a Operation System

To most of the computer science students, operation systems are the most mystical pieces of software. I was quite curious about how a software can bootstrap itself from nothing back in my school time.  So, you may have a good reason to try this website: OSDev:

"This website provides information about the creation of operating systems and serves as a community for those people interested in OS creation. The wiki contain articles about various OS developing subjects.  "


It basically provides all the details you need to build your own operation system. I mean, if you would like to spend a semester in studying stuff listed in this site, you probably can learn much more than listening boring producer-consumer, semaphore stuff.