Monday, March 26, 2012

Implementing Master Slave / Grid Computing Pattern in Akka

Master Slave pattern is a prime example of fault tolerance and parallel computation. The idea behind the pattern is to partition the work into identical sub tasks which are then delegated to Slaves. These slave node or instances will process the work task and send back the result to the master. The master will then compile the results received from all the slave nodes. Key here is that the Slave nodes are only aware on how to process the task and not aware of what happens to the output.

The Master Slave pattern is analogous to the Grid Computing pattern where a control node distributes the work to other nodes. Idea is to make use of the nodes on the network for their computing power. SETI@Home was one of the earliest pioneers in using this model.

I have build a similar example with difference being that worker nodes get started on Remote Nodes, Worker Nodes register with Master(WorkServer) and then subsequently start processing work packets. If there is no worker slave registered with Master(WorkServer), the master waits the workers to register. The workers can register at any time and will start getting work packets from there on.

Thursday, March 15, 2012

Processing 10 million messages with Akka

Akka Actors promise concurrency. What better way to simulate that and see if how much time it takes to process 10 million messages using commodity hardware and software without any low level tunings. I wrote the entire 10 million messages processing in Java and the overall results astonished me.

When I ran the program on my iMac machine with an intel i5 - 4 core, 4 Gb RAM machine and JVM heap at 1024Mb, the program processed 10 million machines in 23 secs. I ran the program multiple times and the average time was in the range of 25 secs. So the through put I received was almost in the range of 400K messages per second which is phenomenal.

Wednesday, March 14, 2012

Word Count MapReduce with Akka

In my ongoing workings with Akka, i recently wrote an Word count map reduce example. This example implements the Map Reduce model, which is very good fit for a scale out design approach.

Flow 

  1. The client system (FileReadActor) reads a text file and sends each line of text as a message to the ClientActor. 
  2. The ClientActor has the reference to the RemoteActor ( WCMapReduceActor ) and the message is passed on to the remote actor
  3. The server (WCMapReduceActor) gets the message. The Actor uses the PriorityMailBox to decide the priority of the message and filters the queue accordingly. In this case, the PriorityMailBox is used to segregate the message between the mapreduce requests and getting the list of results (DISPLAY_LIST)message from the aggregate actor. 
  4. The  WCMapReduceActor sends across the messages to the MapActor (uses RoundRobinRouter dispatcher) for mapping the words
  5.  After mapping the words, the message is send across to the ReduceActor(uses RoundRobinRouter dispatcher) for reducing the words  
  6. The reduced result(s) are send to the Aggregate Actor that does an in-memory aggregation of the result 

Friday, March 2, 2012

What is Akka?

Before I delve into what is an Akka, let us take a step back to understand how the concept of concurrent programming has evolved in the application development world. The applications have evolved from being large monolithic procedures to a more object oriented model. With the advent of Java EE and Spring framework, the application design evolved into more of a process or task based design model. The EJBs or Pojo’s are designed to perform one single task. This model promoted the objects to be stateless (although state full session beans were allowed) in order to be able to handle the increasing load (aka scalable application). The overall business function request will get broken down to multiple beans (Ejbs or Pojos) which will process the information; the result from the beans will be aggregated and presented back to the requester. This model allowed the application to scale up.

Now, when the same model needed to be applied to the Java application that does not make use of EJBs or application server containers, the available choice of technology or technique was to use multi-threaded programming. Working with threads require a much higher level of programming skills, since dealing with state, locks , mutex etc is not easy skill. With Java EE 1.4 onwards, newer API in terms of ExecutionContext and Java 1.5 onwards concurrent data structure libraries (java.util.concurrent) were introduced. This allowed programmers to write programs that could be broken down into smaller tasks and run parallel on the underlying threads.

For an average java programmer, writing multi-threaded programs that could break a big process into smaller tasks and run those tasks in parallel to take advantage of the multiple cores is not so easy. Akka team abstracted this whole concept using an Actor model to another plane where writing programs that process tasks in parallel became slam dunk. The Akka abstraction model allowed programmers to write programs to take advantage of multiple cores and process hundreds of tasks in parallel. The Akka team borrowed concepts and techniques from ERlang to build “Let it Crash” fault tolerance model to allow applications to fail fast and recover from the failure as soon as possible.

Akka provides a scalable real time transaction processing library that allows your application to scale up, scale out and has fault tolerance.