Akka Essentials: Word Count MapReduce with Akka

Wednesday, March 14, 2012

Word Count MapReduce with Akka

In my ongoing workings with Akka, i recently wrote an Word count map reduce example. This example implements the Map Reduce model, which is very good fit for a scale out design approach.

Flow

The client system (FileReadActor) reads a text file and sends each line of text as a message to the ClientActor.
The ClientActor has the reference to the RemoteActor ( WCMapReduceActor ) and the message is passed on to the remote actor
The server (WCMapReduceActor) gets the message. The Actor uses the PriorityMailBox to decide the priority of the message and filters the queue accordingly. In this case, the PriorityMailBox is used to segregate the message between the mapreduce requests and getting the list of results (DISPLAY_LIST)message from the aggregate actor.
The WCMapReduceActor sends across the messages to the MapActor (uses RoundRobinRouter dispatcher) for mapping the words
After mapping the words, the message is send across to the ReduceActor(uses RoundRobinRouter dispatcher) for reducing the words
The reduced result(s) are send to the Aggregate Actor that does an in-memory aggregation of the result

The following picture details how the program has been structured

The code base for the program is available at the following location - https://github.com/write2munish/Akka-Essentials

For more information on MapReduce, read the post MapReduce for dummies

2 comments:

bravegagJune 24, 2013 at 8:36 PM
This MapReduce is completely flawed. There is no guarantee that after receiving the message DISPLAY_LIST, all the MapActor instances are done processing e.g. if you have a lengthy running MapActor your DISPLAY_LIST will go through and some values will be left unprocessed.

If you want to break this MapReduce implementation simply add this snippet to the MapActor at line 95:

if ("Thieves! thieves!".equals(work)) {
try { System.out.println("*** sleeping!"); Thread.sleep(10000); System.out.println("*** back!");
}
catch (InterruptedException e) { e.printStackTrace();
}}

meaning if you make a MapActor wait long enough, the overall job is incomplete and the MapReduce solution falls apart.

ReplyDelete
Replies

Add comment

Akka Essentials

Wednesday, March 14, 2012

Word Count MapReduce with Akka

2 comments:

Trackback

Popular Posts

Additional Information

Total Pageviews

About Me