Flow
- The client system (FileReadActor) reads a text file and sends each line of text as a message to the ClientActor.
- The ClientActor has the reference to the RemoteActor ( WCMapReduceActor ) and the message is passed on to the remote actor
- The server (WCMapReduceActor) gets the message. The Actor uses the PriorityMailBox to decide the priority of the message and filters the queue accordingly. In this case, the PriorityMailBox is used to segregate the message between the mapreduce requests and getting the list of results (DISPLAY_LIST)message from the aggregate actor.
- The WCMapReduceActor sends across the messages to the MapActor (uses RoundRobinRouter dispatcher) for mapping the words
- After mapping the words, the message is send across to the ReduceActor(uses RoundRobinRouter dispatcher) for reducing the words
- The reduced result(s) are send to the Aggregate Actor that does an in-memory aggregation of the result
The following picture details how the program has been structured
The code base for the program is available at the following location - https://github.com/write2munish/Akka-Essentials
For more information on MapReduce, read the post MapReduce for dummies
This MapReduce is completely flawed. There is no guarantee that after receiving the message DISPLAY_LIST, all the MapActor instances are done processing e.g. if you have a lengthy running MapActor your DISPLAY_LIST will go through and some values will be left unprocessed.
ReplyDeleteIf you want to break this MapReduce implementation simply add this snippet to the MapActor at line 95:
if ("Thieves! thieves!".equals(work)) {
try { System.out.println("*** sleeping!"); Thread.sleep(10000); System.out.println("*** back!");
}
catch (InterruptedException e) { e.printStackTrace();
}}
meaning if you make a MapActor wait long enough, the overall job is incomplete and the MapReduce solution falls apart.
Idea is to provide a current snapshot of the values. There are other ways to get the results of the overall jobs - one is you can wait if you know how much time it takes to process the message, other is to make use of number of lines send count (no of messages) and use that to verify how many messages processed to verify the completeness of the job.
DeleteThe intention of this code piece is show map reduce working using actors.