Home // CLOUD COMPUTING 2010, The First International Conference on Cloud Computing, GRIDs, and Virtualization // View article


The Limitation of MapReduce: A Probing Case and a Lightweight Solution

Authors:
Zhiqiang Ma
Lin Gu

Keywords: Distributed computing; Parallel architectures

Abstract:
MapReduce is arguably the most successful parallelization framework especially for processing large data sets in datacenters comprising commodity computers. However, difficulties are observed in porting sophisticated applications to MapReduce, albeit the existence of numerous parallelization opportunities. Intrinsically, the MapReduce design allows a program to scale up to handle extremely large data sets, but constrains a program's ability to process smaller data items and exploit variable-degrees of parallelization opportunities which are likely to be the common case in general application. In this paper, we analyze the limitations of MapReduce and present the design and implementation of a new lightweight parallelization framework, MRlite. MRlite can efficiently process moderatesize data with dependences among numerous computational steps. In the mean time, the parallelization on each step emulates the MapReduce model. Hence, the MRlite framework can also scale up for large data sets if massive parallelism with minimal dependence exists. MRlite can significantly improve the flexibility and parallel execution performance for a number of typical programs. Our evaluation shows that MRlite is one order of magnitude faster than Hadoop on problems that MapReduce has difficulty in handling.

Pages: 68 to 73

Copyright: Copyright (c) IARIA, 2010

Publication date: November 21, 2010

Published in: conference

ISSN: 2308-4294

ISBN: 978-1-61208-106-9

Location: Lisbon, Portugal

Dates: from November 21, 2010 to November 26, 2010