DSpace DSpace

University of Minnesota Digital Conservancy >
University of Minnesota - Twin Cities >
Dissertations and Theses >
Master's Theses (Plan A and Professional Engineering Design Projects) >

Please use this identifier to cite or link to this item: http://hdl.handle.net/11299/132485

Title: Improving MapReduce performance under widely distributed environments.
Authors: Wang, Chenyu
Keywords: Computer science
Issue Date: Jun-2012
Abstract: The need of running large data analysis job over distributed data source and compu- tation resources is increasing. MapReduce is the most popular approach and Hadoop is the most widely used implementation nowadays. However, Hadoop sometimes perform poorly in distributed environments as the assumption of network homogeneity no longer held in this case, and thus bring challenge in both moving data to computation resource and scheduling intermediate data ow during shue phase. We explored a good practice for executing MapReduce job with remote data, which improved the performance by 30% over Dist-CP from getting chances of pipelining.We also provided a shue-aware scheduler to optimize the data ow during shue phase, With our scheduler, the per- formance of Hadoop increased by 20% with shue-heavy applications such as Inverted Index.
Description: University of Minnesota. M.S. thesis. June 2012. Major: Computer science. Advisors: Abhishek Chandra, Jon Weissman. 1 computer file (PDF); vii, 60 pages.
URI: http://purl.umn.edu/132485
Appears in Collections:Master's Theses (Plan A and Professional Engineering Design Projects)

Files in This Item:

File Description SizeFormat
Wang_Chenyu_June2012.pdf837.17 kBPDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.