Comparative Analysis of Map Reduce Scheduling Algorithms

Sonia Sharma and Dr. Parag Jain

MapReduce attracts most of the attention from both industry and academia. With the quick expansion of internet applications, gradual commercial applications of network services are applied to cloud computing environment, with petabytes of data to be processed. MapReduce is one of the most prominent solutions for large-scale data processing MapReduce repeatedly parallelizes computation by running multiple maps and reduce tasks over distributed data across multiple machines. Hadoop is an open source implementation of MapReduce. When Hadoop schedules reduce tasks, it neither exploits data locality nor addresses partitioning skew present in some MapReduce applications. Scheduling of mixed real-time and non-real-time applications in MapReduce environment is a challenging problem but accepts only restricted attention. First In First Out (FIFO) is the default job scheduling strategy of Hadoop, but it cannot guarantee that the job will be completed by a specific deadline. A priori job size information required by Size-based scheduling, which is not available in Hadoop. HFSP builds such knowledge by estimating it on-line during job execution. In this work we compared the various scheduling algorithms and their counterparts using MapReduce framework. We developed FIFO, Round Robin, ETF and LATE schedulers with MapReduce compatible implementation of the same namely MR-FIFO, MR-RR, MR-ETF and MR-LATE.

Volume 11 | 10-Special Issue

Pages: 20-31

DOI: 10.5373/JARDCS/V11SP10/20192773