#1
| |||
| |||
Hadoop Mapreduce |
#3
| |||
| |||
Re: hadoop mapreduce
Yes, Tutorials Point offers Tutorials for Hadoop MapReduce. Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. The Algorithm Generally MapReduce paradigm is based on sending the computer to where the data resides. MapReduce program executes in 3 stages, namely map stage, shuffle stage, and reduce stage. Map stage: The map or mappers job is to process the input data. Generally the input data is in the form of file or directory and is kept in the Hadoop file system (HDFS). The input file is passed to the mapper function line by line. The mapper processes the data and creates several small chunks of data. Reduce stage: The stage is the blend of the Shuffle stage and the Reduce stage. The Reducers job is to process the data that comes from the mapper. After processing, it produces a new set of output, which will be stored in the HDFS. During a MapReduce job, Hadoop sends the Map and Reduce tasks to the appropriate servers in the cluster. The framework manages all the details of data-passing such as issuing tasks, verifying task completion, and copying data around the cluster between the nodes. Most of the computing takes place on nodes with data on local disks that reduces the network traffic. After completion of the given tasks, the cluster collects and reduces the data to form an appropriate result, and sends it back to the Hadoop server. Tutorials Point - Tutorials for Hadoop MapReduce |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Bank Of America Hadoop Interview Questions | Unregistered | Main Forum | 1 | 1st February 2018 02:51 PM |
|