Mapreduce-2

MR:
-------
hadoop works on data locality
code will go to data and process
principal of data locality.

map-- piece of code runing on each block parallely.

each block that many mapper will run

mapper output will go to another node..where reducer will run

reducer--aggrigation to get final o/p

mapper--most works
and reducer-- less works

to do more works on mapper side--can introduce combiner

combiner--mini reducer--at mapp end

each mapper--combiner-

m1--c1
m2--c2
m3--c3

adv of combiner
-------------
less data shuffleing
more work done in parallel.

1st should be correct o/p--

combiner should not change the logic of o/p

when we have aggrigation--then reducer we use with out aggrigation
no reducer

shuffle and sort will only work--when reducer there.

google use mapreduce--existing websearch

=============

java
===========
public class--class name and file name should be same

context--place where we need to put output so that it can be consider
for other activities.

iterable-- similar to list

input to reduce method
(Text,Iterable<IntWritable> values,Context context)

initial count = 0

1 is in IntWritable--so need to convert it from
IntWritable to Int
with get method
get()

to convert to hadoop undersandable:

context.write(key,new LongWritable(count))

o/p of mapper should match with input to reducer.

in main class:
--------------

output data type
for key
and value
if o/p data type of mapper are different than the output data type of reducer
then we should be explicitly mention in main class.

input path should exist
and o/p path should not exist

mapper class name
reducer class name

o/p should be a directoty--non existance
i/p is a file or directory

no of part files depends on no of reducers.

hadoop jar <jarname> <in path> <o/p path>

to see list of jobs

localhost:8088

for local cloudera vm

if u want to browse UI for hdfs

localhost:50070

Search This Blog

My Learning Notes

Mapreduce-2

Comments

Post a Comment

Popular posts from this blog

scala-4