Mapreduce-2
MR:
-------
hadoop works on data locality
code will go to data and process
principal of data locality.
-------
hadoop works on data locality
code will go to data and process
principal of data locality.
map-- piece of code runing on each block parallely.
each block that many mapper will run
mapper output will go to another node..where reducer will run
reducer--aggrigation to get final o/p
mapper--most works
and reducer-- less works
and reducer-- less works
to do more works on mapper side--can introduce combiner
combiner--mini reducer--at mapp end
each mapper--combiner-
m1--c1
m2--c2
m3--c3
m2--c2
m3--c3
adv of combiner
-------------
less data shuffleing
more work done in parallel.
-------------
less data shuffleing
more work done in parallel.
1st should be correct o/p--
combiner should not change the logic of o/p
when we have aggrigation--then reducer we use with out aggrigation
no reducer
no reducer
shuffle and sort will only work--when reducer there.
google use mapreduce--existing websearch
=============
java
===========
public class--class name and file name should be same
===========
public class--class name and file name should be same
context--place where we need to put output so that it can be consider
for other activities.
for other activities.
iterable-- similar to list
input to reduce method
(Text,Iterable<IntWritable> values,Context context)
(Text,Iterable<IntWritable> values,Context context)
initial count = 0
1 is in IntWritable--so need to convert it from
IntWritable to Int
with get method
get()
IntWritable to Int
with get method
get()
to convert to hadoop undersandable:
context.write(key,new LongWritable(count))
o/p of mapper should match with input to reducer.
in main class:
--------------
--------------
output data type
for key
and value
if o/p data type of mapper are different than the output data type of reducer
then we should be explicitly mention in main class.
for key
and value
if o/p data type of mapper are different than the output data type of reducer
then we should be explicitly mention in main class.
input path should exist
and o/p path should not exist
and o/p path should not exist
mapper class name
reducer class name
reducer class name
o/p should be a directoty--non existance
i/p is a file or directory
i/p is a file or directory
no of part files depends on no of reducers.
hadoop jar <jarname> <in path> <o/p path>
to see list of jobs
localhost:8088
for local cloudera vm
if u want to browse UI for hdfs
localhost:50070
Comments
Post a Comment