mrryr
2 phase
map and reduce
works on key and value pair.
k,v --map--k,v k,v --reduce--k,v
traditional programing model works when data kept on single machine so it won't works in hadoop.
Record reader---take each line as input and convert
to key value pair.
lineno,value(string)
in mapper:
ignore+ the key and consentrate on value.
mapper-- no transfer of data.
movement of data from mapper to reducer--shuffling.
after mapper-- machine will do shuffle and sort.
sorting--in reducer machine
after shuffle and sort--machine will do
data,{1,1} list of value.
input to reducer
(are,{1})
(hello,{1,1,1})
o/p-from reducer
(are,1)
(hello,3)
no of block = no of mapper
default reducer=1..can increase and decrease
if no aggregation..we can remove reducer
based on no of reducer--we will have tht many no of partition
after mapper o/p data will be partitions--then shuffles+sort
these three done by framework.
by default system provides hash function --to divide the key value pairs among reducers.
hash function is consistent.
set two reducer
-----------------------
job.setnumreducerclass(2)
two o/p files
hash partitioning--will devide the keys..
if key length < 4 then goo to r with id 0 else r with id 1
own partition code
mapper o/p--input for partioner
o/p of mapper--input to combiner
o/p of combiner--i/p to reducer
combiner--to do local aggrigation in mapper machine
custom combiner-- will extends the reducer class
Comments
Post a Comment