mrryr

 2 phase


map and reduce


works on key and value pair.


k,v --map--k,v  k,v --reduce--k,v


traditional programing model works when data kept on single machine so it won't works in hadoop.



Record reader---take each line as input and convert

to key value pair.


lineno,value(string)


in mapper:

ignore+ the key and consentrate on value.


mapper-- no transfer of data.


movement of data from mapper to reducer--shuffling.


after mapper-- machine will do shuffle and sort.


sorting--in reducer machine


after shuffle and sort--machine will do


data,{1,1} list of value.


input to reducer


(are,{1})

(hello,{1,1,1})


o/p-from reducer


(are,1)

(hello,3)


no of block = no of mapper


default reducer=1..can increase and decrease


if no aggregation..we can remove reducer


based on no of reducer--we will have tht many no of partition


after mapper o/p data will be partitions--then shuffles+sort


these three done by framework.


by default system provides hash function --to divide the key value pairs among reducers.


hash function is consistent.


set two reducer

-----------------------

job.setnumreducerclass(2)


two o/p files


hash partitioning--will devide the keys..


if key length < 4 then goo to r with id 0 else r with id 1


own partition code


mapper o/p--input for partioner


o/p of mapper--input to combiner

o/p  of combiner--i/p to reducer


combiner--to do local aggrigation in mapper machine


custom combiner-- will extends the reducer class


Comments

Popular posts from this blog

scala-4