mrryr

2 phase

map and reduce

works on key and value pair.

k,v --map--k,v k,v --reduce--k,v

traditional programing model works when data kept on single machine so it won't works in hadoop.

Record reader---take each line as input and convert

to key value pair.

lineno,value(string)

in mapper:

ignore+ the key and consentrate on value.

mapper-- no transfer of data.

movement of data from mapper to reducer--shuffling.

after mapper-- machine will do shuffle and sort.

sorting--in reducer machine

after shuffle and sort--machine will do

data,{1,1} list of value.

input to reducer

(are,{1})

(hello,{1,1,1})

o/p-from reducer

(are,1)

(hello,3)

no of block = no of mapper

default reducer=1..can increase and decrease

if no aggregation..we can remove reducer

based on no of reducer--we will have tht many no of partition

after mapper o/p data will be partitions--then shuffles+sort

these three done by framework.

by default system provides hash function --to divide the key value pairs among reducers.

hash function is consistent.

set two reducer

-----------------------

job.setnumreducerclass(2)

two o/p files

hash partitioning--will devide the keys..

if key length < 4 then goo to r with id 0 else r with id 1

own partition code

mapper o/p--input for partioner

o/p of mapper--input to combiner

o/p of combiner--i/p to reducer

combiner--to do local aggrigation in mapper machine

custom combiner-- will extends the reducer class

Search This Blog

My Learning Notes

mrryr

Comments

Post a Comment

Popular posts from this blog

scala-4