spark2
whenever rdd contains tuple of 2 elements-- then its called pair rdd.
last line val result = sortedTotal.collect
here result is the local variable on the local machine not rdd
rdd--disttibuted-- on the cluster
unix timestamp--no of sec after 1st jan 1970
reduceByKey((x,y) => x+y)
here x, y works on two rows
instead of using map where we say (x,1) and doing reduceByKey later
map+reduceByKey ---result --rdd-- its transformation
=
countByValue--action--local variable
if that is final operation--then u can use countByValue..--result will be local variable and paralism won't happen if u do any tranformation.
map+reduceByKey--if u want to do transformation after this.
reduceByKey-- always works on value--we don't have to worries about keys.
mapValues--works on values only(if key is not changing and we will work on values only).
Comments
Post a Comment