spark2

 whenever rdd contains tuple of 2 elements-- then its called pair rdd.


last line val result = sortedTotal.collect


here result is the local variable on the local machine not rdd

rdd--disttibuted-- on the cluster


unix timestamp--no of sec after 1st jan 1970


reduceByKey((x,y) => x+y)


here x, y works on two rows


instead of using map where we say (x,1) and doing reduceByKey later


map+reduceByKey ---result --rdd-- its transformation

=

countByValue--action--local variable


if that is final operation--then u can use countByValue..--result will be local variable and paralism won't happen if u do any tranformation.


map+reduceByKey--if u want to do transformation after this.


reduceByKey-- always works on value--we don't have to worries about keys.


mapValues--works on values only(if key is not changing and we will work on values only).


Comments

Popular posts from this blog

scala-4