spark2

whenever rdd contains tuple of 2 elements-- then its called pair rdd.

last line val result = sortedTotal.collect

here result is the local variable on the local machine not rdd

rdd--disttibuted-- on the cluster

unix timestamp--no of sec after 1st jan 1970

reduceByKey((x,y) => x+y)

here x, y works on two rows

instead of using map where we say (x,1) and doing reduceByKey later

map+reduceByKey ---result --rdd-- its transformation

countByValue--action--local variable

if that is final operation--then u can use countByValue..--result will be local variable and paralism won't happen if u do any tranformation.

map+reduceByKey--if u want to do transformation after this.

reduceByKey-- always works on value--we don't have to worries about keys.

mapValues--works on values only(if key is not changing and we will work on values only).

My Learning Notes