Running Job on Apache Spark2
hdfs dfs -put sherlock.txt /user/training/pyspark --master yarnavglens = sc.textFile("sherlock.txt")
avglensavglensFM = avglens.flatMap(lambda line : line.split())
avglensFMavglensMap = avglensFM.map(lambda word: (word[0], len(word)))
avglensMapavglensGrp = avglensMap.groupByKey(2)
avglensGrpavglensGMap = avglensGrp.map(lambda (k, values): (k, sum(values)/len(values)))
avglensGMapLast updated