[hadoop@10 hadoop]$ hadoop fs -put $testfile cosn:// $bucketname/
/usr/local/service/spark:[root@172 ~]# su hadoop[hadoop@172 root]$ cd /usr/local/service/spark
from __future__ import print_functionimport sysfrom operator import addfrom pyspark.sql import SparkSessionif __name__ == "__main__":if len(sys.argv) != 3:print("Usage: wordcount <file>", file=sys.stderr)exit(-1)spark = SparkSession\\.builder\\.appName("PythonWordCount")\\.getOrCreate()sc = spark.sparkContextlines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])counts = lines.flatMap(lambda x: x.split(' ')) \\.map(lambda x: (x, 1)) \\.reduceByKey(add)output = counts.collect()counts.saveAsTextFile(sys.argv[2])spark.stop()
[hadoop@10 spark]$ ./bin/spark-submit --master yarn ./wordcount.pycosn://$bucketname/$yourtestfile cosn:// $bucketname/$output
[hadoop@172 spark]$ hadoop fs -ls cosn:// $bucketname/$outputFound 2 items-rw-rw-rw- 1 hadoop Hadoop 0 2018-06-29 15:35 cosn:// $bucketname/$output /_SUCCESS-rw-rw-rw- 1 hadoop Hadoop 2102 2018-06-29 15:34 cosn:// $bucketname/$output /part-00000
[hadoop@172 spark]$ hadoop fs -cat cosn:// $bucketname/$output /part-00000(u'', 27)(u'code', 1)(u'both', 1)(u'Hadoop', 1)(u'Bureau', 1)(u'Department', 1)
[hadoop@10spark]$ ./bin/spark-submit ./wordcount.pycosn://$bucketname/$yourtestfile /user/hadoop/$output
/user/hadoop/为 HDFS 中的路径,如果不存在用户可以自己创建。[hadoop@10 spark]$ /usr/local/service/hadoop/bin/yarn logs -applicationId $yourId
文档反馈