Release Notes
Announcements
Security Announcements
pom.xml file, eliminating the need to add them manually.D://mavenWorkplace, and enter the following command to create it:mvn archetype:generate -DgroupId=$yourgroupID -DartifactId=$yourartifactID -DarchetypeArtifactId=maven-archetype-quickstart
$yourartifactID in the D://mavenWorkplace directory. Files in the folder have the following structure:simple---pom.xml Core configuration, under the project root directory---src---main---java Java source code directory---resources Java configuration file directory---test---java Test source code directory---resources Test configuration directory
<dependencies><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.11</artifactId><version>2.0.2</version></dependency></dependencies>
<build><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><configuration><source>1.8</source><target>1.8</target><encoding>utf-8</encoding></configuration></plugin><plugin><artifactId>maven-assembly-plugin</artifactId><configuration><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs></configuration><executions><execution><id>make-assembly</id><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin></plugins></build>
import java.util.Arrays;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.api.java.JavaSparkContext;import scala.Tuple2;/*** Created by tencent on 2018/6/28.*/public class WordCountOnCos {public static void main(String[] args){SparkConf sc = new SparkConf().setAppName("spark on cos");JavaSparkContext context = new JavaSparkContext(sc);JavaRDD<String> lines = context.textFile(args[0]);lines.flatMap(x -> Arrays.asList(x.split(" ")).iterator()).mapToPair(x -> new Tuple2<String, Integer>(x, 1)).reduceByKey((x, y) -> x+y).saveAsTextFile(args[1]);}}
mvn package
scp $localfile root@public IP address:$remotefolder
$localfile is the path plus name of your local file; root is the CVM instance username. You can look up the public IP address in the node information in the EMR console or the CVM console. $remotefolder is the path where you want to store the file in the CVM instance. After the upload is completed, you can check whether the file is in the corresponding folder on the EMR command line.[hadoop@10 hadoop]$ hadoop fs -put $testfile cosn://$bucketname/
$testfile is the full path plus name of the file for counting, and $bucketname is your bucket name. After the upload is completed, you can check whether the file is present in COS in the COS console.root, and the password is the one you set when creating the EMR cluster. Once your credentials are validated, you can enter the command line interface.[root@172 ~]# su hadoop
[hadoop@10spark]$ spark-submit --class $WordCountOnCOS --masteryarn-cluster $packagename.jar cosn:// $bucketname /$testfile cosn:// $bucketname/output
$WordCountOnCOS is your Java Class name, $packagename is the name of the .jar package generated in the new Maven project you created, $bucketname is your bucket name plus path, and $testfile is the name of the file for counting. The output file is stored in the output folder, which cannot be created beforehand; otherwise, the execution will fail.[hadoop@172 /]$ hadoop fs -ls cosn:// $bucketname /outputFound 3 items-rw-rw-rw- 1 hadoop Hadoop 0 2018-06-28 19:20 cosn:// $bucketname /output/_SUCCESS-rw-rw-rw- 1 hadoop Hadoop 681 2018-06-28 19:20 cosn:// $bucketname /output/part-00000-rw-rw-rw- 1 hadoop Hadoop 893 2018-06-28 19:20 cosn:// $bucketname /output/part-00001[hadoop@172 demo]$ hadoop fs -cat cosn://$bucketname/output/part-0000018/07/05 17:35:01 INFO cosnative.NativeCosFileSystem: Opening 'cosn:// $bucketname/output/part-00000' for reading(under,1)(this,3)(distribution,2)(Technology,1)(country,1)(is,1)(Jetty,1)(currently,1)(permitted.,1)(Security,1)(have,1)(check,1)
フィードバック