root, and the password is the one you set when creating the EMR cluster.Hadoop user. The root user is logged in by default when you log in to the EMR node, so you need to switch to the Hadoop user. Run the following command to switch users and go to the Hadoop folder:[root@172 ~]# su hadoop[hadoop@172 root]$ cd /usr/local/service/hadoop[hadoop@172 hadoop]$
test.txt locally and add the following sentences to the file:Hello World.this is a message.this is another message.Hello world, how are you?
scp $localfile root@public IP address:$remotefolder
/usr/local/service/hadoop path in the EMR cluster in this example.[hadoop@172 hadoop]$ ls –l
[hadoop@172 hadoop]$ hadoop fs -put /usr/local/service/hadoop/test.txt /user/hadoop/
[hadoop@172 hadoop]$ hadoop fs -ls /user/hadoopOutput:-rw-r--r-- 3 hadoop supergroup 85 2018-07-06 11:18 /user/hadoop/test.txt
/user/hadoop folder in Hadoop, you can create it on your own by running the following command:[hadoop@172 hadoop]$ hadoop fs –mkdir /user/hadoop
[hadoop@10 hadoop]$ hadoop fs -ls cosn://$bucketname/ test.txt
rw-rw-rw- 1 hadoop hadoop 1366 2017-03-15 19:09 cosn://$bucketname/test.txt$bucketname with the name and path of your bucket.[hadoop@10 hadoop]$ hadoop fs -put test.txt cosn://$bucketname /[hadoop@10 hadoop]$ hadoop fs -ls cosn:// $bucketname / test.txt
rw-rw-rw- 1 hadoop hadoop 1366 2017-03-15 19:09 cosn://$bucketname / test.txtpom.xml file, eliminating the need to add them manually.D://mavenWorkplace, and create the project using the following commands:mvn archetype:generate -DgroupId=$yourgroupID -DartifactId=$yourartifactID-DarchetypeArtifactId=maven-archetype-quickstart
$yourgroupID is your package name, $yourartifactID is your project name, and maven-archetype-quickstart indicates to create a Maven Java project. Some files need to be downloaded during the project creation, so please keep the network connected.$yourartifactID in the D://mavenWorkplace directory. Files in the folder have the following structure:simple---pom.xml Core configuration, under the project root directory---src---main---java Java source code directory---resources Java configuration file directory---test---java Test source code directory---resources Test configuration directory
<dependencies><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.7.3</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-core</artifactId><version>2.7.3</version></dependency></dependencies>
<build><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><configuration><source>1.8</source><target>1.8</target><encoding>utf-8</encoding></configuration></plugin><plugin><artifactId>maven-assembly-plugin</artifactId><configuration><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs></configuration><executions><execution><id>make-assembly</id><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin></plugins></build>
import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;import java.io.IOException;import java.util.StringTokenizer;/*** Created by tencent on 2018/7/6.*/public class WordCount {public static class TokenizerMapperextends Mapper<Object, Text, Text, IntWritable>{private static final IntWritable one = new IntWritable(1);private Text word = new Text();public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context)throws IOException, InterruptedException{StringTokenizer itr = new StringTokenizer(value.toString());while (itr.hasMoreTokens()){this.word.set(itr.nextToken());context.write(this.word, one);}}}public static class IntSumReducerextends Reducer<Text, IntWritable, Text, IntWritable>{private IntWritable result = new IntWritable();public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context)throws IOException, InterruptedException{int sum = 0;for (IntWritable val : values) {sum += val.get();}this.result.set(sum);context.write(key, this.result);}}public static void main(String[] args)throws Exception{Configuration conf = new Configuration();String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();if (otherArgs.length < 2){System.err.println("Usage: wordcount <in> [<in>...] <out>");System.exit(2);}Job job = Job.getInstance(conf, "word count");job.setJarByClass(WordCount.class);job.setMapperClass(TokenizerMapper.class);job.setCombinerClass(IntSumReducer.class);job.setReducerClass(IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);for (int i = 0; i < otherArgs.length - 1; i++) {FileInputFormat.addInputPath(job, new Path(otherArgs[i]));}FileOutputFormat.setOutputPath(job, new Path(otherArgs[(otherArgs.length - 1)]));System.exit(job.waitForCompletion(true) ? 0 : 1);}}
mvn package
scp $jarpackage root@public IP address: /usr/local/service/hadoop
$jarpackage is the path plus name of your local .jar package; root is the CVM instance username; and the public IP address can be viewed in the node information in the EMR console or the CVM console. The file is uploaded to the /usr/local/service/hadoop folder of the EMR cluster./usr/local/service/hadoop directory as described in data preparations, and submit the task by running the following command:[hadoop@10 hadoop]$ bin/hadoop jar/usr/local/service/hadoop/WordCount-1.0-SNAPSHOT-jar-with-dependencies.jarWordCount /user/hadoop/test.txt /user/hadoop/WordCount_output
/user/hadoop/ test.txt is the input file and /user/hadoop/ WordCount_output is the output folder. You should not create the WordCount_output folder before the command is submitted; otherwise, the submission will fail.[hadoop@172 hadoop]$ hadoop fs -ls /user/hadoop/WordCount_outputFound 2 items-rw-r--r-- 3 hadoop supergroup 0 2018-07-06 11:35 /user/hadoop/MEWordCount_output/_SUCCESS-rw-r--r-- 3 hadoop supergroup 82 2018-07-06 11:35 /user/hadoop/MEWordCount_output/part-r-00000
[hadoop@172 hadoop]$ hadoop fs -cat /user/hadoop/MEWordCount_output/part-r-00000Hello 2World. 1a 1another 1are 1how 1is 2message. 2this 2world, 1you? 1……
/usr/local/service/hadoop directory and submit the task by running the following command:[hadoop@10 hadoop]$ hadoop jar/usr/local/service/hadoop/WordCount-1.0-SNAPSHOT-jar-with-dependencies.jarWordCount cosn://$bucketname/test.txt cosn://$bucketname /WordCount_output
cosn:// $bucketname/ test.txt, where $bucketname is your bucket name and path. The result will go to COS as well. Run the following command to view the output file:[hadoop@10 hadoop]$ hadoop fs -ls cosn:// $bucketname /WordCount_outputFound 2 items-rw-rw-rw- 1 hadoop Hadoop 0 2018-07-06 10:34 cosn://$bucketname /WordCount_output/_SUCCESS-rw-rw-rw- 1 hadoop Hadoop 1306 2018-07-06 10:34 cosn://$bucketname /WordCount_output/part-r-00000
[hadoop@10 hadoop]$ hadoop fs -cat cosn:// $bucketname /WordCount_output1/part-r-00000Hello 2World. 1a 1another 1are 1how 1is 2message. 2this 2world, 1you? 1
フィードバック