/usr/local/service/spark:[root@172 ~]# su hadoop[hadoop@172 root]$ cd /usr/local/service/spark
[hadoop@10spark]$ bin/spark-sql --master yarn --num-executors 64 --executor-memory 2g
sbin/start-thriftserver.sh 或者 sbin/stop-thriftserver.sh 来启动或者停止一个 SparkSQLthriftserver。spark-sql> create database sparksql;Time taken: 0.907 secondsspark-sql> show databases;defaultsparksqltestTime taken: 0.131 seconds, Fetched 5 row(s)
spark-sql> use sparksql;Time taken: 0.076 secondsspark-sql> create table sparksql_test(a int,b string);Time taken: 0.374 secondsspark-sql> show tables;sparksql_test falseTime taken: 0.12 seconds, Fetched 1 row(s)
spark-sql> insert into sparksql_test values (42,'hello'),(48,'world');Time taken: 2.641 secondsspark-sql> select * from sparksql_test;42 hello48 worldTime taken: 0.503 seconds, Fetched 2 row(s)
D://mavenWorkplace 中,输入如下命令新建一个 Maven 工程:mvn archetype:generate -DgroupId=$yourgroupID -DartifactId=$yourartifactID-DarchetypeArtifactId=maven-archetype-quickstart
D://mavenWorkplace 目录下就会生成一个名为 $yourartifactID 的工程文件夹。其中的文件结构如下所示:simple---pom.xml 核心配置,项目根下---src---main---java Java 源码目录---resources Java 配置文件目录---test---java 测试源码目录---resources 测试配置目录
<dependencies><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.11</artifactId><version>2.0.2</version></dependency><!--spark sql--><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_2.11</artifactId><version>2.0.2</version></dependency></dependencies>
<build><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><configuration><source>1.8</source><target>1.8</target><encoding>utf-8</encoding></configuration></plugin><plugin><artifactId>maven-assembly-plugin</artifactId><configuration><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs></configuration><executions><execution><id>make-assembly</id><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin></plugins></build>
<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>$yourgroupID </groupId><artifactId>$yourartifactID </artifactId><version>1.0-SNAPSHOT</version><dependencies><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.11</artifactId><version>2.0.2</version></dependency><!--spark sql--><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_2.11</artifactId><version>2.0.2</version></dependency></dependencies><build><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><configuration><source>1.8</source><target>1.8</target><encoding>utf-8</encoding></configuration></plugin><plugin><artifactId>maven-assembly-plugin</artifactId><configuration><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs></configuration><executions><execution><id>make-assembly</id><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin></plugins></build></project>
import org.apache.spark.rdd.RDD;import org.apache.spark.sql.Dataset;import org.apache.spark.sql.Row;import org.apache.spark.sql.SparkSession;/*** Created by tencent on 2018/6/28.*/public class Demo {public static void main(String[] args){SparkSession spark = SparkSession.builder().appName("Java Spark Hive Example").enableHiveSupport().getOrCreate();Dataset<Row> df = spark.read().json(args[0]);RDD<Row> test = df.rdd();test.saveAsTextFile(args[1]);}}
mvn package
scp $localfile root@公网IP地址:$remotefolder
/usr/local/service/spark/exa-mples/src/main/resources/ 下,使用如下指令把该文件上传到 HDFS 中:[hadoop@10 hadoop]$ hadoop fs -put /usr/local/service/spark/examples/src/main/resources/people.json/user/hadoop
/user/hadoop/ 是 HDFS 下的文件夹,如果没有用户可以自己创建。[hadoop@10spark]$ bin/spark-submit --class Demo --master yarn-client $yourjarpackage //user/hadoop/people.json /user/hadoop/$output
/user/hadoop/$output 查看结果:[hadoop@172 spark]$ hadoop fs -cat /user/hadoop/$output/part-00000[null,Michael][30,Andy][19,Justin]
[hadoop@10spark]$ spark-submit -h
文档反馈