/data/路径下(/data/Impala)。[root@10 ~]# su hadoop[hadoop@10 root]$ cd /data/Impala/
#!/bin/bashMAXROW=1000000 #指定生成数据行数for((i = 0; i < $MAXROW; i++))doecho $RANDOM, \\"$RANDOM\\"done
[hadoop@10 ~]$ ./gen_data.sh > impala_test.data
impala_test.data中。然后把生成的测试数据上传到 HDFS 中,执行如下命令:[hadoop@10 ~]$ hdfspath="/impala_test_dir"[hadoop@10 ~]$ hdfs dfs -mkdir $hdfspath[hadoop@10 ~]$ hdfs dfs -put ./impala_test.data $hdfspath
[hadoop@10 ~]$ hdfs dfs -ls $hdfspath
lmapala 版本 | impala-shell 路径 | impala-shell 默认链接端口 |
4.1.0/4.0.0 | /data/lmpala/shell | 27009 |
3.4.0 | /data/lmpala/shell | 27001 |
2.10.0 | /data/lmpala/bin | 27001 |
[root@10 Impala]# cd /data/Impala/shell;./impala-shell -i $core_ip:27001
Connected to $core_ip:27001Server version: impalad version 3.4.1-RELEASE RELEASE (build Could not obtain git hash)***********************************************************************************Welcome to the Impala shell.(Impala Shell 3.4.1-RELEASE (ebled66) built on Tue Nov 20 17:28:10 CST 2021)The SET command shows the current value of all shell and query options.***********************************************************************************[$core_ip:27001] >
cd /data/Impala/shell;./impala-shell -i localhost:27001
[10.1.0.215:27001] > show databases;Query: show databases+------------------+----------------------------------------------+| name | comment |+------------------+----------------------------------------------+| _impala_builtins | System database for Impala builtin functions || default | Default Hive database |+------------------+----------------------------------------------+Fetched 2 row(s) in 0.09s
create指令创建一个数据库:[localhost:27001] > create database experiments;Query: create database experimentsFetched 0 row(s) in 0.41s
use指令转到刚创建的 test 数据库下:[localhost:27001] > use experiments;Query: use experiments
select current_database();
create指令在 experiments 数据库下创建一个新的名为 impala_test 的内部表:[localhost:27001] > create table t1 (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';Query: create table t1 (a int, b string)Fetched 0 row(s) in 0.13s
[localhost:27001] > show tables;Query: show tables+------+| name |+------+| t1 |+------+Fetched 1 row(s) in 0.01s
[localhost:27001] > desc t1;Query: describe t1+------+--------+---------+| name | type | comment |+------+--------+---------+| a | int | || b | string | |+------+--------+---------+Fetched 2 row(s) in 0.01s
LOAD DATA INPATH '$hdfspath/impala_test.data' INTO TABLE t1;
/usr/hive/warehouse/experiments.db/t1下。也可以建立外部表,语句如下:CREATE EXTERNAL TABLE t2(a INT,b string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','LOCATION '/impala_test_dir';
[localhost:27001] > select count(*) from experiments.t1;Query: select count(*) from experiments.t1Query submitted at: 2019-03-01 11:20:20 (Coordinator: http://10.1.0.215:20004)Query progress can be monitored at: http://10.1.0.215:20004/query_plan?query_id=f1441478dba3a1c5:fa7a8eef00000000+----------+| count(*) |+----------+| 1000000 |+----------+Fetched 1 row(s) in 0.63s
[localhost:27001] > drop table experiments.t1;Query: drop table experiments.t1
$hs2host和$hsport,其中$hs2host是 EMR 集群中任意 core 节点或者 task 节点的 IP。而 hsport 可以在对应节点的 Impala 目录下,配置文件conf/impalad.flgs中查看。[root@10 ~]# su hadoop[hadoop@10 root]$ cd /data/Impala/[hadoop@10 Impala]$ grep hs2_port conf/impalad.flgs
文档反馈