Release Notes and Announcements
- Release Notes
- Announcements
- Security Announcements
Product Introduction
- Overview
- Strengths
- Architecture
- Features
- Use Cases
- Constraints and Limits
- Technical Support Scope
- Product release
Purchase Guide
- EMR on CVM Billing Instructions
- EMR on TKE Billing Instructions
- EMR Serverless HBase Billing Instructions
- EMR Serverless TCBase Billing Overview
Getting Started
- EMR on CVM Quick Start
- EMR on TKE Quick Start
EMR on CVM Operation Guide
- Planning Cluster
- Administrative rights
- Configuring Cluster
- Managing Cluster
- Managing Service
- Monitoring and Alarms
- TCInsight
EMR on TKE Operation Guide
- Introduction to EMR on TKE
- Configuring Cluster
- Cluster Management
- Service Management
- Monitoring and Ops
- Application Analysis
EMR Serverless HBase Operation Guide
- EMR Serverless HBase Product Introduction
- Quotas and Limits
- Planning an Instance
- Managing an Instance
- Monitoring and Alarms
- Development Guide
EMR Serverless TCBase Operation Guide
- Introduction to EMR Serverless TCBase
- Managing Instances
- Managing Services
- Monitoring and Alarms
EMR Development Guide
- Hadoop Development Guide
- Spark Development Guide
- Hbase Development Guide
- Phoenix on Hbase Development Guide
- Hive Development Guide
- Presto Development Guide
- Sqoop Development Guide
- Hue Development Guide
- Oozie Development Guide
- Flume Development Guide
- Kerberos Development Guide
- Knox Development Guide
- Alluxio Development Guide
- Kylin Development Guide
- Livy Development Guide
- Kyuubi Development Guide
- Zeppelin Development Guide
- Hudi Development Guide
- Superset Development Guide
- Impala Development Guide
- Druid Development Guide
- TensorFlow Development Guide
- Kudu Development Guide
- Ranger Development Guide
- Kafka Development Guide
- StarRocks Development Guide
- Flink Development Guide
- JupyterLab Development Guide
- MLflow Development Guide
Practical Tutorial
- Practice of EMR on CVM Ops
- Data Migration
- Practical Tutorial on Custom Scaling
API Documentation
- History
- Introduction
- API Category
- Making API Requests
- Cluster Resource Management APIs
- Cluster Services APIs
- User Management APIs
- Information Query APIs
- Scaling APIs
- Configuration APIs
- Other APIs
- Cluster Lifecycle APIs
- Serverless HBase APIs
- YARN Resource Scheduling APIs
- Data Types
- Error Codes
FAQs
- EMR on CVM
Service Level Agreement
Contact Us

Accessing Hudi Data with Hive

Download

포커스 모드

폰트 크기

마지막 업데이트 시간: 2024-10-30 11:43:08

Development Preparation
Make sure you have activated Tencent Cloud and created an EMR cluster. For more details, see Creating a Cluster.
During the creation of an EMR cluster, select the Hive, Spark, and Hudi components in the software configuration interface.
Reading and Writing Hudi with Spark
Log in to the master node, switch to the hadoop user, and use SparkSQL with the HoodieSparkSessionExtension extension to read and write data:
spark-sql --master yarn \\
--num-executors 2 \\
--executor-memory 1g \\
--executor-cores 2 \\
--jars /usr/local/service/hudi/hudi-bundle/hudi-spark3.3-bundle_2.12-0.13.0.jar \\
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \\
--conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \\
--conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
Note:
Among them, --master specifies your master URL, --num-executors specifies the number of executors, and --executor-memory specifies the executor memory capacity. You can modify these parameters based on your actual requirements. The dependency package versions used by --jars may vary across different EMR versions. Check and use the correct dependency package located in the /usr/local/service/hudi/hudi-bundle directory.
Create a table:
-- Create a partition table
﻿
﻿
spark-sql> create table hudi_cow_nonpcf_tbl (
 uuid int,
 name string,
 price double
) using hudi
tblproperties (
 primaryKey = 'uuid'
);
﻿
﻿
-- Create a partition table
﻿
﻿
spark-sql> create table hudi_cow_pt_tbl (
 id bigint,
 name string,
 ts bigint,
 dt string,
 hh string
) using hudi
tblproperties (
 type = 'cow',
 primaryKey = 'id',
 preCombineField = 'ts'
 )
partitioned by (dt, hh);
﻿
﻿
-- Create a MOR partition table
﻿
﻿
spark-sql> create table hudi_mor_tbl (
 id int,
 name string,
 price double,
 ts bigint,
 dt string
) using hudi
tblproperties (
 type = 'mor',
 primaryKey = 'id',
 preCombineField = 'ts'
)
partitioned by (dt);
Write data:
-- insert into non-partitioned table
spark-sql> insert into hudi_cow_nonpcf_tbl select 1, 'a1', 20;
﻿
﻿
-- insert dynamic partition
spark-sql> insert into hudi_cow_pt_tbl partition (dt, hh) select 1 as id, 'a1' as name, 1000 as ts, '2021-12-09' as dt, '10' as hh;
﻿
﻿
-- insert static partition
spark-sql> insert into hudi_cow_pt_tbl partition(dt = '2021-12-09', hh='11') select 2, 'a2', 1000;
spark-sql> insert into hudi_mor_tbl partition(dt = '2021-12-09') select 1, 'a1', 20, 1000;
Using Hive to Query Hudi Table
Log in to the Master node, switch to the hadoop user, and execute the following command to connect to Hive:
hive
Add the Hudi dependency package:
hive> add jar /usr/local/service/hudi/hudi-bundle/hudi-hadoop-mr-bundle-0.13.0.jar;
View the table:
hive> show tables;
OK
hudi_cow_nonpcf_tbl
hudi_cow_pt_tbl
hudi_mor_tbl
hudi_mor_tbl_ro
hudi_mor_tbl_rt
Time taken:0.023 seconds, Fetched:5 row(s)
Query data:
hive> select * from hudi_cow_nonpcf_tbl;
OK
20230905170525412 20230905170525412_0_0 1 8d32a1cc-11f9-437f-9a7b-8ba9532223d3-0_0-17-15_20230905170525412.parquet 1 a1 20.0
Time taken:1.447 seconds, Fetched:1 row(s)
﻿
hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
hive> select * from hudi_mor_tbl_ro;
OK
20230808174602565	20230808174602565_0_1	id:1	dt=2021-12-09	af40667d-1dca-4163-89ca-2c48250985b2-0_0-34-1617_20230808174602565.parquet	1	a1	20.0	1000	2021-12-09
Time taken:0.159 seconds, Fetched:1 row(s)


hive> set hive.vectorized.execution.enabled=false;
hive> select name, count(*) from hudi_mor_tbl_rt group by name;
a1	1
Time taken:17.618 seconds, Fetched:1 row(s)
﻿

도움말 및 지원

문제 해결에 도움이 되었나요?

더 자세한 내용은 문의하기 또는 티겟 제출 을 통해 문의할 수 있습니다.

피드백

tencent cloud

Elastic MapReduce

Accessing Hudi Data with Hive

Development Preparation

Reading and Writing Hudi with Spark

Using Hive to Query Hudi Table

도움말 및 지원