tencent cloud

Elastic MapReduce

Release Notes and Announcements
Release Notes
Announcements
Security Announcements
Product Introduction
Overview
Strengths
Architecture
Features
Use Cases
Constraints and Limits
Technical Support Scope
Product release
Purchase Guide
EMR on CVM Billing Instructions
EMR on TKE Billing Instructions
EMR Serverless HBase Billing Instructions
Getting Started
EMR on CVM Quick Start
EMR on TKE Quick Start
EMR on CVM Operation Guide
Planning Cluster
Administrative rights
Configuring Cluster
Managing Cluster
Managing Service
Monitoring and Alarms
TCInsight
EMR on TKE Operation Guide
Introduction to EMR on TKE
Configuring Cluster
Cluster Management
Service Management
Monitoring and Ops
Application Analysis
EMR Serverless HBase Operation Guide
EMR Serverless HBase Product Introduction
Quotas and Limits
Planning an Instance
Managing an Instance
Monitoring and Alarms
Development Guide
EMR Development Guide
Hadoop Development Guide
Spark Development Guide
Hbase Development Guide
Phoenix on Hbase Development Guide
Hive Development Guide
Presto Development Guide
Sqoop Development Guide
Hue Development Guide
Oozie Development Guide
Flume Development Guide
Kerberos Development Guide
Knox Development Guide
Alluxio Development Guide
Kylin Development Guide
Livy Development Guide
Kyuubi Development Guide
Zeppelin Development Guide
Hudi Development Guide
Superset Development Guide
Impala Development Guide
Druid Development Guide
TensorFlow Development Guide
Kudu Development Guide
Ranger Development Guide
Kafka Development Guide
Iceberg Development Guide
StarRocks Development Guide
Flink Development Guide
JupyterLab Development Guide
MLflow Development Guide
Practical Tutorial
Practice of EMR on CVM Ops
Data Migration
Practical Tutorial on Custom Scaling
API Documentation
History
Introduction
API Category
Cluster Resource Management APIs
Cluster Services APIs
User Management APIs
Data Inquiry APIs
Scaling APIs
Configuration APIs
Other APIs
Serverless HBase APIs
YARN Resource Scheduling APIs
Making API Requests
Data Types
Error Codes
FAQs
EMR on CVM
Service Level Agreement
Contact Us

Basic Hive Operations

PDF
Focus Mode
Font Size
Last updated: 2024-10-30 11:25:22
This document demonstrates how to use Hive on EMR to create databases and tables, import data, and perform basic queries.

Development Preparation

Make sure you have activated Tencent Cloud and created an EMR cluster. For more details, see Creating a Cluster.
During the creation of an EMR cluster, select the Hive component in the software configuration interface.
In the example, there are optional contents that require access to Tencent Cloud Object Storage (COS). You can see Creating a Bucket to create a bucket in COS, and enable COS authorization on the Instance Information page of the EMR console.
Note:
You can see Logging in to a Linux Instance for the method to log in to an EMR node. On the cluster details page, select Cluster Resources > Resource Management, click the corresponding node resource ID to enter the CVM list, and click Login on the right to use WebShell to log in to the instance.
The default username for logging in to a Linux instance is root, and the password is the one set by the user during the creation of the EMR. After entering the correct information, you will be taken to the command line interface.
All operations in this document are performed as the hadoop user. Switch the user identity after logging in to the command line interface.

Preparing Sample Data

Log in to the Master node and use the following command in the EMR command line to switch to the hadoop user and navigate to the Hive folder:
su hadoop
cd /usr/local/service/hive
Create a bash script file named gen_data.sh and add the following code to it:
#!/bin/bash
MAXROW=1000000 #Specifies the number of rows of data to generate.
for((i = 0; i < $MAXROW; i++))
do
echo $RANDOM, \\"$RANDOM\\"
done
Grant permissions and execute the script as follows. This script will generate 1,000,000 pairs of random numbers and save them to the file hive_test.data:
chmod +x script name
./gen_data.sh > hive_test.data
Use the following command to upload the generated test data to HDFS, where ${hdfspath} is the path on HDFS where you want to store the file:
hdfs dfs -put ./hive_test.data /${hdfspath}
You can also use data stored in COS. Upload the data to COS. If the data is stored locally, you can use the COS console to upload it. If the data is in the EMR cluster, use the following command to upload the data, where ${bucketname} is the name of the COS bucket you created:
hdfs dfs -put ./hive_test.data cosn://${bucketname}/

Hive Basic Operations

Log in to the Master node of the EMR cluster, switch to the hadoop user, and enter the Hive command line using the Hive client:
hive

Creating Databases and Tables

Use the SHOW syntax to display all current databases:
hive> show databases;
OK
default
Time taken: 0.26 seconds, Fetched: 1 row(s)
Use the CREATE DATABASE syntax to create a database named test:
hive> create database if not exists test;
OK
Time taken: 0.176 seconds
Use the USE syntax to switch to the created test database:
hive> use test;
OK
Time taken: 0.176 seconds
Use the CREATE TABLE syntax to create an internal table named hive_test under the test database:
hive> create table hive_test (a int, b string)
hive> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
-- Create the hive_test data table and specify the column delimiter as commas (,).
OK
Time taken: 0.204 seconds
Finally, use the SHOW TABLES syntax to check if the table has been created successfully:
hive> show tables;
OK
hive_test
Time taken: 0.176 seconds, Fetched: 1 row(s)

Importing Data

For data stored in HDFS, use the following command to import it into the table:
hive> load data inpath "/${hdfspath}/hive_test.data" into table hive_test;
For data stored in COS, use the following command to import it into the table:
hive> load data inpath "cosn://${bucketname}/hive_test.data" into table hive_test;

Data stored locally in the EMR cluster can also be imported into Hive using the following command:
hive>load data local inpath "/${localpath}/hive_test.data" into table hive_test;
Note:
${hdfspath} is the path where your file is stored on HDFS, ${bucketname} is your COS bucket name, and ${localpath} is the path where your data is stored locally on the EMR cluster. After the import is completed, the source data will be deleted.

Executing Queries

Query the first 10 elements in the table:
hive> select * from hive_test limit 10;
OK
30847 "31583"
14887 "32053"
19741 "16590"
8104 "20321"
29030 "32724"
27274 "5231"
10028 "22594"
924 "32569"
10603 "27927"
4018 "30518"
Time taken:2.133 seconds, Fetched:10 row(s)
Count the total number of rows in the table:
hive> select count(*) from hive_test;
OK
1000000
Time taken:18.504 seconds, Fetched:1 row(s)

Deleting Databases and Tables

Use the DROP TABLE syntax to delete a Hive table:
hive> drop table if exists hive_test;
Moved: 'hdfs://HDFS/usr/hive/warehouse/hive_test' to trash at: hdfs://HDFS/user/hadoop/.Trash/Current
OK
Time taken: 2.327 seconds
Use the DROP DATABASE syntax to delete a Hive database:
hive> drop database if exists test;
OK
Time taken: 0.531 seconds

Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback