Release Notes and Announcements
- Release Notes
- Announcements
- Security Announcements
Product Introduction
- Overview
- Strengths
- Architecture
- Features
- Use Cases
- Constraints and Limits
- Technical Support Scope
- Product release
Purchase Guide
- EMR on CVM Billing Instructions
- EMR on TKE Billing Instructions
- EMR Serverless HBase Billing Instructions
- EMR Serverless TCBase Billing Overview
Getting Started
- EMR on CVM Quick Start
- EMR on TKE Quick Start
EMR on CVM Operation Guide
- Planning Cluster
- Administrative rights
- Configuring Cluster
- Managing Cluster
- Managing Service
- Monitoring and Alarms
- TCInsight
EMR on TKE Operation Guide
- Introduction to EMR on TKE
- Configuring Cluster
- Cluster Management
- Service Management
- Monitoring and Ops
- Application Analysis
EMR Serverless HBase Operation Guide
- EMR Serverless HBase Product Introduction
- Quotas and Limits
- Planning an Instance
- Managing an Instance
- Monitoring and Alarms
- Development Guide
EMR Serverless TCBase Operation Guide
- Introduction to EMR Serverless TCBase
- Managing Instances
- Managing Services
- Monitoring and Alarms
EMR Development Guide
- Hadoop Development Guide
- Spark Development Guide
- Hbase Development Guide
- Phoenix on Hbase Development Guide
- Hive Development Guide
- Presto Development Guide
- Sqoop Development Guide
- Hue Development Guide
- Oozie Development Guide
- Flume Development Guide
- Kerberos Development Guide
- Knox Development Guide
- Alluxio Development Guide
- Kylin Development Guide
- Livy Development Guide
- Kyuubi Development Guide
- Zeppelin Development Guide
- Hudi Development Guide
- Superset Development Guide
- Impala Development Guide
- Druid Development Guide
- TensorFlow Development Guide
- Kudu Development Guide
- Ranger Development Guide
- Kafka Development Guide
- StarRocks Development Guide
- Flink Development Guide
- JupyterLab Development Guide
- MLflow Development Guide
Practical Tutorial
- Practice of EMR on CVM Ops
- Data Migration
- Practical Tutorial on Custom Scaling
API Documentation
- History
- Introduction
- API Category
- Cluster Resource Management APIs
- Cluster Services APIs
- User Management APIs
- Data Inquiry APIs
- Scaling APIs
- Configuration APIs
- Other APIs
- Serverless HBase APIs
- YARN Resource Scheduling APIs
- Making API Requests
- Data Types
- Error Codes
FAQs
- EMR on CVM
Service Level Agreement
Contact Us

Basic Hive Operations

Download

フォーカスモード

フォントサイズ

最終更新日: 2024-10-30 11:25:22

This document demonstrates how to use Hive on EMR to create databases and tables, import data, and perform basic queries.
Development Preparation
Make sure you have activated Tencent Cloud and created an EMR cluster. For more details, see Creating a Cluster.
During the creation of an EMR cluster, select the Hive component in the software configuration interface.
In the example, there are optional contents that require access to Tencent Cloud Object Storage (COS). You can see Creating a Bucket to create a bucket in COS, and enable COS authorization on the Instance Information page of the EMR console.
Note:
You can see Logging in to a Linux Instance for the method to log in to an EMR node. On the cluster details page, select Cluster Resources > Resource Management, click the corresponding node resource ID to enter the CVM list, and click Login on the right to use WebShell to log in to the instance.
The default username for logging in to a Linux instance is root, and the password is the one set by the user during the creation of the EMR. After entering the correct information, you will be taken to the command line interface.
All operations in this document are performed as the hadoop user. Switch the user identity after logging in to the command line interface.
Preparing Sample Data
Log in to the Master node and use the following command in the EMR command line to switch to the hadoop user and navigate to the Hive folder:
su hadoop
cd /usr/local/service/hive
Create a bash script file named gen_data.sh and add the following code to it:
#!/bin/bash
MAXROW=1000000 #Specifies the number of rows of data to generate.
for((i = 0; i < $MAXROW; i++))
do
 echo $RANDOM, \\"$RANDOM\\"
done
Grant permissions and execute the script as follows. This script will generate 1,000,000 pairs of random numbers and save them to the file hive_test.data:
chmod +x script name
./gen_data.sh > hive_test.data
Use the following command to upload the generated test data to HDFS, where ${hdfspath} is the path on HDFS where you want to store the file:
hdfs dfs -put ./hive_test.data /${hdfspath}
You can also use data stored in COS. Upload the data to COS. If the data is stored locally, you can use the COS console to upload it. If the data is in the EMR cluster, use the following command to upload the data, where ${bucketname} is the name of the COS bucket you created:
hdfs dfs -put ./hive_test.data cosn://${bucketname}/
Hive Basic Operations
Log in to the Master node of the EMR cluster, switch to the hadoop user, and enter the Hive command line using the Hive client:
hive
Creating Databases and Tables
Use the SHOW syntax to display all current databases:
hive> show databases;
OK
default
Time taken: 0.26 seconds, Fetched: 1 row(s)
Use the CREATE DATABASE syntax to create a database named test:
hive> create database if not exists test;
OK
Time taken: 0.176 seconds
Use the USE syntax to switch to the created test database:
hive> use test;
OK
Time taken: 0.176 seconds
Use the CREATE TABLE syntax to create an internal table named hive_test under the test database:
hive> create table hive_test (a int, b string)
hive> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
-- Create the hive_test data table and specify the column delimiter as commas (,).
OK
Time taken: 0.204 seconds
Finally, use the SHOW TABLES syntax to check if the table has been created successfully:
hive> show tables;
OK
hive_test
Time taken: 0.176 seconds, Fetched: 1 row(s)
Importing Data
For data stored in HDFS, use the following command to import it into the table:
hive> load data inpath "/${hdfspath}/hive_test.data" into table hive_test;
For data stored in COS, use the following command to import it into the table:
hive> load data inpath "cosn://${bucketname}/hive_test.data" into table hive_test;
﻿
Data stored locally in the EMR cluster can also be imported into Hive using the following command:
hive>load data local inpath "/${localpath}/hive_test.data" into table hive_test;
Note:
${hdfspath} is the path where your file is stored on HDFS, ${bucketname} is your COS bucket name, and ${localpath} is the path where your data is stored locally on the EMR cluster. After the import is completed, the source data will be deleted.
Executing Queries
Query the first 10 elements in the table:
hive> select * from hive_test limit 10;
OK
30847	"31583"
14887	"32053"
19741	"16590"
8104	"20321"
29030	"32724"
27274	"5231"
10028	"22594"
924	"32569"
10603	"27927"
4018	"30518"
Time taken:2.133 seconds, Fetched:10 row(s)
Count the total number of rows in the table:
hive> select count(*) from hive_test;
OK
1000000
Time taken:18.504 seconds, Fetched:1 row(s)
Deleting Databases and Tables
Use the DROP TABLE syntax to delete a Hive table:
hive> drop table if exists hive_test;
Moved: 'hdfs://HDFS/usr/hive/warehouse/hive_test' to trash at: hdfs://HDFS/user/hadoop/.Trash/Current
OK
Time taken: 2.327 seconds
Use the DROP DATABASE syntax to delete a Hive database:
hive> drop database if exists test;
OK
Time taken: 0.531 seconds

ヘルプとサポート

この記事はお役に立ちましたか？

営業担当者にお問い合わせいただくかチケットを提出してサポートを求めることができます。

フィードバック

tencent cloud

Elastic MapReduce

Basic Hive Operations

Development Preparation

Preparing Sample Data

Hive Basic Operations

Creating Databases and Tables

Importing Data

Executing Queries

Deleting Databases and Tables

ヘルプとサポート