tencent cloud

Elastic MapReduce

Release Notes and Announcements
Release Notes
Announcements
Security Announcements
Product Introduction
Overview
Strengths
Architecture
Features
Use Cases
Constraints and Limits
Technical Support Scope
Product release
Purchase Guide
EMR on CVM Billing Instructions
EMR on TKE Billing Instructions
EMR Serverless HBase Billing Instructions
Getting Started
EMR on CVM Quick Start
EMR on TKE Quick Start
EMR on CVM Operation Guide
Planning Cluster
Administrative rights
Configuring Cluster
Managing Cluster
Managing Service
Monitoring and Alarms
TCInsight
EMR on TKE Operation Guide
Introduction to EMR on TKE
Configuring Cluster
Cluster Management
Service Management
Monitoring and Ops
Application Analysis
EMR Serverless HBase Operation Guide
EMR Serverless HBase Product Introduction
Quotas and Limits
Planning an Instance
Managing an Instance
Monitoring and Alarms
Development Guide
EMR Development Guide
Hadoop Development Guide
Spark Development Guide
Hbase Development Guide
Phoenix on Hbase Development Guide
Hive Development Guide
Presto Development Guide
Sqoop Development Guide
Hue Development Guide
Oozie Development Guide
Flume Development Guide
Kerberos Development Guide
Knox Development Guide
Alluxio Development Guide
Kylin Development Guide
Livy Development Guide
Kyuubi Development Guide
Zeppelin Development Guide
Hudi Development Guide
Superset Development Guide
Impala Development Guide
Druid Development Guide
TensorFlow Development Guide
Kudu Development Guide
Ranger Development Guide
Kafka Development Guide
Iceberg Development Guide
StarRocks Development Guide
Flink Development Guide
JupyterLab Development Guide
MLflow Development Guide
Practical Tutorial
Practice of EMR on CVM Ops
Data Migration
Practical Tutorial on Custom Scaling
API Documentation
History
Introduction
API Category
Cluster Resource Management APIs
Cluster Services APIs
User Management APIs
Data Inquiry APIs
Scaling APIs
Configuration APIs
Other APIs
Serverless HBase APIs
YARN Resource Scheduling APIs
Making API Requests
Data Types
Error Codes
FAQs
EMR on CVM
Service Level Agreement
Contact Us

Hive Overview

PDF
フォーカスモード
フォントサイズ
最終更新日: 2024-10-30 11:30:16
Hive is a data warehouse architecture built on the Hadoop file system, offering various features for data warehouse management, including ETL (Extract, Transform, Load) tools, data storage management, and capabilities for querying and analyzing large datasets. Hive also defines a SQL-like development language that allows users to map structured data files to a database table and provides simple SQL query features.
In EMR, Hive is installed in the /usr/local/service/hive path under EMR nodes.
For more details about Hive, see the Apache Hive Official Website.

Hive Service Roles

Role Name
Description
HiveServer2
The ThriftServer service of Hive is used to receive client query requests, perform SQL compilation and parsing, and support multiple client concurrency and authentication.
An EMR cluster can deploy multiple HiveServer2 instances, which supports scaling to Router nodes and configuring load balancing.
Hive MetaStore
Hive’s metadata service maintains metadata information for Hive databases and Hive tables. The metadata management capability of this module is also integrated with engines such as Spark and Trino.
An EMR cluster can deploy multiple Hive MetaStore instances, with support for expansion to Router nodes.
Hive Client
The Hive client provides applications like Beeline and JDBC, allowing users to submit SQL jobs to HiveServer2. Hive service is installed on all nodes where the service is deployed.
Hive WebHCat
WebHCat is a service that provides a REST API for HCatalog, allowing the execution of Hive commands and submission of MapReduce tasks through REST APIs.
Multiple WebHCat instances can be deployed within a cluster, with support for scaling to Router nodes.

Internal Table and External Table in Hive

Internal Table: Hive manages both the metadata and the actual data of internal tables. When you use the DROP command to delete an internal table, both the metadata and the corresponding data are deleted. After an internal table is created, HDFS files are mapped into a table, and Hive’s data warehouse generates a corresponding directory. The default warehouse path in EMR is /usr/hive/warehouse/${tablename}, where ${tablename} is the name of the table you create, located on HDFS.
External Table: External tables in Hive are similar to internal tables, but their data is not stored in the directory associated with the table itself; instead, it is stored elsewhere. The benefit of this is that if you delete the external table, the data it points to will not be deleted; only the metadata corresponding to the external table will be removed.

Hive Syntax

Hive in EMR is fully compatible with the open-source community syntax. For more details, see the HiveQL Community Syntax Manual.

ヘルプとサポート

この記事はお役に立ちましたか?

フィードバック