Release Notes and Announcements

Release Notes

Announcements

Security Announcements

Product Introduction

Overview

Strengths

Architecture

Features

Use Cases

Constraints and Limits

Technical Support Scope

Product release

Purchase Guide

EMR on CVM Billing Instructions

EMR on TKE Billing Instructions

EMR Serverless HBase Billing Instructions

Getting Started

EMR on CVM Quick Start

EMR on TKE Quick Start

EMR on CVM Operation Guide

Planning Cluster

Administrative rights

Configuring Cluster

Managing Cluster

Managing Service

Monitoring and Alarms

TCInsight

EMR on TKE Operation Guide

Introduction to EMR on TKE

Configuring Cluster

Cluster Management

Service Management

Monitoring and Ops

Application Analysis

EMR Serverless HBase Operation Guide

EMR Serverless HBase Product Introduction

Quotas and Limits

Planning an Instance

Managing an Instance

Monitoring and Alarms

Development Guide

EMR Development Guide

Hadoop Development Guide

Spark Development Guide

Hbase Development Guide

Phoenix on Hbase Development Guide

Hive Development Guide

Presto Development Guide

Sqoop Development Guide

Hue Development Guide

Oozie Development Guide

Flume Development Guide

Kerberos Development Guide

Knox Development Guide

Alluxio Development Guide

Kylin Development Guide

Livy Development Guide

Kyuubi Development Guide

Zeppelin Development Guide

Hudi Development Guide

Superset Development Guide

Impala Development Guide

Druid Development Guide

TensorFlow Development Guide

Kudu Development Guide

Ranger Development Guide

Kafka Development Guide

Iceberg Development Guide

StarRocks Development Guide

Flink Development Guide

JupyterLab Development Guide

MLflow Development Guide

Practical Tutorial

Practice of EMR on CVM Ops

Data Migration

Practical Tutorial on Custom Scaling

API Documentation

History

Introduction

API Category

Cluster Resource Management APIs

Cluster Services APIs

User Management APIs

Data Inquiry APIs

Scaling APIs

Configuration APIs

Other APIs

Serverless HBase APIs

YARN Resource Scheduling APIs

Making API Requests

Data Types

Error Codes

FAQs

EMR on CVM

Service Level Agreement

Hive Overview

PDF

フォーカスモード

フォントサイズ

最終更新日: 2024-10-30 11:30:16

Hive is a data warehouse architecture built on the Hadoop file system, offering various features for data warehouse management, including ETL (Extract, Transform, Load) tools, data storage management, and capabilities for querying and analyzing large datasets. Hive also defines a SQL-like development language that allows users to map structured data files to a database table and provides simple SQL query features.
In EMR, Hive is installed in the /usr/local/service/hive path under EMR nodes.
For more details about Hive, see the Apache Hive Official Website.
Hive Service Roles
Role Name
Description
HiveServer2
The ThriftServer service of Hive is used to receive client query requests, perform SQL compilation and parsing, and support multiple client concurrency and authentication.
An EMR cluster can deploy multiple HiveServer2 instances, which supports scaling to Router nodes and configuring load balancing.
Hive MetaStore
Hive’s metadata service maintains metadata information for Hive databases and Hive tables. The metadata management capability of this module is also integrated with engines such as Spark and Trino.
An EMR cluster can deploy multiple Hive MetaStore instances, with support for expansion to Router nodes.
Hive Client
The Hive client provides applications like Beeline and JDBC, allowing users to submit SQL jobs to HiveServer2. Hive service is installed on all nodes where the service is deployed.
Hive WebHCat
WebHCat is a service that provides a REST API for HCatalog, allowing the execution of Hive commands and submission of MapReduce tasks through REST APIs.
Multiple WebHCat instances can be deployed within a cluster, with support for scaling to Router nodes.
Internal Table and External Table in Hive
Internal Table: Hive manages both the metadata and the actual data of internal tables. When you use the DROP command to delete an internal table, both the metadata and the corresponding data are deleted. After an internal table is created, HDFS files are mapped into a table, and Hive’s data warehouse generates a corresponding directory. The default warehouse path in EMR is /usr/hive/warehouse/${tablename}, where ${tablename} is the name of the table you create, located on HDFS.
External Table: External tables in Hive are similar to internal tables, but their data is not stored in the directory associated with the table itself; instead, it is stored elsewhere. The benefit of this is that if you delete the external table, the data it points to will not be deleted; only the metadata corresponding to the external table will be removed.
Hive Syntax
Hive in EMR is fully compatible with the open-source community syntax. For more details, see the HiveQL Community Syntax Manual.

ヘルプとサポート

この記事はお役に立ちましたか？

営業担当者にお問い合わせいただくかチケットを提出してサポートを求めることができます。

フィードバック

tencent cloud

Elastic MapReduce

Hive Overview

Hive Service Roles

Internal Table and External Table in Hive

Hive Syntax

ヘルプとサポート

Role Name	Description
HiveServer2	The ThriftServer service of Hive is used to receive client query requests, perform SQL compilation and parsing, and support multiple client concurrency and authentication. An EMR cluster can deploy multiple HiveServer2 instances, which supports scaling to Router nodes and configuring load balancing.
Hive MetaStore	Hive’s metadata service maintains metadata information for Hive databases and Hive tables. The metadata management capability of this module is also integrated with engines such as Spark and Trino. An EMR cluster can deploy multiple Hive MetaStore instances, with support for expansion to Router nodes.
Hive Client	The Hive client provides applications like Beeline and JDBC, allowing users to submit SQL jobs to HiveServer2. Hive service is installed on all nodes where the service is deployed.
Hive WebHCat	WebHCat is a service that provides a REST API for HCatalog, allowing the execution of Hive commands and submission of MapReduce tasks through REST APIs. Multiple WebHCat instances can be deployed within a cluster, with support for scaling to Router nodes.