tencent cloud

Tencent Cloud TCHouse-D

Product Introduction
Overview
Concepts
Cluster Architecture
Strengths
Scenarios
Purchase Guide
Billing Overview
Renewal Instructions
Overdue Policy
Refund Instructions
Configuration Adjustment Billing Instructions
Getting Started
Using Tencent Cloud TCHouse-D Through the Console
Using Tencent Cloud TCHouse-D Through a Client
Operation Guide
Cluster Operation
Monitoring and Alarm Configuration
Account Privilege Management
Data Management
Query Management
Modify Configurations
Node Management
Log Analysis
SQL Studio
Enabling Resource Isolation
Development Guide
Design of Data Table
Importing Data
Exporting Data
Basic Feature
Query Optimization
Ecological Expansion Feature
API Documentation
History
Introduction
API Category
Making API Requests
Cluster Operation APIs
Database and Table APIs
Cluster Information Viewing APIs
Hot-Cold Data Layering APIs
Database and Operation Audit APIs
User and Permission APIs
Resource Group Management APIs
Data Types
Error Codes
Cloud Ecosystem
Granting CAM Policies to Sub-accounts
Query Acceleration for Tencent Cloud DLC
Practical Tutorial
Basic Feature Usage
Advanced Features Usage
Resource Specification Selection and Optimization Suggestions
Naming Specifications and Limits to the Database and Data Table
Table Design and Data Import
Query Optimization
Suggested Usage to Avoid
Accessing TCHouse-D via JDBC over the Public Network
Performance Testing
TPC-H Performance Testing
SSB Performance Testing
TPC-DS Performance Testing
FAQs
Common Operational Issues
Common Errors
Contact Us
Glossary
Product Policy
Service Level Agreement
Privacy Policy
Data Processing And Security Agreement
DocumentationTencent Cloud TCHouse-D

Hive External Table

Focus Mode
Font Size
Last updated: 2024-07-04 10:13:34
Note:
The content showcased in this document is only suitable for Tencent Cloud TCHouse-D v1.1 and below. For later versions, it is recommended to use the Multi-Catalog feature for interfacing with external data directories.
The Hive External Table of Doris enables Doris to directly access Hive external tables. This eliminates the tedious work of data importing, and leverages OLAP capabilities of Doris for data analysis in Hive tables:
1. Doris can access Hive data source.
2. Supports joint querying of table in Doris and Hive, the Hudi data source for more complex analysis.
3. Supports accessing Hive data sources for which Kerberos is enabled.
This document mainly introduces the usage and considerations of this feature.

Usage

Creating a Hive External Table in Doris

-- Syntax
CREATE [EXTERNAL] TABLE table_name (
col_name col_type [NULL | NOT NULL] [COMMENT "comment"]
) ENGINE=HIVE
[COMMENT "comment"]
PROPERTIES (
'property_name'='property_value',
...
);

-- Example 1: Create the hive_table in hive_db of the Hive cluster
CREATE TABLE `t_hive` (
`k1` int NOT NULL COMMENT "",
`k2` char(10) NOT NULL COMMENT "",
`k3` datetime NOT NULL COMMENT "",
`k5` varchar(20) NOT NULL COMMENT "",
`k6` double NOT NULL COMMENT ""
) ENGINE=HIVE
COMMENT "HIVE"
PROPERTIES (
'hive.metastore.uris' = 'thrift://192.168.0.1:9083',
'database' = 'hive_db',
'table' = 'hive_table'
);

-- Example 2: Create the hive_table in hive_db of the Hive cluster, with HDFS using HA configuration
CREATE TABLE `t_hive` (
`k1` int NOT NULL COMMENT "",
`k2` char(10) NOT NULL COMMENT "",
`k3` datetime NOT NULL COMMENT "",
`k5` varchar(20) NOT NULL COMMENT "",
`k6` double NOT NULL COMMENT ""
) ENGINE=HIVE
COMMENT "HIVE"
PROPERTIES (
'hive.metastore.uris' = 'thrift://192.168.0.1:9083',
'database' = 'hive_db',
'table' = 'hive_table',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);

-- Example 3: Create the hive_table in hive_db of the Hive cluster, with HDFS using HA configuration and Kerberos authentication enabled
CREATE TABLE `t_hive` (
`k1` int NOT NULL COMMENT "",
`k2` char(10) NOT NULL COMMENT "",
`k3` datetime NOT NULL COMMENT "",
`k5` varchar(20) NOT NULL COMMENT "",
`k6` double NOT NULL COMMENT ""
) ENGINE=HIVE
COMMENT "HIVE"
PROPERTIES (
'hive.metastore.uris' = 'thrift://192.168.0.1:9083',
'database' = 'hive_db',
'table' = 'hive_table',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);

-- Example 4: Create the hive_table in hive_db of the Hive cluster, with Hive data stored on S3
CREATE TABLE `t_hive` (
`k1` int NOT NULL COMMENT "",
`k2` char(10) NOT NULL COMMENT "",
`k3` datetime NOT NULL COMMENT "",
`k5` varchar(20) NOT NULL COMMENT "",
`k6` double NOT NULL COMMENT ""
) ENGINE=HIVE
COMMENT "HIVE"
PROPERTIES (
'hive.metastore.uris' = 'thrift://192.168.0.1:9083',
'database' = 'hive_db',
'table' = 'hive_table',
'AWS_ACCESS_KEY' = 'your_access_key',
'AWS_SECRET_KEY' = 'your_secret_key',
'AWS_ENDPOINT' = 's3.us-east-1.amazonaws.com',
'AWS_REGION' = 'us-east-1'
);

Parameter description:

External Table Column:
The column name must correspond to the Hive table one by one.
The order of the columns needs to be consistent with the Hive table.
It must include all columns in the Hive table.
The Hive table partition column does not need to be specified, and can be defined like a regular column.
ENGINE needs to be specified as HIVE.
PROPERTIES attributes:
hive.metastore.uris: Hive Metastore service address.
database: The name of the corresponding database when Hive is mounted.
table: The name of the corresponding table when Hive is mounted.
dfs.nameservices: The name of name service, consistent with hdfs-site.xml.
dfs.ha.namenodes.[nameservice ID]: List of namenode IDs, consistent with hdfs-site.xml.
dfs.namenode.rpc-address.[nameservice ID].[name node ID]: The rpc address of the namenode, the number of which is the same as the number of namenodes, and is consistent with hdfs-site.xml.
dfs.client.failover.proxy.provider.[nameservice ID] :HDFS client's Java class for connecting to the active namenode, usually org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.
To access Hive data sources for which Kerberos is enabled, you need to add the following PROPERTIES for the Hive external table:
hadoop.security.authentication: Please set the authentication method to Kerberos, the default is simple.
dfs.namenode.kerberos.principal: The Kerberos entity of the HDFS namenode service.
hadoop.kerberos.principal: Setting the Kerberos entity used when Doris is connected to HDFS.
hadoop.kerberos.keytab: Setting the local path to the keytab file.
yarn.resourcemanager.principal: The Kerberos entity of the resource manager when Doris is connected to a Hadoop cluster.
If the data is stored is in a system like S3, such as Tencent Cloud COS, the following properties need to be set:
AWS_ACCESS_KEY: The SecretId of your Tencent Cloud account.
AWS_SECRET_KEY: The SecretKey of your Tencent Cloud account.
AWS_ENDPOINT: The endpoint of the COS bucket, for example: cos.ap-nanjing.myqcloud.com.
AWS_REGION: The region of the COS bucket, for example: ap-nanjing.
Note
To have Doris access a Hadoop cluster with Kerberos authentication enabled, you need to deploy Kerberos client kinit on all Doris cluster running nodes and configure krb5.conf by filling in KDC service information, etc.
You need to specify the absolute path to the keytab file to set the value for the PROPERTIES hadoop.kerberos.keytab, and allow Doris process to access this local file.
The configuration of HDFS cluster can be written into the hdfs-site.xml file. This configuration file is in the conf directory of fe and be. When creating a Hive table, users don't need to fill in the configuration info for the HDFS cluster.

Type Matching

The correspondence between supported Hive column types and Doris is as follows:
Hive
Doris
Description
BOOLEAN
BOOLEAN

CHAR
CHAR
Only UTF8 encoding is currently supported.
VARCHAR
VARCHAR
Only UTF8 encoding is currently supported.
TINYINT
TINYINT

SMALLINT
SMALLINT

INT
INT

BIGINT
BIGINT

FLOAT
FLOAT

DOUBLE
DOUBLE

DECIMAL
DECIMAL

DATE
DATE

TIMESTAMP
DATETIME
Converting timestamp to datetime may cause loss of precision
Note
The schema of the Hive table will not be synchronized automatically. You need to rebuild the Hive external table in Doris.
The current storage format of Hive only supports Text, Parquet, and ORC types.
The currently supported versions of Hive are 2.3.7 and 3.1.2 by default, and other versions have not been tested. More versions will be supported in the future.

Query Usage

You can use the Hive external table in Doris the same way as using Doris OLAP tables, except that the Doris data models (Rollup, Pre-Aggregation, and Materialized View, etc.) are unavailable.
select * from t_hive where k1 > 1000 and k3 ='term' or k4 like '%doris';


Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback