Product Introduction

Overview

Concepts

Cluster Architecture

Strengths

Scenarios

Purchase Guide

Billing Overview

Renewal Instructions

Overdue Policy

Refund Instructions

Configuration Adjustment Billing Instructions

Getting Started

Using Tencent Cloud TCHouse-D Through the Console

Using Tencent Cloud TCHouse-D Through a Client

Operation Guide

Cluster Operation

Monitoring and Alarm Configuration

Account Privilege Management

Data Management

Query Management

Modify Configurations

Node Management

Log Analysis

SQL Studio

Enabling Resource Isolation

Development Guide

Design of Data Table

Importing Data

Exporting Data

Basic Feature

Query Optimization

Ecological Expansion Feature

API Documentation

History

Introduction

API Category

Making API Requests

Cluster Operation APIs

Database and Table APIs

Cluster Information Viewing APIs

Hot-Cold Data Layering APIs

Database and Operation Audit APIs

User and Permission APIs

Resource Group Management APIs

Data Types

Error Codes

Cloud Ecosystem

Granting CAM Policies to Sub-accounts

Query Acceleration for Tencent Cloud DLC

Practical Tutorial

Basic Feature Usage

Advanced Features Usage

Resource Specification Selection and Optimization Suggestions

Naming Specifications and Limits to the Database and Data Table

Table Design and Data Import

Query Optimization

Suggested Usage to Avoid

Accessing TCHouse-D via JDBC over the Public Network

Performance Testing

TPC-H Performance Testing

SSB Performance Testing

TPC-DS Performance Testing

FAQs

Common Operational Issues

Common Errors

Glossary

Product Policy

Service Level Agreement

Data Processing And Security Agreement

Hive External Table

Focus Mode

Font Size

Last updated: 2024-07-04 10:13:34

Note:
The content showcased in this document is only suitable for Tencent Cloud TCHouse-D v1.1 and below. For later versions, it is recommended to use the Multi-Catalog feature for interfacing with external data directories.
The Hive External Table of Doris enables Doris to directly access Hive external tables. This eliminates the tedious work of data importing, and leverages OLAP capabilities of Doris for data analysis in Hive tables:
1. Doris can access Hive data source.
2. Supports joint querying of table in Doris and Hive, the Hudi data source for more complex analysis.
3. Supports accessing Hive data sources for which Kerberos is enabled.
This document mainly introduces the usage and considerations of this feature.
Usage
Creating a Hive External Table in Doris
-- Syntax
CREATE [EXTERNAL] TABLE table_name (
  col_name col_type [NULL | NOT NULL] [COMMENT "comment"]
) ENGINE=HIVE
[COMMENT "comment"]
PROPERTIES (
  'property_name'='property_value',
  ...
);
﻿
-- Example 1: Create the hive_table in hive_db of the Hive cluster
CREATE TABLE `t_hive` (
  `k1` int NOT NULL COMMENT "",
  `k2` char(10) NOT NULL COMMENT "",
  `k3` datetime NOT NULL COMMENT "",
  `k5` varchar(20) NOT NULL COMMENT "",
  `k6` double NOT NULL COMMENT ""
) ENGINE=HIVE
COMMENT "HIVE"
PROPERTIES (
'hive.metastore.uris' = 'thrift://192.168.0.1:9083',
'database' = 'hive_db',
'table' = 'hive_table'
);
﻿
-- Example 2: Create the hive_table in hive_db of the Hive cluster, with HDFS using HA configuration
CREATE TABLE `t_hive` (
  `k1` int NOT NULL COMMENT "",
  `k2` char(10) NOT NULL COMMENT "",
  `k3` datetime NOT NULL COMMENT "",
  `k5` varchar(20) NOT NULL COMMENT "",
  `k6` double NOT NULL COMMENT ""
) ENGINE=HIVE
COMMENT "HIVE"
PROPERTIES (
'hive.metastore.uris' = 'thrift://192.168.0.1:9083',
'database' = 'hive_db',
'table' = 'hive_table',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
﻿
-- Example 3: Create the hive_table in hive_db of the Hive cluster, with HDFS using HA configuration and Kerberos authentication enabled
CREATE TABLE `t_hive` (
  `k1` int NOT NULL COMMENT "",
  `k2` char(10) NOT NULL COMMENT "",
  `k3` datetime NOT NULL COMMENT "",
  `k5` varchar(20) NOT NULL COMMENT "",
  `k6` double NOT NULL COMMENT ""
) ENGINE=HIVE
COMMENT "HIVE"
PROPERTIES (
'hive.metastore.uris' = 'thrift://192.168.0.1:9083',
'database' = 'hive_db',
'table' = 'hive_table',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
﻿
-- Example 4: Create the hive_table in hive_db of the Hive cluster, with Hive data stored on S3
CREATE TABLE `t_hive` (
  `k1` int NOT NULL COMMENT "",
  `k2` char(10) NOT NULL COMMENT "",
  `k3` datetime NOT NULL COMMENT "",
  `k5` varchar(20) NOT NULL COMMENT "",
  `k6` double NOT NULL COMMENT ""
) ENGINE=HIVE
COMMENT "HIVE"
PROPERTIES (
'hive.metastore.uris' = 'thrift://192.168.0.1:9083',
'database' = 'hive_db',
'table' = 'hive_table',
'AWS_ACCESS_KEY' = 'your_access_key',
'AWS_SECRET_KEY' = 'your_secret_key',
'AWS_ENDPOINT' = 's3.us-east-1.amazonaws.com',
'AWS_REGION' = 'us-east-1'
);
Parameter description:
External Table Column:
The column name must correspond to the Hive table one by one.
The order of the columns needs to be consistent with the Hive table.
It must include all columns in the Hive table.
The Hive table partition column does not need to be specified, and can be defined like a regular column.
ENGINE needs to be specified as HIVE.
PROPERTIES attributes:
hive.metastore.uris: Hive Metastore service address.
database: The name of the corresponding database when Hive is mounted.
table: The name of the corresponding table when Hive is mounted.
dfs.nameservices: The name of name service, consistent with hdfs-site.xml.
dfs.ha.namenodes.[nameservice ID]: List of namenode IDs, consistent with hdfs-site.xml.
dfs.namenode.rpc-address.[nameservice ID].[name node ID]: The rpc address of the namenode, the number of which is the same as the number of namenodes, and is consistent with hdfs-site.xml.
dfs.client.failover.proxy.provider.[nameservice ID] :HDFS client's Java class for connecting to the active namenode, usually org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.
To access Hive data sources for which Kerberos is enabled, you need to add the following PROPERTIES for the Hive external table:
hadoop.security.authentication: Please set the authentication method to Kerberos, the default is simple.
dfs.namenode.kerberos.principal: The Kerberos entity of the HDFS namenode service.
hadoop.kerberos.principal: Setting the Kerberos entity used when Doris is connected to HDFS.
hadoop.kerberos.keytab: Setting the local path to the keytab file.
yarn.resourcemanager.principal: The Kerberos entity of the resource manager when Doris is connected to a Hadoop cluster.
If the data is stored is in a system like S3, such as Tencent Cloud COS, the following properties need to be set:
AWS_ACCESS_KEY: The SecretId of your Tencent Cloud account.
AWS_SECRET_KEY: The SecretKey of your Tencent Cloud account.
AWS_ENDPOINT: The endpoint of the COS bucket, for example: cos.ap-nanjing.myqcloud.com.
AWS_REGION: The region of the COS bucket, for example: ap-nanjing.
Note
To have Doris access a Hadoop cluster with Kerberos authentication enabled, you need to deploy Kerberos client kinit on all Doris cluster running nodes and configure krb5.conf by filling in KDC service information, etc.
You need to specify the absolute path to the keytab file to set the value for the PROPERTIES hadoop.kerberos.keytab, and allow Doris process to access this local file.
The configuration of HDFS cluster can be written into the hdfs-site.xml file. This configuration file is in the conf directory of fe and be. When creating a Hive table, users don't need to fill in the configuration info for the HDFS cluster.
Type Matching
The correspondence between supported Hive column types and Doris is as follows:
Hive
Doris
Description
BOOLEAN
BOOLEAN
﻿
CHAR
CHAR
Only UTF8 encoding is currently supported.
VARCHAR
VARCHAR
Only UTF8 encoding is currently supported.
TINYINT
TINYINT
﻿
SMALLINT
SMALLINT
﻿
INT
INT
﻿
BIGINT
BIGINT
﻿
FLOAT
FLOAT
﻿
DOUBLE
DOUBLE
﻿
DECIMAL
DECIMAL
﻿
DATE
DATE
﻿
TIMESTAMP
DATETIME
Converting timestamp to datetime may cause loss of precision
Note
The schema of the Hive table will not be synchronized automatically. You need to rebuild the Hive external table in Doris.
The current storage format of Hive only supports Text, Parquet, and ORC types.
The currently supported versions of Hive are 2.3.7 and 3.1.2 by default, and other versions have not been tested. More versions will be supported in the future.
Query Usage
You can use the Hive external table in Doris the same way as using Doris OLAP tables, except that the Doris data models (Rollup, Pre-Aggregation, and Materialized View, etc.) are unavailable.
select * from t_hive where k1 > 1000 and k3 ='term' or k4 like '%doris';
﻿

Help and Support

Was this page helpful?

You can also Contact sales or Submit a Ticket for help.

Help us improve! Rate your documentation experience in 5 mins.

Feedback

Hive	Doris	Description
BOOLEAN	BOOLEAN
CHAR	CHAR	Only UTF8 encoding is currently supported.
VARCHAR	VARCHAR	Only UTF8 encoding is currently supported.
TINYINT	TINYINT
SMALLINT	SMALLINT
INT	INT
BIGINT	BIGINT
FLOAT	FLOAT
DOUBLE	DOUBLE
DECIMAL	DECIMAL
DATE	DATE
TIMESTAMP	DATETIME	Converting timestamp to datetime may cause loss of precision

tencent cloud

Tencent Cloud TCHouse-D

Hive External Table

Usage

Creating a Hive External Table in Doris

Parameter description:

Type Matching

Query Usage

Help and Support