tencent cloud

Iceberg External Table
Last updated: 2024-07-04 10:15:06
Iceberg External Table
Last updated: 2024-07-04 10:15:06
Note:
The content showcased in this document is only applicable to Tencent Cloud TCHouse-D 1.1 and below versions. It is recommended to use the Multi-Catalog feature to interface with external catalogs in subsequent versions.
The Iceberg External Table of Doris provides the capability for Doris to directly access Iceberg external tables. External tables eliminate the tedious data import work and leverage the OLAP capabilities of Doris for analyzing data from Iceberg table:
1. Support for connecting the Iceberg data source to Doris.
2. Support for joint queries between Doris and the tables in the Iceberg data source for more complex analysis operations.
This document mainly introduces the usage and considerations of this feature.

Usage

Creating an Iceberg External Table in Doris

There are two ways to create an Iceberg external table in Doris. You do not need to claim the column definition when creating an external table, as Doris can automatically convert it based on the column definition in the Iceberg table.
1. Creating an independent external table for mounting an Iceberg table. For the specific syntax, please see CREATE TABLE.
-- Syntax
CREATE [EXTERNAL] TABLE table_name
ENGINE = ICEBERG
[COMMENT "comment"]
PROPERTIES (
"iceberg.database" = "iceberg_db_name",
"iceberg.table" = "icberg_table_name",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);


-- Example 1: Mounting the 'iceberg_table' in 'iceberg_db' in Iceberg
CREATE TABLE `t_iceberg`
ENGINE = ICEBERG
PROPERTIES (
"iceberg.database" = "iceberg_db",
"iceberg.table" = "iceberg_table",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);


-- Example 2: Mounting the 'iceberg_table' in 'iceberg_db' in Iceberg with HDFS HA enabled
CREATE TABLE `t_iceberg`
ENGINE = ICEBERG
PROPERTIES (
"iceberg.database" = "iceberg_db",
"iceberg.table" = "iceberg_table",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG",
"dfs.nameservices"="HDFS8000463",
"dfs.ha.namenodes.HDFS8000463"="nn2,nn1",
"dfs.namenode.rpc-address.HDFS8000463.nn2"="172.21.16.5:4007",
"dfs.namenode.rpc-address.HDFS8000463.nn1"="172.21.16.26:4007",
"dfs.client.failover.proxy.provider.HDFS8000463"="org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
);
2. Creating an Iceberg database for mounting the corresponding Iceberg database in the remote end, and all the tables under the database. For the specific syntax, please see CREATE DATABASE.
-- Syntax
CREATE DATABASE db_name
[COMMENT "comment"]
PROPERTIES (
"iceberg.database" = "iceberg_db_name",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);

-- Example: Mounting 'iceberg_db' in Iceberg and all tables under this db
CREATE DATABASE `iceberg_test_db`
PROPERTIES (
"iceberg.database" = "iceberg_db",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);
The progress of creating the table in iceberg_test_db can be viewed by HELP SHOW TABLE CREATION.
You can also create an Iceberg external table with a specific column definition according to your need.
1. Creating an Iceberg External Table
-- Syntax
CREATE [EXTERNAL] TABLE table_name (
col_name col_type [NULL | NOT NULL] [COMMENT "comment"]
) ENGINE = ICEBERG
[COMMENT "comment"]
PROPERTIES (
"iceberg.database" = "iceberg_db_name",
"iceberg.table" = "icberg_table_name",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);

-- Example 1: Mounting the 'iceberg_table' in 'iceberg_db' in Iceberg
CREATE TABLE `t_iceberg` (
`id` int NOT NULL COMMENT "id number",
`name` varchar(10) NOT NULL COMMENT "user name"
) ENGINE = ICEBERG
PROPERTIES (
"iceberg.database" = "iceberg_db",
"iceberg.table" = "iceberg_table",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);

-- Example 2: Mounting the 'iceberg_table' in 'iceberg_db' in Iceberg with HDFS HA enabled
CREATE TABLE `t_iceberg` (
`id` int NOT NULL COMMENT "id number",
`name` varchar(10) NOT NULL COMMENT "user name"
) ENGINE = ICEBERG
PROPERTIES (
"iceberg.database" = "iceberg_db",
"iceberg.table" = "iceberg_table",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG",
"dfs.nameservices"="HDFS8000463",
"dfs.ha.namenodes.HDFS8000463"="nn2,nn1",
"dfs.namenode.rpc-address.HDFS8000463.nn2"="172.21.16.5:4007",
"dfs.namenode.rpc-address.HDFS8000463.nn1"="172.21.16.26:4007",
"dfs.client.failover.proxy.provider.HDFS8000463"="org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
);

Parameter description:

External Table Column
Column names should correspond to the Iceberg table one by one.
The order of the columns needs to match the Iceberg Table.
The ENGINE needs to be specified as ICEBERG.
PROPERTIES:
iceberg.hive.metastore.uris: Hive Metastore server address.
iceberg.database: The database name corresponding to the mounted Iceberg.
iceberg.table: The table name corresponding to the mounted Iceberg, no need to specify when mounting Iceberg database.
iceberg.catalog.type: The catalog method used in Iceberg, which is HIVE_CATALOG by default. At present, only this method is supported, and more Iceberg catalog methods will be supported later.

Display Table Structure

You can view the displayed table structure through SHOW CREATE TABLE.

Synchronous Mounting

When the Schema of Iceberg Table changes, you can manually synchronize it using the REFRESH command, which will delete and rebuild the Iceberg External Table in Doris. For detailed information, see HELP REFRESH.
-- Synchronizing Iceberg Table
REFRESH TABLE t_iceberg;

-- Synchronizing Iceberg Database
REFRESH DATABASE iceberg_test_db;

Type Matching

The Iceberg Column Type supported and the corresponding relationship with Doris are as follows:
Iceberg
Doris
Description
BOOLEAN
BOOLEAN
-
INTEGER
INT
-
LONG
BIGINT
-
FLOAT
FLOAT
-
DOUBLE
DOUBLE
-
DATE
DATE
-
TIMESTAMP
DATETIME
Converting timestamp to datetime may result in loss of precision
STRING
STRING
-
UUID
VARCHAR
Use VARCHAR as a substitute
DECIMAL
DECIMAL
-
TIME
-
Not supported
FIXED
-
Not supported
BINARY
-
Not supported
STRUCT
-
Not supported
LIST
-
Not supported
MAP
-
Not supported
Note
The Schema of Iceberg Table changes will not automatically synchronize, it is necessary to synchronize the Iceberg external table or database in Doris using the REFRESH command.
The current supported Iceberg versions are 0.12.0 and 0.13.2 by default. Other versions have not been tested. More versions will be supported in the future.

Query Usage

You can use the Iceberg external tables in Doris the same way as using Doris internal tables, except that the Doris data models (Rollup, Pre-Aggregation, and Materialized Views, etc.), it is no different from a regular Doris OLAP table.
select * from t_iceberg where k1 > 1000 and k3 ='term' or k4 like '%doris';

Related System Configuration

FE Configuration

The following configurations belong to the system-level configurations of the Iceberg external table. You could modify the fe.conf or use ADMIN SET CONFIG to configure.
iceberg_table_creation_strict_mode Create an Iceberg table, and the strict mode is enabled by default. The strict mode involves a strict filter on the column types of the Iceberg table. If there are data types that Doris does not currently support, the creation of the external table fails.
iceberg_table_creation_interval_second The interval between executing background tasks when Iceberg table is automatically created. The default value is 10s.
max_iceberg_table_creation_record_size The maximum value retained when an Iceberg table is created. The defaut value is 2000. This only applies to the creation of Iceberg database records.
Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback