tencent cloud

Data Lake Compute

Release Notes
Product Introduction
Overview
Strengths
Use Cases
Purchase Guide
Billing Overview
Refund
Payment Overdue
Configuration Adjustment Fees
Getting Started
Complete Process for New User Activation
DLC Data Import Guide
Quick Start with Data Analytics in Data Lake Compute
Quick Start with Permission Management in Data Lake Compute
Quick Start with Partition Table
Enabling Data Optimization
Cross-Source Analysis of EMR Hive Data
Standard Engine Configuration Guide
Configuring Data Access Policy
Operation Guide
Console Operation Introduction
Development Guide
Runtime Environment
SparkJar Job Development Guide
PySpark Job Development Guide
Query Performance Optimization Guide
UDF Function Development Guide
System Restraints
Client Access
JDBC Access
TDLC Command Line Interface Tool Access
Third-party Software Linkage
Python Access
Practical Tutorial
Accessing DLC Data with Power BI
Table Creation Practice
Using Apache Airflow to Schedule DLC Engine to Submit Tasks
Direct Query of DLC Internal Storage with StarRocks
Spark cost optimization practice
DATA + AI
Using DLC to Analyze CLS Logs
Using Role SSO to Access DLC
Resource-Level Authentication Guide
Implementing Tencent Cloud TCHouse-D Read and Write Operations in DLC
DLC Native Table
SQL Statement
SuperSQL Statement
Overview of Standard Spark Statement
Overview of Standard Presto Statement
Reserved Words
API Documentation
History
Introduction
API Category
Making API Requests
Data Table APIs
Task APIs
Metadata APIs
Service Configuration APIs
Permission Management APIs
Database APIs
Data Source Connection APIs
Data Optimization APIs
Data Engine APIs
Resource Group for the Standard Engine APIs
Data Types
Error Codes
General Reference
Error Codes
Quotas and limits
Operation Guide on Connecting Third-Party Software to DLC
FAQs
FAQs on Permissions
FAQs on Engines
FAQs on Features
FAQs on Spark Jobs
DLC Policy
Privacy Policy
Data Privacy And Security Agreement
Service Level Agreement
Contact Us

Querying Partition Table

PDF
Focus Mode
Font Size
Last updated: 2025-03-07 15:27:25
Storing data in partition catalogs can greatly reduce the scanned data volume of a computing task in Data Lake Compute and thereby significantly enhance the computing performance. The general practice of data partitioning is to store data in different catalogs by time. For example, data generated on the same day can be stored in the same catalog, and catalogs can be organized in a "year-month-day" structure. In Data Lake Compute, a table and its partitions must adopt the same data format.

Creating a Partition Table

To create a partition table, you need to specify the partition field in the table creation statement.

Adding Partitioned Data

Specifying a partition during data table creation is only to configure the partition field and doesn't allow running a query statement immediately to get data. You need to add partitioned data to a data table. If new partitioned data is added to the data catalog, you also need to add the partition information to the data table.

Manually adding a partition

Use the ALTER TABLE ADD PARTITION statement to add a specified partition catalog to a data table. If the partition catalog is compatible with the Hive partitioning rule (partition column name=partition column value), you don't need to specify the data path; otherwise, you need to refer SQL Syntax.
Sample 1: Adding a single partition catalog
ALTER TABLE tabel_demo ADD
PARTITION (dt = '2021-01-01');
Sample 2: Adding multi-level nested partition catalogs
ALTER TABLE tabel_demo ADD
PARTITION (year = '2021', month='01', day='01');
Sample 3: Displaying the specified partition path
ALTER TABLE tabel_demo ADD
PARTITION (year = '2021', month='01', day='01') LOCATION 'cosn://tablea_demo' ;

Automatically adding a partition

Use the MSCK REPAIR TABLE statement to scan the data catalog specified during table creation. If there is a new partition catalog, the system will automatically add the partitions to the metadata of the data table. Details can be found in the SQL Syntax.Below is a sample:
MSCK REPAIR TABLE table_demo

System Restraints

MSCK REPAIR TABLE only adds partitions to the metadata of the data table but does not delete them. To delete an added partition, run the ALTER TABLE table-name DROP PARTITION statement.Details can be found in the SQL Syntax.
MSCK REPAIR TABLE is not recommended if the data volume is large, as the system will scan all the data, which may take a long time, cause the task to time out, and make the partition information of the data table incomplete.
A partition catalog must be compatible with the Hive partitioning rule of partition column name=partition column value; otherwise, use ALTER TABLE ADD PARTITION to load a partition.Details can be found in the SQL Syntax.
Make sure that data of a table is stored in a separate folder. For example, if the cosn://tablea_a data in table A and the s3://table_a/table_b data in table B are stored in COS and both tables are partitioned by string, then MSCK REPAIR TABLE will add partitions of table B to table A. To avoid this, use separate folder structures, such as cosn://tablea_aand cosn://tablea_b.
The statement may incur data read/write fees charged by COS. For more information, see Billing Overview.

Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback