A Catalog serves as the top-level logical entity for user-facing metadata within TCLake, organizing metadata resources into a hierarchical structure. It enables metadata isolation and fine-grained access control across different business units and users. This document provides a guide to basic catalog operations.
Data Catalog Hierarchy Model
Within TCLake, all metadata is registered and stored in an underlying metadata store (Metalake), which is completely transparent to the user. The hierarchical structure of metadata objects within a unified catalog is divided into three distinct levels. When referencing tables, volumes, models, or functions, they are represented using a three-level namespace format (e.g., catalog.schema.table).
Level 1: Catalog
A catalog organizes data assets across various formats. Currently, TCLake supports the following catalog categories:
|
Built-in Data Catalog | Lakehouse Catalog | A structured data catalog that natively hosts open formats, including the TCIceberg batch-stream unified table format and the Lance multimodal format. |
| Volume Catalog | An unstructured data catalog designed to directly manage files (e.g., images, video, and audio) stored in Cloud Object Storage (COS), enabling unified metadata governance for unstructured assets. |
| Model Catalog | A built-in catalog for machine learning model artifacts. It allows you to register models trained in MLOps frameworks (e.g., MLflow) into TC-Catalog, delivering full-lifecycle management for ML models. (Currently under development) |
External Data Catalog | e.g., MySQL, EMR, DLC, TCHouse | Establishes connections to external data sources via JDBC or similar protocols to retrieve their metadata in real time. |
Level 2: Schema
A Schema (often synonymous with a database) is a second-level object within a catalog. Depending on the catalog type, a schema can contain concrete data resources such as tables, views, volumes, machine learning models, and functions. Schemas organize Data and AI assets into logical groupings, providing a more granular level of categorization than the catalog itself.
Level 3: Data Resources
The third level of the catalog hierarchy consists of concrete data entities, such as Tables, Volumes, Models, and Functions, depending on the specific catalog type.
Tables / Views
A table is a concrete dataset hosted within the TCLake service, organizing data into rows and columns. A view is a saved query built on top of one or more tables.
Volumes
A volume is a logical entity used to manage unstructured data stored in systems like COS or HDFS. For example, by mapping files under a COS path (e.g., examplebucket.cos.ap-guangzhou.myqcloud.com/folder/ containing a.jpg and b.csv) to MyCatalog.MySchema.MyVolume, a compute engine can directly access the image using MyCatalog.MySchema.MyVolume/a.jpg.
Note:
A volume can only be created within a Volume-type catalog.
Models
A machine learning model registered into the Catalog from frameworks like MLflow. (Currently under development)
Note:
A model can only be created within a Model-type catalog.
Functions
A user-defined function (UDF) registered within the Catalog. It can return either a scalar value or a set of rows. (Currently in planning)
Creating a Data Catalog
2. On the Data Catalog list page, ensure you are logged in as a user with the TCLake Admin role, then click Create Data Catalog.
3. Configure the following parameters in the dialog:
|
Catalog Name | Required. A globally unique identifier (no duplicates allowed). Must be 1 to 64 characters long and can only contain letters, digits, and underscores. |
Description | Optional. |
Storage Class | Currently, only the Standard storage class is supported. |
Enable Multi-AZ Redundancy | Notes: This feature is currently in limited release. To request access, please submit a ticket. Disabled by default. When enabled, data is redundantly stored across multiple Availability Zones (AZs) within the same region, providing intra-city disaster recovery capabilities. Please note: Once enabled, Multi-AZ redundancy cannot be disabled. While it significantly enhances data reliability, it also incurs higher storage usage fees. We recommend enabling this feature only if your workloads demand elevated data reliability. |
4. Review and agree to the TCLake Pricing Overview, then click to create the catalog.
Viewing a Data Catalog
In the left navigation pane, select Data Catalog. Using the catalog tree browser, you can select and explore specific catalogs along with their underlying levels, such as Schemas and Tables.
Editing a Data Catalog
1. On the Data Catalog list page, find the target catalog and click Edit in the Actions column.
2. In the configuration dialog, modify the catalog settings as needed.
Deleting a Data Catalog
On the Data Catalog list page, find the target catalog and click Delete in the Actions column.
Note:
To prevent accidental data loss, for built-in catalogs (Lakehouse, Volume, and Model) hosted in TCLake, you must manually delete all underlying metadata resources—except the default schema—before you can delete the catalog itself.