In an analysis cluster, databases and tables from multiple objects in multiple data sources can be merged. This document introduces how to configure the database/table merging in detail.
Prerequisites
The cluster status is Running.
Automatic Table Merging
Note:
Only versions 4.2509.x and later support the automatic mapping policy.
1. On the Data Object page, click Synchronization and Mapping Configuration to access the configuration interface.
2. In the third step Complete Advanced Mapping Configuration, you can define the policies for advanced mapping settings.
In the advanced strategy section, click Add Policy to create a new policy. This policy can utilize regular expressions to match tables with similar naming patterns and consolidate them into a designated table. For detailed instructions, please refer to the description of advanced strategies in Database/Table Mapping. Manual Database Merging
After objects are synchronized to the analysis cluster, if data from multiple source database instances should be placed into the same database in the analysis engine, the database merging feature can be used for implementation.
2. Click the target cluster in the cluster list on the left to enter the cluster management page.
3. On the cluster details page, click Details in Compute Node to access the instance details page.
4. In Object Information, select all the tables for database merging, and click Set Mapping at the top.
5. In the pop-up window, toggle on the Merge Database switch and click Verify. The mapping database names of all currently selected objects will be merged into the mapping database set in the object in the first row.
Manual Table Merging
After objects are synchronized to the analysis cluster, if multiple tables with the same structure should be merged into one table, the table merging feature of the analysis cluster can be used for implementation.
The table merging feature has the following usage requirements:
The structure of the tables to be merged should be consistent.
All tables to be merged should be synchronized to the analysis cluster.
When DDL operations are performed on the source table, the merged table cannot be queried in the analysis cluster by default. You should wait until the table structure of all original tables of the merged table remains consistent before query and use.
Note:
In an analysis cluster, if DDL changes are performed on the source table of a merged table, such as adding a column, modifying a column, or deleting a column, the operating system has two processing policies.
Pessimistic coordination mode: If the table structure of any source table differs from the primary table, the query fails. This processing policy is adopted by default.
Optimistic coordination mode: The merged table is always based on the structure of the primary table, and the query of the merged table will not be affected. This option is disabled by default. To enable it, submit a ticket for application. Operation Steps
2. Click the target cluster in the cluster list on the left to enter the cluster management page.
3. On the cluster details page, click Details in Compute Node to access the instance details page.
4. In Object Information, select all tables to be merged, and click Set Mapping at the top.
5. In the pop-up window, toggle on the Merge Table switch and click Verify. The mapping database names of all currently selected objects will be merged into the mapping table set in the object in the first row.
Note:
The mapping name of the analysis cluster can be modified at any time. Table merging can also be modified or canceled, with immediate effect after modification.
Merge into an existing table: If this option is selected and other database and table names are selected, it means that all currently selected tables need to be merged into another table.
Add virtual column: If this option is selected, it means to add a virtual column to the merged table. The virtual column will add an identifier to each row to identify the data source of this data. The filling value can be customized, and the default value is the instance ID.
Note:
When auto-increment primary keys exist in multiple source tables to be merged, primary key conflicts will not occur after the tables are merged into the analysis engine. The database itself can avoid primary key conflicts, but duplicate values may appear in the auto-increment primary key column after merging. If data uniqueness cannot be determined, use the virtual column feature.
The default type of the _libra_aggid field is varchar, supporting various expressions and functions compatible with varchar. However, only the "equal to", "not equal to", and "in" expressions achieve efficient pruning of table data.
select * from database.merge_table;
+
| id | c1 | _libra_aggid |
+
| 1 | 1 | cdb-adf3123 |
| 2 | 2 | cdb-adf3123 |
| 1 | 1 | cdb-bda41ag |
| 2 | 2 | cdb-bda41ag |
| 1 | 1 | cdb-zd3123d |
| 2 | 2 | cdb-zd3123d |
+
6 rows in set(0.01 sec)
Primary Table: In multi-table merging scenarios, one of the merged tables should be set as the primary table. The table structure for merged table queries in the analysis engine will be based on the primary table. The primary table can be modified at any time during use.