tencent cloud

MySQL Data Source
Last updated:2024-11-01 17:42:33
MySQL Data Source
Last updated: 2024-11-01 17:42:33

MySQL Environment Preparation and Database Configuration

If you want to perform offline data synchronization using MySQL, you need to first confirm the version support situation of the MySQL data source and the supported read data type conversions.

Supported Versions

Follow these version limitations:
Data Source Type
Edition
Driver
MySQL
5.6,5.7,8.0.x
JDBC Driver:8.0.21

Confirm MySQL Version

Data Integration has version requirements for MySQL. Check if the MySQL version to be synchronized meets the version requirements. You can check the current MySQL database version using the following statement in the MySQL database.
select version();

Set MySQL Server Permissions

You can define a MySQL user with appropriate permissions.
1. Create MySQL User (Optional):
mysql> CREATE USER 'user'@'localhost' IDENTIFIED BY 'password';
2. Grant Required Permissions to User:
For offline data synchronization, the account must have SELECT permission on the database. Refer to the following example to execute the command:
mysql> GRANT SELECT ON *.* TO 'user' IDENTIFIED BY 'password';
3. Refresh the user's permissions:
mysql>FLUSH PRIVILEGES;

Data source configuration

Enter the data source configuration interface, MySQL data sources support two connection methods: cloud instance and connection string.
click Project Management > Data Source Management > Create Data Source > Select MySQL Data Source.
Create a data source via connection string.



Parameter
Description
Data Source Name
The name of the newly created data source, defined by the user and cannot be empty. It should start with a letter and can include letters, numbers, and underscores. The length should be within 20 characters.
Description
Optional, description of this data source.
Data Source Permission
Project sharing means all members of the current data source project can use it; Individual Only and Administrator indicates that the data source is only available to the creator and project administrators.
Deployment Method
Supports two deployment modes: Self-Built Instance and Public Network Instance.Self-Built Instance is deployed on Tencent CVM, while Public Network Instanceis is in the client’s local IDC or other cloud resources, accessible via public network.
Region and Network
When selecting a self-built instance, you need to select the region and vpcID where the data source instance is located.
JDBC URL
Connection string information used to connect to the MySQL data source instance, including host IP, port, database name, and other information.
Username
Username for connecting to the database.
Password
Password for connecting to the database.
Data Connectivity
Test whether the configured database can be connected.
Note:
If the connectivity test fails, the data source can still be saved. However, if saved, the data source cannot be used until the connectivity test passes.
If the connectivity test fails, it might be because WeData is blocked by the network firewall where the database is located.

MySQL Offline Single Table Read Node Configuration




Parameter
Description
Data Source
Available MySQL Data Source.
Database
Supports selection or manual input of the library name to read from.
By default, the database bound to the data source is used as the default database. Other databases need to be manually entered.
If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
Table
Supports selecting or manually entering the table name to be read.
In the case of table partitioning, you can select or input multiple table names at the MySQL source end, and ensure that the structure of multiple tables is consistent.
In the case of table partitioning, the table index range can be configured. For instance, 'table_[0-99]' indicates reading 'table_0', 'table_1', 'table_2' up to 'table_99'; if the numeric suffix of your tables has a consistent length, like 'table_000','table_001','table_002' up to 'table_999', you can configure it as '"table": ["table_00[0-9]", "table_0[10-99]", "table_[100-999]"]'.
If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name. Data synchronization can still be performed when the Data Integration network is connected.
Add Shared Database/Table
Applicable to sharding scenarios. Click to configure multiple data sources, databases, and table information. In sharding scenarios, ensure all table structures are consistent. The task configuration will default to displaying and using the structure of the first table to obtain data.
Split Key
Specify the field used for data sharding. After specifying, concurrent tasks will be initiated for data synchronization, improving data synchronization efficiency. You can use a column in the source data table as the partition key. It is recommended to use the primary key or indexed columns as the partition key. Only integer-type fields are supported.
Filter Conditions (Optional)
In actual business scenarios, data from the current day is usually chosen for synchronization. Specify the where condition as gmt_create>$bizdate. The where condition can effectively carry out business incremental synchronization.
If the WHERE clause is not provided, including missing the key or value, the data synchronization is treated as full data synchronization.
You cannot set the WHERE condition to LIMIT 10, as this does not comply with MySQL WHERE clause constraints.
Advanced Settings (Optional)
You can configure parameters according to business needs.

MySQL Offline Single Table Write Node Configuration




Parameter
Description
Data Destination
The MySQL data source to write to.
Write Type
Single table write: Supports single table writing
Write to Sharded Databases/Tables: Supports one-time writing to sharded MySQL tables within the same data source. Ensure all table schema structures are consistent and all target tables exist.
Database
Supports selection or manual input of the database name to write to
By default, the database bound to the data source is used as the default database. Other databases need to be manually entered.
If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
Table
Supports selection or manual input of the table name to write to
If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name.
Data synchronization can still be performed when the Data Integration network is connected.
Whether to Clear Table
Before writing to the MySQL data table, you can manually choose whether to clear the data table.
Write Mode
MySQL Writing supports three modes:
Append: When a primary key/unique index conflict occurs, the conflicting rows cannot be written.
Overwrite: In case of primary key/unique index conflict, the original row will be deleted first, and then the new row will be inserted.
On duplicate key: When a primary key/uniqueness index conflict occurs, the new row will update the specified fields. Specified fields refer to the fields added in the field mapping configuration of the synchronization task.
Batch Submission Size
The number of records submitted in a single batch can greatly reduce the number of network interactions between the data synchronization system and MySQL, and improve overall throughput. If this value is set too high, it may lead to OOM exceptions in the data synchronization running process.
Pre-Executed SQL (Optional)
The SQL statement executed before the synchronization task. Fill in the correct SQL syntax according to the data source type, such as clearing the old data in the table before execution (truncate table tablename).
Post-Executed SQL (Optional)
The SQL statement executed after the synchronization task. Fill in the correct SQL syntax according to the data source type, such as adding a timestamp (alter table tablename add colname timestamp DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP).

Data type conversion support

Read

The supported field types and type conversion mappings for MySQL reading are as follows (when processing MySQL, the data types of the MySQL data source and the data processing engine will be mapped):
MySQL Data Types
Internal Types
tinyint, smallint, mediumint, int, bigint,year
Long
float, double, decimal
Double
varchar, char, tinytext, text, mediumtext, longtext,set,json
String
date, datetime, timestamp, time
Date
bit, bool
Boolean
tinyblob, mediumblob, blob, longblob, varbinary
Bytes

Write

The supported field types and type conversion mappings for MySQL writing are as follows:
Internal Types
MySQL Data Types
Long
tinyint, smallint, mediumint, int, bigint,year
Double
float, double, decimal
String
varchar, char, tinytext, text, mediumtext, longtext,set,json
Date
date, datetime, timestamp, time
Boolean
bit, bool
Bytes
tinyblob, mediumblob, blob, longblob, varbinary

FAQs

1. Using COLLATE to specify the collation may cause some data synchronization duplication

Cause:
COLLATE is used to specify the collation of a field. It affects the order of the ORDER BY statement, the results of greater than and less than filters in WHERE conditions, as well as the results of DISTINCT, GROUP BY, and HAVING queries. All character type comparisons or sorting are related to COLLATE.
Solution:
Choose non-COLLATE columns when setting split keys for integration tasks.


Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback