tencent cloud

Tencent Cloud TCHouse-D

Product Introduction
Overview
Concepts
Cluster Architecture
Strengths
Scenarios
Purchase Guide
Billing Overview
Renewal Instructions
Overdue Policy
Refund Instructions
Configuration Adjustment Billing Instructions
Getting Started
Using Tencent Cloud TCHouse-D Through the Console
Using Tencent Cloud TCHouse-D Through a Client
Operation Guide
Cluster Operation
Monitoring and Alarm Configuration
Account Privilege Management
Data Management
Query Management
Modify Configurations
Node Management
Log Analysis
SQL Studio
Enabling Resource Isolation
Development Guide
Design of Data Table
Importing Data
Exporting Data
Basic Feature
Query Optimization
Ecological Expansion Feature
API Documentation
History
Introduction
API Category
Making API Requests
Cluster Operation APIs
Database and Table APIs
Cluster Information Viewing APIs
Hot-Cold Data Layering APIs
Database and Operation Audit APIs
User and Permission APIs
Resource Group Management APIs
Data Types
Error Codes
Cloud Ecosystem
Granting CAM Policies to Sub-accounts
Query Acceleration for Tencent Cloud DLC
Practical Tutorial
Basic Feature Usage
Advanced Features Usage
Resource Specification Selection and Optimization Suggestions
Naming Specifications and Limits to the Database and Data Table
Table Design and Data Import
Query Optimization
Suggested Usage to Avoid
Accessing TCHouse-D via JDBC over the Public Network
Performance Testing
TPC-H Performance Testing
SSB Performance Testing
TPC-DS Performance Testing
FAQs
Common Operational Issues
Common Errors
Contact Us
Glossary
Product Policy
Service Level Agreement
Privacy Policy
Data Processing And Security Agreement

Importing with DataX

PDF
Mode fokus
Ukuran font
Terakhir diperbarui: 2024-06-27 11:05:02

About DataX

As a universal offline data import tool open-sourced by Alibaba, DataX is widely used in the industry. DataX enables efficient data synchronization features between various heterogeneous data sources such as MySQL, Oracle, SQLServer, Postgre, HDFS, Hive, ADS, HBase, TableStore(OTS), MaxCompute(ODPS), Hologres, and DRDS, etc. Importing data into Doris using DataX requires the DataX doriswriter plugin. This plugin uses the Stream Load feature of Doris for data import and works together with DataX service.

User Manual

For the code for the DataX doriswriter plugin, see here. This includes the plugin code and the development environment of the DataX project. The doriswriter plugin depends on some modules in the DataX code, which are not in the official Maven repository. Thus, when developing the doriswriter plugin, it is necessary to download the complete DataX code library to compile and develop the plugin.

Directory Structure

doriswriter/

This directory is the code directory of the doriswriter plugin. All the code in this directory is hosted in the Apache Doris repository. For the doriswriter plugin documentation, see:doriswriter/doc.

init-env.sh

This script is primarily used to set up the DataX development environment, which includes the following operations:
1. Clone the DataX code library to local.
2. Connect doriswriter/ directory to DataX/doriswriter directory through soft link.
3. Add the <module>doriswriter</module> module to the DataX/pom.xml file.
4. Change the httpclient version in the DataX/core/pom.xml file from 4.5 to 4.5.13.
Note
Httpclient v4.5 has a bug in handling 307 redirects.
After running this script, developers may start developing or compiling in the DataX/ directory. Due to the soft link, any modifications in the DataX/doriswriter directory will be reflected in the doriswriter/ directory, which facilitates code submission for developers.

Compilation

1. Run init-env.sh.
2. Modify the code in DataX/doriswriter as needed.
3. Compile doriswriter:
3.1 Compile the doriswriter plugin separately:
mvn clean install -pl plugin-rdbms-util,doriswriter -DskipTests
3.2 Compile the entire DataX project:
mvn package assembly:assembly -Dmaven.test.skip=true
The output is in target/datax/datax/.
Note
The hdfsreader, hdfswriter, and oscarwriter plugins require additional jar packages. If you don't need these plugins, you can remove their modules from DataX/pom.xml.
3.3 Compilation error If you encounter the following compilation error:
Could not find artifact com.alibaba.datax:datax-all:pom:0.0.1-SNAPSHOT ...
You can try the following ways to solve it:
3.3.2 After decompression, copy the directory alibaba/datax/ obtained to the corresponding .m2/repository/com/alibaba/ of the maven in use.
3.3.3 Try compiling again.
3.3.4 Submit modifications as needed.

Sample code

Stream reads data and imports it into Doris

For instructions on how to use this sample plugin, please see here.

Mysql reads and imports data into Doris

1. Mysql table structure
CREATE TABLE t_test(
idbigint(30) NOT NULL,
order_code varchar(30) DEFAULT NULL COMMENT '',
line_code varchar(30) DEFAULT NULL COMMENT '',
remark varchar(30) DEFAULT NULL COMMENT '',
unit_no varchar(30) DEFAULT NULL COMMENT '',
unit_name varchar(30) DEFAULT NULL COMMENT '',
price decimal(12,2) DEFAULT NULL COMMENT '',
PRIMARY KEY(id) USING BTREE
)ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=DYNAMIC COMMENT='';
2. Doris table structure
CREATE TABLE ods_t_test (
idbigint(30) NOT NULL,
order_code varchar(30) DEFAULT NULL COMMENT '',
line_code varchar(30) DEFAULT NULL COMMENT '',
remark varchar(30) DEFAULT NULL COMMENT '',
unit_no varchar(30) DEFAULT NULL COMMENT '',
unit_name varchar(30) DEFAULT NULL COMMENT '',
price decimal(12,2) DEFAULT NULL COMMENT ''
)ENGINE=OLAP
UNIQUE KEY(id, order_code)
DISTRIBUTED BY HASH(order_code) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 3",
"in_memory" = "false",
"storage_format" = "V2"
);
3. Create datax scriptimport_t_test.json
{
"job": {
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 0,
"percentage": 0
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "xxx",
"password": "xxx",
"column": ["id","order_code","line_code","remark","unit_no","unit_name","price"],
"connection": [ { "table": [ "t_test" ], "jdbcUrl": [ "jdbc:mysql://10.10.10.1:3306/demo" ] } ] }
},
"writer": {
"name": "doriswriter",
"parameter": {
"feLoadUrl": ["127.0.0.1:8030","127.0.0.2:8030"],
"beLoadUrl": ["127.0.0.3:8040","127.0.0.4:8040","127.0.0.5:8040"],
"jdbcUrl": "jdbc:mysql://127.0.0.1:9030/",
"database": "demo",
"table": "ods_t_test",
"column": ["id","order_code","line_code","remark","unit_no","unit_name","price"],
"username": "xxx",
"password": "xxx",
"postSql": [],
"preSql": [],
"loadProps": {
},
"maxBatchRows" : 300000,
"maxBatchByteSize" : 20971520
}
}
}
]
}
}
4. Execute datax task: python datax.py import_t_test.json. For specific usage, see DataX official website.

Bantuan dan Dukungan

Apakah halaman ini membantu?

masukan