tencent cloud

Cloud HDFS

Product Introduction
Purchase Guide
Getting Started
Operation Guide
Creating CHDFS Instance
Creating Permission Group
Creating Permission Rule
Creating Mount Point
Mounting CHDFS Instance
Authorizing Access with CAM
Accessing CHDFS Through Java Code
Deleting File System
Practical Tutorial
Using CHDFS as Druid's Deep Storage
Migrating Data from Native HDFS to CHDFS
Importing or Exporting CHDFS with DataX
Guide to Configuring CHDFS for CDH
CHDFS Ranger Permission System Solution
API Documentation
History
Introduction
API Category
Making API Requests
Restoration task APIs
Permission rule APIs
Permission group APIs
Other APIs
Mount point APIs
Lifecycle rule APIs
File system APIs
Data Types
Error Codes
FAQs
Glossary
Related Protocol
Privacy Policy
Data Processing And Security Agreement
DokumentasiCloud HDFSPractical TutorialImporting or Exporting CHDFS with DataX

Importing or Exporting CHDFS with DataX

PDF
Mode fokus
Ukuran font
Terakhir diperbarui: 2025-09-26 10:17:22

Environment Dependencies

DataX version: DataX 3.0

Download and Installation

Getting CHDFS JAR

Download CHDFS JAR.

Getting DataX package

Download DataX.

Installing CHDFS JAR

After downloading the CHDFS JAR, copy chdfs_hadoop_plugin_network-1.7.jar to the DataX decompression path plugin/reader/hdfsreader/libs/ and plugin/writer/hdfswriter/libs/.

Directions

Configuring DataX

Modifying datax.py script

Open the bin/datax.py script in the DataX decompression directory and modify the CLASS_PATH variable in it as follows:
CLASS_PATH = ("%s/lib/*:%s/plugin/reader/hdfsreader/libs/*:%s/plugin/writer/hdfswriter/libs/*:.") % (DATAX_HOME, DATAX_HOME, DATAX_HOME)

Configuring hdfsreader and hdfswriter in JSON configuration file

Below is a JSON sample:
{
"job": {
"setting": {
"speed": {
"byte": 10485760
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": [{
"reader": {
"name": "hdfsreader",
"parameter": {
"path": "testfile",
"defaultFS": "ofs://f4xxxxxxxxx-hxT9.chdfs.ap-beijing.myqcloud.com/",
"column": ["*"],
"fileType": "text",
"encoding": "UTF-8",
"hadoopConfig": {
"fs.AbstractFileSystem.ofs.impl": "com.qcloud.chdfs.fs.CHDFSDelegateFSAdapter",
"fs.ofs.impl": "com.qcloud.chdfs.fs.CHDFSHadoopFileSystemAdapter",
"fs.ofs.tmp.cache.dir": "/data/chdfs_tmp_cache",
"fs.ofs.user.appid": "1250000000"
},
"fieldDelimiter": ","
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"path": "/user/hadoop/",
"fileName": "testfile1",
"defaultFS": "ofs://f4xxxxxxxxx-hxT9.chdfs.ap-beijing.myqcloud.com/",
"column": [{
"name": "col",
"type": "string"
},
{
"name": "col1",
"type": "string"
},
{
"name": "col2",
"type": "string"
}
],
"fileType": "text",
"encoding": "UTF-8",
"hadoopConfig": {
"fs.AbstractFileSystem.ofs.impl": "com.qcloud.chdfs.fs.CHDFSDelegateFSAdapter",
"fs.ofs.impl": "com.qcloud.chdfs.fs.CHDFSHadoopFileSystemAdapter",
"fs.ofs.tmp.cache.dir": "/data/chdfs_tmp_cache",
"fs.ofs.user.appid": "1250000000"
},
"fieldDelimiter": ":",
"writeMode": "append"
}
}
}]
}
}
Here, configure hadoopConfig as the configuration required by the CHDFS instance and enter the path of the CHDFS instance as defaultFS, such as ofs://f4xxxxxxxxx-hxT9.chdfs.ap-beijing.myqcloud.com/. Other configuration items are the same as the HDFS configuration items.

Performing data migration

Save the configuration file as hdfs_job.json, place it in the job directory, and run the following command:
bin/datax.py job/hdfs_job.json
You can see that the output is as follows:
2020-03-09 16:49:59.543 [job-0] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%


[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 1 | 1 | 1 | 0.024s | 0.024s | 0.024s
PS Scavenge | 1 | 1 | 1 | 0.014s | 0.014s | 0.014s

2020-03-09 16:49:59.543 [job-0] INFO JobContainer - PerfTrace not enable!
2020-03-09 16:49:59.543 [job-0] INFO StandAloneJobContainerCommunicator - Total 2 records, 33 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.033s | Percentage 100.00%
2020-03-09 16:49:59.544 [job-0] INFO JobContainer -
Task start time : 2020-03-09 16:49:48
Task end time : 2020-03-09 16:49:59
Total task duration : 11s
Average task traffic : 3 B/s
Record write speed : 0 rec/s
Read records : 2
Failed reads/writes : 0


Bantuan dan Dukungan

Apakah halaman ini membantu?

masukan