tencent cloud

Cloud Object Storage

Release Notes and Announcements
Release Notes
Announcements
Product Introduction
Overview
Features
Use Cases
Strengths
Concepts
Regions and Access Endpoints
Specifications and Limits
Service Regions and Service Providers
Billing
Billing Overview
Billing Method
Billable Items
Free Tier
Billing Examples
Viewing and Downloading Bill
Payment Overdue
FAQs
Getting Started
Console
Getting Started with COSBrowser
User Guide
Creating Request
Bucket
Object
Data Management
Batch Operation
Global Acceleration
Monitoring and Alarms
Operations Center
Data Processing
Content Moderation
Smart Toolbox
Data Processing Workflow
Application Integration
User Tools
Tool Overview
Installation and Configuration of Environment
COSBrowser
COSCLI (Beta)
COSCMD
COS Migration
FTP Server
Hadoop
COSDistCp
HDFS TO COS
GooseFS-Lite
Online Tools
Diagnostic Tool
Use Cases
Overview
Access Control and Permission Management
Performance Optimization
Accessing COS with AWS S3 SDK
Data Disaster Recovery and Backup
Domain Name Management Practice
Image Processing
Audio/Video Practices
Workflow
Direct Data Upload
Content Moderation
Data Security
Data Verification
Big Data Practice
COS Cost Optimization Solutions
Using COS in the Third-party Applications
Migration Guide
Migrating Local Data to COS
Migrating Data from Third-Party Cloud Storage Service to COS
Migrating Data from URL to COS
Migrating Data Within COS
Migrating Data Between HDFS and COS
Data Lake Storage
Cloud Native Datalake Storage
Metadata Accelerator
GooseFS
Data Processing
Data Processing Overview
Image Processing
Media Processing
Content Moderation
File Processing Service
File Preview
Troubleshooting
Obtaining RequestId
Slow Upload over Public Network
403 Error for COS Access
Resource Access Error
POST Object Common Exceptions
API Documentation
Introduction
Common Request Headers
Common Response Headers
Error Codes
Request Signature
Action List
Service APIs
Bucket APIs
Object APIs
Batch Operation APIs
Data Processing APIs
Job and Workflow
Content Moderation APIs
Cloud Antivirus API
SDK Documentation
SDK Overview
Preparations
Android SDK
C SDK
C++ SDK
.NET(C#) SDK
Flutter SDK
Go SDK
iOS SDK
Java SDK
JavaScript SDK
Node.js SDK
PHP SDK
Python SDK
React Native SDK
Mini Program SDK
Error Codes
Harmony SDK
Endpoint SDK Quality Optimization
Security and Compliance
Data Disaster Recovery
Data Security
Cloud Access Management
FAQs
Popular Questions
General
Billing
Domain Name Compliance Issues
Bucket Configuration
Domain Names and CDN
Object Operations
Logging and Monitoring
Permission Management
Data Processing
Data Security
Pre-signed URL Issues
SDKs
Tools
APIs
Agreements
Service Level Agreement
Privacy Policy
Data Processing And Security Agreement
Contact Us
Glossary

Importing/Exporting COS Using DataX

PDF
Mode fokus
Ukuran font
Terakhir diperbarui: 2025-09-26 10:13:02

Environmental Dependencies

HADOOP-COS and the corresponding cos_api-bundle.
DataX version: DataX 3.0

Download and Installation

Downloading HADOOP-COS

Download HADOOP-COS and the corresponding cos_api-bundle on Github.

Downloading DataX package

Download DataX on Github.

Installing HADOOP-COS

After HADOOP-COS is downloaded, copy hadoop-cos-2.x.x-${version}.jar and cos_api-bundle-${version}.jar to the Datax decompression paths plugin/reader/hdfsreader/libs/ and plugin/writer/hdfswriter/libs/.

How to Use

DataX configuration

Modifying datax.py script

Open the bin/datax.py script in the DataX decompression directory, and modify the CLASS_PATH variable in the script as follows:
CLASS_PATH = ("%s/lib/*:%s/plugin/reader/hdfsreader/libs/*:%s/plugin/writer/hdfswriter/libs/*:.") % (DATAX_HOME, DATAX_HOME, DATAX_HOME)

Configuring hdfsreader and hdfswriter in JSON configuration file

A sample JSON file is as shown below:
{
"job": {
"setting": {
"speed": {
"byte": 10485760
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": [{
"reader": {
"name": "hdfsreader",
"parameter": {
"path": "testfile",
"defaultFS": "cosn://examplebucket-1250000000/",
"column": ["*"],
"fileType": "text",
"encoding": "UTF-8",
"hadoopConfig": {
"fs.cosn.impl": "org.apache.hadoop.fs.CosFileSystem",
"fs.cosn.userinfo.region": "ap-beijing",
"fs.cosn.tmp.dir": "/tmp/hadoop_cos",
"fs.cosn.userinfo.secretId": "COS_SECRETID",
"fs.cosn.userinfo.secretKey": "COS_SECRETKEY"
},
"fieldDelimiter": ","
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"path": "/user/hadoop/",
"fileName": "testfile1",
"defaultFS": "cosn://examplebucket-1250000000/",
"column": [{
"name": "col",
"type": "string"
},
{
"name": "col1",
"type": "string"
},
{
"name": "col2",
"type": "string"
}
],
"fileType": "text",
"encoding": "UTF-8",
"hadoopConfig": {
"fs.cosn.impl": "org.apache.hadoop.fs.CosFileSystem",
"fs.cosn.userinfo.region": "ap-beijing",
"fs.cosn.tmp.dir": "/tmp/hadoop_cos",
"fs.cosn.userinfo.secretId": "COS_SECRETID",
"fs.cosn.userinfo.secretKey": "COS_SECRETKEY"
},
"fieldDelimiter": ":",
"writeMode": "append"
}
}
}]
}
}
Notes:
Configure hadoopConfig as required for cosn.
Use defaultFS to specify the cosn path, e.g. cosn://examplebucket-1250000000/.
In fs.cosn.userinfo.region, enter the region where your bucket resides, such as ap-beijing. For more information, see Regions and Access Endpoints.
For COS_SECRETID and COS_SECRETKEY, use your own COS key information.
The other fields can be the same as those for hdfs.

Migrating data

Save the configuration file as hdfs_job.json in the job directory by running
bin/datax.py job/hdfs_job.json
The resulting output is as shown below:
2020-03-09 16:49:59.543 [job-0] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%


[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 1 | 1 | 1 | 0.024s | 0.024s | 0.024s
PS Scavenge | 1 | 1 | 1 | 0.014s | 0.014s | 0.014s

2020-03-09 16:49:59.543 [job-0] INFO JobContainer - PerfTrace not enable!
2020-03-09 16:49:59.543 [job-0] INFO StandAloneJobContainerCommunicator - Total 2 records, 33 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.033s | Percentage 100.00%
2020-03-09 16:49:59.544 [job-0] INFO JobContainer -
Job start time : 2020-03-09 16:49:48
Job end time : 2020-03-09 16:49:48
Job duration : 11s
Average job traffic : 3B/s
Recorded write speed : 0rec/s
Recorded read count : 2
Read/Write failure count : 0


Bantuan dan Dukungan

Apakah halaman ini membantu?

masukan