tencent cloud

Cloud Object Storage

Release Notes and Announcements
Release Notes
Announcements
Product Introduction
Overview
Features
Use Cases
Strengths
Concepts
Regions and Access Endpoints
Specifications and Limits
Service Regions and Service Providers
Billing
Billing Overview
Billing Method
Billable Items
Free Tier
Billing Examples
Viewing and Downloading Bill
Payment Overdue
FAQs
Getting Started
Console
Getting Started with COSBrowser
User Guide
Creating Request
Bucket
Object
Data Management
Batch Operation
Global Acceleration
Monitoring and Alarms
Operations Center
Data Processing
Content Moderation
Smart Toolbox
Data Processing Workflow
Application Integration
User Tools
Tool Overview
Installation and Configuration of Environment
COSBrowser
COSCLI (Beta)
COSCMD
COS Migration
FTP Server
Hadoop
COSDistCp
HDFS TO COS
GooseFS-Lite
Online Tools
Diagnostic Tool
Use Cases
Overview
Access Control and Permission Management
Performance Optimization
Accessing COS with AWS S3 SDK
Data Disaster Recovery and Backup
Domain Name Management Practice
Image Processing
Audio/Video Practices
Workflow
Direct Data Upload
Content Moderation
Data Security
Data Verification
Big Data Practice
COS Cost Optimization Solutions
Using COS in the Third-party Applications
Migration Guide
Migrating Local Data to COS
Migrating Data from Third-Party Cloud Storage Service to COS
Migrating Data from URL to COS
Migrating Data Within COS
Migrating Data Between HDFS and COS
Data Lake Storage
Cloud Native Datalake Storage
Metadata Accelerator
GooseFS
Data Processing
Data Processing Overview
Image Processing
Media Processing
Content Moderation
File Processing Service
File Preview
Troubleshooting
Obtaining RequestId
Slow Upload over Public Network
403 Error for COS Access
Resource Access Error
POST Object Common Exceptions
API Documentation
Introduction
Common Request Headers
Common Response Headers
Error Codes
Request Signature
Action List
Service APIs
Bucket APIs
Object APIs
Batch Operation APIs
Data Processing APIs
Job and Workflow
Content Moderation APIs
Cloud Antivirus API
SDK Documentation
SDK Overview
Preparations
Android SDK
C SDK
C++ SDK
.NET(C#) SDK
Flutter SDK
Go SDK
iOS SDK
Java SDK
JavaScript SDK
Node.js SDK
PHP SDK
Python SDK
React Native SDK
Mini Program SDK
Error Codes
Harmony SDK
Endpoint SDK Quality Optimization
Security and Compliance
Data Disaster Recovery
Data Security
Cloud Access Management
FAQs
Popular Questions
General
Billing
Domain Name Compliance Issues
Bucket Configuration
Domain Names and CDN
Object Operations
Logging and Monitoring
Permission Management
Data Processing
Data Security
Pre-signed URL Issues
SDKs
Tools
APIs
Agreements
Service Level Agreement
Privacy Policy
Data Processing And Security Agreement
Contact Us
Glossary
DocumentaçãoCloud Object StorageMigration GuideMigrating Data Between HDFS and COS

Migrating Data Between HDFS and COS

Modo Foco
Tamanho da Fonte
Última atualização: 2024-03-25 15:11:17

Overview

Hadoop DistCp (Distributed copy) is a tool used for large inter- and intra-cluster copying. It uses MapReduce to perform distribution, error handling, recovery, and reporting. With the parallel processing capabilities of MapReduce, Hadoop DistCp can quickly perform large-scale data migration through map tasks, each of which copies a portion of the files specified under the source path.
Since Hadoop-COS implements the semantics of the Hadoop Distributed File System (HDFS), Hadoop DistCp can help you easily migrate data between COS and HDFS. This document outlines this process below.

Prerequisites

1. The Hadoop-COS plugin has been installed on the Hadoop cluster, and the access key for COS has been configured correctly. You can use the following Hadoop commands to check if COS access is normal:
hadoop fs -ls cosn://examplebucket-1250000000/
If the files in the COS bucket can be listed correctly, it means that Hadoop-COS has been installed and configured correctly, and the following steps can be performed.
2. The COS access account must have read and write permission for the destination path of the COS bucket.
Note:
You can authorize sub-accounts read/write permissions for resources in the COS bucket as needed. You are advised to authorize by referring to Notes on Principle of Least Privilege and Setting Sub-user Permissions. The common preset policies are as follows:
DataFullControl: full control over data, including the permission to read, write, as well as delete files, and list the file list.
QcloudCOSDataReadOnly: read-only permission
QcloudCOSDataWriteOnly: write-only permission
The custom monitoring feature requires Cloud Monitoring to have permission to report metrics and read API operations. Please grant the QcloudMonitorFullAccess permission with caution.

Directions

Copying files from HDFS to a COS bucket

Use Hadoop DistCp to migrate files in the / test directory of the local HDFS cluster to the COS bucket hdfs-test-1250000000.

1. Run the following command to start the migration:
hadoop distcp hdfs://10.0.0.3:9000/test cosn://hdfs-test-1250000000/
Hadoop DistCp starts MapReduce tasks to copy the files, and outputs a brief report like so:

2. Run the hadoop fs -ls -R cosn://hdfs-test-1250000000/ command to list the directories and files that have just been migrated to the bucket hdfs-test-1250000000.


Copying files from a COS bucket to a local HDFS cluster

Hadoop DistCp is a tool that supports copying data between different clusters and file systems. To copy COS files to a local HDFS cluster, simply use the object path in the COS bucket as the source path and the HDFS file path as the destination path.
hadoop distcp cosn://hdfs-test-1250000000/test hdfs://10.0.0.3:9000/

Using the DistCp command line to migrate data between HDFS and COS

Note:
With this command line, you can migrate data from HDFS to COS, and vice versa.
Run the following command:
hadoop distcp -Dfs.cosn.impl=org.apache.hadoop.fs.CosFileSystem -Dfs.cosn.bucket.region=ap-XXX -Dfs.cosn.userinfo.secretId=AK**XXX -Dfs.cosn.userinfo.secretKey=XXXX -libjars /home/hadoop/hadoop-cos-2.6.5-shaded.jar cosn://bucketname-appid/test/ hdfs:///test/
Parameter description:
Dfs.cosn.impl: Always set it to org.apache.hadoop.fs.CosFileSystem.
Dfs.cosn.bucket.region: bucket region. You can view the region on the bucket list page of the COS console.
Dfs.cosn.userinfo.secretId: Enter the SecretId of the bucket owner’s account. The value can be obtained at Manage API Key.
Dfs.cosn.userinfo.secretKey: Enter the SecretKey of the bucket owner’s account. The value can be obtained at Manage API Key.
libjars: specifies the location of the Hadoop-COS JAR package. You download the package from the dep directory at GitHub repository.
Note:
For other parameters, please see Hadoop-COS.

Additional Hadoop DistCp Parameters

Hadoop DistCp supports a variety of parameters. For example, you can use -m to specify the maximum number of concurrent Map tasks, and -bandwidth to limit the maximum bandwidth used by each map. For more information, please see Apache Hadoop DistCp Guide.

Ajuda e Suporte

Esta página foi útil?

comentários