/data01/jars.JAR Filename | Description | Download Address |
cos-distcp-1.12-3.1.0.jar | COSDistCp package, whose data needs to be copied to COSN. | |
chdfs_hadoop_plugin_network-2.8.jar | OFS plugin | |
Hadoop-COS | 8.1.5 or later | |
cos_api-bundle | The version needs to match the Hadoop-COS version. |
cosn://bucketname-appid/ starting from v8.1.5.core-site.xml and distribute the configuration to all nodes. If only data needs to be migrated, you don't need to restart the big data component.Key | Value | Configuration File | Description |
fs.cosn.trsf.fs.ofs.impl | com.qcloud.chdfs.fs.CHDFSHadoopFileSystemAdapter | core-site.xml | COSN implementation class, which is required. |
fs.cosn.trsf.fs.AbstractFileSystem.ofs.impl | com.qcloud.chdfs.fs.CHDFSDelegateFSAdapter | core-site.xml | COSN implementation class, which is required. |
fs.cosn.trsf.fs.ofs.tmp.cache.dir | In the format of `/data/emr/hdfs/tmp/` | core-site.xml | Temporary directory, which is required. It will be created on all MRS nodes. You need to ensure that there are sufficient space and permissions. |
fs.cosn.trsf.fs.ofs.user.appid | `appid` of your COS bucket | core-site.xml | Required |
fs.cosn.trsf.fs.ofs.ranger.enable.flag | false | core-site.xml | This key is required. You need to check whether the value is `false`. |
fs.cosn.trsf.fs.ofs.bucket.region | Bucket region | core-site.xml | This key is required. Valid values: eu-frankfurt (Frankfurt), ap-chengdu (Chengdu), and ap-singapore (Singapore). |
hdfs:///data/user/target to cosn://{bucketname-appid}/data/user/target.hdfs dfsadmin -disallowSnapshot hdfs:///data/user/hdfs dfsadmin -allowSnapshot hdfs:///data/user/targethdfs dfs -deleteSnapshot hdfs:///data/user/target {current date}hdfs dfs -createSnapshot hdfs:///data/user/target {current date}

hadoop fs -libjars /data01/jars/chdfs_hadoop_plugin_network-2.8.jar -mkdir cosn://bucket-appid/distcp-tmp
nohup hadoop jar /data01/jars/cos-distcp-1.10-2.8.5.jar -libjars /data01/jars/chdfs_hadoop_plugin_network-2.8.jar --src=hdfs:///data/user/target/.snapshot/{current date} --dest=cosn://{bucket-appid}/data/user/target --temp=cosn://bucket-appid/distcp-tmp/ --preserveStatus=ugpt --skipMode=length-checksum --checkMode=length-checksum --cosChecksumType=CRC32C --taskNumber 6 --workerNumber 32 --bandWidth 200 >> ./distcp.log &
--taskNumber=10.workerNumber=4.-1, which indicates no limit on the read bandwidth. Example: --bandWidth=10.COMPOSITE_CRC32. The Hadoop version must be 3.1.1 or later; otherwise, you need to change this parameter to --cosChecksumType=CRC64.workerNumber to 1, use the taskNumber parameter to control the number of concurrent migrations, and use the bandWidth parameter to control the bandwidth of a single concurrent migration.FILES_FAILED indicates the number of failed files. If there is no FILES_FAILED counter, all files have been migrated successfully.CosDistCp CountersBYTES_EXPECTED=10198247BYTES_SKIPPED=10196880FILES_COPIED=1FILES_EXPECTED=7FILES_FAILED=1FILES_SKIPPED=5
Statistics Item | Description |
BYTES_EXPECTED | Total size (in bytes) to copy according to the source directory |
FILES_EXPECTED | Number of files to copy according to the source directory, including the directory itself |
BYTES_SKIPPED | Total size (in bytes) of files that can be skipped (same length or checksum value) |
FILES_SKIPPED | Number of source files that can be skipped (same length or checksum value) |
FILES_COPIED | Number of source files that are successfully copied |
FILES_FAILED | Number of source files that failed to be copied |
FOLDERS_COPIED | Number of directories that are successfully copied |
FOLDERS_SKIPPED | Number of directories that are skipped |
--delete parameter to guarantee the complete consistency between the HDFS and COS data.--delete parameter, you need to add the --deleteOutput=/xxx(custom) parameter but not the --diffMode parameter.nohup hadoop jar /data01/jars/cos-distcp-1.10-2.8.5.jar -libjars /data01/jars/chdfs_hadoop_plugin_network-2.8.jar --src=--src=hdfs:///data/user/target/.snapshot/{current date} --dest=cosn://{bucket-appid}/data/user/target --temp=cosn://bucket-appid/distcp-tmp/ --preserveStatus=ugpt --skipMode=length-checksum --checkMode=length-checksum --cosChecksumType=CRC32C --taskNumber 6 --workerNumber 32 --bandWidth 200 --delete --deleteOutput=/dele-xx >> ./distcp.log &
trash directory, and the list of moved files will be generated in the /xxx/failed directory. You can run hadoop fs -rm URL or hadoop fs -rmr URL to delete the data in the trash directory.masukan