hadoop-cos-2.x.x-${version}.jar, cos_api-bundle-${version}.jar, and chdfs_hadoop_plugin_network-${version}.jar to plugin/reader/hdfsreader/libs/ and plugin/writer/hdfswriter/libs/ in the extracted DataX path.datax.py scriptbin/datax.py script in the DataX decompression directory, and modify the CLASS_PATH variable in the script as follows:CLASS_PATH = ("%s/lib/*:%s/plugin/reader/hdfsreader/libs/*:%s/plugin/writer/hdfswriter/libs/*:.") % (DATAX_HOME, DATAX_HOME, DATAX_HOME)
hdfsreader and hdfswriter in the configuration JSON file{"job": {"setting": {"speed": {"byte": 10485760},"errorLimit": {"record": 0,"percentage": 0.02}},"content": [{"reader": {"name": "hdfsreader","parameter": {"path": "/test/","defaultFS": "cosn://examplebucket1-1250000000/","column": ["*"],"fileType": "text","encoding": "UTF-8","hadoopConfig": {"fs.cosn.impl": "org.apache.hadoop.fs.CosFileSystem","fs.cosn.trsf.fs.ofs.bucket.region": "ap-guangzhou","fs.cosn.bucket.region": "ap-guangzhou","fs.cosn.tmp.dir": "/tmp/hadoop_cos","fs.cosn.trsf.fs.ofs.tmp.cache.dir": "/tmp/","fs.cosn.userinfo.secretId": "COS_SECRETID","fs.cosn.userinfo.secretKey": "COS_SECRETKEY","fs.cosn.trsf.fs.ofs.user.appid": "1250000000"},"fieldDelimiter": ","}},"writer": {"name": "hdfswriter","parameter": {"path": "/","fileName": "hive.test","defaultFS": "cosn://examplebucket2-1250000000/","column": [{"name":"col1","type":"int"},{"name":"col2","type":"string"}],"fileType": "text","encoding": "UTF-8","hadoopConfig": {"fs.cosn.impl": "org.apache.hadoop.fs.CosFileSystem","fs.cosn.trsf.fs.ofs.bucket.region": "ap-guangzhou","fs.cosn.bucket.region": "ap-guangzhou","fs.cosn.tmp.dir": "/tmp/hadoop_cos","fs.cosn.trsf.fs.ofs.tmp.cache.dir": "/tmp/","fs.cosn.userinfo.secretId": "COS_SECRETID","fs.cosn.userinfo.secretKey": "COS_SECRETKEY","fs.cosn.trsf.fs.ofs.user.appid": "1250000000"},"fieldDelimiter": ",","writeMode": "append"}}}]}}
hadoopConfig as required for cosn.defaultFS to the COSN path, such as cosn://examplebucket-1250000000/.fs.cosn.userinfo.region and fs.cosn.trsf.fs.ofs.bucket.region to the bucket region, such as ap-guangzhou. For more information, see Regions and Access Endpoints.COS_SECRETID and COS_SECRETKEY, use your own COS key information.fs.ofs.user.appid and fs.cosn.trsf.fs.ofs.user.appid to your appid.hdfs_job.json in the job directory and run the following command:[root@172 /usr/local/service/datax]# python bin/datax.py job/hdfs_job.json
2022-10-23 00:25:24.954 [job-0] INFO JobContainer -[total cpu info] =>averageCpu | maxDeltaCpu | minDeltaCpu-1.00% | -1.00% | -1.00%[total gc info] =>NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTimePS MarkSweep | 1 | 1 | 1 | 0.034s | 0.034s | 0.034sPS Scavenge | 14 | 14 | 14 | 0.059s | 0.059s | 0.059s2022-10-23 00:25:24.954 [job-0] INFO JobContainer - PerfTrace not enable!2022-10-23 00:25:24.954 [job-0] INFO StandAloneJobContainerCommunicator - Total 1000003 records, 9322478 bytes | Speed 910.40KB/s, 100000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 1.000s | All Task WaitReaderTime 6.259s | Percentage 100.00%2022-10-23 00:25:24.955 [job-0] INFO JobContainer -Job start time : 2022-10-23 00:25:12Job end time : 2022-10-23 00:25:24Job duration : 12sAverage job traffic : 910.40 KB/sRecord write speed : 100000 records/sTotal number of read records : 1000003Read/Write failure count : 0
cosn-ranger-interface-1.x.x-${version}.jar and hadoop-ranger-client-for-hadoop-${version}.jar to plugin/reader/hdfsreader/libs/ and plugin/writer/hdfswriter/libs/ in the extracted DataX path. Click here to download them.hdfsreader and hdfswriter in the JSON configuration file.{"job": {"setting": {"speed": {"byte": 10485760},"errorLimit": {"record": 0,"percentage": 0.02}},"content": [{"reader": {"name": "hdfsreader","parameter": {"path": "/test/","defaultFS": "cosn://examplebucket1-1250000000/","column": ["*"],"fileType": "text","encoding": "UTF-8","hadoopConfig": {"fs.cosn.impl": "org.apache.hadoop.fs.CosFileSystem","fs.cosn.trsf.fs.ofs.bucket.region": "ap-guangzhou","fs.cosn.bucket.region": "ap-guangzhou","fs.cosn.tmp.dir": "/tmp/hadoop_cos","fs.cosn.trsf.fs.ofs.tmp.cache.dir": "/tmp/","fs.cosn.trsf.fs.ofs.user.appid": "1250000000","fs.cosn.credentials.provider": "org.apache.hadoop.fs.auth.RangerCredentialsProvider","qcloud.object.storage.zk.address": "172.16.0.30:2181","qcloud.object.storage.ranger.service.address": "172.16.0.30:9999","qcloud.object.storage.kerberos.principal": "hadoop/172.16.0.30@EMR-5IUR9VWW"},"haveKerberos": "true","kerberosKeytabFilePath": "/var/krb5kdc/emr.keytab","kerberosPrincipal": "hadoop/172.16.0.30@EMR-5IUR9VWW","fieldDelimiter": ","}},"writer": {"name": "hdfswriter","parameter": {"path": "/","fileName": "hive.test","defaultFS": "cosn://examplebucket2-1250000000/","column": [{"name":"col1","type":"int"},{"name":"col2","type":"string"}],"fileType": "text","encoding": "UTF-8","hadoopConfig": {"fs.cosn.impl": "org.apache.hadoop.fs.CosFileSystem","fs.cosn.trsf.fs.ofs.bucket.region": "ap-guangzhou","fs.cosn.bucket.region": "ap-guangzhou","fs.cosn.tmp.dir": "/tmp/hadoop_cos","fs.cosn.trsf.fs.ofs.tmp.cache.dir": "/tmp/","fs.cosn.trsf.fs.ofs.user.appid": "1250000000","fs.cosn.credentials.provider": "org.apache.hadoop.fs.auth.RangerCredentialsProvider","qcloud.object.storage.zk.address": "172.16.0.30:2181","qcloud.object.storage.ranger.service.address": "172.16.0.30:9999","qcloud.object.storage.kerberos.principal": "hadoop/172.16.0.30@EMR-5IUR9VWW"},"haveKerberos": "true","kerberosKeytabFilePath": "/var/krb5kdc/emr.keytab","kerberosPrincipal": "hadoop/172.16.0.30@EMR-5IUR9VWW","fieldDelimiter": ",","writeMode": "append"}}}]}}
fs.cosn.credentials.provider to org.apache.hadoop.fs.auth.RangerCredentialsProvider to use Ranger for authorization.qcloud.object.storage.zk.address to the ZooKeeper address.qcloud.object.storage.ranger.service.address to the COS Ranger address.haveKerberos to true.qcloud.object.storage.kerberos.principal and kerberosPrincipal to the Kerberos authentication principal name (which can be read from core-site.xml in the EMR environment with Kerberos enabled).kerberosKeytabFilePath to the absolute path of the keytab authentication file (which can be read from ranger-admin-site.xml in the EMR environment with Kerberos enabled).java.io.IOException: Permission denied: no access groups bound to this mountPoint examplebucket2-1250000000, access denied or java.io.IOException: Permission denied: No access rules matched error is reported?java. lang. RuntimeException: java. lang.ClassNotFoundException: Class org.apache.hadoop.fs.con.ranger.client.RangerQcloudObjectStorageClientImpl not found error is reported?cosn-ranger-interface-1.x.x-${version}.jar and hadoop-ranger-client-for-hadoop-${version}.jar have been copied to plugin/reader/hdfsreader/libs/ and plugin/writer/hdfswriter/libs/ in the extracted DataX path (click here to download them).java.io.IOException: Login failure for hadoop/_HOST@EMR-5IUR9VWW from keytab /var/krb5kdc/emr.keytab: javax.security.auth.login.LoginException: Unable to obtain password from user error is reported?kerberosPrincipal and qcloud.object.storage.kerberos.principal are mistakenly set to hadoop/_HOST@EMR-5IUR9VWW instead of hadoop/172.16.0.30@EMR-5IUR9VWW. As DataX cannot resolve a _HOST domain name, you need to replace _HOST with an IP. You can run the klist -ket /var/krb5kdc/emr.keytab command to find an appropriate principal.java.io.IOException: init fs.cosn.ranger.plugin.client.impl failed error is reported?qcloud.object.storage.kerberos.principal is configured in hadoopConfig in the JSON file, and if not, you need to configure it.masukan