COS offers the metadata acceleration feature to provide high-performance file system capabilities. Metadata acceleration leverages the powerful metadata management feature of Cloud HDFS (CHDFS) at the underlying layer to allow using file system semantics for COS access. The designed system metrics can reach a bandwidth of up to 100 GB/s, over 100,000 queries per second (QPS), and a latency of milliseconds. Buckets with metadata acceleration enabled can be widely used in scenarios such as big data, high-performance computing, machine learning, and AI. For more information on metadata acceleration, see Metadata Acceleration Overview.
In the past, big data access based on COS was mainly implemented through the Hadoop-COS tool. The tool internally adapts Hadoop Compatible File System (HCFS) APIs to COS RESTful APIs to access data in COS. The difference in metadata organization between COS and file systems leads to varying metadata operation performance, which affects the big data analysis performance. Buckets with metadata acceleration enabled are fully compatible with the HCFS protocol and can be accessed directly by using native HDFS APIs. This saves the overheads of converting the HDFS protocol to COS protocol and provides native HDFS features such as efficient directory rename (as an atomic operation), file atime, and mtime update, efficient directory DU statistics, and POSIX ACL permission support.
If the system indicates that the corresponding HDFS file system is not found, submit a ticket.
The HDFS permission configuration is different from the native COS permission system. If you use HDFS to access a COS bucket, we recommend you configure HDFS permission to authorize servers in a specified VPC to access the COS bucket and enjoy the same permission experience as that of native HDFS.
|Version||≥ v2.7||≥ v8.1.5||Make sure that the version matches the COSN version as listed in tencentyun/hadoop-cos.|
The EMR environment has already been seamlessly integrated with COS, and you only need to complete the following steps:
find / -name "chdfs*" find / -name "temrfs_hadoop*"
Make sure that the versions of two JAR packages in the search results meet the above requirements for environment dependencies.
If the chdfs-hadoop-plugin needs to be updated, perform the following steps:
Download the script files of the updated JAR package at:
Run the following command to put the two scripts in the
/root directory of the server to add the execution permission to
sh update_cos_jar.sh https://hadoop-jar-beijing-1259378398.cos.ap-beijing.myqcloud.com/hadoop_plugin_network/2.7
Replace the parameter with the bucket in the corresponding region, such as
https://hadoop-jar-guangzhou-1259378398.cos.ap-guangzhou.myqcloud.com/hadoop_plugin_network/2.7 in Guangzhou region.
Perform the above steps on each EMR node until all JAR packages are replaced.
2.7.5with your Hadoop version and replace
8.1.5with the version of the required hadoop-cos package (which must be 8.1.5 or later). The cos_api-bundle version will be automatically adapted.)
core-site.xmlin the EMR console by adding new configuration items
fs.cosn.trsf.fs.ofs.bucket.regionto specify the COS region where the bucket resides, such as
fs.cosn.trsf.fs.ofs.bucket.regionare required to specify the COS region where the bucket resides, such as
In a self-built environment as described in Install CDH, you need to download three JAR packages meeting the version requirements in the environment dependencies.
Place the installation packages in the
classpath path of each server in the Hadoop cluster, such as
/usr/local/service/hadoop/share/hadoop/common/lib/ (which may vary by component).
hadoop-env.sh file under the
$HADOOP_HOME/etc/hadoop directory by adding the COSN JAR package to your Hadoop environment variables as follows:
for f in $HADOOP_HOME/share/hadoop/tools/lib/*.jar; do if [ "$HADOOP_CLASSPATH" ]; then export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f else export HADOOP_CLASSPATH=$f fi done
Add the following configuration items to
core-site.xml in the computing cluster:
<!--COSN implementation class--> <property> <name>fs.cosn.impl</name> <value>org.apache.hadoop.fs.CosFileSystem</value> </property> <!-- Bucket region in the format of `ap-guangzhou` --> <property> <name>fs.cosn.bucket.region</name> <value>ap-guangzhou</value> </property> <!-- Bucket region in the format of `ap-guangzhou` --> <property> <name>fs.cosn.trsf.fs.ofs.bucket.region</name> <value>ap-guangzhou</value> </property> <!-- Configure how to get `SecretId` and `SecretKey` --> <property> <name>fs.cosn.credentials.provider</name> <value>org.apache.hadoop.fs.auth.SimpleCredentialProvider</value> </property> <!-- API key information of the account, which can be viewed in the [CAM console](https://console.tencentcloud.com/capi). --> <property> <name>fs.cosn.userinfo.secretId</name> <value>XXXXXXXXXXXXXXXXXXXXXXXX</value> </property> <!-- API key information of the account, which can be viewed in the [CAM console](https://console.tencentcloud.com/capi). --> <property> <name>fs.cosn.userinfo.secretKey</name> <value>XXXXXXXXXXXXXXXXXXXXXXXX</value> </property> <!-- Configure the account's `appid` --> <property> <name>fs.cosn.trsf.fs.ofs.user.appid</name> <value>125XXXXXX</value> </property> <!-- Local temporary directory, which is used to store temporary files generated during execution. --> <property> <name>fs.cosn.trsf.fs.ofs.tmp.cache.dir</name> <value>/tmp</value> </property>
After all environment configuration steps are completed, you can verify the environment in the following ways:
By default, the native POSIX ACL mode is adopted for authentication for the HDFS protocol. If you need to use Ranger authentication, configure as follows:
fs.cosn.credentials.providerand set it to
fs.cosn.credentials.providerand set it to
In a big data scenario, you can access a bucket with metadata acceleration enabled over the HDFS protocol in the following steps:
core-stie.xmlas instructed in Creating Bucket and Configuring HDFS Protocol.