tencent cloud

Cloud Object Storage

Release Notes and Announcements
Release Notes
Announcements
Product Introduction
Overview
Features
Use Cases
Strengths
Concepts
Regions and Access Endpoints
Specifications and Limits
Service Regions and Service Providers
Billing
Billing Overview
Billing Method
Billable Items
Free Tier
Billing Examples
Viewing and Downloading Bill
Payment Overdue
FAQs
Getting Started
Console
Getting Started with COSBrowser
User Guide
Creating Request
Bucket
Object
Data Management
Batch Operation
Global Acceleration
Monitoring and Alarms
Operations Center
Data Processing
Content Moderation
Smart Toolbox
Data Processing Workflow
Application Integration
User Tools
Tool Overview
Installation and Configuration of Environment
COSBrowser
COSCLI (Beta)
COSCMD
COS Migration
FTP Server
Hadoop
COSDistCp
HDFS TO COS
GooseFS-Lite
Online Tools
Diagnostic Tool
Use Cases
Overview
Access Control and Permission Management
Performance Optimization
Accessing COS with AWS S3 SDK
Data Disaster Recovery and Backup
Domain Name Management Practice
Image Processing
Audio/Video Practices
Workflow
Direct Data Upload
Content Moderation
Data Security
Data Verification
Big Data Practice
COS Cost Optimization Solutions
Using COS in the Third-party Applications
Migration Guide
Migrating Local Data to COS
Migrating Data from Third-Party Cloud Storage Service to COS
Migrating Data from URL to COS
Migrating Data Within COS
Migrating Data Between HDFS and COS
Data Lake Storage
Cloud Native Datalake Storage
Metadata Accelerator
GooseFS
Data Processing
Data Processing Overview
Image Processing
Media Processing
Content Moderation
File Processing Service
File Preview
Troubleshooting
Obtaining RequestId
Slow Upload over Public Network
403 Error for COS Access
Resource Access Error
POST Object Common Exceptions
API Documentation
Introduction
Common Request Headers
Common Response Headers
Error Codes
Request Signature
Action List
Service APIs
Bucket APIs
Object APIs
Batch Operation APIs
Data Processing APIs
Job and Workflow
Content Moderation APIs
Cloud Antivirus API
SDK Documentation
SDK Overview
Preparations
Android SDK
C SDK
C++ SDK
.NET(C#) SDK
Flutter SDK
Go SDK
iOS SDK
Java SDK
JavaScript SDK
Node.js SDK
PHP SDK
Python SDK
React Native SDK
Mini Program SDK
Error Codes
Harmony SDK
Endpoint SDK Quality Optimization
Security and Compliance
Data Disaster Recovery
Data Security
Cloud Access Management
FAQs
Popular Questions
General
Billing
Domain Name Compliance Issues
Bucket Configuration
Domain Names and CDN
Object Operations
Logging and Monitoring
Permission Management
Data Processing
Data Security
Pre-signed URL Issues
SDKs
Tools
APIs
Agreements
Service Level Agreement
Privacy Policy
Data Processing And Security Agreement
Contact Us
Glossary
DocumentationCloud Object Storage

Using HDFS to Access Metadata Acceleration-Enabled Bucket

Focus Mode
Font Size
Last updated: 2024-03-25 16:04:01

Metadata Acceleration Overview

COS offers the metadata acceleration feature to provide high-performance file system capabilities. Metadata acceleration leverages the powerful metadata management feature of Cloud HDFS (CHDFS) at the underlying layer to allow using file system semantics for COS access. The designed system metrics can reach a bandwidth of up to 100 GB/s, over 100,000 queries per second (QPS), and a latency of milliseconds. Buckets with metadata acceleration enabled can be widely used in scenarios such as big data, high-performance computing, machine learning, and AI. For more information on metadata acceleration, see Metadata Acceleration Overview.

Benefits of HDFS Access

In the past, big data access based on COS was mainly implemented through the Hadoop-COS tool. The tool internally adapts Hadoop Compatible File System (HCFS) APIs to COS RESTful APIs to access data in COS. The difference in metadata organization between COS and file systems leads to varying metadata operation performance, which affects the big data analysis performance. Buckets with metadata acceleration enabled are fully compatible with the HCFS protocol and can be accessed directly by using native HDFS APIs. This saves the overheads of converting the HDFS protocol to COS protocol and provides native HDFS features such as efficient directory rename (as an atomic operation), file atime, and mtime update, efficient directory DU statistics, and POSIX ACL permission support.

Creating Bucket and Configuring the HDFS Protocol

1. Create a COS bucket and enable metadata acceleration for it. After the bucket is created, go to the File List page of the bucket, where you can upload and download files.
2. On the left sidebar, click Performance Configuration > Metadata Acceleration, and you can see that metadata acceleration is enabled. When you create a bucket that requires metadata acceleration to be enabled, you need to perform the corresponding authorization operation as prompted. After you click Authorize, the HDFS protocol will be automatically enabled, and you can see the default bucket mount target information.
Note:
If the system indicates that the corresponding HDFS file system is not found, submit a ticket.
3. In the HDFS Permission Configuration column, click Add Permission Configuration.
4. Select the VPC where the computing cluster resides in the VPC Name column, enter the IP address or range that needs to be opened in the Node IP column, select Can edit or Can view as the access type, and click Save.
Note:
The HDFS permission configuration is different from the native COS permission system. If you use HDFS to access a COS bucket, we recommend you configure HDFS permission to authorize servers in a specified VPC to access the COS bucket and enjoy the same permission experience as that of native HDFS.

Configuring Computing Cluster to Access COS

Environmental dependency

Dependency
chdfs-hadoop-plugin
COSN (hadoop-cos)
cos_api-bundle
Version
≥ v2.7
≥ v8.1.5
Make sure that the version matches the COSN version as listed in tencentyun/hadoop-cos.
Download address

EMR environment

The EMR environment has already been seamlessly integrated with COS, and you only need to complete the following steps:
1. Find an EMR server and run the following command on it to check whether the package versions in the folders of the services required by the EMR environment meet the requirements for environment dependencies.
find / -name "chdfs*"
find / -name "temrfs_hadoop*"



Make sure that the versions of two JAR packages in the search results meet the above requirements for environment dependencies.
2. If the chdfs-hadoop-plugin needs to be updated, perform the following steps:
Download the script files of the updated JAR package at:
Run the following command to put the two scripts in the /root directory of the server to add the execution permission to update_cos_jar.sh:
sh update_cos_jar.sh https://hadoop-jar-beijing-1259378398.cos.ap-beijing.myqcloud.com/hadoop_plugin_network/2.7
Replace the parameter with the bucket in the corresponding region, such as https://hadoop-jar-guangzhou-1259378398.cos.ap-guangzhou.myqcloud.com/hadoop_plugin_network/2.7 in Guangzhou region. Perform the above steps on each EMR node until all JAR packages are replaced.
3. If the hadoop-cos package or cos_api-bundle package needs to be updated, perform the following steps:
Replace /usr/local/service/hadoop/share/hadoop/common/lib/hadoop-temrfs-1.0.5.jar with temrfs_hadoop_plugin_network-1.1.jar.
Add the following configuration items to core-site.xml:
emr.temrfs.download.md5=822c4378e0366a5cc26c23c88b604b11
emr.temrfs.download.version=2.7.5-8.1.5-1.0.6 (Replace 2.7.5 with your Hadoop version and replace 8.1.5 with the version of the required hadoop-cos package (which must be 8.1.5 or later). The cos_api-bundle version will be automatically adapted.)
emr.temrfs.download.region=sh
emr.temrfs.tmp.cache.dir=/data/emr/hdfs/tmp/temrfs
Modify the configuration item fs.cosn.impl=com.qcloud.emr.fs.TemrfsHadoopFileSystemAdapter in core-site.xml.
4. Configure core-site.xml in the EMR console by adding new configuration items fs.cosn.bucket.region and fs.cosn.trsf.fs.ofs.bucket.region to specify the COS region where the bucket resides, such as ap-shanghai.
Note:
fs.cosn.bucket.region and fs.cosn.trsf.fs.ofs.bucket.region are required to specify the COS region where the bucket resides, such as ap-shanghai.
5. Restart resident services such as YARN, Hive, Presto, and Impala.

Self-built Hadoop/CDH environments

1. In a self-built environment as described in Install CDH, you need to download three JAR packages meeting the version requirements in the environment dependencies.
2. Place the installation packages in the classpath path of each server in the Hadoop cluster, such as /usr/local/service/hadoop/share/hadoop/common/lib/ (which may vary by component).
3. Modify the hadoop-env.sh file under the $HADOOP_HOME/etc/hadoop directory by adding the COSN JAR package to your Hadoop environment variables as follows:
for f in $HADOOP_HOME/share/hadoop/tools/lib/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
else
export HADOOP_CLASSPATH=$f
fi
done
4. Add the following configuration items to core-site.xml in the computing cluster:
<!--COSN implementation class-->
<property>
<name>fs.cosn.impl</name>
<value>org.apache.hadoop.fs.CosFileSystem</value>
</property>

<!-- Bucket region in the format of `ap-guangzhou` -->
<property>
<name>fs.cosn.bucket.region</name>
<value>ap-guangzhou</value>
</property>

<!-- Bucket region in the format of `ap-guangzhou` -->
<property>
<name>fs.cosn.trsf.fs.ofs.bucket.region</name>
<value>ap-guangzhou</value>
</property>

<!-- Configure how to get `SecretId` and `SecretKey` -->
<property>
<name>fs.cosn.credentials.provider</name>
<value>org.apache.hadoop.fs.auth.SimpleCredentialProvider</value>
</property>

<!-- API key information of the account, which can be viewed in the [CAM console](https://console.tencentcloud.com/capi). -->
<property>
<name>fs.cosn.userinfo.secretId</name>
<value>XXXXXXXXXXXXXXXXXXXXXXXX</value>
</property>

<!-- API key information of the account, which can be viewed in the [CAM console](https://console.tencentcloud.com/capi). -->
<property>
<name>fs.cosn.userinfo.secretKey</name>
<value>XXXXXXXXXXXXXXXXXXXXXXXX</value>
</property>

<!-- Configure the account's `appid` -->
<property>
<name>fs.cosn.trsf.fs.ofs.user.appid</name>
<value>125XXXXXX</value>
</property>

<!-- Local temporary directory, which is used to store temporary files generated during execution. -->
<property>
<name>fs.cosn.trsf.fs.ofs.tmp.cache.dir</name>
<value>/tmp</value>
</property>
5. Restart resident services such as YARN, Hive, Presto, and Impala.

Verifying environment

After all environment configuration steps are completed, you can verify the environment in the following ways:
Use the Hadoop command line on the client to check whether mounting is successful.


Log in to the COS console to check whether the files and directories in the bucket file list are consistent.

Ranger Permission Configuration

By default, the native POSIX ACL mode is adopted for authentication for the HDFS protocol. If you need to use Ranger authentication, configure as follows:

EMR environment

1. The EMR environment has integrated the COS Ranger Service, so you can select it when purchasing an EMR cluster.
2. In the HDFS authentication mode of the HDFS protocol, select the Ranger authentication mode and configure the corresponding address information of Ranger.
Add the new configuration item fs.cosn.credentials.provider and set it to org.apache.hadoop.fs.auth.RangerCredentialsProvider.
If you have any questions about Ranger, see COS Ranger Permission System Solution.

Self-built Hadoop/CDH environments

1. Configure the Ranger service to access COS over the HDFS protocol. For more information, see COS Ranger Permission System Solution.
2. In the HDFS authentication mode of the HDFS protocol, select the Ranger authentication mode and configure the corresponding address information of Ranger.
Add the new configuration item fs.cosn.credentials.provider and set it to org.apache.hadoop.fs.auth.RangerCredentialsProvider.

Others

In a big data scenario, you can access a bucket with metadata acceleration enabled over the HDFS protocol in the following steps:
1. Configure the HDFS mount target information in core-stie.xml as instructed in Creating Bucket and Configuring HDFS Protocol.
2. Access the bucket using components such as Hive, MR, and Spark. For more information, see Mounting a COS Bucket in a Computing Cluster.
3. By default, the native POSIX ACL mode is adopted for authentication. If you need to use Ranger authentication, see COS Ranger Permission System Solution.

Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback