tencent cloud

Cloud Object Storage

Release Notes and Announcements
Release Notes
Announcements
Product Introduction
Overview
Features
Use Cases
Strengths
Concepts
Regions and Access Endpoints
Specifications and Limits
Service Regions and Service Providers
Billing
Billing Overview
Billing Method
Billable Items
Free Tier
Billing Examples
Viewing and Downloading Bill
Payment Overdue
FAQs
Getting Started
Console
Getting Started with COSBrowser
User Guide
Creating Request
Bucket
Object
Data Management
Batch Operation
Global Acceleration
Monitoring and Alarms
Operations Center
Data Processing
Content Moderation
Smart Toolbox
Data Processing Workflow
Application Integration
User Tools
Tool Overview
Installation and Configuration of Environment
COSBrowser
COSCLI (Beta)
COSCMD
COS Migration
FTP Server
Hadoop
COSDistCp
HDFS TO COS
GooseFS-Lite
Online Tools
Diagnostic Tool
Use Cases
Overview
Access Control and Permission Management
Performance Optimization
Accessing COS with AWS S3 SDK
Data Disaster Recovery and Backup
Domain Name Management Practice
Image Processing
Audio/Video Practices
Workflow
Direct Data Upload
Content Moderation
Data Security
Data Verification
Big Data Practice
COS Cost Optimization Solutions
Using COS in the Third-party Applications
Migration Guide
Migrating Local Data to COS
Migrating Data from Third-Party Cloud Storage Service to COS
Migrating Data from URL to COS
Migrating Data Within COS
Migrating Data Between HDFS and COS
Data Lake Storage
Cloud Native Datalake Storage
Metadata Accelerator
GooseFS
Data Processing
Data Processing Overview
Image Processing
Media Processing
Content Moderation
File Processing Service
File Preview
Troubleshooting
Obtaining RequestId
Slow Upload over Public Network
403 Error for COS Access
Resource Access Error
POST Object Common Exceptions
API Documentation
Introduction
Common Request Headers
Common Response Headers
Error Codes
Request Signature
Action List
Service APIs
Bucket APIs
Object APIs
Batch Operation APIs
Data Processing APIs
Job and Workflow
Content Moderation APIs
Cloud Antivirus API
SDK Documentation
SDK Overview
Preparations
Android SDK
C SDK
C++ SDK
.NET(C#) SDK
Flutter SDK
Go SDK
iOS SDK
Java SDK
JavaScript SDK
Node.js SDK
PHP SDK
Python SDK
React Native SDK
Mini Program SDK
Error Codes
Harmony SDK
Endpoint SDK Quality Optimization
Security and Compliance
Data Disaster Recovery
Data Security
Cloud Access Management
FAQs
Popular Questions
General
Billing
Domain Name Compliance Issues
Bucket Configuration
Domain Names and CDN
Object Operations
Logging and Monitoring
Permission Management
Data Processing
Data Security
Pre-signed URL Issues
SDKs
Tools
APIs
Agreements
Service Level Agreement
Privacy Policy
Data Processing And Security Agreement
Contact Us
Glossary

Transparent Acceleration

PDF
Modo Foco
Tamanho da Fonte
Última atualização: 2024-03-25 16:04:01

Overview

Transparent acceleration is used to accelerate CosN access to COS. The COS-based CosN is implemented with a standard Hadoop file system, aiming to help integrate big data computing frameworks (such as Hadoop, Spark, and Tez) with COS. You can use CosN to read and write data stored in COS. If you are already using CosN to access COS, you can use GooseFS’s client-based path mapping method to access GooseFS with the CosN schema without modifying the definition of the current Hive table to facilitate the comparison tests of GooseFS features/performance. If you are a CHDFS user, you can modify your configurations so that you can access GooseFS with the OFS schema.
The mapping between CosN schema and GooseFS schema is described below.
Assuming that the UFS path of the namespace warehouse is cosn://examplebucket-1250000000/data/warehouse/, the CosN-to-GooseFS path mapping will be as follows:
cosn://examplebucket-1250000000/data/warehouse -> /warehouse/
cosn://examplebucket-1250000000/data/warehouse/folder/test.txt ->/warehouse/folder/test.txt
GooseFS-to-CosN path mapping:
/warehouse ->cosn://examplebucket-1250000000/data/warehouse/
/warehouse/ -> cosn://examplebucket-1250000000/data/warehouse/
/warehouse/folder/test.txt -> cosn://examplebucket-1250000000/data/warehouse/folder/test.txt
The CosN schema maintains the mapping between GooseFS and UFS CosN paths on the client, and converts CosN-path requests to GooseFS-path requests. The mapping is refreshed periodically. You can modify the refresh interval (default: 60s) using goosefs.user.client.namespace.refresh.interval in the GooseFS configuration file goosefs-site.properties.
Note:
If the accessed CosN path cannot be converted into a GooseFS path, the corresponding Hadoop API call will throw an exception.

Example

This example shows how to access GooseFS using schemas gfs://, cosn://, and ofs:// on the Hadoop command-line tool and Hive.

1. Prepare the data and computing cluster

Creating a bucket for testing purposes.
Create a folder named ml-100k in the root directory of the bucket.
Download the ml-100k dataset from Grouplens and upload the u.user file to <Bucket root directory>/ml-100k.
Purchase an EMR cluster and configure the HIVE component by referring to the EMR documentation.

2. Configure the environment

i. Put the GooseFS client package goosefs-1.0.0-client.jar in the share/hadoop/common/lib/ directory.
cp goosefs-1.0.0-client.jar hadoop/share/hadoop/common/lib/
Note:
The configuration update and JAR package should be synced to all nodes in the cluster.
ii. Modify the Hadoop configuration file etc/hadoop/core-site.xml to specify the GooseFS class implementation.
<property>
<name>fs.AbstractFileSystem.gfs.impl</name>
<value>com.qcloud.cos.goosefs.hadoop.GooseFileSystem</value>
</property>
<property>
<name>fs.gfs.impl</name>
<value>com.qcloud.cos.goosefs.hadoop.FileSystem</value>
</property>
iii. Run the following Hadoop command to check whether you can access GooseFS using the gfs:// schema, where <MASTER_IP> is the IP of the master.
hadoop fs -ls gfs://<MASTER_IP>:9200/
iv. Put the JAR package of the GooseFS client to the hive/auxlib/ directory for Hive to load it.
cp goosefs-1.0.0-client.jar hive/auxlib/
v. Run the following command to create a namespace whose UFS schema is CosN and list the namespace. Replace examplebucket-1250000000 with your actual COS bucket, and SecretId and SecretKey with your actual key information.
goosefs ns create ml-100k cosn://examplebucket-1250000000/ml-100k --secret fs.cosn.userinfo.secretId=SecretId --secret fs.cosn.userinfo.secretKey=SecretKey--attribute fs.cosn.bucket.region=ap-guangzhou --attribute fs.cosn.credentials.provider=org.apache.hadoop.fs.auth.SimpleCredentialProvider
goosefs ns ls
vi. Run the following command to create a namespace whose UFS schema is OFS and list the namespace. Replace instance-id with the actual ID of your CHDFS instance, and 1250000000 with your actual APPID:
goosefs ns create ofs-test ofs://instance-id.chdfs.ap-guangzhou.myqcloud.com/ofs-test --attribute fs.ofs.userinfo.appid=1250000000
goosefs ns ls

3. Create a GooseFS schema and query data

Run the following commands:
create database goosefs_test;

use goosefs_test;
CREATE TABLE u_user_gfs (
userid INT,
age INT,
gender CHAR(1),
occupation STRING,
zipcode STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'gfs://<MASTER_IP>:<MASTER_PORT>/ml-100k';

select sum(age) from u_user_gfs;

4. Create a CosN schema and query data

Run the following commands:
CREATE TABLE u_user_cosn (
userid INT,
age INT,
gender CHAR(1),
occupation STRING,
zipcode STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'cosn://examplebucket-1250000000/ml-100k';

select sum(age) from u_user_cosn;

5. Modify the CosN implementation to GooseFS-compatible

Modify hadoop/etc/hadoop/core-site.xml:
<property>
<name>fs.AbstractFileSystem.cosn.impl</name>
<value>com.qcloud.cos.goosefs.hadoop.CosN</value>
</property>
<property>
<name>fs.cosn.impl</name>
<value>com.qcloud.cos.goosefs.hadoop.CosNFileSystem</value>
</property>
Run the Hadoop command below. If the path cannot be converted into a GooseFS path, an error message will be displayed in the command output.
hadoop fs -ls cosn://examplebucket-1250000000/ml-100k/

Found 1 items
-rw-rw-rw- 0 hadoop hadoop 22628 2021-07-02 15:27 cosn://examplebucket-1250000000/ml-100k/u.user

hadoop fs -ls cosn://examplebucket-1250000000/unknow-path
ls: Failed to convert ufs path cosn://examplebucket-1250000000/unknow-path to GooseFs path, check if namespace mounted
Run the Hive query language again:
select sum(age) from u_user_cosn;

6. Create an OFS schema and query data

Run the following commands:
CREATE TABLE u_user_ofs (
userid INT,
age INT,
gender CHAR(1),
occupation STRING,
zipcode STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'ofs://instance-id.chdfs.ap-guangzhou.myqcloud.com/ofs-test/';

select sum(age) from u_user_ofs;

7. Modify the OFS implementation to GooseFS-compatible

Modify hadoop/etc/hadoop/core-site.xml:
<property>
<name>fs.AbstractFileSystem.ofs.impl</name>
<value>com.qcloud.cos.goosefs.hadoop.CHDFSDelegateFS</value>
</property>
<property>
<name>fs.ofs.impl</name>
<value>com.qcloud.cos.goosefs.hadoop.CHDFSHadoopFileSystem</value>
</property>
Run the Hadoop command below. If the path cannot be converted into a GooseFS path, an error message will be included in the command output.
hadoop fs -ls ofs://instance-id.chdfs.ap-guangzhou.myqcloud.com/ofs-test/

Found 1 items
-rw-r--r-- 0 hadoop hadoop 22628 2021-07-15 15:56 ofs://instance-id.chdfs.ap-guangzhou.myqcloud.com/ofs-test/u.user

hadoop fs -ls ofs://instance-id.chdfs.ap-guangzhou.myqcloud.com/unknown-path
ls: Failed to convert ufs path ofs://instance-id.chdfs.ap-guangzhou.myqcloud.com/unknown-path to GooseFs path, check if namespace mounted
Run the Hive query language again:
select sum(age) from u_user_ofs;

Ajuda e Suporte

Esta página foi útil?

comentários