Release Notes and Announcements
- Release Notes
- Announcements
Product Introduction
Billing
- Billing Overview
- Billing Method
- Billable Items
- Free Tier
- Billing Examples
- Viewing and Downloading Bill
- Payment Overdue
- FAQs
Getting Started
- Console
- Getting Started with COSBrowser
User Guide
- Creating Request
- Bucket
- Object
- Data Management
- Batch Operation
- Global Acceleration
- Monitoring and Alarms
- Operations Center
- Data Processing
- Content Moderation
- Smart Toolbox
- Data Processing Workflow
- Application Integration
User Tools
- Tool Overview
- Installation and Configuration of Environment
- COSBrowser
- COSCLI (Beta)
- COSCMD
- COS Migration
- FTP Server
- Hadoop
- COSDistCp
- HDFS TO COS
- GooseFS-Lite Tool
- Online Tools
- Diagnostic Tool
Use Cases
- Overview
- Access Control and Permission Management
- Performance Optimization
- Accessing COS with AWS S3 SDK
- Data Disaster Recovery and Backup
- Domain Name Management Practice
- Image Processing
- Audio/Video Practices
- Workflow
- Direct Data Upload
- Content Moderation
- Data Security
- Data Verification
- Big Data Practice
- COS Cost Optimization Solutions
- Using COS in the Third-party Applications
Migration Guide
Data Lake Storage
- Cloud Native Datalake Storage
- Metadata Accelerator
- GooseFS
Data Processing
- Data Processing Overview
- Image Processing
- Media Processing
- Content Moderation
- File Processing Service
- File Preview
Troubleshooting
- Obtaining RequestId
- Slow Upload over Public Network
- 403 Error for COS Access
- Resource Access Error
- POST Object Common Exceptions
API Documentation
- Introduction
- Common Request Headers
- Common Response Headers
- Error Codes
- Request Signature
- Action List
- Service APIs
- Bucket APIs
- Object APIs
- Batch Operation APIs
- Data Processing APIs
- Job and Workflow
- Content Moderation APIs
- Cloud Antivirus API
SDK Documentation
- SDK Overview
- Preparations
- Android SDK
- C SDK
- C++ SDK
- .NET(C#) SDK
- Flutter SDK
- Go SDK
- iOS SDK
- Java SDK
- JavaScript SDK
- Node.js SDK
- PHP SDK
- Python SDK
- React Native SDK
- Mini Program SDK
- Harmony SDK
- Endpoint SDK Quality Optimization
- Error Codes
Security and Compliance
- Data Disaster Recovery
- Data Security
- Cloud Access Management
FAQs
- Popular Questions
- General
- Billing
- Domain Name Compliance Issues
- Bucket Configuration
- Domain Names and CDN
- Object Operations
- Logging and Monitoring
- Permission Management
- Data Processing
- Data Security
- Pre-signed URL Issues
- SDKs
- Tools
- APIs
Agreements
Contact Us
Glossary

Getting Started

Download

Modo Foco

Tamanho da Fonte

Última atualização: 2024-03-25 16:04:01

Overview
Cloud Native Datalake Storage helps you quickly deploy a COS-based data lake storage service on TKE. Then, you can deploy big data and AI service applications required by various businesses in a TKE or EKS cluster. In addition, you can also use GooseFS to connect to massive distributed storage services.
Concepts and terms
The following lists some basic concepts and terms of Cloud Native Datalake Storage:
Environment: It maintains the mappings between computing clusters and storage services. We recommend you uniformly manage computing clusters and storage services in an environment.
Note: 
 If you need to delete a computing cluster in the TKE console, we recommend you clear the data lake environment first.
Computing cluster: It is a container cluster for running various computing businesses. You can create a TKE or EKS cluster.
Storage service: It refers to COS, which stores different types of data for computing.
Application market: It houses the application components for diversified computing businesses, such as Flink and Spark. You can select an application as needed when creating an environment.
Note: 
 When a container cluster is terminated, applications deployed in it will also be terminated. Proceed with caution.
GooseFS: It manages different underlying buckets and caches frequently accessed data in your computing cluster to accelerate computing.
You can get some basic information in the following documents:
COS: Getting Started with the Console describes how to create a bucket and upload/download files to/from it.
TKE: Quickly Creating a Standard Cluster describes how to create a TKE or EKS cluster.
Application market: Application Market describes how to create and deploy an application in a TKE cluster.
GooseFS: GooseFS on TKE Cloud Native Practices describes how to manage GooseFS in a cluster.
Prerequisites
Currently, Cloud Native Datalake Storage is provided through an allowlist. To use it, contact us for application.
Cloud Native Datalake Storage relies on TKE and COS and requires permissions to manipulate computing and storage services. If you log in with a sub-account, make sure that the sub-account has at least the following permissions:
Permissions to manipulate COS buckets and files.
Permission to manipulate buckets: If you need to manage bucket configurations, get the corresponding permission from the root account. Generally, this permission doesn't affect data read/write and doesn't require extra configurations. It is sufficient to grant the read permission, such as the QcloudCOSBucketConfigRead policy.
Permission to manipulate files: Generally, computing jobs require reading/writing files from/to buckets. You can get full access from the root account, such as the QcloudCOSDataFullControl policy. Alternatively, the root account can grant the permission based on the principle of least privilege.
Permissions to manage container clusters:
Permission to manipulate clusters: Generally, you need to grant permissions to create and manipulate clusters. For detailed directions, see Using TKE Preset Policy Authorization.
Permission to manage clusters: TKE provides an authorization mode to connect to Kubernetes RBAC, so that sub-account access can be controlled in a refined manner. Sub-account operations are also subject to the TKE Kubernetes object-level permission control.
Permission to manipulate the application market: The application market relies on the operations of the TCR service. For detailed directions on how to authorize a sub-account, see TKE Image Registry Resource-level Permission Settings.
Directions
The following details the steps, including environment creation, cluster association, computing application deployment, storage service association, and environment management:
1. Log in to the COS console.
2. On the left sidebar, click Cloud Native Datalake Storage.
3. On the Cloud Native Datalake Storage page, you can see the capability overview and deployment guide.
The deployment guide is displayed by default, and you can click Collapse Guide in the top-right corner to disable it.
The environment list page allows for search. You can manipulate an existing environment as follows:
Click Environment Name to enter the environment details page and manage the environment.
Click Associated Cluster to enter the cluster details page in the TKE console.
Click Associated Bucket to enter the bucket page and view the file information.
4. Click Create Environment.
Before creating an environment, select the target container computing cluster and configure the following parameters:
Environment Name: It can contain up to 63 characters and must be globally unique.
Region: Select the region of the container cluster.
Cluster Type: It can be TKE or EKS. If there are no clusters in the current region, you can click Create Container Cluster to create one in the TKE console.
Cluster: It is the name of the cluster for deploying computing applications and running computing jobs based on the specified region and specified cluster type conditions.
Computing application: It indicates the application service required for running a computing job. Currently, Flink, k8s-big-data-suite, colocation, Airflow, PyTorch, and spark-operator applications are supported by default. You can select one or multiple applications as needed. If you need to deploy a custom application, you can go to the TKE console for deployment on your own.
5. Click Next to enter the Bucket Configuration page.
You can configure different buckets for the computing cluster on this page. By default, GooseFS is available for managing buckets and caching data in the local nodes of the computing cluster for computing acceleration. You need to configure the following parameters:
Region: It is the region of the computing cluster by default and cannot be edited. If there are no available buckets for computing jobs in the region, you can click Create Bucket to create one.
Bucket: You can select multiple buckets in the specified region. You can also mount only a specified file directory of a bucket.
Note: 
 If you mount the entire bucket, you can ignore the second input box; if you need to specify a directory, enter the directory name in the format of prefix/*.
Enable GooseFS: GooseFS accelerates computing jobs. It is enabled by default and cannot be modified. No extra fees will be incurred.
6. Click Next to enter the GooseFS Application Configuration page.
In a data lake environment, all computing jobs need to access COS through GooseFS; therefore, you should grant GooseFS the permission to access the secretId and secretKey of the specified buckets.
7. Click Next and confirm the information.
8. To modify the configuration items, click Modify. After confirming that everything is correct, click Create Environment. Then, go back to the environment list and refresh it, and you can see the newly created environment.
To delete an environment, click Delete in the environment list and confirm the deletion in the pop-up window.
9. Click the environment name in the list to enter the Basic Information page.
Three views are available to describe the environment, computing cluster, and bucket information.
Environment information: It displays the environment's name, region, associated computing cluster, storage service, and creation time.
Computing cluster information: It displays the computing cluster's name, number of nodes, and usage of CPU, memory, and GPU. You can click View details to enter the TKE console and view computing cluster details.
Bucket information: It displays the name, file URL, and GooseFS status of the bucket associated with the computing cluster. You can click View details to view the details of the storage service.
At this point, you have created a data lake environment.

Ajuda e Suporte

Esta página foi útil?

Você também pode entrar em contato com a Equipe de vendas ou Enviar um tíquete em caso de ajuda.

comentários

tencent cloud

Cloud Object Storage

Getting Started

Overview

Concepts and terms

Prerequisites

Directions

Ajuda e Suporte