tencent cloud

Elastic MapReduce

Release Notes and Announcements
Release Notes
Announcements
Security Announcements
Product Introduction
Overview
Strengths
Architecture
Features
Use Cases
Constraints and Limits
Technical Support Scope
Product release
Purchase Guide
EMR on CVM Billing Instructions
EMR on TKE Billing Instructions
EMR Serverless HBase Billing Instructions
Getting Started
EMR on CVM Quick Start
EMR on TKE Quick Start
EMR on CVM Operation Guide
Planning Cluster
Administrative rights
Configuring Cluster
Managing Cluster
Managing Service
Monitoring and Alarms
TCInsight
EMR on TKE Operation Guide
Introduction to EMR on TKE
Configuring Cluster
Cluster Management
Service Management
Monitoring and Ops
Application Analysis
EMR Serverless HBase Operation Guide
EMR Serverless HBase Product Introduction
Quotas and Limits
Planning an Instance
Managing an Instance
Monitoring and Alarms
Development Guide
EMR Development Guide
Hadoop Development Guide
Spark Development Guide
Hbase Development Guide
Phoenix on Hbase Development Guide
Hive Development Guide
Presto Development Guide
Sqoop Development Guide
Hue Development Guide
Oozie Development Guide
Flume Development Guide
Kerberos Development Guide
Knox Development Guide
Alluxio Development Guide
Kylin Development Guide
Livy Development Guide
Kyuubi Development Guide
Zeppelin Development Guide
Hudi Development Guide
Superset Development Guide
Impala Development Guide
Druid Development Guide
TensorFlow Development Guide
Kudu Development Guide
Ranger Development Guide
Kafka Development Guide
Iceberg Development Guide
StarRocks Development Guide
Flink Development Guide
JupyterLab Development Guide
MLflow Development Guide
Practical Tutorial
Practice of EMR on CVM Ops
Data Migration
Practical Tutorial on Custom Scaling
API Documentation
History
Introduction
API Category
Cluster Resource Management APIs
Cluster Services APIs
User Management APIs
Data Inquiry APIs
Scaling APIs
Configuration APIs
Other APIs
Serverless HBase APIs
YARN Resource Scheduling APIs
Making API Requests
Data Types
Error Codes
FAQs
EMR on CVM
Service Level Agreement
Contact Us

HDFS Federation Management Development Guide

PDF
포커스 모드
폰트 크기
마지막 업데이트 시간: 2025-05-26 15:28:44

Type Selection for HDFS Federation Management

Note:
The HDFS federation management feature is currently made available through an allowlist. To use it, submit a ticket for application.

HDFS federation management architecture

In order to scale the name service horizontally, HDFS federation uses multiple independent NameNodes/nameSpaces. The DataNodes are used as common storage for blocks by all the NameNodes. Each DataNode registers with all the NameNodes in the cluster. DataNodes send periodic heartbeats and block reports. They also handle commands from the NameNodes.

For more information, see HDFS Federation.

How ViewFs federation works

To make it easier to manage multiple namespaces, HDFS ViewFs federation uses the classic client-side mount table (a feature of open-source ViewFs). Different paths of applications are mapped to specific NameServices on the client, so as to achieve storage separation or performance separation. This mechanism distributes files and loads but requires more human intervention (clear planning) to implement ideal load balancing.

How router-based federation works

Router-based federation provides a software layer to manage multiple namespaces. Compared with ViewFs which maintains the mount table information on the client, it is truly transparent to the client, because the mapping information will be additionally saved and persisted.

HDFS Federation Management Configuration

You need to consider two factors: the required number of NameServices and the mount method of the business data directory.

Planning principles for the number of NameServices

1. One NameService can store up to 100 million files securely. After this limit is exceeded, its access speed and read/write throughput will be greatly reduced. Therefore, if you expect to store more than 100 million files, you need to add an extra NameService.
2. For an application that reads/writes HDFS heavily, it is necessary to allocate a separate NameService to process its requests. This ensures that the data in the application can exclusively use all the processing capabilities of the NameService and avoids the impact on other applications.
3. For an application that demands high reliability, you can allocate an independent NameService to it, because it may be prevented from accessing HDFS and thus rendered unstable if the reads/writes of other applications are too frequent.

Planning principles for the mount method of business data directories

1. Mount the data directories of the services associated with the business data to the same NameService, because cross-NameService file reads/writes are slow and may reduce the application's file storage performance.
2. For a service that involves a large amount of data but is not associated with other services, you can directly use only one NameService.
3. For a small business, we recommend you mount its directories directly to the default NameService (HDFS${clusterid}), so as to eliminate the need of data migration and simplify the federation configuration.
4. We recommend you map only first-level directories to NameServices in order to reduce the configuration complexity.

Scheme Comparison

ViewFs federation configuration method

1. Directly use ViewFs Strengths: Unified views are provided, and different applications can be used in the same way. Shortcomings: Changes to ViewFs mount tables require all applications using the cluster to read the latest mount point synchronously.
2. Specify the NameService
Strengths: You don't need to modify the configurations of all applications. Shortcomings: You need to specify directories for different components and applications, which makes the coupling of component paths more complicated.

Router-based federation configuration method

1. Directly use router-based federation
Strengths: Unified views are provided, and different applications can be used in the same way. Changes to router-based federation mount tables take effect directly, so applications using the cluster don't need to update the mount tables synchronously. Shortcomings: The DFSRouter forwarding layer is added, which slightly affects the performance.
2. Specify the NameService
Strengths: You don't need to modify the configurations of all applications. Shortcomings: You need to specify directories for different components and applications, which makes the coupling of component paths more complicated.

도움말 및 지원

문제 해결에 도움이 되었나요?

피드백