Release Notes and Announcements
- Release Notes
- Announcements
- Security Announcements
Product Introduction
- Overview
- Strengths
- Architecture
- Features
- Use Cases
- Constraints and Limits
- Technical Support Scope
- Product release
Purchase Guide
- EMR on CVM Billing Instructions
- EMR on TKE Billing Instructions
- EMR Serverless HBase Billing Instructions
- EMR Serverless TCBase Billing Overview
Getting Started
- EMR on CVM Quick Start
- EMR on TKE Quick Start
EMR on CVM Operation Guide
- Planning Cluster
- Administrative rights
- Configuring Cluster
- Managing Cluster
- Managing Service
- Monitoring and Alarms
- TCInsight
EMR on TKE Operation Guide
- Introduction to EMR on TKE
- Configuring Cluster
- Cluster Management
- Service Management
- Monitoring and Ops
- Application Analysis
EMR Serverless HBase Operation Guide
- EMR Serverless HBase Product Introduction
- Quotas and Limits
- Planning an Instance
- Managing an Instance
- Monitoring and Alarms
- Development Guide
EMR Serverless TCBase Operation Guide
- Introduction to EMR Serverless TCBase
- Managing Instances
- Managing Services
- Monitoring and Alarms
EMR Development Guide
- Hadoop Development Guide
- Spark Development Guide
- Hbase Development Guide
- Phoenix on Hbase Development Guide
- Hive Development Guide
- Presto Development Guide
- Sqoop Development Guide
- Hue Development Guide
- Oozie Development Guide
- Flume Development Guide
- Kerberos Development Guide
- Knox Development Guide
- Alluxio Development Guide
- Kylin Development Guide
- Livy Development Guide
- Kyuubi Development Guide
- Zeppelin Development Guide
- Hudi Development Guide
- Superset Development Guide
- Impala Development Guide
- Druid Development Guide
- TensorFlow Development Guide
- Kudu Development Guide
- Ranger Development Guide
- Kafka Development Guide
- StarRocks Development Guide
- Flink Development Guide
- JupyterLab Development Guide
- MLflow Development Guide
Practical Tutorial
- Practice of EMR on CVM Ops
- Data Migration
- Practical Tutorial on Custom Scaling
API Documentation
- History
- Introduction
- API Category
- Cluster Resource Management APIs
- Cluster Services APIs
- User Management APIs
- Data Inquiry APIs
- Scaling APIs
- Configuration APIs
- Other APIs
- Serverless HBase APIs
- YARN Resource Scheduling APIs
- Making API Requests
- Data Types
- Error Codes
FAQs
- EMR on CVM
Service Level Agreement
Contact Us

Cross-AZ High Availability

Download

Focus Mode

Font Size

Last updated: 2024-10-30 10:34:18

Overview of Rack Awareness
Rack awareness in a Hadoop cluster refers to the technique where Hadoop organizes nodes according to the network topology and prioritizes task scheduling and data storage between nodes within the same rack. This improves cluster performance and reliability.
It is supported by two components: HDFS and YARN. HDFS achieves high reliability and availability by distributing replicas of data blocks across different racks. YARN improves task execution efficiency and performance by assigning tasks to nodes or containers that are physically closer.
Since Hadoop cannot automatically detect the network topology of nodes, it provides the following methods to enable rack awareness:
Custom Java Class implements the DNSToSwitchMapping API method, and the class name is specified in the core-site.xml configuration file using the net.topology.node.switch.mapping.impl parameter.
Topology mapping is based on a script, and the net.topology.script.file.name parameter in the core-site.xml configuration file is used to specify the script file.
Below is an example of configuring rack awareness policies based on a script. The basic method involves mapping AZ subnets to rack information.
Note
The setup of rack awareness should be based on a cross-AZ deployment architecture for the cluster (for cluster creation, see Cross-AZ Cluster Deployment). It is not applicable to single-AZ clusters.
Configuring Rack Awareness Policy Based on Scripts
1. Prepare a cross-AZ EMR cluster. Log in to the EMR Console, click the cluster ID/Name to enter the cluster details page, and under Instance Information > Deployment Information, confirm the VPC network information of the cluster and the subnets corresponding to different AZs.
Then, in VPC > Subnet, obtain the CIDR and AZ mapping information for each subnet.
Note
 Both the VPC name and subnet name may be duplicated, so you need to further verify the information in Instance Information under Cluster Resources.
2. Prepare the rack awareness script RackAware.py based on the subnet CIDR and AZ mapping information.
Note:
This script uses the /usr/bin/python path with the Python 2 version as an example. Replace #CIDR# with the actual subnet CIDR in the script.
#!/usr/bin/python
﻿
import sys
import IPy
import re
﻿
DEFAULT_RACK="/default-rack"
cidrToRack = {
 ' #CIDR#' : 'rack-1', 
 ' #CIDR#' : 'rack-2', 
 ' #CIDR#' : 'rack-3'
 }
﻿
for name in sys.argv[1:]:
 rack = DEFAULT_RACK
 ips = re.findall(r'[0-9]+(?:\\.[0-9]+){3}', name)
 if len(name) > 0 and len(ips) > 0:
     ip = ips[0]
     for cidr in cidrToRack.keys():
       if ip in IPy.IP(cidr):
             rack = cidrToRack[cidr]
             break
 print "/{0}".format(rack)
3. In the Cluster Service > HDFS > Configuration Management, add the RackAware.py file and update the NameNode node’s core-site.xml file with the configuration item net.topology.script.file.name=/usr/local/service/hadoop/etc/hadoop/RackAware.py.
4. In the console, restart NameNode and ResourceManager.
Viewing Cluster Rack Information
HDFS service: Log in to the NameNode, and execute the command hdfs dfsadmin -printTopology as the hadoop user, as shown below:
﻿

YARN service: You can log in to the WebUI to view the information:
﻿
﻿
﻿
﻿

Help and Support

Was this page helpful?

You can also Contact sales or Submit a Ticket for help.

Help us improve! Rate your documentation experience in 5 mins.

Feedback

tencent cloud

Elastic MapReduce

Cross-AZ High Availability

Overview of Rack Awareness

Configuring Rack Awareness Policy Based on Scripts

Viewing Cluster Rack Information

Help and Support