tencent cloud

Elastic MapReduce

Release Notes and Announcements
Release Notes
Announcements
Security Announcements
Product Introduction
Overview
Strengths
Architecture
Features
Use Cases
Constraints and Limits
Technical Support Scope
Product release
Purchase Guide
EMR on CVM Billing Instructions
EMR on TKE Billing Instructions
EMR Serverless HBase Billing Instructions
Getting Started
EMR on CVM Quick Start
EMR on TKE Quick Start
EMR on CVM Operation Guide
Planning Cluster
Administrative rights
Configuring Cluster
Managing Cluster
Managing Service
Monitoring and Alarms
TCInsight
EMR on TKE Operation Guide
Introduction to EMR on TKE
Configuring Cluster
Cluster Management
Service Management
Monitoring and Ops
Application Analysis
EMR Serverless HBase Operation Guide
EMR Serverless HBase Product Introduction
Quotas and Limits
Planning an Instance
Managing an Instance
Monitoring and Alarms
Development Guide
EMR Development Guide
Hadoop Development Guide
Spark Development Guide
Hbase Development Guide
Phoenix on Hbase Development Guide
Hive Development Guide
Presto Development Guide
Sqoop Development Guide
Hue Development Guide
Oozie Development Guide
Flume Development Guide
Kerberos Development Guide
Knox Development Guide
Alluxio Development Guide
Kylin Development Guide
Livy Development Guide
Kyuubi Development Guide
Zeppelin Development Guide
Hudi Development Guide
Superset Development Guide
Impala Development Guide
Druid Development Guide
TensorFlow Development Guide
Kudu Development Guide
Ranger Development Guide
Kafka Development Guide
Iceberg Development Guide
StarRocks Development Guide
Flink Development Guide
JupyterLab Development Guide
MLflow Development Guide
Practical Tutorial
Practice of EMR on CVM Ops
Data Migration
Practical Tutorial on Custom Scaling
API Documentation
History
Introduction
API Category
Cluster Resource Management APIs
Cluster Services APIs
User Management APIs
Data Inquiry APIs
Scaling APIs
Configuration APIs
Other APIs
Serverless HBase APIs
YARN Resource Scheduling APIs
Making API Requests
Data Types
Error Codes
FAQs
EMR on CVM
Service Level Agreement
Contact Us

Cluster Types

PDF
Focus Mode
Font Size
Last updated: 2026-01-14 15:26:39
Elastic MapReduce (EMR) supports various cluster types and corresponding application scenarios, and defines 5 node types. Different cluster types and their respective use cases support different node types, number of deployed nodes, and deployed services. You can select the most appropriate cluster type and use case based on your business needs when creating a cluster.

Cluster Type Description

Hadoop cluster

Use Case
Description
Node Deployment
Default
Based on open-source Hadoop and the components that form a Hadoop ecosystem, it provides big data solutions for massive data storage, offline/real-time data analysis, streaming data computing, and machine learning.
Master node: It is a management node that ensures the scheduling of the cluster works properly. Processes such as NameNode, ResourceManager, and HMaster are deployed here. The number of master nodes is 1 in non-HA mode and 2 in HA mode.
Note: If Kudu is deployed, the cluster supports only the HA mode, and there are 3 master nodes.
Core node: It is a compute and storage node. All your data in HDFS is stored in core nodes. Therefore, core nodes cannot be scaled in once scaled out to ensure data security. Processes such as DataNode, NodeManager, and RegionServer are deployed here. The number of core nodes is ≥ 2 in non-HA mode and ≥ 3 in HA mode.
Task node: It is a node for computing only and does not store any data. The computed data comes from a core node or COS. Therefore, task nodes are often elastic nodes and can be scaled in or out as needed. Processes such as NodeManager and PrestoWork are deployed here. The number of task nodes can be changed at any time to scale the cluster, with a minimum value of 0.
Common node: It provides data sharing and syncing and HA fault tolerance services for the master nodes in an HA cluster. Distributed coordinator components such as ZooKeeper and JournalNode are deployed here. The number of common nodes is 0 in non-HA mode and ≥ 3 in HA mode.
Router node: It is used to share the load of a master node or as the task submitter of the cluster. It can be scaled in or out at any time. Hadoop packages, including software programs and processes such as Hive, Hue, and Spark, are deployed here. The number of router nodes can be changed at any time, with a minimum value of 0.
ZooKeeper
It is suitable for creating a distributed, high-availability coordination service for large clusters.
Common node: Distributed coordinator components such as ZooKeeper are deployed here. The number of deployed common nodes must be odd and be at least three. Common nodes support only the HA mode.
HBase
It is suitable for storing massive amounts of unstructured or semi-structured data. It provides a high-reliability, high-performance, column-oriented, scalable distributed storage system that supports real-time data read/write.
Master node: It is a management node that ensures the scheduling of the cluster works properly. Processes such as HMaster、HBaseThrift、NameNode and ResourceManager are deployed here. The number of master nodes is 1 in non-HA mode and 2 in HA mode.
Core node: It is a compute and storage node. All your data in HDFS is stored in core nodes. Therefore, core nodes cannot be scaled in once scaled out to ensure data security. Processes such as DataNode, NodeManager, and RegionServer are deployed here. The number of core nodes is ≥ 2 in non-HA mode and ≥ 3 in HA mode.
Task node: It is a node for computing only and does not store any data. The computed data comes from a core node or COS. Therefore, task nodes are often elastic nodes and can be scaled in or out as needed. Processes such as NodeManager are deployed here. The number of task nodes can be changed at any time to scale the cluster, with a minimum value of 0.
Common node: It provides data sharing and syncing and HA fault tolerance services for the master nodes in an HA cluster. Distributed coordinator components such as ZooKeeper and JournalNode are deployed here. The number of common nodes is 0 in non-HA mode and ≥ 3 in HA mode.
Router node: It is used to share the load of a master node or as the task submitter of the cluster. It can be scaled in or out at any time. The number of router nodes can be changed at any time, with a minimum value of 0.
Trino(Presto)
It provides an open-source distributed SQL query engine for quick query and analysis of massive amounts of data. It is suitable for interactive analytical queries.
Master node: It is a management node that ensures the scheduling of the cluster works properly. Processes such as Trino-Coordinator and NameNode are deployed here. The number of master nodes is 1 in non-HA mode and 2 in HA mode.
Core node: Core nodes are compute nodes and storage nodes. If the HDFS service is deployed, all the data in HDFS are stored in core nodes. Therefore, core nodes cannot be scaled in once scaled out to ensure data security. Processes such as DataNode and NodeManager are deployed here. The number of core nodes is ≥ 2 in non-HA mode and ≥ 3 in HA mode.
Task node: It is a node for computing only and does not store any data. The computed data comes from a core node or COS. Therefore, task nodes are often elastic nodes and can be scaled in or out as needed. Processes such as NodeManager and PrestoWork are deployed here. The number of task nodes can be changed at any time to scale the cluster, with a minimum value of 0.
Common node: Common nodes provide data sharing and syncing and HA fault tolerance services for the master nodes in HA mode of services including HDFS and YARN. Distributed coordinator components such as ZooKeeper and JournalNode are deployed here. The number of common nodes is 0 in non-HA mode and ≥ 3 in HA mode.
Router node: It is used to share the load of a master node or as the task submitter of the cluster. It can be scaled in or out at any time. The number of router nodes can be changed at any time, with a minimum value of 0.
Kudu
It provides a distributed and scalable columnar storage manager and supports random reads/writes and OLAP analysis to process frequently updated data.
Master node: It is a management node that ensures the scheduling of the cluster works properly. Processes such as NameNode and ResourceManager are deployed here. The number of master nodes is 1 in non-HA mode and 2 in HA mode.
Core node: It is a compute and storage node. All your data in HDFS is stored in core nodes. Therefore, core nodes cannot be scaled in once scaled out to ensure data security. The number of core nodes is ≥ 2 in non-HA mode and ≥ 3 in HA mode.
Task node: It is a node for computing only and does not store any data. The computed data comes from a core node or COS. Therefore, task nodes are often elastic nodes and can be scaled in or out as needed. The number of task nodes can be changed at any time to scale the cluster, with a minimum value of 0.
Common node: It provides data sharing and syncing and HA fault tolerance services for the master nodes in an HA cluster. Distributed coordinator components such as ZooKeeper and JournalNode are deployed here. The number of common nodes is 0 in non-HA mode and ≥ 3 in HA mode.
Router node: It is used to share the load of a master node or as the task submitter of the cluster. It can be scaled in or out at any time. The number of router nodes can be changed at any time, with a minimum value of 0.

Kafka cluster

Use Case
Description
Node Deployment
Default
It is a distributed, partitioned, multi-replica, and multi-subscriber message processing system based on ZooKeeper coordination. It is suitable for asynchronous processing, message communication, and streaming data receiving and distribution.
Core node: It is a backend module for data storage. Processes such as BE and Broker are deployed here. The number of core nodes is ≥ 1 in non-HA mode or ≥ 2 in HA mode.
Common node: It provides data sharing and syncing and HA fault tolerance services for the core nodes in an HA cluster. The number of common nodes is 0 in non-HA mode or ≥ 3 in HA mode.

RSS Cluster

Application Scenario
Description
Node Deployment Description
Default scenario
Store shuffle data on remote servers for Spark applications.
Master node: It is a management node for the entire cluster that ensures the scheduling of the cluster works properly. Processes such as coordinators are deployed here. It collects the load situation of shuffle servers through the heartbeat mechanism and allocates suitable shuffle servers for jobs based on this information. The number of master nodes is 1 in non-HA mode and 2 in HA mode.
Core node: It is a compute and storage node. Roles such as shuffle servers are deployed here. It mainly receives shuffle data, merges it, and writes it to storage. Shuffle servers read shuffle data stored in disks. The number of core nodes is 1 in non-HA mode and 2 in HA mode.
Router node: It is used to share the load of a master node or serve as the task submitter of the cluster. It can be scaled out or in at any time. The number of router nodes can be changed at any time, with a minimum value of 0.

StarRocks Cluster

StarRocks adopts full vectorization technology. It supports extremely fast and unified OLAP databases. It is suitable for many data analysis scenarios, such as multidimensional, real-time, and high-concurrency analysis.
Application Scenario
Description
Node Deployment Description
Storage-compute integration
The data is stored in the local core nodes of the cluster. Cloud SSD or NVMe SSD local disks can be used as the storage medium, which provide high data read/write efficiency and are suitable for scenarios with requirements for high query performance.
Master node: It is a frontend module that provides the WebUI feature. Processes such as FE Follower and Broker are deployed here. The number of master nodes is no less than 1 in non-HA mode and no less than 3 in HA mode. It cannot be scaled in.
Core node: It is a backend module that mainly provides the data storage feature. Processes such as BE and Broker are deployed here. The number of core nodes is no less than 3.
Task node: It is a compute node. The computed data comes from a core node or Cloud Object Storage (COS). It can provide local data cache services and can be scaled out or in at any time. Processes such as Compute Node are deployed here. The number of task nodes can be changed at any time to scale the cluster. The minimum number of task nodes can be set to 0 when storage and computing are integrated.
Router node: It is a frontend module that helps achieve high read/write availability. Processes such as FE Observer and Broker are deployed here. Router nodes can be scaled out.
Storage-compute separation
The data is stored in COS. The compute node can provide local hot data cache services. This is suitable for business scenarios that are sensitive to storage costs and have relatively low requirements for query efficiency.
Master node: It is a frontend module that provides the WebUI feature. Processes such as FE Follower and Broker are deployed here. The number of master nodes is no less than 1 in non-HA mode and no less than 3 in HA mode. It cannot be scaled in.
Task node: It is a compute node. The computed data comes from COS. It can provide local data cache services and can be scaled out or in at any time. Processes such as Compute Node are deployed here. The number of task nodes can be changed at any time to scale the cluster. The minimum number of task nodes can be set to 3 when storage and computing are separated.
Router node: It is a frontend module that helps achieve high read/write availability. Processes such as FE Observer and Broker are deployed here. Router nodes can be scaled out.

Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback