tencent cloud

Feedback

Cluster Types

Last updated: 2022-08-12 14:59:18

    EMR supports six cluster types and their respective use cases and defines five node types. Different cluster types and their respective use cases support different node types, number of deployed nodes, and deployed services. You can select the most appropriate cluster type and use case based on your business needs when creating a cluster.

    Note:

    ClickHouse, Doris, and Kafka cluster types are not available by default. To use them, submit a ticket for application.

    Cluster Type Description

    Hadoop cluster

    Use Case Description Node Deployment Description
    Default use case Based on open-source Hadoop and the components that form a Hadoop ecosystem, it provides big data solutions for massive data storage, offline/real-time data analysis, streaming data compute, and machine learning.
    • Master node: It is a management node that ensures the scheduling of the cluster works properly. Processes such as NameNode, ResourceManager, and HMaster are deployed here. The number of master nodes is 1 in non-HA mode or 2 in HA mode.
      Note: If Kudu is included in the deployed components, the cluster supports only the HA mode, and the number of master nodes is 3.
    • Core node: It is a compute and storage node. All your data in HDFS is stored in core nodes. Therefore, in order to ensure data security, once core nodes are scaled out, they cannot be scaled in. Processes such as DataNode, NodeManager, and RegionServer are deployed here. The number of core nodes is ≥2 in non-HA mode or ≥3 in HA mode.
    • Task node: It is a pure compute node and does not store any data. The computed data comes from a core node or COS. Therefore, it is often used as an elastic node and can be scaled in or out at any time. Processes such as NodeManager and PrestoWork are deployed here. The number of task nodes can be changed at any time to scale the cluster. The minimum value is 0.
    • Common node: It provides data sharing and syncing and HA fault tolerance services for the master nodes in an HA cluster. Distributed coordinator components such as ZooKeeper and JournalNode are deployed here. The number of common nodes is 0 in non-HA mode or ≥3 in HA mode.
    • Router node: It is used to share the load of a master node or as the task submitter of the cluster. It can be scaled in or out at any time. Hadoop packages, including software programs and processes such as Hive, Hue, and Spark, are deployed here. The number of router nodes can be changed at any time. The minimum value is 0.
    ZooKeeper It is suitable for creating a distributed, high-availability coordination service for large clusters.
    • Common node: Distributed coordinator components such as ZooKeeper are deployed here. The number of deployed nodes must be odd and at least three common nodes. Common nodes support only the HA mode.
    HBase It is suitable for storing massive amounts of unstructured or semi-structured data. It provides a high-reliability, high-performance, column-oriented, scalable distributed storage system that supports real-time data read/write.
    • Master node: It is a management node that ensures the scheduling of the cluster works properly. Processes such as NameNode, ResourceManager, and HMaster are deployed here. The number of master nodes is 1 in non-HA mode or 2 in HA mode.
    • Core node: It is a compute and storage node. All your data in HDFS is stored in core nodes. Therefore, in order to ensure data security, once core nodes are scaled out, they cannot be scaled in. Processes such as DataNode, NodeManager, and RegionServer are deployed here. The number of core nodes is ≥2 in non-HA mode or ≥3 in HA mode.
    • Task node: It is a pure compute node and does not store any data. The computed data comes from a core node or COS. Therefore, it is often used as an elastic node and can be scaled in or out at any time. Processes such as NodeManager are deployed here. The number of task nodes can be changed at any time to scale the cluster. The minimum value is 0.
    • Common node: It provides data sharing and syncing and HA fault tolerance services for the master nodes in an HA cluster. Distributed coordinator components such as ZooKeeper and JournalNode are deployed here. The number of common nodes is 0 in non-HA mode or ≥3 in HA mode.
    • Router node: It is used to share the load of a master node or as the task submitter of the cluster. It can be scaled in or out at any time. The number of router nodes can be changed at any time. The minimum value is 0.
    Presto It provides an open-source distributed SQL query engine for quick query and analysis of massive amounts of data. It is suitable for interactive analytical queries.
    • Master node: It is a management node that ensures the scheduling of the cluster works properly. Processes such as NameNode and ResourceManager are deployed here. The number of master nodes is 1 in non-HA mode or 2 in HA mode.
    • Core node: It is a compute and storage node. All your data in HDFS is stored in core nodes. Therefore, in order to ensure data security, once core nodes are scaled out, they cannot be scaled in. Processes such as DataNode and NodeManager are deployed here. The number of core nodes is ≥2 in non-HA mode or ≥3 in HA mode.
    • Task node: It is a pure compute node and does not store any data. The computed data comes from a core node or COS. Therefore, it is often used as an elastic node and can be scaled in or out at any time. Processes such as NodeManager and PrestoWork are deployed here. The number of task nodes can be changed at any time to scale the cluster. The minimum value is 0.
    • Common node: It provides data sharing and syncing and HA fault tolerance services for the master nodes in an HA cluster. Distributed coordinator components such as ZooKeeper and JournalNode are deployed here. The number of common nodes is 0 in non-HA mode or ≥3 in HA mode.
    • Router node: It is used to share the load of a master node or as the task submitter of the cluster. It can be scaled in or out at any time. The number of router nodes can be changed at any time. The minimum value is 0.
    Kudu It provides a distributed and scalable columnar storage manager and supports random reads/writes and OLAP analysis to process frequently updated data.
    • Master node: It is a management node that ensures the scheduling of the cluster works properly. Processes such as NameNode and ResourceManager are deployed here. The number of master nodes is 1 in non-HA mode or 2 in HA mode.
    • Core node: It is a compute and storage node. All your data in HDFS is stored in core nodes. Therefore, in order to ensure data security, once core nodes are scaled out, they cannot be scaled in. The number of core nodes is ≥2 in non-HA mode or ≥3 in HA mode.
    • Task node: It is a pure compute node and does not store any data. The computed data comes from a core node or COS. Therefore, it is often used as an elastic node and can be scaled in or out at any time. The number of task nodes can be changed at any time to scale the cluster. The minimum value is 0.
    • Common node: It provides data sharing and syncing and HA fault tolerance services for the master nodes in an HA cluster. Distributed coordinator components such as ZooKeeper and JournalNode are deployed here. The number of common nodes is 0 in non-HA mode or ≥3 in HA mode.
    • Router node: It is used to share the load of a master node or as the task submitter of the cluster. It can be scaled in or out at any time. The number of router nodes can be changed at any time. The minimum value is 0.

    Druid cluster

    Use Case Description Node Deployment Description
    Default use case It supports high-performance real-time analysis, big data queries in milliseconds, and multiple data ingestion methods. It is suitable for real-time big data query scenarios.
    • Master node: It is a management node that ensures the scheduling of the cluster works properly. Processes such as NameNode and ResourceManager are deployed here. The number of master nodes is 1 in non-HA mode or 2 in HA mode.
    • Core node: It is a compute and storage node. All your data in HDFS is stored in core nodes. Therefore, in order to ensure data security, once core nodes are scaled out, they cannot be scaled in. Processes such as DataNode and NodeManager are deployed here. The number of core nodes is ≥2 in non-HA mode or ≥3 in HA mode.
    • Task node: It is a pure compute node and does not store any data. The computed data comes from a core node or COS. Therefore, it is often used as an elastic node and can be scaled in or out at any time. Processes such as NodeManager are deployed here. The number of task nodes can be changed at any time to scale the cluster. The minimum value is 0.
    • Common node: It provides data sharing and syncing and HA fault tolerance services for the master nodes in an HA cluster. Distributed coordinator components such as ZooKeeper and JournalNode are deployed here. The number of common nodes is 0 in non-HA mode or ≥3 in HA mode.
    • Router node: It is used to share the load of a master node or as the task submitter of the cluster. It can be scaled in or out at any time. The number of router nodes can be changed at any time. The minimum value is 0.

    ClickHouse cluster

    Use Case Description Node Deployment Description
    Default use case It provides a column-oriented database management system. It is suitable for data warehouse analysis scenarios such as real-time wide table analysis, real-time BI report analysis, and user behavior analysis.
    • Core node: It is a compute and storage node. ClickHouseServer is deployed here.
    • Common node: It provides data sharing and syncing and HA fault tolerance services for the master nodes in an HA cluster. Distributed coordinator components such as ZooKeeper are deployed here. The number of common nodes is 0 in non-HA mode or ≥3 in HA mode.

    Doris cluster

    Use Case Description Node Deployment Description
    Default use case It provides an MPP analytical database product that supports sub-second queries on PB-scale, structured data. It is compatible with MySQL protocol and uses the standard SQL syntax. It is suitable for historical report analysis, real-time data analysis, interactive data analysis, etc.
    • Master node: It is a frontend module that provides the Web UI feature. Processes such as FE Follower and Broker are deployed here. The number of master nodes is ≥1 in non-HA mode or ≥3 in HA mode.
    • Core node: It is a backend module that provides the data storage feature. Processes such as BE and Broker are deployed here. The number of core nodes is ≥3.
    • Router node: It is a frontend module that helps achieve high read/write availability. Processes such as FE Observer and Broker are deployed here. Router nodes can be scaled out but not in.

    Kafka cluster

    Use Case Description Node Deployment Description
    Default use case It provides a distributed, partitioned, multi-replica, and multi-subscriber message processing system based on ZooKeeper coordination. It is suitable for asynchronous processing, message communication, and streaming data receiving and distribution.
    • Core node: It provides a distributed, partitioned, multi-replica, and multi-subscriber message processing system based on ZooKeeper coordination. It is suitable for asynchronous processing, message communication, and streaming data receiving and distribution.
    • Common node: It provides data sharing and syncing and HA fault tolerance services for the core nodes in an HA cluster. The number of common nodes is 0 in non-HA mode or ≥3 in HA mode.

    StarRocks cluster

    Use Case Description Node Deployment Description
    Default use case StarRocks adopts full vectorization technology. It supports extremely fast and unified OLAP databases. It is suitable for many data analysis scenarios, such as multidimensional, real-time, and high-concurrency analysis.
    • Master node: It is a frontend module that provides the Web UI feature. Processes such as FE Follower and Broker are deployed here. The number of master nodes is ≥1 in non-HA mode or ≥3 in HA mode.
    • Core node: It is a backend module that provides the data storage feature. Processes such as BE and Broker are deployed here. The number of core nodes is ≥3.
    • Router node: It is a frontend module that helps achieve high read/write availability. Processes such as FE Observer and Broker are deployed here. Router nodes can be scaled out but not in.
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support