tencent cloud

Elastic MapReduce

Release Notes and Announcements
Release Notes
Announcements
Security Announcements
Product Introduction
Overview
Strengths
Architecture
Features
Use Cases
Constraints and Limits
Technical Support Scope
Product release
Purchase Guide
EMR on CVM Billing Instructions
EMR on TKE Billing Instructions
EMR Serverless HBase Billing Instructions
Getting Started
EMR on CVM Quick Start
EMR on TKE Quick Start
EMR on CVM Operation Guide
Planning Cluster
Administrative rights
Configuring Cluster
Managing Cluster
Managing Service
Monitoring and Alarms
TCInsight
EMR on TKE Operation Guide
Introduction to EMR on TKE
Configuring Cluster
Cluster Management
Service Management
Monitoring and Ops
Application Analysis
EMR Serverless HBase Operation Guide
EMR Serverless HBase Product Introduction
Quotas and Limits
Planning an Instance
Managing an Instance
Monitoring and Alarms
Development Guide
EMR Development Guide
Hadoop Development Guide
Spark Development Guide
Hbase Development Guide
Phoenix on Hbase Development Guide
Hive Development Guide
Presto Development Guide
Sqoop Development Guide
Hue Development Guide
Oozie Development Guide
Flume Development Guide
Kerberos Development Guide
Knox Development Guide
Alluxio Development Guide
Kylin Development Guide
Livy Development Guide
Kyuubi Development Guide
Zeppelin Development Guide
Hudi Development Guide
Superset Development Guide
Impala Development Guide
Druid Development Guide
TensorFlow Development Guide
Kudu Development Guide
Ranger Development Guide
Kafka Development Guide
Iceberg Development Guide
StarRocks Development Guide
Flink Development Guide
JupyterLab Development Guide
MLflow Development Guide
Practical Tutorial
Practice of EMR on CVM Ops
Data Migration
Practical Tutorial on Custom Scaling
API Documentation
History
Introduction
API Category
Cluster Resource Management APIs
Cluster Services APIs
User Management APIs
Data Inquiry APIs
Scaling APIs
Configuration APIs
Other APIs
Serverless HBase APIs
YARN Resource Scheduling APIs
Making API Requests
Data Types
Error Codes
FAQs
EMR on CVM
Service Level Agreement
Contact Us

Druid Usage

PDF
Focus Mode
Font Size
Last updated: 2025-01-03 15:02:25
EMR allows you to deploy an E-MapReduce Druid cluster as an independent cluster based on the following considerations:
Use case: E-MapReduce Druid can be used without Hadoop to adapt to different business use cases.
Resource preemption: E-MapReduce Druid has high requirements for the memory, especially with the Broker and Historical nodes. The resource usage of E-MapReduce Druid is not scheduled by Hadoop YARN; therefore, resource preemption tends to occur during operations.
Cluster size: As an infrastructure, Hadoop generally has a large size, while E-MapReduce Druid is relatively small. When they are deployed in the same cluster, resources may be wasted due to their different sizes. Therefore, separate deployment is more flexible.

Purchase suggestions

To purchase a Druid cluster, select Druid as the cluster type when creating the EMR cluster. The Druid cluster has built-in Hadoop HDFS and YARN services integrated with Druid, which are recommended for testing only. We strongly recommend you use a dedicated Hadoop cluster in the production environment. To disable the built-in Hadoop services for the Druid cluster, go to the EMR console, select the target service pane on the Cluster services page, and click Operation > Pause service to suspend the service.

Configuring connectivity between Hadoop and Druid clusters

This section describes how to configure the connectivity between the Hadoop and Druid clusters. If you use the built-in Hadoop cluster in the Druid cluster (which is not recommended for the production environment), they can be properly connected with no additional settings required, and you can skip this section.
If you want to store the index data in the HDFS of another independent Hadoop cluster (which is recommended for the production environment), you need to configure the connectivity between the two clusters in the following steps:
1. Make sure that the Druid and Hadoop clusters can properly communicate with each other. The two clusters should be in the same VPC. If they are in different VPCs, the two VPCs should be able to communicate with each other (through CCN or Peering Connection, for example).
2. Copy the core-site.xml, hdfs-site.xml, yarn-site.xml, and mapred-site.xml files in /usr/local/service/hadoop/etc/hadoop in the Hadoop cluster and paste them in /usr/local/service/druid/conf/druid/_common on each node in the E-MapReduce Druid cluster.
Note:
As the Druid cluster has a built-in Hadoop cluster, the relevant soft links to the files above already exist in the Druid path. You need to delete them first before copying the configuration files of another Hadoop cluster. In addition, you need to make sure that the file permissions are correct so that the files can be accessed by the hadoop user.
3. Modify the common.runtime.properties configuration file in Druid configuration management, save the change, and restart the Druid cluster services.
druid.storage.type: It defaults to hdfs and does not need to be modified
druid.storage.storageDirectory:
If the target Hadoop cluster is non-HA: hdfs://{namenode_ip}:4007
If the target Hadoop cluster is HA: hdfs://HDFSXXXXX
Configure the full path, which can be found in the `fs.defaultFS` configuration item in the `core-site.xml` file of the target Hadoop cluster.

Using COS

E-MapReduce Druid can use COS as the deep storage. This section describes how to configure COS as the deep storage of the Druid cluster.
First, you need to make sure that COS has been activated for both the Druid cluster and the target Hadoop cluster. You can activate COS when purchasing the clusters or configure COS in the EMR console after purchasing them.
1. Modify the common.runtime.properties configuration file in Druid configuration management:
druid.storage.type: hdfs
druid.storage.storageDirectory: cosn://{bucket_name}/druid/segments You can create the segments directory on COS and set its permissions in advance.
2. Modify the core-site.xml configuration file in HDFS configuration management:
Set fs.cosn.impl to org.apache.hadoop.fs.CosFileSystem.
Add a new configuration item fs.AbstractFileSystem.cosn.impl and set it to org.apache.hadoop.fs.CosN.
3. Put the JAR packages related to hadoop-cos (such as cos_api-bundle-5.6.69.jar and hadoop-cos-2.8.5-8.1.6.jar) into the /usr/local/service/druid/extensions/druid-hdfs-storage, /usr/local/service/druid/hadoopdependencies/hadoop-client/2.8.5, and /usr/local/service/hadoop/share/hadoop/common/lib/ directories on each node of the cluster.
Save the configuration and restart the Druid cluster services.

Modifying Druid parameters

After you create the E-MapReduce Druid cluster, a set of configuration items will be generated automatically. However, we recommend you modify the memory configuration as needed to achieve the optimal performance. You can do so on the [Configurations]https://www.tencentcloud.com/document/product/1026/31109) page in the EMR console.
When modifying the configuration, make sure that the modification is correct:
MaxDirectMemorySize >= druid.processing.buffer.sizeByte *(druid.processing.numMergeBuffers + druid.processing.numThreads + 1)
Modification suggestion:
druid.processing.numMergeBuffers = max(2, druid.processing.numThreads / 4)
druid.processing.numThreads = Number of cores - 1 (or 1)
druid.server.http.numThreads = max(10, (Number of cores * 17) / 16 + 2) + 30
For more information on the configuration, see Configuration reference.

Using a router as a query node

Currently, a Druid cluster deploys the Broker process on the EMR master node by default. As there are many processes deployed on the master node, they may interfere with each other, which may lead to insufficient memory and compromise the query efficiency. In addition, many businesses require that the query nodes and core nodes be separately deployed. In this case, you can add one or more router nodes in the console and install the Broker processes so as to scale out the query nodes of the Druid cluster.

Accessing the web

You can access the Druid cluster in the console through the port 18888 on the master node and configure the public IP on your own. After opening port 18888 in the security group and setting the bandwidth, you can access the cluster at [http://{masterIp}:18888]().

Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback