How do I fix failed operations on the EMR master node due to low configuration?
Symptoms
As the master node's configuration is too low, Hive or Spark jobs submitted to it report errors or are directly killed.
Cause analysis
The memory of the master node is insufficient, causing other applications to be killed due to OOM.
Solution
- Too many businesses are deployed on the EMR master node, which usually becomes the bottleneck of the entire cluster. However, the master node cannot be scaled out; instead, it can only be upgraded as described below:
- First, find the node where the standby NameNode resides in the cluster.
- Run the following command on the standby NameNode to enter the safe mode.
hdfs dfsadmin -fs 10.0.0.9(standby node IP):4007 -safemode enter Enter the safe mode
- Run the following command on the standby NameNode to save the metadata.
hdfs dfsadmin -fs 10.0.0.9(standby node IP):4007 -saveNamespace Save the metadata
- Run the following command on the standby NameNode to exit the safe mode.
hdfs dfsadmin -fs 10.0.0.9(standby node IP):4007 -safemode leave Exit the safe mode
- Then, in the EMR Console (or the CVM Console for a legacy cluster), upgrade the active node.
- Upgrade the standby node and make the configuration of the master's active node the same as that of the standby node.
If your cluster is not a high-availability one, then it will become unavailable for a while during the upgrade.
- In Spark, jobs are committed in client mode by default, and the driver runs on the master node. You can change the mode to master mode and then commit jobs.
- For the Hive component, enable the router node, migrate HiveServer2 to it, and then disable the Hive component on the master node. For detailed directions, please see Migrating HiveServer2 to Router.
- Disable components that are not commonly used on the master node or migrate Hue to the router node.
Directions for migrating Hue to the router node:
- Enter the EMR Console, Add a router node on the Cloud Hardware Management page, and select the Hue component.
- After the scale-out, disable the original Hue component on the master node, retain that on the router node, bind a public EIP to the router node, and open the source policy and ports in the security group.
Preset values of memory size for master node components in EMR cluster and recommendations
- List of heap memories of common components
Component |
Process |
Configuration File |
Configuration Item |
Default Heap Memory (in MB) |
HDFS |
Namenode |
hadoop-env.sh |
NNHeapsize |
4,096 |
YARN |
Resourcemanager |
yarn-env.sh |
Heapsize |
2,000 |
Hive |
Hiveserver2 |
hive-env.sh |
HS2Heapsize |
4,096 |
HBase |
Hmaster |
hbase-env.sh |
Heapsize |
1,024 |
Presto |
Coordinator |
jvm.config |
Maximum JVM |
3,072 |
Spark |
spark-driver |
spark-defaults.conf |
spark.driver.memory |
1,024 |
Oozie |
Oozie |
- |
- |
1,024 |
Storm |
Nimbus |
- |
- |
1,024 |
- Suggested preset values for components
Component |
Suggested Heap Memory Size |
HDFS (NameNode) |
Minimum heap memory = 250 x number of files + 290 x number of directories + 368 x number of blocks |
YARN (ResourceManager) |
It can be increased as needed |
Hive (HiveServer2) |
It can be increased as needed |
HBase (HMaster) |
The master node only receives DDL requests and performs load balancing. The default size of 1 GB is generally sufficient |
Presto (Coordinator) |
Use the default value |
Spark (spark-driver) |
It can be increased as needed |
Oozie (oozie) |
Use the default value |
Storm (Nimbus) |
Use the default value |
- Suggested idle memory size for servers: 10–20% of the total memory size.
- You can deploy EMR components in independent mode or hybrid mode as needed.
- Independent deployment: it is suitable for HDFS clusters for storage, HBase clusters for analysis of massive amounts of data, and Spark clusters for job computation.
- Hybrid deployment: multiple components can be deployed in a cluster in this mode, which is suitable for testing clusters or scenarios where the business volume is not high or resource preemption is negligible.
Was this page helpful?