Feature Introduction
Shuffle, as a core operator of big data computing, provides high coverage rate but witnesses high resource consumption and prominent stability problems, as the EMR customers and the user scale of Spark continuously grow, and containerized applications become popular.
Based on such background, Tencent Cloud Elastic MapReduce (EMR) has added a new RSS cluster type and supports the Uniffle component. It allows Spark clusters of both EMR on CVM and EMR on TKE versions to bind to RSS clusters with one click, providing remote Shuffle services for Spark jobs. This aims to improve the performance of large data Shuffle tasks or enhance task stability in node recycling scenarios.
1. Log in to the EMR console, click Creating a Cluster on the EMR on CVM cluster list page, and select RSS as the cluster type when creating the cluster. You need to configure the software, region and hardware, basic settings, and confirm the configuration information in 4 steps. 2. In the software configuration step, carefully select the region, as Shuffle services do not support cross-region operations.
3. In the region and hardware configuration step, the minimum number of nodes for a high-availability cluster is 4, including 2 master nodes and at least 2 core nodes. Non-HA clusters have only one replica stored and is suitable for testing but not recommended for production environments.
Note:
1. Evaluate the memory configuration, disk IO bandwidth, etc., for the new RSS cluster based on the number of parallel Spark SQL tasks and the Shuffle data volume of the tasks.
2. An RSS cluster supports shared use for all Spark clusters in the same region.
4. After completing the above 4 configuration steps, click Purchase to make the payment. Once the payment is successful, the EMR cluster will enter the creation process. You can find the newly created cluster in the EMR console in about 10 minutes.
Operation Steps
Using Uniffle in Spark Clusters
Using Uniffle in EMR on CVM Spark Clusters
1. In the EMR on CVM cluster list, select the cluster that needs to enable the Shuffle service, and click Manage under Service.
2. Select Cluster Service, and in the Spark Service panel, click Operation and then Set Shuffle Service.
3. Enable the service and select the required RSS cluster ID that is to be bound.
By default, spark.shuffle.manager is disabled. If it is enabled, all submitted Spark tasks will use the RSS-Shuffle service. It is recommended to enable it at the task level. The steps are as follows:
1. Modify the spark-defaults.conf configuration file for the Spark component on the Spark configuration management page.
spark.shuffle.manager=org.apache.spark.shuffle.RssShuffleManager
spark.driver.extraClassPath=/usr/local/service/spark/rss-client/* (If this parameter already exists, separate multiple values with a colon :)
spark.executor.extraClassPath=/usr/local/service/spark/rss-client/* (If this parameter already exists, separate multiple values with a colon :)
2. Spark dynamic allocation.
2.1 If you use dynamic resource allocation (spark.dynamicAllocation.enabled=true) with Spark and need to use Uniffle, you need to apply a patch on Spark according to the specific version, and submit a ticket. 2.2 Modify the spark-defaults.conf configuration file: spark.shuffle.service.enabled=false.
Note:
If the spark version is 3.5 or above, you need to additionally configure the parameter spark.shuffle.sort.io.plugin.class=org.apache.spark.shuffle.RssShuffleDataIo.
Disabling Uniffle in EMR on CVM Spark Clusters
1. In the Spark Service panel, click Operation, Set Shuffle Service, and click the button to disable the service.
2. Remove the Spark parameter spark.shuffle.manager=org.apache.spark.shuffle.RssShuffleManager on the configuration management page and apply the change.
Using Uniffle in EMR on TKE Spark Clusters
The operations are the same as those for using Uniffle in EMR on CVM spark clusters, but the paths for associating and unbinding the RSS cluster are different. The operation paths are as follows:
1. In the EMR on TKE cluster list, select the cluster that needs to enable the Shuffle service, and click the cluster ID to enter the Cluster Information page.
2. Select Cluster Service. In cluster service Spark Service, click Operation in the upper right corner, and click Set Shuffle Service.
3. Enable the service and select the required RSS cluster ID that is to be bound.