tencent cloud

Elastic MapReduce

Release Notes and Announcements
Release Notes
Announcements
Security Announcements
Product Introduction
Overview
Strengths
Architecture
Features
Use Cases
Constraints and Limits
Technical Support Scope
Product release
Purchase Guide
EMR on CVM Billing Instructions
EMR on TKE Billing Instructions
EMR Serverless HBase Billing Instructions
Getting Started
EMR on CVM Quick Start
EMR on TKE Quick Start
EMR on CVM Operation Guide
Planning Cluster
Administrative rights
Configuring Cluster
Managing Cluster
Managing Service
Monitoring and Alarms
TCInsight
EMR on TKE Operation Guide
Introduction to EMR on TKE
Configuring Cluster
Cluster Management
Service Management
Monitoring and Ops
Application Analysis
EMR Serverless HBase Operation Guide
EMR Serverless HBase Product Introduction
Quotas and Limits
Planning an Instance
Managing an Instance
Monitoring and Alarms
Development Guide
EMR Development Guide
Hadoop Development Guide
Spark Development Guide
Hbase Development Guide
Phoenix on Hbase Development Guide
Hive Development Guide
Presto Development Guide
Sqoop Development Guide
Hue Development Guide
Oozie Development Guide
Flume Development Guide
Kerberos Development Guide
Knox Development Guide
Alluxio Development Guide
Kylin Development Guide
Livy Development Guide
Kyuubi Development Guide
Zeppelin Development Guide
Hudi Development Guide
Superset Development Guide
Impala Development Guide
Druid Development Guide
TensorFlow Development Guide
Kudu Development Guide
Ranger Development Guide
Kafka Development Guide
Iceberg Development Guide
StarRocks Development Guide
Flink Development Guide
JupyterLab Development Guide
MLflow Development Guide
Practical Tutorial
Practice of EMR on CVM Ops
Data Migration
Practical Tutorial on Custom Scaling
API Documentation
History
Introduction
API Category
Cluster Resource Management APIs
Cluster Services APIs
User Management APIs
Data Inquiry APIs
Scaling APIs
Configuration APIs
Other APIs
Serverless HBase APIs
YARN Resource Scheduling APIs
Making API Requests
Data Types
Error Codes
FAQs
EMR on CVM
Service Level Agreement
Contact Us

Meson Engine

PDF
フォーカスモード
フォントサイズ
最終更新日: 2025-09-26 16:41:47
Meson Engine is a high-performance vectorized query engine built into EMR Spark. It supports seamless acceleration of Spark SQL workloads and DataFrame API calls, reducing the overall cost of workloads. Compared with open-source Spark, it offers a 2.7x performance improvement in TPC-DS 1TB benchmark. Meson is fully compatible with Apache Spark APIs, requiring no changes to existing business code. In EMR product versions that support Meson Engine, you only need to modify a small amount of configuration to enable it.

Principle Introduction

With the extensive application of SSDs and significant improvement in network interface card performance, the performance bottleneck of the Spark engine has shifted from the traditional understanding of IO to computing resources mainly driven by CPU. However, CPU optimization schemes around JVM (such as Codegen) face many constraints, such as limits on bytecode length and number of parameters. Developers also find it difficult to leverage some features of modern CPUs on JVM.
The Meson Engine transforms Spark Physical Plan, uses a C++ implemented vectorized acceleration library to execute computations, and returns the executed data in a columnar format, enhancing memory and bandwidth utilization efficiency. This breakthrough in performance bottlenecks can effectively improve the efficiency of Spark jobs.

Usage Restrictions

The Meson Engine currently has usage scenario limits. In restricted scenarios, the Meson engine will perform Fallback and revert to the Native Spark engine for execution. Since Fallback needs to convert data, too many Fallback times may lead to a longer total running time than the Native Spark engine.
Please learn about the main usage limits of Meson Engine in advance.
Supports Parquet data format. ORC support is not currently optimized. Other data formats are not supported.
ANSI mode is not supported.
Applications based on RDD are not supported.
Structured Streaming is not supported.
Custom Python code based on PySpark is not supported.
MEMORY_ONLY CacheTable is not supported.

Applicable Scenarios

Support capability is provided based on Spark 3.5.3 and above versions.
Note:
Meson Engine does not fully support or has unsupported storage formats, data types, operators, and functions, which will fall back to Native Spark engine execution.

Storage Format

Meson engine supported data storage format:
Supported data formats: Parquet, ORC
Supported table formats: Iceberg, Hive

Data Types

Meson engine supported data types:
Byte,Short,Int,Long
Boolean
String,Binary
Decimal
Float,Double
Date,Timestamp

Operators

Type
Supported Operators
Unsupported Operators
Source
FileSourceScanExec,HiveTableScanExec,BatchScanExec,InMemoryTableScanExec
-
Sink
DataWritingCommandExec,InsertIntoHiveTable,
-
Common
FilterExec,ProjectExec,SortExec,UnionExec
-
Aggregate
HashAggregateExec
SortAggregateExec,ObjectHashAggregateExec
Join
BroadcastHashJoinExec,ShuffledHashJoinExec,SortMergeJoinExec,BroadcastNestedLoopJoinExec,CartesianProductExec
-
Window
WindowExec
WindowGroupLimitExec
Exchange
ShuffleExchangeExec,ReusedExchangeExec,BroadcastExchangeExec,CoalesceExec
CustomShuffleReaderExec
Limit
GlobalLimitExec,LocalLimitExec,TakeOrderedAndProjectExec,CollectLimitExec
-
Subquery
SubqueryBroadcastExec
-
Other
ExpandExec,GenerateExec,CollectTailExec,RangeExec
RangeExec,SampleExec

Functions

Type
Supported Functions
Generator Functions
explode,explode_outer,inline,inline_outer,posexplode,posexplode_outer,stack
Window Functions
cume_dist,dense_rank,lag,lead,nth_value,ntile,percent_rank,rank,row_number
Aggregate Functions
any,any_value,approx_count_distinct,approx_percentile,array_agg,avg,bit_and,bit_or,bit_xor,bool_and,bool_or,collect_list,collect_set,corr,count,count_if,covar_pop,covar_samp,every,first,first_value,grouping,grouping_id,kurtosis,last,last_value,max,max_by,mean,median,min,min_by,percentile,percentile_approx,regr_avgx,regr_avgy,regr_count,regr_intercept,regr_r2,regr_slope,regr_sxx,regr_sxy,regr_syy,skewness,some,std,stddev,stddev_pop,stddev_samp,sum,try_avg,try_sum,var_pop,var_samp,variance
Array Functions
array,array_append,array_compact,array_contains,array_distinct,array_except,array_insert,array_intersect,array_join,array_max,array_min,array_position,array_prepend,array_remove,array_repeat,array_union,arrays_overlap,arrays_zip,flatten,get,shuffle,slice,sort_array
Bitwise Functions
&,^,bit_count,bit_get,getbit,shiftright,|,~
Collection Functions
array_size,cardinality,concat,reverse,size
Conditional Functions
coalesce,if,ifnull,nanvl,nullif,nvl,nvl2,when
Conversion Functions
bigint,binary,boolean,cast,date,decimal,double,float,int,smallint,string,timestamp,tinyint
Date and Timestamp Functions
add_months,date_add,date_diff,date_format,date_from_unix_date,date_sub,date_trunc,dateadd,datediff,day,dayofmonth,dayofweek,dayofyear,extract,from_unixtime,from_utc_timestamp,hour,last_day,make_date,make_timestamp,make_ym_interval,minute,month,next_day,quarter,second,timestamp_micros,timestamp_millis,to_unix_timestamp,to_utc_timestamp,trunc,unix_date,unix_micros,unix_millis,unix_seconds,unix_timestamp,weekday,weekofyear,year
Hash Functions
crc32,hash,md5,sha,sha1,sha2,xxhash64
JSON Functions
from_json,get_json_object,json_array_length,json_object_keys,json_tuple,schema_of_json,to_json
Lambda Functions
aggregate,array_sort,exists,filter,forall,map_filter,map_zip_with,reduce,transform,transform_keys,transform_values,zip_with
Map Functions
element_at,map,map_concat,map_contains_key,map_entries,map_keys,map_values,str_to_map,try_element_at
Mathematical Functions
%,*,+,-,/,abs,acos,acosh,asin,asinh,atan,atan2,atanh,bin,cbrt,ceil,ceiling,conv,cos,cosh,cot,csc,degrees,e,exp,expm1,factorial,floor,greatest,hex,hypot,least,log,log10,log1p,log2,mod,negative,pi,pmod,positive,pow,power,rand,random,rint,round,sec,shiftleft,sign,signum,sinh,sqrt,try_add,unhex,width_bucket
Misc Functions
assert_true,equal_null,spark_partition_id,uuid,version,||
Predicate Functions
!,!=,<,<=,<=>,<>,=,==,>,>=,and,between,case,ilike,in,isnan,isnotnull,isnull,like,not,or,regexp,regexp_like
String Functions
ascii,base64,bit_length,btrim,char,char_length,character_length,chr,concat_ws,contains,endswith,find_in_set,format_number,format_string,initcap,instr,lcase,left,len,length,levenshtein,locate,lower,lpad,ltrim,luhn_check,mask,overlay,position,regexp_extract,regexp_extract_all,regexp_replace,repeat,replace,right,rpad,rtrim,soundex,split,split_part,startswith,substr,substring,substring_index,translate,trim,ucase,unbase64,upper
Struct Functions
named_struct,struct
URL Functions
url_decode,url_encode

Enabling Meson Acceleration

EMR-V3.7.0

To create an EMR-V3.7.0 Version Cluster, you can use the configuration management feature in the EMR Console to add the following configuration in the spark-defaults.conf configuration file to enable this feature:
Parameter
Description
spark.plugins
The plug-in used by Spark, set the parameter value to org.apache.gluten.GlutenPlugin (if spark.plugins is already configured, you can add org.apache.gluten.GlutenPlugin to it, use comma "," as separator).
spark.memory.offHeap.enabled
Set to true, Meson speed up requires the use of JVM off memory
spark.memory.offHeap.size
Set the offHeap Memory size according to actual conditions. For details, see recommended configurations for executor memory of varying specifications.
spark.shuffle.manager
The columnar shuffle manager used by Meson, set the parameter value to: org.apache.spark.shuffle.sort.ColumnarShuffleManager
Recommended memory configurations for Executors of varying specifications:
executor-cores
spark.executor.memory
spark.memory.offHeap.size
2
2GB
4GB
4
3GB
10GB
8
6GB
20GB

EMR-V3.6.1(beta)

To create an EMR-V3.6.1 Version Cluster, you can use the configuration management feature in the EMR Console to add the following configuration in the spark-defaults.conf configuration file to enable this feature:
Parameter
Description
spark.plugins
The plug-in used by Spark, set the parameter value to org.apache.gluten.GlutenPlugin (if spark.plugins is already configured, you can add org.apache.gluten.GlutenPlugin to it, use comma "," as separator).
spark.memory.offHeap.enabled
Set to true, Native speed up requires the use of JVM off memory
spark.memory.offHeap.size
Set the offHeap Memory size according to actual conditions. The initial size can be set to 1G.
spark.shuffle.manager
The columnar shuffle manager used by Meson, set the parameter value to: org.apache.spark.shuffle.sort.ColumnarShuffleManager
spark.driver.extraClassPath
The Gluten native jar used by Spark, the default path of the jar is /usr/local/service/spark/gluten
spark.executor.extraClassPath
The Gluten native jar used by Spark, with the default path at /usr/local/service/spark/gluten
spark.executorEnv.LIBHDFS3_CONF
Path of the integrated HDFS cluster configuration file, default at /usr/local/service/hadoop/etc/hadoop/hdfs-site.xml


ヘルプとサポート

この記事はお役に立ちましたか?

フィードバック