tencent cloud

Elastic MapReduce

Release Notes and Announcements
Release Notes
Announcements
Security Announcements
Product Introduction
Overview
Strengths
Architecture
Features
Use Cases
Constraints and Limits
Technical Support Scope
Product release
Purchase Guide
EMR on CVM Billing Instructions
EMR on TKE Billing Instructions
EMR Serverless HBase Billing Instructions
Getting Started
EMR on CVM Quick Start
EMR on TKE Quick Start
EMR on CVM Operation Guide
Planning Cluster
Administrative rights
Configuring Cluster
Managing Cluster
Managing Service
Monitoring and Alarms
TCInsight
EMR on TKE Operation Guide
Introduction to EMR on TKE
Configuring Cluster
Cluster Management
Service Management
Monitoring and Ops
Application Analysis
EMR Serverless HBase Operation Guide
EMR Serverless HBase Product Introduction
Quotas and Limits
Planning an Instance
Managing an Instance
Monitoring and Alarms
Development Guide
EMR Development Guide
Hadoop Development Guide
Spark Development Guide
Hbase Development Guide
Phoenix on Hbase Development Guide
Hive Development Guide
Presto Development Guide
Sqoop Development Guide
Hue Development Guide
Oozie Development Guide
Flume Development Guide
Kerberos Development Guide
Knox Development Guide
Alluxio Development Guide
Kylin Development Guide
Livy Development Guide
Kyuubi Development Guide
Zeppelin Development Guide
Hudi Development Guide
Superset Development Guide
Impala Development Guide
Druid Development Guide
TensorFlow Development Guide
Kudu Development Guide
Ranger Development Guide
Kafka Development Guide
Iceberg Development Guide
StarRocks Development Guide
Flink Development Guide
JupyterLab Development Guide
MLflow Development Guide
Practical Tutorial
Practice of EMR on CVM Ops
Data Migration
Practical Tutorial on Custom Scaling
API Documentation
History
Introduction
API Category
Cluster Resource Management APIs
Cluster Services APIs
User Management APIs
Data Inquiry APIs
Scaling APIs
Configuration APIs
Other APIs
Serverless HBase APIs
YARN Resource Scheduling APIs
Making API Requests
Data Types
Error Codes
FAQs
EMR on CVM
Service Level Agreement
Contact Us

SQL AI Tuning

PDF
フォーカスモード
フォントサイズ
最終更新日: 2025-09-02 17:19:50

Feature Introduction

SQL AI tuning optimizes query performance and enhances data processing efficiency using smart algorithms and machine learning technology. It provides tuning policies through a comprehensive analysis of global SQL data, including query plans, data distribution, and resource usage. Meanwhile, it offers optimization suggestions, including index creation and data skew handling by combining insights from SQL queries. In addition, AI tuning supports real-time monitoring and feedback to help users quickly locate issues and provide processing suggestions, ensuring the stability and reliability of big data tasks through efficient exception detection and effective tuning solutions.
Note:
1. Currently, only SQL tuning for the Spark engine is supported.
2. SQL AI tuning is an allowlist feature.

Operation Steps

1. Log in to EMR TCInsight > Exception Center, select TCInsight > Exception Center in the left sidebar on the console, and click to go to the Computing Insights > Spark Overview page.
2. On the Spark Overview > Task Leaderboard List Operations > Insight Suggestions page, you can view AI optimization suggestions for SQL.

Feature Description

SQL AI tuning is divided into hit item tuning and non-hit item tuning. Hit items provide precise SQL optimization suggestions based on exception items from query items (see the insight item list for details). Non-hit items offer non-targeted optimization suggestions from a general AI large model. It is recommended that users verify both types of suggestions according to actual scenarios before adoption as needed.
Note:
1. Hit item tuning: The SQL tuning system uses a general AI model that can address some abnormal scenarios related to insight items, providing optimization suggestions with high precision. It can be applied after verifying the tuning.
2. Non-hit item tuning: Supports AI optimization covering multi-dimensional data, including SQL, Schema, Profile, DAG, and Counters. However, the validity of suggestions is only 25%. It is recommended that users perform full verification according to their business background in practical applications before adoption as needed.
Hit item list description:
Computing Engine
Hit Insight Item
Description
Value Attribute
Spark
Improper partition conditions
Improper selection of partition fields (for example, high-frequency fields or uneven data distribution) causes data skew or full table scanning when querying.
Performance optimization
BroadcastJoin optimization
Identify improper use of BroadcastJoin (for example, improper threshold setting), which causes tablets to perform Shuffle Join with large tables, resulting in massive network transmission and computing overhead that significantly reduces query performance.
Performance optimization
Resource efficiency improvement
CPU resource wastage
Improper task parallelism settings (for example, too many or too few partitions), data skew, or redundant computing logic lead to underutilized or overconsumed CPU resources, resulting in resource wastage.
Performance optimization
Resource efficiency improvement
Cost saving
JOIN data inflation
Improper JOIN operations may cause data inflation, specifically: data skew, Cartesian product, and JOINs between large tables.
Performance optimization
Resource efficiency improvement
Cost saving
Data skew
Data skew (uneven data distribution) causes some partitions or tasks to process significantly larger data volumes than other partitions or tasks. Specific scenarios include JOIN key skew, GROUP BY key skew, input data skew, and Shuffle data skew.
Performance optimization
Resource efficiency improvement
Cost saving
Empty Task input
The input data of specific Tasks in the query job is empty.
Performance optimization
Resource efficiency improvement
Cost saving
ExecutorGC
Executor frequently triggers garbage collection (GC)
Performance optimization
Resource efficiency improvement
Stability enhancement
Full table scanning of partitioned tables
The query does not utilize the partition pruning feature of partitioned tables, resulting in full table scanning.
Performance optimization
Resource efficiency improvement
Cost saving
Global sorting
The query includes global sorting (for example, ORDER BY), leading to:
1. Excessive data shuffle, increasing network and disk I/O pressure.
2. Overloaded single-point sorting tasks, causing performance bottlenecks.
Performance optimization
Resource efficiency improvement
Cost saving
Run history analysis
Identify large fluctuations in duration, prolonged durations in certain Stages or Tasks, or other situations by analyzing the run history of queries.
Performance optimization
Stability enhancement
Resource efficiency improvement
Too many small input files
The input data for queries contains a large number of small files, which leads to excessive task scheduling overhead, increased metadata management pressure, and low data read efficiency.
Performance optimization
Resource efficiency improvement
Cost saving
Insufficient resources
Cluster resources (for example, CPU, memory, and disk) are insufficient to support query demands, resulting in slow task execution or failures.
Performance optimization
Stability enhancement
Excessive data scanning
Excessive data volume scanning during queries may occur due to failing to utilize the partition pruning or index, or insufficient data filter criteria.
Performance optimization
Cost saving
Peak memory exceeds limit
Memory usage exceeds the limit during query execution, resulting in OOM (memory overflow) or task failure.
Performance optimization
Stability enhancement
Memory resource wastage
Due to ineffective use of cached data, improper memory allocation, and partial Executor memory limitations, memory resources are underutilized.
Resource efficiency improvement
Cost saving
Too many small output files
Due to low storage efficiency, high NameNode load, and reduced subsequent read performance, a large number of small output files are generated from the query results.
Performance optimization
Cost saving
Abnormal resource overhead
The resource overhead (for example, CPU, memory, and disk) for queries is abnormally high due to data skew, complex computing logic, or improper resource allocation.
Performance optimization
Cost saving
Scheduling delay
Task scheduling delay may occur due to insufficient cluster resources, intense contention, or high scheduler load.
Performance optimization
Stability enhancement
Scheduling skew
Uneven task scheduling may be caused by uneven data distribution or improper scheduling policy, resulting in some nodes being overloaded.
Performance optimization
Resource efficiency improvement
ShuffleFailure
Shuffle failures can occur due to factors including excessive data volume, network or disk I/O exceeding the limit, insufficient memory, or OOM.
Performance optimization
Stability enhancement
Slow Tasks
Some Tasks may have prolonged execution time due to data skew, complex computing logic, or insufficient resource allocation.
Performance optimization
Resource efficiency improvement
Too many small input files
The input data for queries contains a large number of small files, which leads to the following scenarios:
1. Excessive task scheduling overhead.
2. Low data read efficiency.
Performance optimization
Cost saving
Too small Task input data
The input data volume for specific Tasks is too small for the following reasons:
1. High proportion of task scheduling overhead.
2. Low resource utilization.
Performance optimization
Resource efficiency improvement
StageScheduleDelay
Stage scheduling delay may occur due to incomplete preceding Stage, insufficient cluster resources, or intense contention.
Performance optimization
Stability enhancement

ヘルプとサポート

この記事はお役に立ちましたか?

フィードバック