Release Notes and Announcements
- Release Notes
- Announcements
- Security Announcements
Product Introduction
- Overview
- Strengths
- Architecture
- Features
- Use Cases
- Constraints and Limits
- Technical Support Scope
- Product release
Purchase Guide
- EMR on CVM Billing Instructions
- EMR on TKE Billing Instructions
- EMR Serverless HBase Billing Instructions
- EMR Serverless TCBase Billing Overview
Getting Started
- EMR on CVM Quick Start
- EMR on TKE Quick Start
EMR on CVM Operation Guide
- Planning Cluster
- Administrative rights
- Configuring Cluster
- Managing Cluster
- Managing Service
- Monitoring and Alarms
- TCInsight
EMR on TKE Operation Guide
- Introduction to EMR on TKE
- Configuring Cluster
- Cluster Management
- Service Management
- Monitoring and Ops
- Application Analysis
EMR Serverless HBase Operation Guide
- EMR Serverless HBase Product Introduction
- Quotas and Limits
- Planning an Instance
- Managing an Instance
- Monitoring and Alarms
- Development Guide
EMR Serverless TCBase Operation Guide
- Introduction to EMR Serverless TCBase
- Managing Instances
- Managing Services
- Monitoring and Alarms
EMR Development Guide
- Hadoop Development Guide
- Spark Development Guide
- Hbase Development Guide
- Phoenix on Hbase Development Guide
- Hive Development Guide
- Presto Development Guide
- Sqoop Development Guide
- Hue Development Guide
- Oozie Development Guide
- Flume Development Guide
- Kerberos Development Guide
- Knox Development Guide
- Alluxio Development Guide
- Kylin Development Guide
- Livy Development Guide
- Kyuubi Development Guide
- Zeppelin Development Guide
- Hudi Development Guide
- Superset Development Guide
- Impala Development Guide
- Druid Development Guide
- TensorFlow Development Guide
- Kudu Development Guide
- Ranger Development Guide
- Kafka Development Guide
- StarRocks Development Guide
- Flink Development Guide
- JupyterLab Development Guide
- MLflow Development Guide
Practical Tutorial
- Practice of EMR on CVM Ops
- Data Migration
- Practical Tutorial on Custom Scaling
API Documentation
- History
- Introduction
- API Category
- Making API Requests
- Cluster Resource Management APIs
- Cluster Services APIs
- User Management APIs
- Information Query APIs
- Scaling APIs
- Configuration APIs
- Other APIs
- Cluster Lifecycle APIs
- Serverless HBase APIs
- YARN Resource Scheduling APIs
- Data Types
- Error Codes
FAQs
- EMR on CVM
Service Level Agreement
Contact Us

SQL AI Tuning

Download

フォーカスモード

フォントサイズ

最終更新日: 2025-09-02 17:19:50

Feature Introduction 
SQL AI tuning optimizes query performance and enhances data processing efficiency using smart algorithms and machine learning technology. It provides tuning policies through a comprehensive analysis of global SQL data, including query plans, data distribution, and resource usage. Meanwhile, it offers optimization suggestions, including index creation and data skew handling by combining insights from SQL queries. In addition, AI tuning supports real-time monitoring and feedback to help users quickly locate issues and provide processing suggestions, ensuring the stability and reliability of big data tasks through efficient exception detection and effective tuning solutions.
Note:
1. Currently, only SQL tuning for the Spark engine is supported.
2. SQL AI tuning is an allowlist feature.
Operation Steps
1. Log in to EMR TCInsight > Exception Center, select TCInsight > Exception Center in the left sidebar on the console, and click to go to the Computing Insights > Spark Overview page.
2. On the Spark Overview > Task Leaderboard List Operations > Insight Suggestions page, you can view AI optimization suggestions for SQL.
Feature Description
SQL AI tuning is divided into hit item tuning and non-hit item tuning. Hit items provide precise SQL optimization suggestions based on exception items from query items (see the insight item list for details). Non-hit items offer non-targeted optimization suggestions from a general AI large model. It is recommended that users verify both types of suggestions according to actual scenarios before adoption as needed.
Note:
1. Hit item tuning: The SQL tuning system uses a general AI model that can address some abnormal scenarios related to insight items, providing optimization suggestions with high precision. It can be applied after verifying the tuning.
2. Non-hit item tuning: Supports AI optimization covering multi-dimensional data, including SQL, Schema, Profile, DAG, and Counters. However, the validity of suggestions is only 25%. It is recommended that users perform full verification according to their business background in practical applications before adoption as needed.
Hit item list description:
Computing Engine
Hit Insight Item
Description
Value Attribute
Spark
Improper partition conditions
Improper selection of partition fields (for example, high-frequency fields or uneven data distribution) causes data skew or full table scanning when querying.
Performance optimization
﻿
BroadcastJoin optimization
Identify improper use of BroadcastJoin (for example, improper threshold setting), which causes tablets to perform Shuffle Join with large tables, resulting in massive network transmission and computing overhead that significantly reduces query performance.
Performance optimization
Resource efficiency improvement
﻿
CPU resource wastage
Improper task parallelism settings (for example, too many or too few partitions), data skew, or redundant computing logic lead to underutilized or overconsumed CPU resources, resulting in resource wastage.
Performance optimization
Resource efficiency improvement
Cost saving
﻿
JOIN data inflation
Improper JOIN operations may cause data inflation, specifically: data skew, Cartesian product, and JOINs between large tables.
Performance optimization
Resource efficiency improvement
Cost saving
﻿
Data skew
Data skew (uneven data distribution) causes some partitions or tasks to process significantly larger data volumes than other partitions or tasks. Specific scenarios include JOIN key skew, GROUP BY key skew, input data skew, and Shuffle data skew.
Performance optimization
Resource efficiency improvement
Cost saving
﻿
Empty Task input
The input data of specific Tasks in the query job is empty.
Performance optimization
Resource efficiency improvement
Cost saving
﻿
ExecutorGC
Executor frequently triggers garbage collection (GC)
Performance optimization
Resource efficiency improvement
Stability enhancement
﻿
Full table scanning of partitioned tables
The query does not utilize the partition pruning feature of partitioned tables, resulting in full table scanning.
Performance optimization
Resource efficiency improvement
Cost saving
﻿
Global sorting
The query includes global sorting (for example, ORDER BY), leading to:
1. Excessive data shuffle, increasing network and disk I/O pressure.
2. Overloaded single-point sorting tasks, causing performance bottlenecks.
Performance optimization
Resource efficiency improvement
Cost saving
﻿
Run history analysis
Identify large fluctuations in duration, prolonged durations in certain Stages or Tasks, or other situations by analyzing the run history of queries.
Performance optimization
Stability enhancement
Resource efficiency improvement
﻿
Too many small input files
The input data for queries contains a large number of small files, which leads to excessive task scheduling overhead, increased metadata management pressure, and low data read efficiency.
Performance optimization
Resource efficiency improvement
Cost saving
﻿
Insufficient resources
Cluster resources (for example, CPU, memory, and disk) are insufficient to support query demands, resulting in slow task execution or failures.
Performance optimization
Stability enhancement
﻿
Excessive data scanning
Excessive data volume scanning during queries may occur due to failing to utilize the partition pruning or index, or insufficient data filter criteria.
Performance optimization
Cost saving
﻿
Peak memory exceeds limit
Memory usage exceeds the limit during query execution, resulting in OOM (memory overflow) or task failure.
Performance optimization
Stability enhancement
﻿
Memory resource wastage
Due to ineffective use of cached data, improper memory allocation, and partial Executor memory limitations, memory resources are underutilized.
Resource efficiency improvement
Cost saving
﻿
Too many small output files
Due to low storage efficiency, high NameNode load, and reduced subsequent read performance, a large number of small output files are generated from the query results.
Performance optimization
Cost saving
﻿
Abnormal resource overhead
The resource overhead (for example, CPU, memory, and disk) for queries is abnormally high due to data skew, complex computing logic, or improper resource allocation.
Performance optimization
Cost saving
﻿
Scheduling delay
Task scheduling delay may occur due to insufficient cluster resources, intense contention, or high scheduler load.
Performance optimization
Stability enhancement
﻿
Scheduling skew
Uneven task scheduling may be caused by uneven data distribution or improper scheduling policy, resulting in some nodes being overloaded.
Performance optimization
Resource efficiency improvement
﻿
ShuffleFailure
Shuffle failures can occur due to factors including excessive data volume, network or disk I/O exceeding the limit, insufficient memory, or OOM.
Performance optimization
Stability enhancement
﻿
Slow Tasks
Some Tasks may have prolonged execution time due to data skew, complex computing logic, or insufficient resource allocation.
Performance optimization
Resource efficiency improvement
﻿
Too many small input files
The input data for queries contains a large number of small files, which leads to the following scenarios:
1. Excessive task scheduling overhead.
2. Low data read efficiency.
Performance optimization
Cost saving
﻿
Too small Task input data
The input data volume for specific Tasks is too small for the following reasons:
1. High proportion of task scheduling overhead.
2. Low resource utilization.
Performance optimization
Resource efficiency improvement
﻿
StageScheduleDelay
Stage scheduling delay may occur due to incomplete preceding Stage, insufficient cluster resources, or intense contention.
Performance optimization
Stability enhancement

ヘルプとサポート

この記事はお役に立ちましたか？

営業担当者にお問い合わせいただくかチケットを提出してサポートを求めることができます。

フィードバック

tencent cloud

Elastic MapReduce

SQL AI Tuning

Feature Introduction

Operation Steps

Feature Description

ヘルプとサポート

Computing Engine	Hit Insight Item	Description	Value Attribute
Spark	Improper partition conditions	Improper selection of partition fields (for example, high-frequency fields or uneven data distribution) causes data skew or full table scanning when querying.	Performance optimization
		BroadcastJoin optimization	Identify improper use of BroadcastJoin (for example, improper threshold setting), which causes tablets to perform Shuffle Join with large tables, resulting in massive network transmission and computing overhead that significantly reduces query performance.	Performance optimization Resource efficiency improvement
		CPU resource wastage	Improper task parallelism settings (for example, too many or too few partitions), data skew, or redundant computing logic lead to underutilized or overconsumed CPU resources, resulting in resource wastage.	Performance optimization Resource efficiency improvement Cost saving
		JOIN data inflation	Improper JOIN operations may cause data inflation, specifically: data skew, Cartesian product, and JOINs between large tables.	Performance optimization Resource efficiency improvement Cost saving
		Data skew	Data skew (uneven data distribution) causes some partitions or tasks to process significantly larger data volumes than other partitions or tasks. Specific scenarios include JOIN key skew, GROUP BY key skew, input data skew, and Shuffle data skew.	Performance optimization Resource efficiency improvement Cost saving
		Empty Task input	The input data of specific Tasks in the query job is empty.	Performance optimization Resource efficiency improvement Cost saving
		ExecutorGC	Executor frequently triggers garbage collection (GC)	Performance optimization Resource efficiency improvement Stability enhancement
		Full table scanning of partitioned tables	The query does not utilize the partition pruning feature of partitioned tables, resulting in full table scanning.	Performance optimization Resource efficiency improvement Cost saving
		Global sorting	The query includes global sorting (for example, ORDER BY), leading to: 1. Excessive data shuffle, increasing network and disk I/O pressure. 2. Overloaded single-point sorting tasks, causing performance bottlenecks.	Performance optimization Resource efficiency improvement Cost saving
		Run history analysis	Identify large fluctuations in duration, prolonged durations in certain Stages or Tasks, or other situations by analyzing the run history of queries.	Performance optimization Stability enhancement Resource efficiency improvement
		Too many small input files	The input data for queries contains a large number of small files, which leads to excessive task scheduling overhead, increased metadata management pressure, and low data read efficiency.	Performance optimization Resource efficiency improvement Cost saving
		Insufficient resources	Cluster resources (for example, CPU, memory, and disk) are insufficient to support query demands, resulting in slow task execution or failures.	Performance optimization Stability enhancement
		Excessive data scanning	Excessive data volume scanning during queries may occur due to failing to utilize the partition pruning or index, or insufficient data filter criteria.	Performance optimization Cost saving
		Peak memory exceeds limit	Memory usage exceeds the limit during query execution, resulting in OOM (memory overflow) or task failure.	Performance optimization Stability enhancement
		Memory resource wastage	Due to ineffective use of cached data, improper memory allocation, and partial Executor memory limitations, memory resources are underutilized.	Resource efficiency improvement Cost saving
		Too many small output files	Due to low storage efficiency, high NameNode load, and reduced subsequent read performance, a large number of small output files are generated from the query results.	Performance optimization Cost saving
		Abnormal resource overhead	The resource overhead (for example, CPU, memory, and disk) for queries is abnormally high due to data skew, complex computing logic, or improper resource allocation.	Performance optimization Cost saving
		Scheduling delay	Task scheduling delay may occur due to insufficient cluster resources, intense contention, or high scheduler load.	Performance optimization Stability enhancement
		Scheduling skew	Uneven task scheduling may be caused by uneven data distribution or improper scheduling policy, resulting in some nodes being overloaded.	Performance optimization Resource efficiency improvement
		ShuffleFailure	Shuffle failures can occur due to factors including excessive data volume, network or disk I/O exceeding the limit, insufficient memory, or OOM.	Performance optimization Stability enhancement
		Slow Tasks	Some Tasks may have prolonged execution time due to data skew, complex computing logic, or insufficient resource allocation.	Performance optimization Resource efficiency improvement
		Too many small input files	The input data for queries contains a large number of small files, which leads to the following scenarios: 1. Excessive task scheduling overhead. 2. Low data read efficiency.	Performance optimization Cost saving
		Too small Task input data	The input data volume for specific Tasks is too small for the following reasons: 1. High proportion of task scheduling overhead. 2. Low resource utilization.	Performance optimization Resource efficiency improvement
		StageScheduleDelay	Stage scheduling delay may occur due to incomplete preceding Stage, insufficient cluster resources, or intense contention.	Performance optimization Stability enhancement