tencent cloud

TDSQL Boundless

Release Notes
Product Introduction
Overview
Scenarios
Product Architecture
Instance Types
Compatibility Notes
Kernel Features
Kernel Overview
Kernel Version Release Notes
Functionality Features
Performance Features
Billing
Billing Overview
Purchase Method
Pricing Details
Renewal
Overdue Payments
Refund
Getting Started
Creating an Instance
Connect to Instances
User Guide
Data Migration
Data Subscription
Instance Management
Configuration Change
Parameter Configuration
Account Management
Security Group
Backup and Restoration
Database Auditing
Tag Management
Use Cases
Technical Evolution and Usage Practices of Online DDL
Lock Mechanism Analysis and Troubleshooting Practices
Data Intelligent Scheduling and Related Practices for Performance Optimization
TDSQL Boundless Selection Guide and Practical Tutorial
Developer Guide
Developer Guide (MySQL Compatibility Mode)
Developer Guide (HBase Compatibility Mode)
Performance Tuning
Performance Tuning Overview
SQL Tuning
DDL Tuning
Performance White Paper
Performance Overview
TPC-C Test
Sysbench Test
API Documentation
History
Introduction
API Category
Making API Requests
Instance APIs
Security Group APIs
Task APIs
Backup APIs
Rollback APIs
Parameter APIs
Database APIs
Data Types
Error Codes
General Reference
System Architecture
SQL Reference
Database Parameter Description
TPC-H benchmark data model reference
Error Code Information
Security and Compliance
FAQs
Agreements
Service Level Agreement
Terms of Service
Privacy Policy
Data Processing And Security Agreement
Contact Us
Glossary

Multi-table join query

PDF
포커스 모드
폰트 크기
마지막 업데이트 시간: 2026-03-26 14:55:17
This document describes how to perform multi-table join queries using JOIN statements in TDSQL Boundless databases, including the usage of various join types and join algorithms.

Before You Begin

Before a single-table query is performed, ensure that:
1. TDSQL Boundless instance has been created.
2. Test data has been imported: This document uses the TPC-H benchmarking dataset as an example. For details, see TPC-H Benchmark Data Model Reference.
3. Database connection has been established: Connect to TDSQL Boundless via MySQL client or application.

JOIN types

inner join (INNER JOIN)

Inner joins only return rows from both tables that simultaneously meet the join condition. If no matching row exists in either table, that row will not appear in the result.
Query the customer name corresponding to each order:
SELECT
o.o_orderkey,
o.o_orderdate,
o.o_totalprice,
c.c_name,
c.c_mktsegment
FROM orders o
INNER JOIN customer c ON o.o_custkey = c.c_custkey;
In this query, orders will be returned only if corresponding customers exist in the customer table.
Multi-table Inner Join - Query order details and their belonging order and customer information:
SELECT
c.c_name,
o.o_orderkey,
o.o_orderdate,
l.l_partkey,
l.l_quantity,
l.l_extendedprice
FROM customer c
INNER JOIN orders o ON c.c_custkey = o.o_custkey
INNER JOIN lineitem l ON o.o_orderkey = l.l_orderkey
WHERE o.o_orderdate BETWEEN '1995-01-01' AND '1995-03-31'
ORDER BY l.l_extendedprice DESC
LIMIT 20;
Note:
Since INNER JOIN is used, customers without orders will not appear in the result set. To include customers without orders, LEFT JOIN should be used.

Left Outer Join (LEFT JOIN)

Left outer joins return all rows from the left table. For rows with no matching row in the right table, columns from the right table are populated with NULL.
Query all customers and their order counts (including those who have not placed orders):
SELECT
c.c_custkey,
c.c_name,
c.c_mktsegment,
COUNT(o.o_orderkey) AS order_count
FROM customer c
LEFT JOIN orders o ON c.c_custkey = o.o_custkey
GROUP BY c.c_custkey, c.c_name, c.c_mktsegment
ORDER BY order_count ASC
LIMIT 20;
Unlike INNER JOIN, even if a customer has no records in the orders table, that customer will still appear in the results with order_count being 0.
Using LEFT JOIN to find unmatched rows - Query customers who have not placed orders:
SELECT c.c_custkey, c.c_name, c.c_phone
FROM customer c
LEFT JOIN orders o ON c.c_custkey = o.o_custkey
WHERE o.o_orderkey IS NULL;
This syntax is equivalent to the NOT EXISTS subquery but offers higher execution efficiency in certain scenarios:
-- Equivalent syntax
SELECT c_custkey, c_name, c_phone
FROM customer c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.o_custkey = c.c_custkey
);

Right Outer Join (RIGHT JOIN)

Right outer joins return all rows from the right table. For rows with no matching row in the left table, columns from the left table are populated with NULL. Its logic is symmetric to LEFT JOIN.
Query all countries and their customer counts:
SELECT
n.n_name AS nation_name,
COUNT(c.c_custkey) AS customer_count
FROM customer c
RIGHT JOIN nation n ON c.c_nationkey = n.n_nationkey
GROUP BY n.n_name
ORDER BY customer_count DESC;
Practical Tutorial: In actual development, it is recommended to consistently use LEFT JOIN instead of RIGHT JOIN by simply swapping the table order. This maintains SQL style consistency and enhances readability.
The above query is equivalent to:
SELECT
n.n_name AS nation_name,
COUNT(c.c_custkey) AS customer_count
FROM nation n
LEFT JOIN customer c ON c.c_nationkey = n.n_nationkey
GROUP BY n.n_name
ORDER BY customer_count DESC;

Cross join (CROSS JOIN)

Cross joins return the Cartesian product of both tables, meaning every row from the left table is combined with every row from the right table. If the left table has m rows and the right table has n rows, the result set will contain m * n rows.
Query all region and market segment combinations:
SELECT
r.r_name AS region_name,
seg.segment_name
FROM region r
CROSS JOIN (
SELECT DISTINCT c_mktsegment AS segment_name
FROM customer
) seg
ORDER BY r.r_name, seg.segment_name;
Note:
Cross joins can produce very large result sets, which should be used with caution in production environments. They are typically reserved for specific scenarios such as generating dimension combinations.

Left semi join (LEFT SEMI JOIN)

Left semi joins are used to determine whether rows in the left table have matching rows in the right table. They return only columns from the left table, with each row appearing at most once. The TDSQL syntax does not directly support LEFT SEMI JOIN; it is typically expressed through EXISTS or IN subqueries.
Query customers with orders:
SELECT c.c_custkey, c.c_name, c.c_acctbal
FROM customer c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.o_custkey = c.c_custkey
);
Equivalent IN syntax:
SELECT c_custkey, c_name, c_acctbal
FROM customer
WHERE c_custkey IN (
SELECT o_custkey FROM orders
);
Unlike INNER JOIN, left semi joins do not produce duplicate rows due to multiple matches in the right table. For example, if a customer has 10 orders, INNER JOIN returns 10 rows, whereas left semi join returns only 1 row.

Implicit joins

Instead of using the JOIN keyword, list multiple tables in the FROM clause and specify join conditions in the WHERE clause. Implicit joins behave equivalently to INNER JOIN.
SELECT
c.c_name,
o.o_orderkey,
o.o_totalprice
FROM customer c, orders o
WHERE c.c_custkey = o.o_custkey
AND o.o_orderdate >= '1995-01-01'
ORDER BY o.o_totalprice DESC
LIMIT 10;
Hands-on Tutorial: It is recommended to use explicit JOIN ... ON syntax instead of implicit joins. Explicit syntax separates join conditions from filter conditions, making the SQL intent clearer and easier to maintain.

JOIN-related algorithms

The TDSQL Boundless optimizer automatically selects the optimal join algorithm based on cost-based evaluation. Currently, it supports two core algorithms: Nested Loop Join and Hash Join.

Nested Loop Join

Nested Loop Join (nested loop join) is the most fundamental join algorithm. Its principle is to traverse each row of the outer table (driving table) and then search for matching rows in the inner table.
TDSQL Boundless executes Nested Loop Join using the Batched Key Access (BKA) approach by default. The core optimization of BKA is: to collect key values from multiple rows of the outer table in batches and send them to the inner table for lookup in a single operation, rather than sending them row by row. This batch processing method offers two significant advantages:
1. Reduce RPC Frequency: Consolidating multiple single-row lookups into a single batch request significantly reduces network overhead.
2. Improve Storage Engine Read Efficiency: After a batch of key values is received, the inner table performs batch index lookups and table lookups, thus improving storage engine read efficiency.
Applicable Scenarios:
The inner table has an index on the join column.
The outer table has a relatively small data volume, or the number of rows is reduced after WHERE filtering.
Good index selectivity (meaning the index can quickly locate a small number of rows).
Example:
-- A typical Nested Loop Join scenario
-- orders as the outer table, customer as the inner table via primary key c_custkey
SELECT
o.o_orderkey,
o.o_totalprice,
c.c_name
FROM orders o
INNER JOIN customer c ON o.o_custkey = c.c_custkey
WHERE o.o_orderdate = '1995-03-15';
In this query, the orders table, after being filtered by o_orderdate, has a small number of rows and serves as the driving table; the customer table, with c_custkey as the primary key, acts as the driven table. BKA will send the filtered o_custkey values in batches and look up matching rows in the customer table in a single operation.

Hash Join

Hash Join (hash join) works by: first loading the smaller table (build table) into memory to construct a hash table, then traversing the larger table (probe table) and using a hash function to quickly find matching rows for each row.
Applicable Scenarios:
There is no appropriate index on the join column.
Equijoin (= condition)
Example:
-- A typical Hash Join scenario
-- Equijoin of two large tables with no index on the join columns
SELECT
l.l_orderkey,
l.l_extendedprice,
l.l_discount,
p.p_name,
p.p_type
FROM lineitem l
INNER JOIN part p ON l.l_partkey = p.p_partkey
WHERE p.p_type LIKE '%BRASS%';

Algorithm Selection

TDSQL Boundless optimizer automatically selects join algorithms based on a cost model, primarily considering the following factors:
Factor
Nested Loop Join (BKA)
Hash Join
Index Dependency
There needs to be an index on the join column of the inner table.
Does not rely on indexes
Data Volume
Suitable for scenarios where the outer table is small and the inner table is large.
Suitable for scenarios where both tables are large.
Memory Consumption
Relatively low, processed in batches.
Relatively high, requires building a hash table in memory.
Connection Type
Equijoin and non-equijoin
Supports equijoin only.
Through EXPLAIN you can view the join algorithm selected by the optimizer:
EXPLAIN SELECT
o.o_orderkey, c.c_name
FROM orders o
INNER JOIN customer c ON o.o_custkey = c.c_custkey
WHERE o.o_orderdate = '1995-03-15';

JOIN order

In multi-table join queries, the join order of tables significantly impacts query performance. The TDSQL Boundless optimizer automatically evaluates different join sequences and selects the execution plan with the lowest cost.

Automatic optimization

When generating execution plans, the optimizer comprehensively considers factors such as table size, index status, and selectivity of filter conditions, automatically adjusting the join order:
-- The optimizer will automatically determine the join order for customer, orders, lineitem
SELECT
c.c_name,
o.o_orderdate,
l.l_extendedprice
FROM customer c
INNER JOIN orders o ON c.c_custkey = o.o_custkey
INNER JOIN lineitem l ON o.o_orderkey = l.l_orderkey
WHERE c.c_mktsegment = 'BUILDING'
AND o.o_orderdate < '1995-03-15'
AND l.l_shipdate > '1995-03-15';

Use Hint to specify the join order

When the join order selected by the optimizer is not optimal, you can use Hint to manually intervene in the join order.
JOIN_ORDER: Specifies the complete join order for all tables:
SELECT /*+ JOIN_ORDER(o, c, l) */
c.c_name,
o.o_orderdate,
l.l_extendedprice
FROM customer c
INNER JOIN orders o ON c.c_custkey = o.o_custkey
INNER JOIN lineitem l ON o.o_orderkey = l.l_orderkey
WHERE c.c_mktsegment = 'BUILDING'
AND o.o_orderdate < '1995-03-15'
AND l.l_shipdate > '1995-03-15';
JOIN_ORDER(o, c, l) forces the optimizer to strictly follow the orders > customer > lineitem sequence for joins, ignoring the table order specified in the SQL.
JOIN_PREFIX: Specifies only the first few tables in the join order, with the optimizer automatically determining the remaining tables:
SELECT /*+ JOIN_PREFIX(o, c) */
c.c_name,
o.o_orderdate,
l.l_extendedprice
FROM customer c
INNER JOIN orders o ON c.c_custkey = o.o_custkey
INNER JOIN lineitem l ON o.o_orderkey = l.l_orderkey
WHERE c.c_mktsegment = 'BUILDING'
AND o.o_orderdate < '1995-03-15'
AND l.l_shipdate > '1995-03-15';
JOIN_PREFIX(o, c) forces the join order of the first two tables to be orders > customer, while the join position of lineitem is chosen by the optimizer itself. When only needing to control the order of some key tables and leaving the rest to the optimizer for optimization, JOIN_PREFIX is more flexible than JOIN_ORDER.

도움말 및 지원

문제 해결에 도움이 되었나요?

피드백