Overview
Scenarios
Product Architecture
Instance Types
Compatibility Notes
JOIN statements in TDSQL Boundless databases, including the usage of various join types and join algorithms.SELECTo.o_orderkey,o.o_orderdate,o.o_totalprice,c.c_name,c.c_mktsegmentFROM orders oINNER JOIN customer c ON o.o_custkey = c.c_custkey;
customer table.SELECTc.c_name,o.o_orderkey,o.o_orderdate,l.l_partkey,l.l_quantity,l.l_extendedpriceFROM customer cINNER JOIN orders o ON c.c_custkey = o.o_custkeyINNER JOIN lineitem l ON o.o_orderkey = l.l_orderkeyWHERE o.o_orderdate BETWEEN '1995-01-01' AND '1995-03-31'ORDER BY l.l_extendedprice DESCLIMIT 20;
INNER JOIN is used, customers without orders will not appear in the result set. To include customers without orders, LEFT JOIN should be used.NULL.SELECTc.c_custkey,c.c_name,c.c_mktsegment,COUNT(o.o_orderkey) AS order_countFROM customer cLEFT JOIN orders o ON c.c_custkey = o.o_custkeyGROUP BY c.c_custkey, c.c_name, c.c_mktsegmentORDER BY order_count ASCLIMIT 20;
INNER JOIN, even if a customer has no records in the orders table, that customer will still appear in the results with order_count being 0.SELECT c.c_custkey, c.c_name, c.c_phoneFROM customer cLEFT JOIN orders o ON c.c_custkey = o.o_custkeyWHERE o.o_orderkey IS NULL;
NOT EXISTS subquery but offers higher execution efficiency in certain scenarios:-- Equivalent syntaxSELECT c_custkey, c_name, c_phoneFROM customer cWHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.o_custkey = c.c_custkey);
NULL. Its logic is symmetric to LEFT JOIN.SELECTn.n_name AS nation_name,COUNT(c.c_custkey) AS customer_countFROM customer cRIGHT JOIN nation n ON c.c_nationkey = n.n_nationkeyGROUP BY n.n_nameORDER BY customer_count DESC;
LEFT JOIN instead of RIGHT JOIN by simply swapping the table order. This maintains SQL style consistency and enhances readability.SELECTn.n_name AS nation_name,COUNT(c.c_custkey) AS customer_countFROM nation nLEFT JOIN customer c ON c.c_nationkey = n.n_nationkeyGROUP BY n.n_nameORDER BY customer_count DESC;
SELECTr.r_name AS region_name,seg.segment_nameFROM region rCROSS JOIN (SELECT DISTINCT c_mktsegment AS segment_nameFROM customer) segORDER BY r.r_name, seg.segment_name;
EXISTS or IN subqueries.SELECT c.c_custkey, c.c_name, c.c_acctbalFROM customer cWHERE EXISTS (SELECT 1 FROM orders oWHERE o.o_custkey = c.c_custkey);
IN syntax:SELECT c_custkey, c_name, c_acctbalFROM customerWHERE c_custkey IN (SELECT o_custkey FROM orders);
INNER JOIN, left semi joins do not produce duplicate rows due to multiple matches in the right table. For example, if a customer has 10 orders, INNER JOIN returns 10 rows, whereas left semi join returns only 1 row.JOIN keyword, list multiple tables in the FROM clause and specify join conditions in the WHERE clause. Implicit joins behave equivalently to INNER JOIN.SELECTc.c_name,o.o_orderkey,o.o_totalpriceFROM customer c, orders oWHERE c.c_custkey = o.o_custkeyAND o.o_orderdate >= '1995-01-01'ORDER BY o.o_totalprice DESCLIMIT 10;
JOIN ... ON syntax instead of implicit joins. Explicit syntax separates join conditions from filter conditions, making the SQL intent clearer and easier to maintain.-- A typical Nested Loop Join scenario-- orders as the outer table, customer as the inner table via primary key c_custkeySELECTo.o_orderkey,o.o_totalprice,c.c_nameFROM orders oINNER JOIN customer c ON o.o_custkey = c.c_custkeyWHERE o.o_orderdate = '1995-03-15';
orders table, after being filtered by o_orderdate, has a small number of rows and serves as the driving table; the customer table, with c_custkey as the primary key, acts as the driven table. BKA will send the filtered o_custkey values in batches and look up matching rows in the customer table in a single operation.= condition)-- A typical Hash Join scenario-- Equijoin of two large tables with no index on the join columnsSELECTl.l_orderkey,l.l_extendedprice,l.l_discount,p.p_name,p.p_typeFROM lineitem lINNER JOIN part p ON l.l_partkey = p.p_partkeyWHERE p.p_type LIKE '%BRASS%';
Factor | Nested Loop Join (BKA) | Hash Join |
Index Dependency | There needs to be an index on the join column of the inner table. | Does not rely on indexes |
Data Volume | Suitable for scenarios where the outer table is small and the inner table is large. | Suitable for scenarios where both tables are large. |
Memory Consumption | Relatively low, processed in batches. | Relatively high, requires building a hash table in memory. |
Connection Type | Equijoin and non-equijoin | Supports equijoin only. |
EXPLAIN you can view the join algorithm selected by the optimizer:EXPLAIN SELECTo.o_orderkey, c.c_nameFROM orders oINNER JOIN customer c ON o.o_custkey = c.c_custkeyWHERE o.o_orderdate = '1995-03-15';
-- The optimizer will automatically determine the join order for customer, orders, lineitemSELECTc.c_name,o.o_orderdate,l.l_extendedpriceFROM customer cINNER JOIN orders o ON c.c_custkey = o.o_custkeyINNER JOIN lineitem l ON o.o_orderkey = l.l_orderkeyWHERE c.c_mktsegment = 'BUILDING'AND o.o_orderdate < '1995-03-15'AND l.l_shipdate > '1995-03-15';
SELECT /*+ JOIN_ORDER(o, c, l) */c.c_name,o.o_orderdate,l.l_extendedpriceFROM customer cINNER JOIN orders o ON c.c_custkey = o.o_custkeyINNER JOIN lineitem l ON o.o_orderkey = l.l_orderkeyWHERE c.c_mktsegment = 'BUILDING'AND o.o_orderdate < '1995-03-15'AND l.l_shipdate > '1995-03-15';
JOIN_ORDER(o, c, l) forces the optimizer to strictly follow the orders > customer > lineitem sequence for joins, ignoring the table order specified in the SQL.SELECT /*+ JOIN_PREFIX(o, c) */c.c_name,o.o_orderdate,l.l_extendedpriceFROM customer cINNER JOIN orders o ON c.c_custkey = o.o_custkeyINNER JOIN lineitem l ON o.o_orderkey = l.l_orderkeyWHERE c.c_mktsegment = 'BUILDING'AND o.o_orderdate < '1995-03-15'AND l.l_shipdate > '1995-03-15';
피드백