tencent cloud

Data Lake Compute

Release Notes
Product Introduction
Overview
Strengths
Use Cases
Purchase Guide
Billing Overview
Refund
Payment Overdue
Configuration Adjustment Fees
Getting Started
Complete Process for New User Activation
DLC Data Import Guide
Quick Start with Data Analytics in Data Lake Compute
Quick Start with Permission Management in Data Lake Compute
Quick Start with Partition Table
Enabling Data Optimization
Cross-Source Analysis of EMR Hive Data
Standard Engine Configuration Guide
Configuring Data Access Policy
Operation Guide
Console Operation Introduction
Development Guide
Runtime Environment
SparkJar Job Development Guide
PySpark Job Development Guide
Query Performance Optimization Guide
UDF Function Development Guide
System Restraints
Client Access
JDBC Access
TDLC Command Line Interface Tool Access
Third-party Software Linkage
Python Access
Practical Tutorial
Accessing DLC Data with Power BI
Table Creation Practice
Using Apache Airflow to Schedule DLC Engine to Submit Tasks
Direct Query of DLC Internal Storage with StarRocks
Spark cost optimization practice
DATA + AI
Using DLC to Analyze CLS Logs
Using Role SSO to Access DLC
Resource-Level Authentication Guide
Implementing Tencent Cloud TCHouse-D Read and Write Operations in DLC
DLC Native Table
SQL Statement
SuperSQL Statement
Overview of Standard Spark Statement
Overview of Standard Presto Statement
Reserved Words
API Documentation
History
Introduction
API Category
Making API Requests
Data Table APIs
Task APIs
Metadata APIs
Service Configuration APIs
Permission Management APIs
Database APIs
Data Source Connection APIs
Data Optimization APIs
Data Engine APIs
Resource Group for the Standard Engine APIs
Data Types
Error Codes
General Reference
Error Codes
Quotas and limits
Operation Guide on Connecting Third-Party Software to DLC
FAQs
FAQs on Permissions
FAQs on Engines
FAQs on Features
FAQs on Spark Jobs
DLC Policy
Privacy Policy
Data Privacy And Security Agreement
Service Level Agreement
Contact Us

Window Functions

PDF
Focus Mode
Font Size
Last updated: 2024-08-07 17:33:25

row_number

Function statement:
row_number()
Supported engines: SparkSQL and Presto
Usage instructions: Assign a unique consecutive number to each row.
Return type: int
Example:

SELECT a, b, row_number() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 2
A1 2 3
A2 3 1

rank

Function statement:
rank()
Supported engines: SparkSQL and Presto
Usage instructions: Compute the rank of a value within a group of values. If there are ties, leaves gaps in the ranking sequence.
Return type: int
Example:
SELECT a, b, rank() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1 1 1
A1 1 1
A1 2 3
A2 3 1

dense_rank

Function statement:
dense_rank()
Supported engines: SparkSQL and Presto
Usage instructions: Compute the rank of a value within a group of values. Unlike the rank function, dense_rank does not leave gaps in the ranking sequence in the case of ties.
Return type: int
Example:
SELECT a, b, dense_rank() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1 1 1
A1 1 1
A1 2 2
A2 3 1

percent_rank()

Function statement:
percent_rank()

Supported engines: SparkSQL and Presto
Usage instructions: Compute the percentile rank of a value within a group of values. The return value is a decimal between 0 and 1.
Return type: double
Example:
SELECT a, b, percent_rank() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1 1 0.0
A1 1 0.0
A1 2 1.0
A2 3 0.0

cume_dist

Function statement:
cume_dist()
Supported engines: SparkSQL and Presto
Usage instructions: Compute the position of a value relative to all values within a partition.
Return type: double
Example:
SELECT a, b, cume_dist() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1 1 0.6666666666666666
A1 1 0.6666666666666666
A1 2 1.0
A2 3 1.0

first_value

Function statement:
first_value(col)
Supported engines: SparkSQL and Presto
Usage instructions: Return the value of the first row of a column in a partition.
Return type: data type of the col column
Example:
SELECT a, b, first_value(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1 1 1
A1 1 1
A1 2 1
A2 3 3

last_value

Function statement:
last_value(col)
Supported engines: SparkSQL and Presto
Usage instructions: Return the value of the last row of a column in a partition.
Return type: int
Example:
SELECT a, b, last_value(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1 1 1
A1 1 1
A1 2 2
A2 3 3

lag

Function statement:
lag(col[, n [, default]])
Supported engines: SparkSQL and Presto
Usage instructions: Return the value of the nth row above the current row in the window. The default value for n is 1, and the default value for default is null. If the value of the nth row is null, null is returned. If such an offset row does not exist (e.g., the first row of the window has no row above it when the offset is 1), default is returned. The first parameter is the column name, the second parameter is the nth row before, and the third parameter is the default value.
Return type: data type of the col column
Example:
SELECT a, b, lag(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1 1 NULL
A1 1 1
A1 2 1
A2 3 NULL

lead

Function statement:
lead(col[, n[, default]])
Supported engines: SparkSQL and Presto
Usage instructions: Return the value of the nth row after the current row in the window. The default value for n is 1, and the default value for default is null. If the value of the nth row is null, null is returned. If such an offset row does not exist (e.g., the last row of the window has no row below it when the offset is 1), default is returned. The first parameter is the column name, the second parameter is the nth row before, and the third parameter is the default value.
Return type: data type of the col column
Example:
SELECT a, b, lead(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1 1 1
A1 1 2
A1 2 NULL
A2 3 NULL

nth_value

Function statement:
nth_value(col[, n])
Supported engines: SparkSQL and Presto
Usage instructions: Return the value of the nth row from the start of the window. n starts from 1. If ignoreNulls=true, null will be skipped when searching for the nth row. Otherwise, every row is counted in n. If the nth row does not exist (e.g., when n is 10 but the window size is less than 10), null is returned. The first parameter is the column name, and the second parameter is the nth row.
Return type: data type of the col column
Example:
SELECT a, b, nth_value(b, 2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1 1 1
A1 1 1
A1 2 1
A2 3 NULL

ntile

Function statement:
ntile(n)
Supported engines: SparkSQL and Presto
Usage instructions: Divide the rows in the window partition into n buckets and return the bucket number of the row, ranging from 1 to n.
Return type: int
Example:
SELECT a, b, ntile(2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1 1 1
A1 1 1
A1 2 2
A2 3 1

CLUSTER_SAMPLE

Function statement:
CLUSTER_SAMPLE(<int> N[, <int> M]) over (PARTITION BY col1 ORDER by col2)
Supported engine: SparkSQL
Usage instructions: Sample within the window according to the specified proportion or quantity.
N: Required, int type. When only N is provided, it indicates sampling N data entries. The sampling result is close to N but not guaranteed to be exactly N.
M: Optional, int type. When M is specified, it indicates sampling the total number of entries in the M/N window. The sampling result is close to total entries of M/N .
Return type: boolean, where true indicates the row is sampled, and false indicates it is not sampled.
Example:
> SELECT a, b, cluster_sample(2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 2 true
A1 1 true
A2 3 true
A1 1 false
> SELECT a, b from (select a, b, cluster_sample(2) OVER (PARTITION BY a ORDER BY b) as c FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)) where c;
A1 2
A1 1
A2 3



Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback