Window Functions

Download
Focus Mode
Font Size
Last updated: 2024-08-07 17:33:25
row_number
Function statement:
  row_number()
Supported engines: SparkSQL and Presto
Usage instructions: Assign a unique consecutive number to each row.
Return type: int
Example:
﻿
SELECT a, b, row_number() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1	1	1
A1	1	2
A1	2	3
A2	3	1
rank
Function statement:
  rank()
Supported engines: SparkSQL and Presto
Usage instructions: Compute the rank of a value within a group of values. If there are ties, leaves gaps in the ranking sequence.
Return type: int
Example:
SELECT a, b, rank() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	1
A1	1	1
A1	2	3
A2	3	1
dense_rank
Function statement:
  dense_rank()
Supported engines: SparkSQL and Presto
Usage instructions: Compute the rank of a value within a group of values. Unlike the rank function, dense_rank does not leave gaps in the ranking sequence in the case of ties.
Return type: int
Example:
SELECT a, b, dense_rank() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	1
A1	1	1
A1	2	2
A2	3	1
percent_rank()
Function statement:
  percent_rank()
﻿
Supported engines: SparkSQL and Presto
Usage instructions: Compute the percentile rank of a value within a group of values. The return value is a decimal between 0 and 1.
Return type: double
Example:
SELECT a, b, percent_rank() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	0.0
A1	1	0.0
A1	2	1.0
A2	3	0.0
cume_dist
Function statement:
cume_dist()  
Supported engines: SparkSQL and Presto
Usage instructions: Compute the position of a value relative to all values within a partition.
Return type: double
Example:
SELECT a, b, cume_dist() OVER (PARTITION BY a ORDER BY b)  FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	0.6666666666666666
A1	1	0.6666666666666666
A1	2	1.0
A2	3	1.0
first_value
Function statement:
  first_value(col)
Supported engines: SparkSQL and Presto
Usage instructions: Return the value of the first row of a column in a partition.
Return type: data type of the col column
Example:
SELECT a, b, first_value(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	1
A1	1	1
A1	2	1
A2	3	3
last_value
Function statement:
  last_value(col)
Supported engines: SparkSQL and Presto
Usage instructions: Return the value of the last row of a column in a partition.
Return type: int
Example:
SELECT a, b, last_value(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	1
A1	1	1
A1	2	2
A2	3	3
lag
Function statement:
 lag(col[, n [, default]]) 
Supported engines: SparkSQL and Presto
Usage instructions: Return the value of the nth row above the current row in the window. The default value for n is 1, and the default value for default is null. If the value of the nth row is null, null is returned. If such an offset row does not exist (e.g., the first row of the window has no row above it when the offset is 1), default is returned. The first parameter is the column name, the second parameter is the nth row before, and the third parameter is the default value.
Return type: data type of the col column
Example:
SELECT a, b, lag(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	NULL
A1	1	1
A1	2	1
A2	3	NULL
lead
Function statement:
  lead(col[, n[, default]])
Supported engines: SparkSQL and Presto
Usage instructions: Return the value of the nth row after the current row in the window. The default value for n is 1, and the default value for default is null. If the value of the nth row is null, null is returned. If such an offset row does not exist (e.g., the last row of the window has no row below it when the offset is 1), default is returned. The first parameter is the column name, the second parameter is the nth row before, and the third parameter is the default value.
Return type: data type of the col column
Example:
SELECT a, b, lead(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	1
A1	1	2
A1	2	NULL
A2	3	NULL
nth_value
Function statement:
  nth_value(col[, n])
Supported engines: SparkSQL and Presto
Usage instructions: Return the value of the nth row from the start of the window. n starts from 1. If ignoreNulls=true, null will be skipped when searching for the nth row. Otherwise, every row is counted in n. If the nth row does not exist (e.g., when n is 10 but the window size is less than 10), null is returned. The first parameter is the column name, and the second parameter is the nth row.
Return type: data type of the col column
Example:
SELECT a, b, nth_value(b, 2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	1
A1	1	1
A1	2	1
A2	3	NULL 
ntile
Function statement:
  ntile(n)
Supported engines: SparkSQL and Presto
Usage instructions: Divide the rows in the window partition into n buckets and return the bucket number of the row, ranging from 1 to n.
Return type: int
Example:
SELECT a, b, ntile(2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	1
A1	1	1
A1	2	2
A2	3	1
CLUSTER_SAMPLE
Function statement:
  CLUSTER_SAMPLE(<int> N[, <int> M]) over (PARTITION BY col1 ORDER by col2)
Supported engine: SparkSQL
Usage instructions: Sample within the window according to the specified proportion or quantity.
N: Required, int type. When only N is provided, it indicates sampling N data entries. The sampling result is close to N but not guaranteed to be exactly N.
M: Optional, int type. When M is specified, it indicates sampling the total number of entries in the M/N  window. The sampling result is close to total entries of M/N .
Return type: boolean, where true indicates the row is sampled, and false indicates it is not sampled.
Example:
> SELECT a, b, cluster_sample(2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 2 true
A1 1 true
A2 3 true
A1 1 false
> SELECT a, b from (select a, b, cluster_sample(2) OVER (PARTITION BY a ORDER BY b) as c FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)) where c;
A1 2
A1 1
A2 3
﻿
﻿
Help and Support

Was this page helpful?
You can also Contact sales or Submit a Ticket for help.
Help us improve! Rate your documentation experience in 5 mins.
Feedback
tencent cloud

Data Lake Compute

Window Functions

row_number

rank

dense_rank

percent_rank()

cume_dist

first_value

last_value

lag

lead

nth_value

ntile

CLUSTER_SAMPLE

Help and Support