Overview
Strengths
Use Cases
SELECT statement: Retrieve rows from multiple tables.[ WITH with_query [, ...] ]SELECT [ ALL | DISTINCT ] select_expression [, ...][ FROM from_item [, ...] ][ WHERE condition ][ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ][ HAVING condition ][ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ][ ORDER BY expression [ ASC | DESC ] [ NULLS FIRST | NULLS LAST] [, ...] ][ LIMIT [ count | ALL ] ]
with_query statement is as follows:subquery_table_name [ ( column_name [, ...] ) ] AS (subquery)
[ INNER ] JOINLEFT [ OUTER ] JOINRIGHT [ OUTER ] JOINFULL [ OUTER ] JOINCROSS JOINON join_condition: If join_condition is used, you can specify the column name for the join key of multiple tables. If join_column is used, the join column must exist in both tables.UNION, INTERSECT, and EXCEPT combine the results of more than one statement. UNION combines all the rows in the result set from the first query with those in the result set from the second query. To eliminate duplication, UNION builds a hash table, which consumes memory resources. UNION ALL is recommended for better performance.INTERSECT returns only the rows in the result sets of both the first and the second queries.EXCEPT returns the rows in the result set of the first query, but not the second.expression. Then, the second expression is used to sort rows matching the first expression, and so on.WITH x AS (SELECT a, MAX(b) AS b FROM t GROUP BY a)SELECT a, b FROM x;
WITHt1 AS (SELECT a, MAX(b) AS b FROM x GROUP BY a),t2 AS (SELECT a, AVG(d) AS d FROM y GROUP BY a)SELECT t1.*, t2.*FROM t1JOIN t2 ON t1.a = t2.a;
SELECT count(*), nationkey FROM customer GROUP BY 2;
SELECT count(*), nationkey FROM customer GROUP BY nationkey;
SELECT count(*) FROM customer GROUP BY mktsegment;
SELECT origin_state, origin_zip, destination_state, sum(package_weight)FROM shippingGROUP BY GROUPING SETS ((origin_state),(origin_state, origin_zip),(destination_state));
SELECT origin_state, NULL, NULL, sum(package_weight)FROM shipping GROUP BY origin_stateUNION ALLSELECT origin_state, origin_zip, NULL, sum(package_weight)FROM shipping GROUP BY origin_state, origin_zipUNION ALLSELECT NULL, NULL, destination_state, sum(package_weight)FROM shipping GROUP BY destination_state;
SELECT count(*), mktsegment, nationkey,CAST(sum(acctbal) AS bigint) AS totalbalFROM customerGROUP BY mktsegment, nationkeyHAVING sum(acctbal) > 5700000ORDER BY totalbal DESC;
SELECT nameFROM nationWHERE regionkey IN (SELECT regionkey FROM region)
SELECT column_name(s)FROM table_nameWHERE EXISTS(SELECT column_name FROM table_name WHERE condition)
SELECT *FROM table_1JOIN table_2USING (key_A, key_B)
SELECT *FROM (VALUES(1, 3, 10),(2, 4, 20)) AS table_1 (key_A, key_B, y1)LEFT JOIN (VALUES(1, 3, 100),(2, 4, 200)) AS table_2 (key_A, key_B, y2)USING (key_A, key_B);
SELECT *FROM nationCROSS JOIN region
SELECT orderdate FROM orders LIMIT 5
Statement: ORDER BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...]
SELECT name, age FROM person ORDER BY age
SELECT * FROM studentORDER BY student_id
SELECT * FROM studentORDER BY student_id,student_name
SELECT * FROM (VALUES 13, 42)EXCEPTSELECT 13
SELECT * FROM (VALUES 13, 42)INTERSECTSELECT 1
UNION ALL operator.SELECT 13UNIONSELECT 42
SELECT id FROM aUNION ALLSELECT id FROM b;
BERNOULLI: Each row is selected to be in the table sample with a probability of the sample percentage. When a table is sampled using the Bernoulli method, all physical blocks of the table are scanned and certain rows are skipped (based on a comparison between the sample percentage and a random value calculated at runtime). The probability of a row being included in the result is independent from any other row. This does not reduce the time required to read the sampled table from disk. It may have an impact on the total query time if the sampled output is processed further.SYSTEM: This sampling method divides the table into logical segments of data and samples the table at this granularity. This sampling method either selects all rows from a specific data segment or skips it, based on a comparison between the sampling percentage and a randomly computed value at runtime. The row selected in system sampling depends on the connector used. For example, when Hive is used, it depends on how the data is laid out on HDFS. This method does not guarantee independent sampling probabilities.SELECT *FROM users TABLESAMPLE BERNOULLI (50);
SELECT *FROM users TABLESAMPLE SYSTEM (75);
SELECT * FROM personPIVOT (SUM(age) AS a, AVG(class) AS cFOR name IN ('John' AS john, 'Mike' AS mike));
LATERAL VIEW [ OUTER ] generator_function ( expression [ , ... ] ) [ table_alias ] AS column_alias [ , ... ]
Select 'dlc''test'
CREATE EXTERNAL TABLE IF NOT EXISTS `csv_test_2222` (`_c0` STRING,`_c1` INTEGER,`_c2` INTEGER,`_c3` INTEGER)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ('separatorChar' = '''','quoteChar' = '''')STORED AS `textfile`LOCATION 'cosn://dlc-nj-1258469122/csv/100M/
Was this page helpful?
You can also Contact sales or Submit a Ticket for help.
Help us improve! Rate your documentation experience in 5 mins.
Feedback