Quick Start with Data Analytics in Data Lake Compute

Recent Pages

Quick Start with Data Analytics in Data Lake Compute

Last updated: 2024-07-17 15:19:00

Data Lake Compute allows you to quickly query and analyze COS data. Currently, CSV, ORC, Parquet, JSON, Avro, and text files are supported.
With Data Lake Compute, you can complete data analysis queries on COS in just a minute. It currently supports multiple formats including CSV, ORC, PARQUET, JSON, ARVO, and text files.
Preliminary Preparations
Before initiating a query, you need to activate the internal permissions of Data Lake Compute and configure the path for query results.
Step 1: Establish the necessary internal permissions for Data Lake Compute.
Note
 If the user already has the necessary permissions, or if they are the root account administrator, this step can be disregarded.
If you are logging in as a sub-account for the first time, in addition to the necessary CAM authorization, you also need to request any Data Lake Compute admin or root account admin to grant you the necessary Data Lake Compute permissions from the Permission Management menu on the left side of the Data Lake Compute console (for a detailed explanation of permissions, please refer to DLC Permission Overview).
1. Table Permissions: Grant read and write operation permissions to the corresponding catalog, database, table, and view.
2. Engine Permissions: These can grant usage, monitoring, and modification rights to the computation engine.
Note
The system will automatically provide each user with a shared public-engine based on the Presto kernel, allowing you to quickly try it out without the need to purchase a private cluster first.
For detailed steps on granting permissions, please refer to Sub-account Permission Management.
Step 2: Configure the path for query results.
Upon initial use of Data Lake Compute, you must first configure the path for query results. Once configured, the query results will be saved to this COS path.
1. Log in to the Data Lake Compute DLC console and select the service region.
2. Navigate to Data Exploration via the left sidebar menu.
3. Under the Database and Tables page, click on Storage Configuration to set the path for query results.
﻿
Specify the COS path for storage. If there are no available COS buckets in your account, you can create one through the Object Storage Console.
﻿
﻿
﻿
Analysis Steps
Step 1. Create a database
If you are familiar with SQL statements, write the CREATE DATABASE statement in the query and skip the creation wizard.
1. Log in to the Data Lake Compute console and select the service region.
2. Select Data Explore on the left sidebar.
3. Select Database & table, click "+", and select Create a database as shown below:
﻿
﻿
Enter the database name and description.
﻿
﻿
﻿
4. After selecting an execution engine in the top-right corner, run the CREATE DATABASE statement.
﻿
﻿
﻿
As shown in the picture below:
﻿
﻿
  For details, see Table Management.
Step 2. Create an external table
If you are familiar with SQL statements, write the CREATE TABLE statement in the query and skip the creation wizard.
1. Log in to the Data Lake Compute console and select the service region.
2. Select Data Explore on the left sidebar.
3. Select Database & table, select the created table, and right-click to select Create external table.
Note: 
 An external table generally refers to a data file stored in a COS bucket under your account. It can be directly created in Data Lake Compute for analysis with no need to load additional data. It is external, so only its metadata will be deleted when you run DROP TABLE, while your original data will remain.
﻿
4. Generate the table creation statement based on the wizard, and then complete the steps of setting the basic information, selecting the data format, editing the column, and editing the partition.
Step 1. Select the COS path of the data file (which must be a directory in a COS bucket but not a bucket itself). There is also a quick method to upload a file to COS. The operations require relevant COS permissions.
Step 2. Select the data file format. In the Advanced options, you can select automatic inference, and then the backend will parse the file format and automatically generate the table column information for fast column inference.
﻿
﻿
Note: 
 Structure inference is an auxiliary tool for table creation and may not be 100% accurate. You need to check and modify the field names and types as needed.
﻿
﻿
﻿
Step 3. Skip this step if there is no partition. Proper partitioning helps improve the analysis performance. For more information on partitioning, see Querying Partition Table.
﻿
﻿
5. Click Complete to generate the SQL statement for table creation. Then, select a data engine and run the statement to create a table.
﻿
﻿
Step 3. Run the SQL analysis
After the data is prepared, write the SQL analysis statement, select an appropriate compute engine, and start data analysis.
﻿
﻿
Sample
Write a SQL statement with all data query results being SUCCESS and run the statement after selecting a compute engine.
select * from `DataLakeCatalog`.`demo2`.`demo_audit_table` where _c5 = 'SUCCESS'
﻿