WeData supports integration with the DLC storage-compute separation computing engine, provides DLC database/table management features, and enables agile and efficient data lake analysis and computation. Based on Spark and Presto capabilities, standard SQL can be used to achieve federated analysis and computing of Cloud Object Storage (COS) and multi-source databases.
Using Background
Tencent Cloud Data Lake Compute (DLC) provides agile and efficient managed data lake analytics and computing services. Users do not need to perform traditional data layered modeling, drastically reducing the preparation time for massive data analysis and effectively improving enterprise data agility.
Use Limits
|
Data Lake Compute (DLC) | Currently, WeData supports data management, analytical queries, and computing tasks for DLC Type Library Tables. The supported computing engine versions of DLC are as follows: standard engine Presto:Standard-P 1.0 Spark:Standard-S 1.1,Standard-S 1.0,Standard-S 1.1(native) SuperSQL engine Spark SQL:SuperSQL-S 1.0,SuperSQL-S 3.5 Spark job: Spark 2.4, Spark 3.2, Spark 3.5 Presto:SuperSQL-P 1.0 |
WeData | task types supported in WeData data development include: DLC SQL, DLC Spark, DLC PySpark. WeData supports creating DLC tables and DLC functions. |
Usage Process
The main process of using DLC in WeData includes the following steps:
Preparations
|
Data Lake Compute (DLC) | To ensure smooth use of DLC-related table creation, data development, and data exploration features in WeData, the DLC cluster must meet basic configurations. For example, when using DLC's Spark Job Engine in WeData, you need to create a Spark Job Engine in DLC and grant the corresponding user permission to use the engine. | Create and manage DLC engine |
WeData | Bind the DLC cluster, obtain the latest cluster configuration from the DLC cluster. | By default, newly created projects will automatically use dynamic keys to interconnect with DLC. Data permission and engine permission management
|
Task Development
Creating a Workflow
Task development is implemented through the orchestration of data workflows to achieve procedural execution of computing tasks. Before creating a computing task, it is necessary to create a data workflow, then orchestrate the computing task running process within the workflow.
Create a DLC Node
WeData performs task development based on the DLC engine. After binding the DLC cluster with the project in WeData, it will integrate the DLC system data source into WeData. Currently, the DLC SQL in the orchestration space only supports the DLC system source.
Task Development
Complete the creation of DLC-supported computing task types in the created data workflows after binding the engine with the WeData project. During the task node configuration process, use the system data source provided by DLC for task development and debugging.
Task Submit
After configuring and debugging with the DLC system source data without errors, save the corresponding computing task, then submit and publish the workflow where the computing task is located. It can then be scheduled and executed in the Ops center.
Subsequent Operations
After completing DLC task development, you can perform DLC metadata management, task operation and maintenance monitoring, and data quality monitoring in WeData to ensure normal output of DLC data; you can perform multi-source joint query and data analysis in the data exploration feature.