Release Notes

Dynamic Release Record (2026)

Product Introduction

Product Overview

Product Advantages

Product Architecture

Product Features

Application Scenarios

Purchase Guide

Billing Overview

Product Version Purchase Instructions

Execute Resource Purchase Description

Billing Modes

Overdue Policy

Refund

Preparations

Overview of Account and Permission Management

Add allowlist /security groups (Optional)

Operation Guide

Console Operation

Project Management

Data Integration

Studio

Data Development

Data Analysis

Data Science

Data Governance (with Unity Semantics)

API Documentation

History

Introduction

API Category

Making API Requests

Smart Ops Related Interfaces

Project Management APIs

Resource Group APIs

Data Development APIs

Data Asset - Data Dictionary APIs

Data Development APIs

Ops Center APIs

Data Operations Related Interfaces

Data Exploration APIs

Asset APIs

Metadata Related Interfaces

Task Operations APIs

Data Security APIs

Instance Operation and Maintenance Related Interfaces

Data Map and Data Dictionary APIs

Data Quality Related Interfaces

DataInLong APIs

Platform Management APIs

Data Source Management APIs

Data Quality APIs

Platform Management APIs

Asset Data APIs

Data Source Management APIs

Data Types

Error Codes

WeData API 2025-08-06

Service Level Agreements

Related Agreement

Data Processing And Security Agreement

Glossary

DLC PySpark

PDF

Mode fokus

Ukuran font

Terakhir diperbarui: 2024-11-01 16:26:14

Note:
You need to bind the DLC engine. Currently, DLC PySpark supports the Spark job engine. For engine kernel details, see DLC Engine Kernel Version.
Feature Overview
Create a DLC PySpark task in WeData, submit it to the WeData scheduling platform and the DLC engine for execution.
Task parameters description
In the task properties of DLC PySpark, you can add DLC PySpark task data access policy, entry parameters, dependent resources, Spark task conf parameters, and task image.
Parameter name
Parameter description
Data access policy
Required, security policy to access COS data during task execution. For details, refer to DLC Configuration Data Access Policy.
Entry parameters
Optional, entry parameters of the program. Multiple parameters are supported and should be separated by "space".
Dependent resources
Optional, supports selecting --py-files, --files, --archives. Multiple COS paths for each resource can be input, separated by commas (,).
Conf parameters
Optional, parameters starting with spark., formatted as k=v. Multiple parameters should be separated by new lines. Example: spark.network.timeout=120s.
Task image
The image for task execution. If the task requires a specific image, you can choose between DLC built-in image and custom image.
Resource configuration
Using cluster resource configuration: Use the default resource configuration parameters of the cluster.
Custom: Resource usage parameters for custom tasks, including executor size, driver size, and number of executors.
Sample code
from os.path import abspath
﻿
from pyspark.sql import SparkSession
﻿
if __name__ == "__main__":
    spark = SparkSession \\
        .builder \\
        .appName("Operate DB Example") \\
        .getOrCreate()
     
    # 1. Create database
    spark.sql("CREATE DATABASE IF NOT EXISTS `DataLakeCatalog`.`dlc_db_test_py` COMMENT 'demo test' ") 
    # 2. Create inner table
    spark.sql("CREATE TABLE IF NOT EXISTS `DataLakeCatalog`.`dlc_db_test_py`.`test`(`id` int,`name` string,`age` int) ")
    # 3. Write inner data
    spark.sql("INSERT INTO `DataLakeCatalog`.`dlc_db_test_py`.`test` VALUES (1,'Andy',12),(2,'Justin',3) ") 
    # 4. Query inner data
    spark.sql("SELECT * FROM `DataLakeCatalog`.`dlc_db_test_py`.`test` ").show()
    
    # 5. Create outer table
    spark.sql("CREATE EXTERNAL TABLE IF NOT EXISTS `DataLakeCatalog`.`dlc_db_test_py`.`ext_test`(`id` int, `name` string, `age` int) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE LOCATION 'cosn://cos-bucket-name/ext_test' ")   
    # 6. Write outer data
    spark.sql("INSERT INTO `DataLakeCatalog`.`dlc_db_test_py`.`ext_test` VALUES (1,'Andy',12),(2,'Justin',3) ")  
    # 7. Query outer data
    spark.sql("SELECT * FROM `DataLakeCatalog`.`dlc_db_test_py`.`ext_test` ").show()
    spark.stop()
﻿

Bantuan dan Dukungan

Apakah halaman ini membantu?

Anda juga dapat Menghubungi Penjualan atau Mengirimkan Tiket untuk meminta bantuan.

masukan

tencent cloud

Tencent Cloud WeData

DLC PySpark

Feature Overview

Task parameters description

Sample code

Bantuan dan Dukungan

Parameter name	Parameter description
Data access policy	Required, security policy to access COS data during task execution. For details, refer to DLC Configuration Data Access Policy.
Entry parameters	Optional, entry parameters of the program. Multiple parameters are supported and should be separated by "space".
Dependent resources	Optional, supports selecting --py-files, --files, --archives. Multiple COS paths for each resource can be input, separated by commas (,).
Conf parameters	Optional, parameters starting with spark., formatted as k=v. Multiple parameters should be separated by new lines. Example: spark.network.timeout=120s.
Task image	The image for task execution. If the task requires a specific image, you can choose between DLC built-in image and custom image.
Resource configuration	Using cluster resource configuration: Use the default resource configuration parameters of the cluster. Custom: Resource usage parameters for custom tasks, including executor size, driver size, and number of executors.