Entering the Task Development Page
2. Click Project List in the left menu to find the target project that requires task development functionality.
3. After selecting a project, click to enter the Offline Development module.
4. Click Orchestration Space in the left menu.
Task Development Overview
Task development in WeData involves orchestrating computation tasks into data workflows for streamlined data processing. It supports flexible development processes through scheduling strategies, event listeners, task parameters, self-dependencies, and function libraries. This allows users to meet requirements for data processing, transformation, and enrichment while providing a visual configuration interface to easily build and manage complex data workflows.
Process-Oriented Data Processing
Define the rules for data flow and transformation between different tasks, enabling operations such as processing, cleansing, and transforming data.
Data Workflow Orchestration
Computation tasks act as data processing nodes and are organized into workflows, forming a complete end-to-end data processing pipeline.
Scheduling Policies
Scheduling strategies determine when tasks are executed. Workflows can be triggered automatically based on periodic schedules and additional conditions, ensuring that tasks are processed in the intended order and timing to meet diverse business needs.
Event Listeners
Event listeners are used when a computation task depends on an event to be triggered. They consist of a trigger program, a trigger event, and the listening task. First, an event is defined under the project based on business requirements. Then, a trigger program sends the event, and once the task detects the event, it runs accordingly.
Task Parameters and Parameter Passing
Variable parameters can be used in workflow design and task configuration, with support for passing parameters between tasks. Each computation task can have different input parameters, and output parameters can be passed to subsequent tasks, enabling data sharing and interaction across tasks.
Self-Dependency
During operations, tasks can be configured with self-dependencies, meaning a task’s execution can depend on the state of its previous scheduled run.
Function Library
A function library is provided, containing commonly used functions and algorithms for Hive SQL, Spark SQL, and DLC, such as mathematical functions, data transformation functions, and aggregation functions. It also supports user-defined functions (UDFs), helping users perform flexible and powerful data processing and computation.
Collaborative Data Development
In WeData, development scripts can be created, written, and debugged collaboratively with workflows. Ad hoc scripts configured in the development space can directly participate in workflow orchestration as task nodes, enabling code reuse and optimization of the overall process.
Workflow Introduction
The orchestration space provides workflow orchestration and configuration capabilities, allowing users to organize and develop different types of task code and submit them to the scheduling system for periodic execution. A project can contain multiple workflows, and WeData supports grouping workflows into the same folder for efficient management. A workflow is a collection of various task objects, including: Data integration tasks, Computation tasks (Hive SQL, JDBC SQL, MapReduce, PySpark, Python, Shell, Spark, Spark SQL, DLC SQL, DLC Spark, Impala, TCHouse-P, Trino), General tasks.
Workflow Directory
Directory feature
|
Searching | Supports search folders, workflows, and task names. |
Refreshing | Refresh: Refresh the directory tree to get the latest state of the orchestration directory. Locate tree node: One-click locate the current tree node. Collapse tree node: One-click expand/collapse ALL directories. |
Batch | Batch operation:Supports performing batch operations on ALL computation tasks in the directory. Operations include task submission (batch supported), task deletion, task ownership changes, data source modification, task dependency modification, scheduling cycle modification, scheduling priority modification, scheduling parameter modification, etc. Batch operation records can also be viewed. Display configuration: Supports display and hide for AI assistant, cross-workflow category and node, highlight code in brackets by click brackets, code snippet, etc. |
Create | Supports creating folders and data workflows. |
Workflow Canvas
Canvas feature:
|
Submit | Click the icon to submit the current workflow to the scheduling system (including node content, configuration properties, and dependency relationship) and generate a new version. Click the icon to refresh the content on the current workflow canvas. Click the icon to go to Ops Center - Workflow List Page. Click the icon to test the current workflow. During the test, click the pause icon to stop the test. Click the icon to add, modify, or delete project parameters and workflow parameters. |
Refreshing |
|
Go to Ops Center |
|
workflow test |
|
Advanced execution |
|
Task Type Directory | In the Task Type Directory, click the computing task type to add a task node to the workflow canvas. |
locate | Click the icon, in the pop-up filtering box select and locate the corresponding task. |
canvas zoom | Click the icon to zoom the workflow canvas. |
Formatting | Click the icon to standardize the layout format of tasks in the workflow. |
box select | Click the icon, the mouse changes to selection mode, allowing you to box select multiple tasks and execute batch operations. |
hide | Click the icon to hide cross-workflow nodes. |
General Settings
Click General Settings on the right to edit the current workflow name, responsible person, add description, workflow variables, and Spark SQL configuration parameters (optional). Among them, Spark SQL configuration only takes effect for Spark SQL tasks in workflow tasks.
Feature description:
|
Workflow name | Define the name of the workflow. |
workflow owner | Assign the workflow owner. The owner handles relevant permissions, submission, updates, and approval operations in the subsequent workflow. |
Description (optional) | Custom workflow description. |
Workflow type (optional) | Specify when creating workflow. Periodic workflows generate instances based on scheduling configuration. Manual workflows require manual triggering to generate instances and do not execute periodically. |
Workflow parameter (optional) | Workflow parameters (optional) apply to parameters within tasks under the workflow. They allow setting general parameters through workflow configuration. Rules: variable name = variable value, multiple values separated by “,”. Example: a=${yyyyMMdd}; b=123; c=456. Both constants and scheduling date variables are supported. For details, see workflow level variable usage process. |
Spark SQL configuration parameters (optional) | Used to configure optimization parameters (thread, memory, CPU cores), only act on the Spark SQL node. Separate multiple parameters with semicolons. |
For modifications to common settings, there are two modes: Simple Mode and Standard Mode.
In simple mode:
Modify the workflow name, workflow owner, and description, then click "Save" in the bottom-left corner to change information.
Modify workflow parameters and Spark SQL configuration parameters, then submit scheduling changes through the submit button in the top-left corner of the canvas.
In standard mode:
Modify workflow name and description, then click “Save” in the lower right corner to update the information in the production environment.
Modify the workflow owner, workflow parameters, and Spark SQL configuration parameters, then submit through the submit button in the top-left corner of the canvas and release by the release center to update information in production environments.
Unified Scheduling
Workflow scheduling supports two types of periodic scheduling configurations: regular and crontab. Regular configuration: Includes one-time, minute, hour, day, week, month, and year scheduling options, as described in the scheduling settings. Crontab configuration: More flexible, but can only be configured within unified workflow scheduling. All tasks under a crontab configuration must share the same scheduling time (crontab expression). Cross-workflow task dependencies are not supported, nor is creating dependencies between tasks configured with crontab and those configured with regular scheduling.
Note:
The unified scheduling operation is similar to batch operation, which will change the task cycles of all tasks under the current workflow to a uniform scheduling cycle. It is advisable to use when the scheduling cycles of tasks within the workflow are consistent.
General configuration method
Configuration instructions
|
Scheduling Cycle | The execution cycle unit for task scheduling supports minute, hour, day, week, month, year, and one-time. |
Effective Time | The valid time range for scheduling time configuration. The system will auto-schedule within this time range according to the time configuration. It will not auto-schedule if the validity period is exceeded. |
Execution Time | Users can manually set the duration between each execution and the specific time when task execution starts. If the cycle interval is 10 minutes, the scheduling task will run every 10 minutes from 00:00 to 23:59 every day starting on May 29, 2025. |
Calendar Scheduling | Users can select dates to execute or not execute scheduling in the scheduling calendar. |
Scheduling Plan | Automatically generated based on the cycle time setting. |
Self-Dependent | Configure the task self-dependency attribute for computing tasks in the current workflow. For the dependency feature, see Task Self-Dependency. |
Workflow self-dependency | Enabling this option means computing tasks in the current workflow depend on ALL computing tasks from the previous cycle of the current workflow. The workflow self-dependency feature takes effect only when tasks in the current workflow are in the same scheduling period and follow a daily cycle. |
crontab configuration mode
crontab configuration supports fine-grained settings for year, month, week, day, hour, minute, and second. Once configured, you can view the execution time.
Supports using crontab statements to configure the scheduling cycle. Click Configuration to enter the configuration page.
History Records
Click History Records on the right sidebar to view historical operations of the current workflow, including operator (execution account), operation time, and specific operation content.
version
Each time a data workflow is edited and submitted for operations, a workflow version is generated. Click on the right sidebar version to view historical versions of the workflow, including version name (version number), saver (version submitter), retention time (submission time), and change description.
Note:
Note: Workflow versions occur only upon submission of the workflow. Submitting a single task will not generate a workflow version.
Through the View feature in the operation list, you can see the configuration message of the corresponding version. The configuration message can be changed in the universal setting of the workflow.
Computing Task Introduction
Canvas Feature
|
Save | Click the icon to save the current task node. |
Submit | Click the icon to submit the task node to the scheduling system (node basic content, scheduling configuration attributes) and generate a new version record. Feature limitation: The task can only be properly committed after the data source and condition settings are complete.
|
Lock/Unlock | Click the icon to lock/release the current file for editing. If the task is locked by another user, it cannot be edited. |
Running | Click the icon to debug the current task node. |
Advanced execution |
Click the icon to run the current task node with variables. The system will automatically pop up the time parameter and custom parameter used in the code. |
Stop running | Click the icon to stop debugging the current task node. |
Formatting | Click the icon to standardize the format of code statements in the task.
|
Refreshing | Click the icon to refresh the content of the current task node. |
Project variables | Click the icon to view project global variables and use them in tasks.
|
Task Ops | Click the icon to go to the task operation and maintenance page and automatically filter the current task. |
Instance Ops | Click the icon to go to the instance operation and maintenance page and automatically filter the current task. |
Data Source | Select the data source used by the current computing task. |
Execution Resource Group | Select the execution resource group during the current computing task. |
resource queue | Select the resource queue used during the current compute task execution. |
Task Submit
After completing task editing, click the submit button in the top-left corner of the canvas, fill in the change description in the pop-up dialog box, and submit the task (including node basic content, scheduling configuration attributes) to the scheduling system to generate a new version record. Once the task is submitted successfully, you can view or operate the task and corresponding instance in the Ops center.
For non-first submissions, you can click "View version comparison" in the pop-up window to compare code content, task properties, and other information with the last submission.
Online Editor
Script type tasks can be filled with code content through the online editor, such as Shell, Python, and DLC SQL tasks. The online editor supports writing code:
Click brackets to highlight code in brackets (can be turned on or off in the data development configuration at the bottom-left corner of the offline development interface).
Double-click brackets to select code inside.
Select the code and click the run button on the line count to run the selected statement.
Task Attribute
The task name, task owner, task description, and task scheduling parameters of the current task can be modified. Application parameters are supported, and automatic code variable parsing is provided. A parameter description document is available to assist with scheduling parameter usage.
Scheduling Setting
Task Scheduling includes configuration items such as scheduling policy, event scheduling, dependency configuration, upstream dependency task configuration, scheduling priority, and failure strategy. For details, see Task Scheduling. version
The task history shows submission/save records. You can view node historical versions, submitter/saver, submission time/save time, change type, status, remarks, etc. in the version panel. Click the version name to view a single version's information and compare two versions by selecting them. After submission, a submit version is generated. Each save creates a save version, and a new record appears in the submit version/save version panel. Only submitted task nodes have version information; otherwise, it remains empty.
Submit Version
|
Rolling back | Roll back scripts and configurations of the task, excluding dependency relationship. Submission after rollback. After rollback, changes (including code and task configuration) not submitted will be lost. |
Compare | Provide pairwise comparison between historical versions of compute tasks, showing key information differences through code comparison panel and task properties panel. |
Save Version
|
Change Description | Click the pencil icon in the change description to add or modify the description for this specific version. |
Rolling back | Roll back scripts and configurations of the task, excluding dependency relationship. After rollback, changes (including code and task configuration) that are not saved will be lost. |
Compare | Provide pairwise comparison between historical versions of compute tasks, showing key information differences through code comparison panel and task properties panel. |
Dependency Relationship
On canvas, after connecting task nodes or adding event dependency in scheduling setting, the dependency relationship will display specific dependencies. Selecting the production version option shows dependencies in the submitted run version, while selecting the latest saved version shows dependencies from the last save. Tasks support search and can be filtered by scheduling cycle, status, and owner. Events can be filtered by period type and validity time.
MetaDB
Display metadata information of accessed data sources in the current project. You can obtain database table information by retrieving data sources, databases, or data tables, facilitating quick usage during task development. It provides quick functions like copy table, query SQL, table DDL, and table name.
Note:
Note: Currently, the ability to copy table query SQL and table DDL is only supported for system data sources.
Function Library
Display functions available for use in task development. Currently supports DLC SQL, Hive SQL, and Spark SQL functions. Select based on the engine targeted by the development task. The function library includes commonly used built-in functions, such as analysis functions corr, covar_pop, encryption functions hash, md5, and logical functions decode, nvl. Additionally, custom functions are supported. Once created through the function development feature after uploading a function package via resource management, the function will be displayed in the library and can be called during task development. For details on creating custom functions, see Function Development.