tencent cloud

Developing a Notebook
Last updated:2026-01-09 17:45:08
Developing a Notebook
Last updated: 2026-01-09 17:45:08

Preliminary Preparations

Enable WeData Studio Feature

Note:
Currently, the WeData Studio feature is on the allowlist. If you need to use it, please contact the WeData team to enable it.

Purchase Big Data Storage and Computing Engine

Note:
WeData Studio supports interfacing with big data storage and computing engines including:
1. DLC engine: Standard engine - Spark Type, must contain a "machine learning - Spark MLlib" kind resource group.
2. EMR engine: EMR on CVM - Hadoop Type, version 3.5.0, must contain EG component and Spark components.
If you use the EMR engine, you need to check whether the security group used by EMR allows access to the WeData Studio CIDR (30.22.32.0/19). If not, perform the following operations on the security group:
Allow inbound traffic from 30.22.32.0/19 on TCP port 8888.

Studio Development IDE

Studio features include three modules: Studio development IDE, data directory navigation, and Git source code management.

Enter Studio Module

Click Data R&D > Studio. To enter Studio, you must first pull up an individual runtime environment. It is isolated by project and user, meaning each user has an exclusive runtime environment in each project.

The initial startup of an individual runtime environment takes several minutes, and subsequent entries can achieve second-level response.


File Directory Management

Studio file directory structure includes four parts: Workspace, GitFolder, recycle bin, and favorites.
Workspace is a local folder.
GitFolder is a git folder, supporting interfacing with remote git repositories to perform code management.
The recycle bin is used to store folders and files after user deletion.
Favorites are used to store folders and files favorited by the current user.


File Management Operation

Studio file directory management objects include folders, Notebook (.ipynb), SQL (.sql), and files (.py, .csv). Various types of objects support management operations including:
Operation Name
Workspace Folder
GitFolder
Notebook, SQL, File
Creating a folder
×
Creating a Notebook
×
Create a new file
×
Moving
Copying
×
×
Copying the path
Copy relative path
Renaming
Uploading
×
Download
Deleted Object
Favorite
Permission configuration.
×

Creating a Notebook

Creating a Folder

Click create a folder, fill in the folder name and folder path.


Creating a Notebook File

Click create a Notebook, fill in the Notebook name and folder path.


Editing a Notebook File

In the IDE area on the right, support a series of operations such as create cell, cut cell, copy cell, paste cell, delete cell, move cell, edit cell, and modify cell language type.


Running a Notebook

To run a Notebook file, you need to select a kernel. WeData uses a remote kernel to submit the task to the big data storage and computing engine for execution via the engine's computational resources.

Select Kernel

1. Click Run at the top of the cell to automatically open the popup for creating a kernel;
2. Click kernel not connected in the upper-right corner of the IDE to open the popup for creating a kernel.


Kernel Configuration

Take the DLC engine as an example. Suppose the current project is only bound to the DLC engine, the kernel configuration page for Notebook files is as follows:


Attribute item
Attribute item description
Use Limits
DLC resource group
Select a resource group in the DLC engine bound to the current project
DLC engine: Only standard engine-Spark type DLC engines are supported.
Resource group: Only resource groups with the supported business scenario as "machine learning" and framework type as "Spark MLlib" are supported.
Enable resource reuse mode
Once enabled, you can choose a created Spark App to create a kernel, saving engine resources and shortening kernel creation time.
If two Notebook files use the same Spark App, the runtime environment will be shared.
Spark App Name
Support selecting current project, a Spark App created by the user
-
Disable resource reuse mode
After closing, create a new Spark App
Create a new Spark App to achieve environment isolation between files, but it typically need a few minutes.
Spark App Name
Fill in the Spark App name, making it easy to reuse and select subsequently.
-
Auto release time
Select Spark App inactive auto release time
-
Custom Images
Default is the built-in mirror within the DLC resource group, supporting user selection of TCR Custom Image
-
Advanced Settings
Fill in the created Spark App specification parameters
-

View Execution Results

For DataFrame data structure, WeData performs targeted optimization in scenarios where using the display() function for data presentation and SQL syntax for query result, supporting tabular data display and manipulation of data result.

View Data Result

Support preview of data result, select data in custom region, and perform right-click and shortcut key copy. Support click column name to sort in ascending or descending order.
Note:
Preview supports up to 10,000 rows of data with a data size no more than 2M.


Data Retrieval

Support keyword input to do fuzzy search, and retrieval result can be highlighted. Click the "<" or ">" button to switch between multiple retrieval results.


Field Setting

Support configuration data to display column names, you can click the pin button for top display of specified field.



Field Filtering

Click the filter button to add multiple filter conditions for filtering data result.


Data Download

Click Download to download the data result as a csv, excel, or txt file.
Note:
Supports up to 10,000 rows of data download with a data size no more than 2 MB.


Dynamic Parameter Implementation

Notebook supports defining dynamic parameters to implement parameterized file code debug function.

Parameter Definition

1. Define via code.
Define parameters name, default value and tag with dlcutils.widgets.text() function.
Function Name
Parameter Definitions
Example
dlcutils.widgets.text(name: str, default: str, label: str = "")
name: Parameter Name
default: Default Value
label: parameter tag
dlcutils.widgets.text("fav_food" , "bean","favorite food")
2. Define through visual interface.
Click the "parameter" button in the toolbar at the top, enter parameter name, parameter value, and parameter tag in the pop-up window.


Parameter Retrieval

Get the parameter value through the dlcutils.widgets.get() function.
Function Name
Parameter Definitions
Example
dlcutils.widgets.get(name: str)
name: Parameter Name
dlcutils.widgets.get(fav_food)
Note:
During the trial run of the Notebook, the parameter value output by the code will be used to replace the default value in the function definition with the parameter value in the pop-up window.

File Version Management

Click Save to generate a saved version of the current file. Click on the right "Version History" to view all historical versions, supporting operations such as version comparison and version rollback.

Version comparison:


File Permission Management

1. Enter "Project Management > Data Development Configuration" and start the task permission control switch.
Note: The feature is only applicable to the DLC engine and not currently supported by the EMR engine.

2. Enter Studio, right-click the folder or Filename in Workspace, and select permission configuration.
Note:
Only Workspace supports file permission management. GitFolder cannot be configured with permissions, and all members of the project can view and operate it.

3. In the pop-up window, you can add, edit, or delete permission configuration items.
Project administrators are granted by default management permissions for all folders and files.
The creator of a folder or file is granted by default management permissions for the created object.
Subfiles or files in a folder inherit the parent folder's permission configuration by default.

Attribute item
Attribute item description
Authorized object
Support selection: role, user, all project members
permission item
Support selection: manage, none

Data Catalog

Data Retrieval

The data Catalog displays in a hierarchy of Catalog, database, table, and field, supporting users to view during the development process and access data via code.
Note:
The prerequisite for using the data directory is that the DLC engine region bound to the current project has enabled TC-Catalog.
Click the "Search" button, you can enter Catalog name, database name, or table name for fuzzy search.

Data Access

Support reading and writing the content of the data catalog via PySpark in code.
Shortcut action:
1. Click the "Insert" button after the data table or field to insert the table name or field name into the right IDE.
2. Click the "Copy" button after the data table or field to copy the table name or field name to the system clipboard. You can subsequently paste and use it in the IDE.

Git Source Code Management

Initialize Git Configuration

Project Git Configuration

Each project can only connect to a Git repository domain names or IP addresses, configured by the project administrator or others with permission to manage project. Every user in the project needs to initialize own personal configuration.
Operation Steps:
1. Enter "Project Management > Git Configuration", fill in the Git repository address, Git provider, branch, etc. to be used for the current project.
2. Click sequentially Initialize Network Configuration, Network Connectivity Test.

3. After the connectivity test is passed, enter Personal Information > Personal Configuration to complete your Git information configuration.
Attribute Item
Attribute Item Description
Git Repository Address
Fill in the remote Git repository URL they want to connect to
Git provider
Support selecting GitLab, GitLab Enterprise Edition
Git branch
Fill in the remote Git repository branch they want to connect to
Network Environment
Support public network access or Virtual Private Cloud (VPC).
1. If the network environment is public network access, click "Initialize Network Configuration" on the webpage, and the system will automatically perform network integration for user.
2. If the network environment is a VPC, users are advised to purchase a terminal node service, bind it to the network location of Git, and fill in the terminal node service ID here.
Network connectivity test
Verify the network connectivity status with the remote Git repository

Personal Git Configuration

1. Click Personal Info > Personal Configuration, enter username and Token information in Git permission configuration.
2. Click "Connectivity Test". After the test passes, you can click "Initialize Personal Runtime Environment".
3. Connect the personal runtime environment of Studio to the remote Git repository. Subsequently, you can manage the code in Studio in the remote Git repository.


Git Management Operation

After the initial configuration is complete, enter Studio again, and the system will automatically fetch code files from the remote Git repository to GitFolder. Users are advised to manually fetch updates subsequently.

Studio Git source code management feature supports commonly used Git operations, including but not limited to:
commit: Submit local changes to the work branch and add a change description.
push: push a new branch to the remote Git repository.
pull: Pull content from the remote Git repository to your local system.
branch merge: Merge the change from the work branch to another branch, such as the master branch.
Resolve merge conflicts: Support recognition and address code conflicts during branch merging.
view history records: view the history records of the current branch.

Notebook Task Orchestration

Create Workflow

1. Enter "Data R&D - Workflow Orchestration".

2. Create a new workflow in the orchestration space directory.


Creating a Notebook Task

1. Enter Offline Development > Orchestration Space, create a Notebook Task, and refer to an existing notebook file. The file source supports selecting the Studio directory or a remote Git repository.
If it is a Studio directory, select one of the Notebook files in Studio Workspace or GitFolder.
If it is a remote Git repository, select one of the Notebook files in the bound remote Git repository and branch.


Notebook Task Configuration

Note:
Notebook Task in the orchestration space has a reference relationship with notebook files in the Studio directory or remote Git repository. Modifying file content in the orchestration space is not allowed, but you can configure the environment for running the notebook file.
1. Select the storage-compute engine and scheduling resource group information they want to connect to in the top right corner of the page.
2. In the "Task Configuration" sidebar, you can adjust the mirror and specification parameters.


Running a Notebook Task

Click Run to debug the current Notebook Task. After successful running, you can proceed with task submission.

Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback