tencent cloud

Notebook Exploration
Last updated: 2025-09-19 17:27:00
Notebook Exploration
Last updated: 2025-09-19 17:27:00
Note:
Note: Currently, Notebook Exploration is in the Invite Testing stage. You can contact ticket to enable the allowlist for trial use.

Feature Overview

WeData has newly launched the Notebook Exploration feature, which supports reading data from Tencent Cloud big data engines EMR and DLC through Jupyter Notebook. With interactive data analysis, users can perform data exploration and machine learning.
Currently, Notebook Exploration is available in the following regions: Domestic sites: Beijing, Shanghai, Guangzhou, Singapore, Silicon Valley, International sites: Singapore, Frankfurt




Key Features

One-Click Workspace Creation

No need to manually install the Python environment or configure dependencies. Supports one-click creation of a Notebook workspace that comes with a complete Jupyter Notebook environment and commonly used dependency package

User and Resource Isolation

Each user has an exclusive workspace under different projects. Storage and computing resources across workspaces are isolated, ensuring that tasks and file resources do not interfere with each other.

Integration with Big Data Engines

Supports binding to EMR and DLC big data engines. Users can directly read data from these engines for interactive exploration, algorithm model training, and predictive data analysis.

Built-In Practical Tutorial

The Notebook workspace comes with built-in big data tutorials, allowing users to get started quickly and experience the features out-of-the-box.

Overall Use Flow

The end-to-end process of using Notebook in WeData is shown below:


Operation Steps

Creating Notebook Workspaces

1. Log in to WeData console.
2. In the left-hand menu, click Project List and locate the target project where you want to use the Notebook Exploration feature.
3. Click the project name to enter the project.
4. In the upper-left corner of the page, expand the top menu and go to Data Analysis > Notebook Exploration.



5. Enter the Notebook Exploration list page and click Create Workspace.



6. Enter the workspace configuration page to set basic info and resource configuration information.



Attribute Name
Description
Required or Not
Basic information
Configure the basic information of the Notebook workspace, used to create a Notebook workspace instance.

Space Name
Name of the Notebook workspace. Supports Chinese, English, numbers, underscores (_), and hyphens (-), with a maximum length of 32 characters.
Yes
Space Template
Selecting different templates will import different configurations during initialization. You can select Jupyter Notebook to import the standard Notebook example template or choose the Deepseek series to import a specific model.
Yes
Permission Scope
If “Personal use only” is selected, only the current user can access the workspace. If “Shared within the project” is selected, project members can enter the workspace for collaborative development.
Yes
Description
Notebook Workspace Description, supports Chinese, English, numbers, special characters, with a maximum length of 255 characters.
No
Engine
Supports selecting the EMR or DLC compute engine bound to the current project. The selected engine will be pre-mounted for use. Notebook tasks can access PySpark through the selected engine.
No
Network
If an EMR engine is selected, you must further select a network configuration for network access. By default, the VPC and subnet of the EMR engine are used.
Yes
DLC Data Engine
If a DLC engine is selected, you must further select a DLC data engine bound to the project, used to run DLC PySpark tasks.
Note:
Note: Only DLC Spark Job Type computing resources are supported.
Yes
Machine Learning
If the DLC data engine contains a resource group of type “Machine Learning”, this option will be displayed and automatically selected by default.
If the DLC data engine does not contain a resource group of type “Machine Learning”, this option will not appear. If needed, please create it in DLC beforehand.
No
RoleArn
If a DLC engine is selected, you must further select a RoleArn to grant access permissions for reading and writing data in COS.
Note:
Note: RoleArn is the data access policy (CAM role arn) for DLC engine access to COS. Users are advised to configure it in DLC.
Yes
Advanced configuration
You can choose to use MLflow to manage experiments, data and models in Notebook Exploration. The feature currently requires enabling the allowlist.

MLflow Service
When selected, Notebook Task will report experiments and machine learning created using MLflow functions to the MLflow service. You can subsequently view them in Machine Learning - Experiment Management and Model Management.
No
Resource Configuration
Configure workspace storage and computing resources for Notebook task running and file storage.

Specification Selection
Supported specifications include:
2-core 4GB RAM / 8GB storage (trial version)
4-core 8GB RAM / 16GB storage (advanced version)
8-core 16GB RAM / 32GB storage (Pro Edition)
32-core 64GB RAM / 32GB storage (ultra-fast version)
Yes



Workspace Start/Stop Management

Starting Up a Workspace

1. Click Create Now to enter the Notebook workspace launch page.
2. During the startup process, the PySpark environment will be configured for you, and commonly used Python packages such as numpy, pandas, and scikit-learn will be installed. The installation may take some time, so please wait patiently until it is completed.
3. Once the Notebook workspace displays the following page, it indicates that the workspace has been successfully launched and you can start creating Notebook tasks.




Exiting the Workspace

1. Click the Log Out button at the top-left to exit the current workspace and return to the list page.
The workspace will stop automatically ten minutes after exiting. A stopped workspace will restore the development environment and data when started again.




Editing the Workspace

Click the Edit button on the list page to modify the configuration of the current workspace. Supported modifications include: space name, description, and resource configuration.


Deleting the Workspace

Click the Delete button on the list page to delete the current workspace.

Creating and Running Notebook Files

1. Creating Notebook Files
You can create folders and Notebook files in the left-side Resource Explorer.
Note:
Notebook files must end with (.ipynb).
2. Select the running kernel
Enter the Notebook file, click in the upper left corner to select kernel, and choose the kernel from the pop-up drop-down list.
Note:
In Jupyter Notebook, a Kernel is the backend program that executes code, returns computation results, and interacts with the user interface.
Currently, WeData Notebook supports two types of kernels:
Python Environment: The default IPython kernel in Jupyter Notebook, supporting Python code execution. Python 3.10 is pre-installed, and users can choose the built-in Python 3.8 or Python 3.11 versions, or install other Python versions as needed.
DLC resource group: A remote kernel provided by Tencent Cloud Big Data. Python tasks can be submitted to the DLC resource group for execution.

If selected the DLC resource group, choose a certain machine learning resource group instance from the DLC data engine options.

3. Running Notebook Files
Click Run to generate a Notebook kernel instance and start running code. The results show below each cell.

Periodic Scheduling Notebook Task

Creating Notebook Task

1. Enter the project and open the menu Data R&D > Offline Development.



2. Click Create Workflow in the left sidebar, and configure workflow properties including workflow name and folder.






3. Create a task in the workflow with the task type Common-Notebook Exploration. On the create task page, configure task basic attributes including task name and task type.




Configuring and Running Notebook Task

In the Notebook task configuration page, reference files in the Notebook workspace.

1. Select a Notebook workspace
You can drop down to select all Notebook workspaces in the current project.
2. Select a Notebook file
You can drop down to select all files in the current Notebook workspace. Remark: If the current user has no permission for this Notebook workspace, they will be unable to enter it for operations.
3. Preview code
After selecting a Notebook file, you can preview its specific content below.
4. Run Notebook Task
In the top-right corner, select the scheduling resource group, click Run to perform online running of the current Notebook file. Below, you can view running logs, run code, and execution results.

Configuring Scheduling

1. Click on the right Scheduling Configuration, set the scheduling interval of the current Notebook Task, for example, the figure below is set to run once every 5 minutes.



2. Click the Submit button to submit the current task to periodic scheduling.


Task Ops

1. Enter Data R&D > Ops Center.

2. Task Ops
Click Task Ops to see workflows submitted for scheduling and task nodes in the workflow.

3. Instance Ops
Click Instance Ops to view each periodic instance generated by the above workflow.

4. Enter instance details to view running logs and results.

Practical Tutorial

The Notebook workspace includes built-in Big Data Series Tutorials ready for use out-of-the-box, allowing users to quickly start their experience.

Tutorial 1 Using DLC Jupyter Plugin to Perform Data Analysis

This sample Notebook demonstrates how to analyze data in Data Lake Compute (DLC). The Notebook space already has the DLC Jupyter Plugin built-in for direct loading. Example syntax includes running Spark code, SparkSQL code, and using SparkML.
Note:
To use this tutorial, the notebook workspace needs to bind the DLC engine without selecting "Use Machine Learning Resource Group". The kernel needs to select Python Environment, and WeData Notebook will interact with DLC via the Jupyter Plugin.




Tutorial 2 Reading EMR Data for Model Prediction

1. This sample Notebook demonstrates how to create an EMR-Hive table and import local data into it. Then read data from the EMR-Hive table and convert it into a pandas DataFrame for data preparation.
2. After completing data preparation, you can use the Prophet time series algorithm to train a predictive model, then evaluate model accuracy and make predictions.
Note:
Use this tutorial. The Notebook workspace must bind an EMR engine.




Tutorial 3 Creating a Machine Learning Experiment and Performing Experiment Management

This sample Notebook demonstrates how to use MLflow to create experiments, record data and manage models. The experiment is based on the iris dataset, uses KNeighborsClassifier algorithm for model training, and employs MLflow to record and trace experimental data, finally generating a best model for classification prediction.
Note:
To use this tutorial, the notebook workspace needs to bind the DLC engine and select "Use Machine Learning Resource Group". The kernel needs to select DLC resources, and WeData Notebook will remotely submit the notebook file to DLC for execution.
MLflow is an open-source machine learning platform that provides end-to-end support for the data science lifecycle, including experiment management, model versioning, model deployment, and model monitoring. If MLflow service is enabled in the current workspace, you can record each experiment’s parameters, metrics, and results by calling MLflow functions. These records can then be viewed under WeData Machine Learning > Experiment Management and Model Management, enabling experiment tracking and reproducibility.


Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback