tencent cloud

Tencent Cloud WeData

Data Science Overview

PDF
Focus Mode
Font Size
Last updated: 2026-04-22 18:08:09

Product Design Philosophy

WeData is built on the product design principles of MLOps and includes a data science module.

MLOps Principles and Value

MLOps (Machine Learning Operations) is a set of engineering methods that connect the AI team with Business and Ops teams. It establishes a standardized, automated, and continuously improved management system for the full lifecycle of Machine Learning models, enabling organizations to stably, reliably, and efficiently produce high-quality models at scale for business empowerment. The core approach is to achieve cost reduction and efficiency enhancement in large-scale AI development by addressing the following issues:
Model lifecycle lacks unified management
Code assets, data assets, algorithm assets and model assets lack uniform version management and trace ability;
Businesses lack appropriate standards in the process from ML production to application.
Long model development and deployment iteration cycle
Algorithmia 2020: 64% of businesses take over one month to deploy a new model, with 18% of companies needing more than 90 days to go live;
Model service is not sustainable
Model iteration and deployment speed cannot keep up with rapid business requirement changes.
From the moment it goes live, the model risks degradation (data drift, effect drift).
Degree of automation is low
Many manual processes, less efficient, high labor cost.
Lack of comprehensive monitoring and alerting mechanism, unable to detect errors prior to occurrence of damage and correct in time.
Cross-team collaboration is difficult
Different tools and workflows across teams.
Silo effect and communication gap between business team, Ops team and AI team are insurmountable.
High potential risk
Technical risk: unstable model performance, vulnerable infrastructure;
Compliance risk: violates government regulations and company policy.

Our Insight and Advantage

Building data science capabilities on a data platform like WeData inherently provides powerful Data Integration, Data Development, and Data Governance abilities, naturally addressing the fragmented nature of traditional data platforms and AI development platforms.
Data development and AI development separation
Big data and AI are two independent systems, making it difficult to implement end-to-end processes such as sample cleaning, storage, analysis, training, and reasoning.
High storage&computing costs
Data needs to flow between two systems.
Big data and AI cannot share CPU and GPU computing resources.

Our Core Philosophy

1. Always advance AI project R&D with business objectives as traction.
2. AI R&D can be achieved by a data-centric approach.
3. Drive the full lifecycle via a modular platform, such as data exploration, feature engineering, model training, and online service.
4. Use automated processes to implement continuous training, continuous integration, and continuous delivery.

Feature Overview

The WeData data science module includes four core function modules: Experiment Management, Feature Management, Model Management, and Model Service. It closely collaborates with associated products like Studio, Workflow, Data Quality, and Engine, thereby achieving MLOps capability implementation and realizing end-to-end capability across the whole lifecycle of "Data-Model-Inference."

Core Modules

Module
Core Features
Model experiment
Enable the MLflow service in Studio, you can call MLflow-related functions in the experiment to record parameters, metrics, and results for each experiment, then view them in Experiment Management, thereby achieving traceability and reproducibility.
And provided AutoML to support no-code development.
Feature management
Use the feature processing API provided by WeData in Studio to create, write, read, search, sync, and consume feature tables. You can also view and manage features in Feature Management to implement unified management and consumption of features.
Model Management
Enable the MLflow service in Studio. You can call MLflow-related functions in the experiment to register a model or perform visual model registration in Experiment Management. It supports viewing key information of the model as well as its association with experiments, runs, and services.
Model Service
Support creating API service from models in Model Management, performing service monitoring and other features, and viewing the relationship between models for easy information tracing.

Peripheral Modules

Module
Core Features
Studio
The main workspace for AI development is Studio, where users can edit, debug, and run code, and call the MLflow and feature engineering API to perform CRUD operations on feature tables, model training, and model registration.
Workflow
The main workspace of the automated process. Users can debug code in Studio and submit it to workflow setting for periodic scheduling to automate and periodically produce models.
Data Quality
Model service inference tables, feature tables, and training data tables can trigger data quality tasks to view quality information such as field analysis, drift analysis, and model metrics.
Engine
Data science integrates two types of engines, Data Lake Compute (DLC) and Elastic MapReduce (EMR), as data sources, offline feature storage, and training resources for AI development.

Required Descriptions and Prerequisites

Type
Description
Engines support
DLC standard engine
Applicable to model training, experiment reporting, feature management, and model registration.
Note that only the DLC engine can be used for AutoML experimentation when selecting the "wedata-data-science" mirror to create a resource group.
EMR on CVM,EMR on TKE
Applicable to model training, experiment reporting, feature management, and model registration.
Note that it cannot be used for AutoML experimentation.
EMR:Ray on TKE
Applicable to model training, experiment reporting, and model registration.
Note: Not applicable to AutoML experimentation or feature processing.
MLflow version
WeData's experiment management is fully compatible with MLflow version 2.17.2. The related mirror has been pre-installed. You can execute the command to check after connecting to the Studio runtime environment.
%pip list | grep mlflow
Offline feature store
WeData-managed offline feature tables currently only support Data Lake Compute (DLC) Iceberg tables and Elastic MapReduce (EMR) Hive tables. The primary key and timestamp key must be specified during feature table registration. Subsequent operations will perform feature indexing based on the specified primary key and timestamp key.
Table operation permissions
DLC:
If Catalog is enabled, authorize users with corresponding database/table permissions in DLC and grant permissions in the corresponding Catalog under "data assets - Catalog directory" in WeData.
If Catalog is not enabled, grant database/table permissions in DLC.
EMR:
In WeData, set the engine access account and account mapping under "Project Management > Storage-Compute Engine Settings" to determine table privileges.
Model operation permissions
DLC:
If Catalog is enabled, grant permissions to users in the corresponding Catalog under "data assets > Catalog directory" in WeData.
If Catalog is not enabled, proceed to manage permissions by project granularity.
EMR:
Manage permissions by project granularity
Online feature store
WeData supports Redis as an online feature store. Prepare as follows:
Step 1: Go to Tencent Cloud Distributed Cache to create a Redis instance.
Region and network need to pay attention to remain consistent with the scheduling resource group and engine (DLC or EMR) used.
Step 2: Add a Redis data source in WeData's "Project Management > Data Source Management" and test connectivity with the scheduling resource group.
Step 3: Add the default feature library (online) in WeData feature management.
Feature engineering package
Connect to the engine and run the following command to install the latest feature engineering package.
%pip install tencent-wedata-feature-engineering
If you use DLC, select the "wedata-data-science" mirror when building a resource group. The feature engineering package has been pre-installed. Execute command check version.
%pip list | grep tencent-wedata-feature-engineering
The latest version is viewable at tencent-wedata-feature-engineering.
Key management description
The call of the feature engineering package requires the ID and key of TencentCloud API. You can go to Tencent Cloud CAM to create access key.
If your organization requires that key information cannot be transmitted in plain text, you can use Tencent Cloud SSM or Tencent Cloud KMS to manage keys.

Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback