Data quality is one of the core components of data governance. It helps users immediately detect dirty data generated in data integration and data development, automatically intercept abnormal tasks, and block the spread of dirty data downstream. This reduces user issue handling cost and resource consumption.
Applicable Roles: Data development engineer, data warehouse table owner.
Fee Instructions
The generated costs of data quality task execution mainly consist of the following three parts:
1. WeData product feature version cost (premise).
2. WeData execution resource cost: Charge based on the volume of scheduling resources consumed by the Quality Task Instance.
3. Non-WeData direct costs: Quality Task Verification requires collaboration with engines and data source services (such as EMR, DLC, TCHouse-D, TCHouse-P), which will generate engine costs. These costs are charged by the engine side and are not included in the WeData billing statement. For specific pricing standards of each engine, refer to the billing information in the engine product documentation on the Tencent Cloud official website.
Core Capabilities
The Quality Module mainly includes the following core features:
1. Supports various Tencent Cloud big data storage engines (EMR, DLC, TCHouse-P, TCHouse-D) and open-source big data storage engines (Doris).
2. Configure data quality inspection rules at the table or field level.
3. Configure execution policies based on actual business scenarios.
4. Set rule strength to determine whether to block downstream tasks.
5. Support various user reach methods (WeCom Group, WeChat, call, SMS, mail, Lark Group, DingTalk Group).
6. Quality scoring can be obtained from six dimensions (accuracy, timeliness, integrity, uniqueness, consistency, and validity) to form a quality report for library and table dimensions.
Module Features
Introduction to the features of each module in Data Quality:
|
Quality overview | Quality result overview: View detection status and rule execution status. View alarm status, table alarm ranking. |
Rule Template | Unified management rule template, making it easy for unified reuse: 56+ system built-in templates: can only be viewed; Custom rule template: supports CRUD operations. |
Data monitoring | Create detection rules: Supports various Tencent Cloud big data engines: EMR, DLC, TCHouse-P, TCHouse-D, Doris; Supports various creation methods: single table addition, multiple tables addition, batch upload. View detection rules: Supports various viewing methods: view all, table dimension, rule dimension; Supports viewing the rule list of a table and managing rules. |
Ops Management | Execute Instance and Results: Supports viewing the task running result and viewable historical operation status for each rule. Supports exporting execution results and viewing historical export logs. Quality Task: Supports viewing generated quality inspection tasks. Supports configuring alarm information for quality tasks. Alarm Information: Supports viewing historical alarm status. |
Quality report | Quality report: Supports calculating quality scores from historical operation results in multiple dimensions: database tables, rule dimensions. Supports viewing quality scores in multiple dimensions: comprehensive quality score, dimension quality score, quality detail breakdown. |
Key term explanation:
|
Independent cycle | Perform periodic quality inspection on selected database tables and core business fields with self-defined frequency such as daily, hourly, or by minute. The quality task will execute on a scheduled cycle. If an exception is detected, subscribers will be notified immediately. |
Associated Scheduling | Associate the quality task with a production task (data sync task or data development task). When the production task execution is complete, insert a quality rule task execution. If an exception is detected, the handler will be notified immediately to handle it. Based on the task level, downstream task execution will be blocked to avoid problem data expansion. |
Important Notes
EMR, DLC, TCHouse-P, TCHouse-D configuration tables and field data quality rules require the use of a scheduling resource group with network connectivity to execute output data scheduling nodes. Only then can data quality rule validation be triggered normally, provided the executor is stable and the version has been updated to the new version.
Each table can configure multiple table-level and field-level data quality rules, executing validation simultaneously.