tencent cloud

Data Desensitization
Last updated: 2025-11-11 15:35:19
Data Desensitization
Last updated: 2025-11-11 15:35:19

Overview

Data Lake Compute (DLC) supports column-level data masking. You can use the data masking feature to associate masking rules for columns involving sensitive data, and you can configure a series of masking algorithms targeting different user groups to achieve refined masking applications based on roles. For example, for mobile number data, you may want to grant full access permissions to the users in the customer service group, grant only the permission to view the last 4 numbers to the users in the analysis group, and grant strict masking permissions that display as NULL to the users in the finance group.

Restrictions and Limitations

The current restrictions and limitations of the DLC data masking feature are as follows.

Supported Engine Types

Currently, only the following engine types with a kernel version later than October 1, 2024, are supported: Spark standard engine, SuperSQL Spark job engine, and SuperSQL SparkSQL engine. Among these, data desensitization for the SuperSQL SparkSQL engine is an allowlist feature. If you need to use it, submit a ticket for activation. For information on DLC engine categories, refer to Data Engine Introduction.

Applicable Scope

1. The data masking feature will only take effect for DLC regular users. For DLC user types, see permission overview.
2. The configuration for a masking policy tag takes about 1 minute to take effect.
3. Only effective for the metadata under the default data catalog, DataLakeCatalog.

Special Restrictions

1. When the SuperSQL SparkSQL engine is used, views are not supported by masking policies.
2. Users or working groups should have the SELECT permission on databases and tables. Otherwise, queries by the user or working group on that table will return an error. For DLC permission types, refer to Sub-Account Permission Management.

Supported Masking Methods

Currently, DLC supports the following data masking methods for column values:

Default Value

The default masking value for the column based on the column's data type is returned. Use this rule when you want to hide the value of the column but display the data type.
Supported data types: STRING, BINARY, INT, DECIMAL, BIGINT, FLOAT, DOUBLE, BOOLEAN, TIMESTAMP, DATE, ARRAY.
Data Type
Default Masking Value
STRING
""
BINARY
[]
INT
0
DECIMAL
0
BIGINT
0
FLOAT
0
DOUBLE
0
BOOLEAN
false
TIMESTAMP
1970-01-01 08:00:00
DATE
1970-01-01
ARRAY
[]

Retaining the First 4 Characters

The first 4 characters of the column's value are returned, replacing the rest of the string with XXXXX. If the column's value is equal to or less than 4 characters in length, return the column's value after the value has been run through the SHA-256 hash function. You can only use this rule with columns that use the STRING data type.
Supported data type: STRING
Example Value
Masking Value
abcd@example.com
abcdxxxxxxxxxx
abc
ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad

Retaining the Last 4 Characters

The last 4 characters of the column's value are returned, replacing the rest of the string with XXXXX. If the column's value is equal to or less than 4 characters in length, then return the column's value after the value has been run through the SHA-256 hash function. Use this rule with columns that use the STRING data type.
Supported data type: STRING
Example Value
Masking Value
abcd@example.com
xxxxxxxxxx.com
abc
ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad

Hashing

The column's value is returned after the value has been run through the SHA-256 hash function. Use this rule when you want the end user to be able to use this column in a JOIN operation for a query. You can only use this rule with columns that use the STRING or BYTES data types.
Supported data types: STRING and BYTES
Example Value
Masking Value
abcd@example.com
3f7768839b5bcba43f589cc3af54efaea18bceb1df8b05a7dffaec3e7b43b269

Setting as NULL

NULL is returned regardless of the column value's data type. Use this rule when you want to hide both the value and the data type of the column.
Supported data types: Unlimited
Example Value
Masking Value
abcd@example.com
NULL

Date Masking

Only the year part of a date string and defaults the month and date to 01/01 is displayed. Use this rule with columns that use the DATE and TIMESTAMP data types.
Supported data types: TIMESTAMP and DATE
Example Value
Masking Value
2015-03-05T09:32:05.359
2015-01-01 00:00:00

No Masking

Plaintext of the column value is displayed, without being masked.
Supported data types: All

Masking Method Selection Recommendations

You can flexibly select the masking method for the column value based on the following recommendations:
Masking Method
Recommended Scenario
Default Value
Column value is expected to be hidden but data type needs to be displayed to users.
Retaining the first 4 characters
Plaintext of the column value is expected to be hidden but part of the characters for information confirmation needs to be displayed, for example, the first 4 characters of the customer's email address used by the customer service personnel for confirmation.
Retaining the last 4 characters
Plaintext of the column value is expected to be hidden but part of the characters for information confirmation needs to be displayed, for example, the last 4 characters of the customer's mobile number used by the customer service personnel for confirmation.
Hashing
This rule is used in scenarios where end users can use this column in a JOIN operation for a query or use this column for GROUP BY statistics.
Setting as NULL
This rule is used in scenarios where the column value and its data type are expected to be hidden.
Date masking
This rule is used for the scenario where only the year part is displayed and the rest of the date information is hidden, for example, the year of birth for confirmation.
No masking
Recommended for users who need to use the plaintext.

Data Masking Workflow

To configure DLC data desensitization, you can follow the steps below:

Overview of Masking Policy Tags

A masking policy tag refers to a tag customized by users and associated with columns containing sensitive data. In a masking policy tag, you can configure refined masking methods for multiple user groups. The following example shows the procedure to set a masking policy tag named "mobile number" and set the masking methods for 3 DLC workgroups:
Name of masking policy tag: Mobile number masking
Masking method configurations for workgroups:
Workgroup
Masking Method
Customer service personnel group
No masking
Analysis personnel group
Displaying the last 4 characters
Finance personnel group
Setting as NULL
Effect:
After associating the above masking policy tag with two columns, such as Phone_number1 and Phone_number2, DLC users in the customer service personnel group get the plaintext when querying by Phone_number1 and Phone_number2, DLC users in the analysis personnel group get the result displaying as *********4320, and DLC users in the finance personnel group get the result displaying as NULL.

Masking Policy Execution Priority

Assuming a DLC user is added to multiple working groups, and different masking methods are associated with these different working groups, multiple masking policies may exist for a specific user. If conflicting masking methods exist for a user, the system will apply the method with the highest priority among the working groups of the user, where working groups higher in the list have a higher priority.
For example, in the masking policy tag above, if Zhang San belongs to both the analyst personnel group and the finance personnel group, the query for Zhang San will display data desensitized as, for example, *********4320.

Step 1: Creating a Masking Policy Tag

Creating a Masking Policy Tag

1. Go to the DLC console, choose Metadata Management > Data Desensitization, click Create Masking Policy Tag, and create a masking policy tag in the pop-up dialog box.

2. In Configure Masking Method, select the corresponding working groups to add them to the Selected dialog box, and configure a masking method for each working group.
3. After configuration, you can drag and drop working group modules to sort them from top to bottom, which determines the policy priority. Working groups positioned higher in the list have a higher priority. If no manual sorting is performed, policies will be applied based on the default top-down order.
4. Click OK to complete the creation.
Note:
1. The DLC data desensitization feature currently only takes effect for users within user groups that have been configured with a masking method. It does not take effect for user groups without a configured masking method, or for users not added to any user group.
2. To apply desensitization to all DLC users, it is recommended to:
2.1 Configure a masking method for all user groups.
2.2 Add all DLC users to a user group.

Step 2: Binding the Tag to a Data Column

1. Go to the DLC console, select Metadata Management, and select the data table containing the field to be desensitized from the data catalog.
2. Locate the field requiring desensitization, or search for the field name in the top-right search box. Then, in the Masking Policy Tag column for that field, click

, and select the desired masking policy tag for binding from the dialog box.
3. If no matching masking policy tag is found, click Create Masking Policy Tag in the dialog box to quickly create one.

Note:
1. The various built-in DLC masking methods have field type restrictions. For example, date desensitization can only be associated with columns of TIMESTAMP or DATE data types. For details, refer to Masking Method Selection Recommendations.
2. A field can only be associated with one masking policy tag. However, a single masking policy tag can be reused across multiple data columns.
Effective Time:
After you configure a masking policy tag and bind it to a column, the configuration takes about 1 minute to take effect. During this period, users may still get plaintext in a query. Just wait a moment for the configuration to take effect.

Step 3: Performing a Query

After you bind a masking policy tag to a column requiring desensitization, the configuration is successful if the data type of the column supports all masking methods within that policy tag. Users see the masked results according to the masking method applied to their user groups when querying column data. The following uses a virtual case to further show the effects that the DLC data masking feature can achieve.

Case Description

Assume that company A has a customer list customer_list containing sensitive information, with detailed fields as follows:
Mobile Number
Customer Level
Consumption Amount
Email Address
123456789
High
45,600
abc@example.com
234567891
Medium
15,000
bcd@example.com
345678912
Low
2,000
cde@example.com
456789123
Low
1,000
def@example.com
There are 3 user groups in company A, including the customer service personnel group, the finance personnel group, and the analysis personnel group. Currently, company A hopes that the Mobile Number and Email Address fields containing users' sensitive PII information can only be used by the customer service personnel group, the Consumption Amount field and the Customer Level tag can only be used by the finance personnel group, but the analysis personnel can view the hashed value of the Customer Level to conduct statistical analysis of customer hierarchy.
Based on the above requirements, create the following 3 masking policy tags:
Contact Information Masking
(Bind the policy tag to the Mobile Number and Email Address fields)
Consumption Amount Masking
(Bind the policy tag to the Consumption Amount field)
Customer Level Masking
(Bind the policy tag to the Customer Level field)
Customer service personnel group: No masking
Finance personnel group: NULL
Analysis personnel group: NULL
Customer service personnel group: NULL
Finance personnel group: No masking
Analysis personnel group: NULL
Customer service personnel group: NULL
Finance personnel group: No masking
Analysis personnel group: Hashing
Assume that after the masking policy tags are bound to the corresponding columns as per the above table,
1. For a user existing only within a specific working group, running SELECT * FROM customer_list returns the following results:
Customer service personnel group: This group has been granted a rule of no desensitization for contact information. The following results will be returned:
Mobile Number
Customer Level
Consumption Amount
Email Address
123456789
NULL
NULL
abc@example.com
234567891
NULL
NULL
bcd@example.com
345678912
NULL
NULL
cde@example.com
456789123
NULL
NULL
def@example.com
Finance personnel group: This group has been granted a rule of no desensitization for consumption amount and customer level. The following results will be returned:
Mobile Number
Customer Level
Consumption Amount
Email Address
NULL
High
45,600
NULL
NULL
Medium
15,000
NULL
NULL
Low
2,000
NULL
NULL
Low
1,000
NULL
Analyst personnel group: This group has been assigned hash desensitization for customer level. The following results will be returned. While the actual customer level is obscured, statistical analysis can still be performed using the hash value:
Mobile Number
Customer Level
Consumption Amount
Email Address
NULL
4fa3c0d004d0750fc7bf8631993bd7c668fd33f8d089e0103ad8ef3fc1d9f4bb
45,600
NULL
NULL
35d8f8d59e2630de970e35271547d087278074addd61ce31940da69d82d19929
15,000
NULL
NULL
49542bc83b9d59935686144f352b6acb2264992720d0dbe780be50b56b87fef7
2,000
NULL
NULL
49542bc83b9d59935686144f352b6acb2264992720d0dbe780be50b56b87fef7
1,000
NULL
2. If a user exists in both the customer service personnel group and the finance personnel group, according to the top-down priority of masking rules, the effective masking rules for this user are as follows:
Contact Information Desensitization
(Bind the policy tag to the Mobile Number and Email Address fields)
Consumption Amount Desensitization
(Bind the policy tag to the Consumption Amount field)
Customer Level Desensitization
(Bind the policy tag to the Customer Level field)
Customer service personnel group: No desensitization
Customer service personnel group: NULL
Customer service personnel group: NULL
Running SELECT * FROM customer_list returns the following results:
Mobile Number
Customer Level
Consumption Amount
Email Address
123456789
NULL
NULL
abc@example.com
234567891
NULL
NULL
bcd@example.com
345678912
NULL
NULL
cde@example.com
456789123
NULL
NULL
def@example.com

Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback