tencent cloud

Tencent Cloud WeData

Release Notes
Dynamic Release Record (2026)
Product Introduction
Product Overview
Product Advantages
Product Architecture
Product Features
Application Scenarios
Purchase Guide
Billing Overview
Product Version Purchase Instructions
Execute Resource Purchase Description
Billing Modes
Overdue Policy
Refund
Preparations
Overview of Account and Permission Management
Add allowlist /security groups (Optional)
Sign in to WeData with Microsoft Entra ID (Azure AD) Single Sign-On (SSO)
Operation Guide
Console Operation
Project Management
Data Integration
Studio
Data Development
Data Analysis
Data Science
Data Governance (with Unity Semantics)
API Documentation
History
Introduction
API Category
Making API Requests
Smart Ops Related Interfaces
Project Management APIs
Resource Group APIs
Data Development APIs
Data Asset - Data Dictionary APIs
Data Development APIs
Ops Center APIs
Data Operations Related Interfaces
Data Exploration APIs
Asset APIs
Metadata Related Interfaces
Task Operations APIs
Data Security APIs
Instance Operation and Maintenance Related Interfaces
Data Map and Data Dictionary APIs
Data Quality Related Interfaces
DataInLong APIs
Platform Management APIs
Data Source Management APIs
Data Quality APIs
Platform Management APIs
Asset Data APIs
Data Source Management APIs
Data Types
Error Codes
WeData API 2025-08-06
Service Level Agreements
Related Agreement
Privacy Policy
Data Processing And Security Agreement
Contact Us
Glossary

S3 Data Source

PDF
Mode fokus
Ukuran font
Terakhir diperbarui: 2025-07-25 17:23:07

S3 Offline Single Table Read Node Configuration


Parameter
Description
Data Source
Select an available S3 data source from the current project.
Synchronization Scope
Full synchronization: The synchronization scope is all files under the file path.
Incremental: The synchronization scope is the updated files between the planned scheduling time in the previous period and the planned scheduling time in this period under the file path. When debugging a synchronization task, if incremental synchronization is selected, all files under the file path are synchronized by default.
Note:
The synchronization scope is only valid for file paths in the form of a file directory. If the path is specific to a file, the synchronization scope does not take effect, and synchronization is performed by file.
Synchronization Method
S3 supports two synchronization methods:
Data synchronization: Parse structured data content and map and sync data content based on field relationships.
File transfer: Transmit the entire file without content parsing, applicable to unstructured data synchronization.
Note:
File transfer only supports data sources where both the source end and the target end are of file types (COS/HDFS/SFTP/FTP/S3/Azure Blob/Http), and the synchronization method for both the source end and the target end must be file transfer.
File Path
Path of the S3 file system, supports the use of * wildcard.
S3 file path must include the bucket name, for example s3a://bucket_name.
File Type
S3 supports five file types: TXT, ORC, PARQUET, CSV, JSON.
TXT: Represents the TextFile file format.
ORC: Represents the ORCFile file format.
PARQUET: Represents the ordinary Parquet file format.
CSV: Represents the ordinary HDFS file format (logical two-dimensional table).
JSON: Represents the JSON file format.
Field Separator
For TXT/CSV file types, the field separator is set when reading in synchronous data mode, with a default value of (,).
Note:
Unsupported ${} as a delimiter. ${a} will be recognized as a configured parameter a.
Line Separator(Optional)
For TXT/CSV file types, the line delimiter set when reading in synchronous data mode. Up to 3 values can be input, and multiple input values are all treated as line delimiters. If not specified, linux defaults to \\n, and windows defaults to \\r\\n.
Note:
1. Setting multiple line delimiters will impact read performance.
2. Does not support ${} as a delimiter, ${a} will be recognized as a configured parameter a.
Encode
Coding configuration for reading files. Supports utf8 and gbk encoding.
Null Value Conversion
When reading, convert the specified string to NULL. NULL represents an unknown or inapplicable value, different from 0, an empty string, or other numeric values.
Note:
Unsupported ${} as a specified string. ${a} will be recognized as a configured parameter a.
Compression Format
Currently supports: none, deflate, gzip, bzip2, lz4, snappy.
Since snappy currently does not have a unified stream format, Data Integration currently only supports the most widely used:
hadoop-snappy (snappy stream format on Hadoop)
framing-snappy (google-recommended snappy stream format)
Quote character
No configuration: The source reads the data as its original content.
The source end reads the value within double quotes (") as data content. Note: If the data format is not standardized, it may cause OOM when reading.
Single quotation mark ('): The source end reads the value within the single quotation mark (') as data content. Note: If the data format is not standardized, reading may cause OOM.
Skip the Header
No: When reading, do not skip the header.
Yes: When reading, skip the header.

S3 Offline Single Table Write Node Configuration


Parameter
Description
Data Source
Select an available S3 data source from the current project.
Synchronization Method
S3 supports two synchronization methods:
Data synchronization: Parse structured data content and map and sync data content based on field relationships.
File transfer: Transmit the entire file without content parsing, applicable to unstructured data synchronization.
Note:
File transfer only supports data sources where both the source end and the target end are of file types (COS/HDFS/SFTP/FTP/S3/Azure Blob/Http), and the synchronization method for both the source end and the target end must be file transfer.
File Path
Write to the path of the S3 file system.
S3 file path must include the bucket name, such as s3a://bucket_name.
Write Mode
overwrite: Clean up all files with the file name as the prefix before writing.
append: Write without any pre-processing and ensure no file name conflict.
nonConflict: Report an error if the file name is duplicated.
File Type
S3 supports four file types: TXT, ORC, PARQUET, CSV.
TXT: Represents the TextFile file format.
ORC: Represents the ORCFile file format.
PARQUET: Represents the ordinary Parquet file format.
CSV: Represents the ordinary HDFS file format (logical two-dimensional table).
Field Separator
For TXT/CSV file types, the field separator is set when writing in synchronous data mode, with a default value of (,).
Note:
Unsupported ${} as a delimiter. ${a} will be recognized as a configured parameter a.
Line Separator(Optional)
For TXT/CSV file types, the line separator set when writing in synchronous data mode. Multiple input values are all treated as line separators. If not specified, linux defaults to \\n, and windows defaults to \\r\\n. When manually filling, one value can be input as the target line separator for data writing.
Note: Unsupported ${} as a delimiter. ${a} will be recognized as a configured parameter a.
Encode
Coding configuration for writing files. Supports utf8 and gbk encoding.
Empty Character String Processing
No action taken: Do not process empty strings when writing.
Processed as NULL: When writing, process empty strings as NULL.
Null Value Conversion
When writing, convert NULL to the specified string. NULL represents an unknown or inapplicable value, different from 0, an empty string, or other numeric values.
Note:
Unsupported ${} as a specified string. ${a} will be recognized as a configured parameter a.
Compression Format
Currently supports: none, deflate, gzip, bzip2, lz4, snappy.
Since snappy currently does not have a unified stream format, Data Integration currently only supports the most widely used:
hadoop-snappy (snappy stream format on Hadoop)
framing-snappy (google-recommended snappy stream format)
Quote character
No configuration: The target does not perform the add operation for quotation marks when writing data, and the values match the source.
" (double quotation mark): The target automatically adds double quotation marks " to each value when writing data, such as "123".
single quotation mark (' ): When writing data, automatically add a single quotation mark to each value, for example '123'.
Header included or Not
No: When writing, exclude the header.
Yes: When writing, include the header.
Tagged completed file
Blank .ok file, indicating task transmission completed.
Do not generate: After task completion, do not generate blank .ok files. (Chosen by default)
Generate a blank .ok file after task completion
Tagged completed file name
When filling in, include the complete file path, name, and suffix. By default, a .ok file is generated in the target path according to the target file name and suffix.
Support using time scheduling parameters such as ${yyyyMMdd} in data sync and file transfer modes.
Support using built-in variable ${filetrans_di_src} to represent source file name.
For example: /user/test/${filetrans_di_src}_${yyyyMMdd}.ok.
Note:
In file transfer mode, when using ${filetrans_di_src} to reference the source file name in the target file name, and the tagged completed file is generated as an .ok file according to the target file name and suffix, the number of generated tagged completed files depends on the number of files at the source end.


Bantuan dan Dukungan

Apakah halaman ini membantu?

masukan