tencent cloud

Cloud Object Storage

Release Notes and Announcements
Release Notes
Announcements
Product Introduction
Overview
Features
Use Cases
Strengths
Concepts
Regions and Access Endpoints
Specifications and Limits
Service Regions and Service Providers
Billing
Billing Overview
Billing Method
Billable Items
Free Tier
Billing Examples
Viewing and Downloading Bill
Payment Overdue
FAQs
Getting Started
Console
Getting Started with COSBrowser
User Guide
Creating Request
Bucket
Object
Data Management
Batch Operation
Global Acceleration
Monitoring and Alarms
Operations Center
Data Processing
Content Moderation
Smart Toolbox
Data Processing Workflow
Application Integration
User Tools
Tool Overview
Installation and Configuration of Environment
COSBrowser
COSCLI (Beta)
COSCMD
COS Migration
FTP Server
Hadoop
COSDistCp
HDFS TO COS
GooseFS-Lite
Online Tools
Diagnostic Tool
Use Cases
Overview
Access Control and Permission Management
Performance Optimization
Accessing COS with AWS S3 SDK
Data Disaster Recovery and Backup
Domain Name Management Practice
Image Processing
Audio/Video Practices
Workflow
Direct Data Upload
Content Moderation
Data Security
Data Verification
Big Data Practice
COS Cost Optimization Solutions
Using COS in the Third-party Applications
Migration Guide
Migrating Local Data to COS
Migrating Data from Third-Party Cloud Storage Service to COS
Migrating Data from URL to COS
Migrating Data Within COS
Migrating Data Between HDFS and COS
Data Lake Storage
Cloud Native Datalake Storage
Metadata Accelerator
GooseFS
Data Processing
Data Processing Overview
Image Processing
Media Processing
Content Moderation
File Processing Service
File Preview
Troubleshooting
Obtaining RequestId
Slow Upload over Public Network
403 Error for COS Access
Resource Access Error
POST Object Common Exceptions
API Documentation
Introduction
Common Request Headers
Common Response Headers
Error Codes
Request Signature
Action List
Service APIs
Bucket APIs
Object APIs
Batch Operation APIs
Data Processing APIs
Job and Workflow
Content Moderation APIs
Cloud Antivirus API
SDK Documentation
SDK Overview
Preparations
Android SDK
C SDK
C++ SDK
.NET(C#) SDK
Flutter SDK
Go SDK
iOS SDK
Java SDK
JavaScript SDK
Node.js SDK
PHP SDK
Python SDK
React Native SDK
Mini Program SDK
Error Codes
Harmony SDK
Endpoint SDK Quality Optimization
Security and Compliance
Data Disaster Recovery
Data Security
Cloud Access Management
FAQs
Popular Questions
General
Billing
Domain Name Compliance Issues
Bucket Configuration
Domain Names and CDN
Object Operations
Logging and Monitoring
Permission Management
Data Processing
Data Security
Pre-signed URL Issues
SDKs
Tools
APIs
Agreements
Service Level Agreement
Privacy Policy
Data Processing And Security Agreement
Contact Us
Glossary

Creating Templates

PDF
Focus Mode
Font Size
Last updated: 2026-01-12 22:36:56

Feature Description

Create an Automatic Speech Recognition (ASR) template.

Authorization Description

When using a sub-account, you need to add the ci:CreateMediaTemplate permission to the action in the authorization policy. For all operation APIs supported by Cloud Infinite (CI), please refer to CI action.

Service Activation

To use this feature, you need to bind a bucket in advance and enable Cloud Infinite service.
To use this feature, you need to enable Smart Audio Service in advance via the console or API.
Note:
Note: After binding Cloud Infinite (CI), if you manually perform the unbind operation on the bucket, you will not be able to continue using this feature.

Use Limits

When using this API, please confirm the relevant restrictions. For details, see Usage Limits.


Request

Request sample

POST /template HTTP/1.1
Host: <BucketName-APPID>.ci.<Region>.myqcloud.com
Date: <GMT Date>
Authorization: <Auth String>
Content-Length: <length>
Content-Type: application/xml

<body>
Note:
Authorization: A request header that carries authentication information to verify the legitimacy of the request. For details, see the Request Signature document.

Request header

This API only uses common request headers. For details, see Common Request Headers documentation.

Request body

The following figure shows the request bodies required for implementing this request operation.
<Request>
<Tag>SpeechRecognition</Tag>
<Name>TemplateName</Name>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ChannelNum>1</ChannelNum>
<ResTextFormat>1</ResTextFormat>
<FilterDirty>0</FilterDirty>
<FilterModal>1</FilterModal>
<ConvertNumMode>0</ConvertNumMode>
<SpeakerDiarization>1</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
</SpeechRecognition>
</Request>
The detailed data is described as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required or Not
Request
None.
Container for saving requests
Container
Yes
Container type
Request
data description as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required or Not
Tag
Request
Template type: SpeechRecognition
String
Yes
Name
Request
Template name, supporting only Chinese, English, digits, _, -, and *, with a length not exceeding 64.
String
Yes
SpeechRecognition
Request
speech recognition parameter
Container
Yes
Container type
SpeechRecognition
data description as follows:
Node Name (Keyword)
Parent Node
Description
Type
Default Value
Required or Not
FlashAsr
Request.
SpeechRecognition
Enable ultra-fast ASR, value true/false
String
false
No
EngineModelType
Request.
SpeechRecognition
Engine model type, divided into phone call scenario and non-phone call scenario.
phone call scenario
8k_zh: 8k phone call Mandarin (applicable to stereo audio)
8k_zh_s: 8k phone call Mandarin speaker separation (applicable only to mono-channel audio)
8k_en: 8k phone call English
non-phone call scenario
16k_zh: 16k Mandarin
16k_zh_video: 16k audio and video domain
16k_en: 16k English
16k_ca: 16k Cantonese
16k_ja: 16k Japanese
16k_zh_edu: Chinese education
16k_en_edu: Education in English
16k_zh_medical: Medical
16k_th: Thai
16k_zh_dialect: Multi-dialect, supports 23 dialects
Ultra-fast ASR supports 8k_zh, 16k_zh, 16k_en, 16k_zh_video, 16k_zh_dialect, 16k_ms (Malay), 16k_zh-PY (Chinese-English-Cantonese)
String
None.
Yes
ChannelNum
Request.
SpeechRecognition
Number of sound channels:
1 means mono. EngineModelType is only supported for mono in non-phone call scenarios.
2 means stereo (only supported for 8k_zh engine model, stereo should correspond to both callers).
Only supports non-ultra-fast ASR. This parameter is required for non-ultra-fast ASR.
String
None.
No
ResTextFormat
Request.
SpeechRecognition
Recognition result return format:
0: Recognition result text (with segment timestamp)
1: Word-level detailed recognition result, no punctuation, with speech speed value (Word Timestamp List, generally used to generate subtitle scene)
2: Word-level detailed recognition result (with punctuation and speech speed value)
3: Punctuation segmentation, with timestamp per segment, especially applicable to subtitle scene (includes word-level time, punctuation, speech speed value)
Only supports non-ultra-fast ASR
String
None.
No
FilterDirty
Request.
SpeechRecognition
Whether to filter profanity (Currently supports Mandarin engine)
0: Not filter profanity
1: Filter profanity
2: Replace profanity with *
String
0
No
FilterModal
Request.
SpeechRecognition
Whether to filter modal particles (Currently supports Mandarin engine)
0: Do not filter modal particles
1: Partial filtering
2: Strict filtering
String
0
No
ConvertNumMode
Request.
SpeechRecognition
Whether to perform intelligent conversion of Arabic numerals (Currently supports Mandarin engine)
0: Do not convert, directly output Chinese numbers
1: Intelligently convert to Arabic numerals based on the scenario
3: Enable math-related number conversion
Only supports non-ultra-fast ASR
String
0
No
SpeakerDiarization
Request.
SpeechRecognition
whether to enable speaker separation
0: Do not enable.
1: Enable (only supports 8k_zh, 16k_zh, 16k_zh_video, mono-channel audio)
For 8k phone call scenarios, recommend using dual-channel to distinguish both callers. Set ChannelNum=2, no need to enable speaker separation.
String
0
No
SpeakerNumber
Request.
SpeechRecognition
Number of speakers to be separated (speaker separation must be enabled), value ranges from 0 to 10.
0 indicates automatic separation (currently only supports ≤6 persons)
1-10 indicates the specified number of speakers to be separated
Only supports non-ultra-fast ASR
String
0
No
FilterPunc
Request.
SpeechRecognition
Whether to filter punctuation (Currently supports Mandarin engine)
0: Not filter.
1: Filter out sentence-ending punctuation
2: Filter out ALL punctuation
String
0
No
OutputFileType
Request.
SpeechRecognition
Output file type, selectable txt, srt
Ultra-fast ASR only supports txt
Non-ultra-fast Asr with ResTextFormat set to 3 only supports txt
String
txt
No
Format
Request.
SpeechRecognition
Ultra-fast ASR audio format, supports wav, pcm, ogg-opus, speex, silk, mp3, m4a, aac
Ultra-fast ASR requires this parameter
String
None.
No
FirstChannelOnly
Request.
SpeechRecognition
whether to recognize the first sound channel
Identify all sound channels
Recognize the first sound channel
Ultra-fast ASR only
String
1
No
WordInfo
Request.
SpeechRecognition
whether to display word-level timestamp
0: Do not display
1: Display, excluding punctuation timestamp
2: Display, including punctuation timestamp
Ultra-fast ASR only
String
0
No
SentenceMaxLength
Request.
SpeechRecognition
Maximum characters per punctuation, range of values: [6,40]
Default value 0 means disable this feature
This parameter can be used in subtitle generation to control the maximum number of characters in a single-line subtitle
When FlashAsr is false, the parameter is valid only when ResTextFormat is 3.
String
0
No

Response

Response Headers

This API only returns the public response header. For details, see Common Response Headers documentation.

Response Body

The response body is returned as application/xml. An example including the complete node data is shown below:
<Response>
<RequestId>NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x</RequestId>
<Template>
<Tag>SpeechRecognition</Tag>
<TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
<Name>TemplateName</Name>
<Tag>SpeechRecognition</Tag>
<CreateTime>2020-08-05T11:35:24+0800</CreateTime>
<UpdateTime>2020-08-31T16:15:20+0800</UpdateTime>
<BucketId>test-1234567890</BucketId>
<Category>Custom</Category>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ChannelNum>1</ChannelNum>
<ResTextFormat>1</ResTextFormat>
<FilterDirty>0</FilterDirty>
<FilterModal>1</FilterModal>
<ConvertNumMode>0</ConvertNumMode>
<SpeakerDiarization>1</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
<FlashAsr>false</FlashAsr>
<FirstChannelOnly>0</FirstChannelOnly>
<WordInfo>0</WordInfo>
<SentenceMaxLength>0</SentenceMaxLength>
<HotVocabularyTableId/>
</SpeechRecognition>
</Template>
</Response>
The data are as follows:
Node Name (Keyword)
Parent Node
Description
Type
Response
None.
Container for saving results
Container
Container node
Response
content:
Node Name (Keyword)
Parent Node
Description
Type
Template
Response
Container for storing template details
Container
RequestId
Response
unique request ID
String
Container node
Template
content:
Node Name (Keyword)
Parent Node
Description
Type
TemplateId
Response.Template
template ID
String
Name
Response.Template
Template Name
String
BucketId
Response.Template
template's bucket
String
Category
Response.Template
template property, Custom or Official
String
Tag
Response.Template
Template type: SpeechRecognition
String
UpdateTime
Response.Template
Update time
String
CreateTime
Response.Template
Creation time
String
SpeechRecognition
Response.Template
Request.SpeechRecognition in the request body
Container

Error Code

This request returns common error responses and error codes. For more information, see Error Codes.

Practical Case

Request

POST /template HTTP/1.1
Authorization: q-sign-algorithm=sha1&q-ak=************************************&q-sign-time=1497530202;1497610202&q-key-time=1497530202;1497610202&q-header-list=&q-url-param-list=&q-signature=****************************************
Host: test-1234567890.ci.ap-chongqing.myqcloud.com
Content-Length: 1666
Content-Type: application/xml

<Request>
<Tag>SpeechRecognition</Tag>
<Name>TemplateName</Name>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ChannelNum>1</ChannelNum>
<ResTextFormat>1</ResTextFormat>
<FilterDirty>0</FilterDirty>
<FilterModal>1</FilterModal>
<ConvertNumMode>0</ConvertNumMode>
<SpeakerDiarization>1</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
<SentenceMaxLength>0</SentenceMaxLength>
</SpeechRecognition>
</Request>

Response

HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 100
Connection: keep-alive
Date: Thu, 14 Jul 2022 12:37:29 GMT
Server: tencent-ci
x-ci-request-id: NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x

<Response>
<RequestId>NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x</RequestId>
<Template>
<TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
<Name>TemplateName</Name>
<Tag>SpeechRecognition</Tag>
<CreateTime>2020-08-05T11:35:24+0800</CreateTime>
<UpdateTime>2020-08-31T16:15:20+0800</UpdateTime>
<BucketId>test-1234567890</BucketId>
<Category>Custom</Category>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ChannelNum>1</ChannelNum>
<ResTextFormat>1</ResTextFormat>
<FilterDirty>0</FilterDirty>
<FilterModal>1</FilterModal>
<ConvertNumMode>0</ConvertNumMode>
<SpeakerDiarization>1</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
<FlashAsr>false</FlashAsr>
<FirstChannelOnly>0</FirstChannelOnly>
<WordInfo>0</WordInfo>
<SentenceMaxLength>0</SentenceMaxLength>
<HotVocabularyTableId/>
</SpeechRecognition>
</Template>
</Response>


Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback