tencent cloud

Creating Templates
Last updated: 2025-09-15 17:39:18
Creating Templates
Last updated: 2025-09-15 17:39:18

Feature Description

Create an Automatic Speech Recognition (ASR) template.

Authorization Description

When using a sub-account, you need to add the ci:CreateMediaTemplate permission to the action in the authorization policy. For all operation APIs supported by Cloud Infinite (CI), please refer to CI action.

Service Activation

To use this feature, you need to bind a bucket in advance and enable Cloud Infinite service.
To use this feature, you need to enable Smart Audio Service in advance via the console or API.
Note:
Note: After binding Cloud Infinite (CI), if you manually perform the unbind operation on the bucket, you will not be able to continue using this feature.

Use Limits

When using this API, please confirm the relevant restrictions. For details, see Usage Limits.


Request

Request sample

POST /template HTTP/1.1
Host: <BucketName-APPID>.ci.<Region>.myqcloud.com
Date: <GMT Date>
Authorization: <Auth String>
Content-Length: <length>
Content-Type: application/xml

<body>
Note:
**Note** Authorization: A request header that carries authentication information to verify the legitimacy of the request. For details, see the Request Signature document.

Request header

This API only uses common request headers. For details, see Common Request Headers documentation.

Request body

The following figure shows the request bodies required for implementing this request operation.
<Request>
<Tag>SpeechRecognition</Tag>
<Name>TemplateName</Name>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ChannelNum>1</ChannelNum>
<ResTextFormat>1</ResTextFormat>
<FilterDirty>0</FilterDirty>
<FilterModal>1</FilterModal>
<ConvertNumMode>0</ConvertNumMode>
<SpeakerDiarization>1</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
</SpeechRecognition>
</Request>
The detailed data is described as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required or Not
Request
None.
Container for saving requests
Container
Yes
Container type
Request
data description as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required or Not
Tag
Request
Template type: SpeechRecognition
String
Yes
Name
Request
Template name, supporting only Chinese, English, digits, _, -, and *, with a length not exceeding 64.
String
Yes
SpeechRecognition
Request
speech recognition parameter
Container
Yes
Container type
SpeechRecognition
data description as follows:
Node Name (Keyword)
Parent Node
Description
Type
Default Value
Required or Not
FlashAsr
Request.
SpeechRecognition
Enable ultra-fast ASR, value true/false
String
false
No
EngineModelType
Request.
SpeechRecognition
Engine model type, divided into phone call scenario and non-phone call scenario.
phone call scenario
8k_zh: 8k phone call Mandarin (applicable to stereo audio)
8k_zh_s: 8k phone call Mandarin speaker separation (applicable only to mono-channel audio)
8k_en: 8k phone call English
non-phone call scenario
16k_zh: 16k Mandarin
16k_zh_video: 16k audio and video domain
16k_en: 16k English
16k_ca: 16k Cantonese
16k_ja: 16k Japanese
16k_zh_edu: Chinese education
16k_en_edu: Education in English
16k_zh_medical: Medical
16k_th: Thai
16k_zh_dialect: Multi-dialect, supports 23 dialects
Ultra-fast ASR supports 8k_zh, 16k_zh, 16k_en, 16k_zh_video, 16k_zh_dialect, 16k_ms (Malay), 16k_zh-PY (Chinese-English-Cantonese)
String
None.
Yes
ChannelNum
Request.
SpeechRecognition
Number of sound channels:
1 means mono. EngineModelType is only supported for mono in non-phone call scenarios.
2 means stereo (only supported for 8k_zh engine model, stereo should correspond to both callers).
Only supports non-ultra-fast ASR. This parameter is required for non-ultra-fast ASR.
String
None.
No
ResTextFormat
Request.
SpeechRecognition
Recognition result return format:
0: Recognition result text (with segment timestamp)
1: Word-level detailed recognition result, no punctuation, with speech speed value (Word Timestamp List, generally used to generate subtitle scene)
2: Word-level detailed recognition result (with punctuation and speech speed value)
3: Punctuation segmentation, with timestamp per segment, especially applicable to subtitle scene (includes word-level time, punctuation, speech speed value)
Only supports non-ultra-fast ASR
String
None.
No
FilterDirty
Request.
SpeechRecognition
Whether to filter profanity (Currently supports Mandarin engine)
0: Not filter profanity
1: Filter profanity
2: Replace profanity with *
String
0
No
FilterModal
Request.
SpeechRecognition
Whether to filter modal particles (Currently supports Mandarin engine)
0: Do not filter modal particles
1: Partial filtering
2: Strict filtering
String
0
No
ConvertNumMode
Request.
SpeechRecognition
Whether to perform intelligent conversion of Arabic numerals (Currently supports Mandarin engine)
0: Do not convert, directly output Chinese numbers
1: Intelligently convert to Arabic numerals based on the scenario
3: Enable math-related number conversion
Only supports non-ultra-fast ASR
String
0
No
SpeakerDiarization
Request.
SpeechRecognition
whether to enable speaker separation
0: Do not enable.
1: Enable (only supports 8k_zh, 16k_zh, 16k_zh_video, mono-channel audio)
For 8k phone call scenarios, recommend using dual-channel to distinguish both callers. Set ChannelNum=2, no need to enable speaker separation.
String
0
No
SpeakerNumber
Request.
SpeechRecognition
Number of speakers to be separated (speaker separation must be enabled), value ranges from 0 to 10.
0 indicates automatic separation (currently only supports ≤6 persons)
1-10 indicates the specified number of speakers to be separated
Only supports non-ultra-fast ASR
String
0
No
FilterPunc
Request.
SpeechRecognition
Whether to filter punctuation (Currently supports Mandarin engine)
0: Not filter.
1: Filter out sentence-ending punctuation
2: Filter out ALL punctuation
String
0
No
OutputFileType
Request.
SpeechRecognition
Output file type, selectable txt, srt
Ultra-fast ASR only supports txt
Non-ultra-fast Asr with ResTextFormat set to 3 only supports txt
String
txt
No
Format
Request.
SpeechRecognition
Ultra-fast ASR audio format, supports wav, pcm, ogg-opus, speex, silk, mp3, m4a, aac
Ultra-fast ASR requires this parameter
String
None.
No
FirstChannelOnly
Request.
SpeechRecognition
whether to recognize the first sound channel
Identify all sound channels
Recognize the first sound channel
Ultra-fast ASR only
String
1
No
WordInfo
Request.
SpeechRecognition
whether to display word-level timestamp
0: Do not display
1: Display, excluding punctuation timestamp
2: Display, including punctuation timestamp
Ultra-fast ASR only
String
0
No
SentenceMaxLength
Request.
SpeechRecognition
Maximum characters per punctuation, range of values: [6,40]
Default value 0 means disable this feature
This parameter can be used in subtitle generation to control the maximum number of characters in a single-line subtitle
When FlashAsr is false, the parameter is valid only when ResTextFormat is 3.
String
0
No

Response

Response Headers

This API only returns the public response header. For details, see Common Response Headers documentation.

Response Body

The response body is returned as application/xml. An example including the complete node data is shown below:
<Response>
<RequestId>NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x</RequestId>
<Template>
<Tag>SpeechRecognition</Tag>
<TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
<Name>TemplateName</Name>
<Tag>SpeechRecognition</Tag>
<CreateTime>2020-08-05T11:35:24+0800</CreateTime>
<UpdateTime>2020-08-31T16:15:20+0800</UpdateTime>
<BucketId>test-1234567890</BucketId>
<Category>Custom</Category>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ChannelNum>1</ChannelNum>
<ResTextFormat>1</ResTextFormat>
<FilterDirty>0</FilterDirty>
<FilterModal>1</FilterModal>
<ConvertNumMode>0</ConvertNumMode>
<SpeakerDiarization>1</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
<FlashAsr>false</FlashAsr>
<FirstChannelOnly>0</FirstChannelOnly>
<WordInfo>0</WordInfo>
<SentenceMaxLength>0</SentenceMaxLength>
<HotVocabularyTableId/>
</SpeechRecognition>
</Template>
</Response>
The data are as follows:
Node Name (Keyword)
Parent Node
Description
Type
Response
None.
Container for saving results
Container
Container node
Response
content:
Node Name (Keyword)
Parent Node
Description
Type
Template
Response
Container for storing template details
Container
RequestId
Response
unique request ID
String
Container node
Template
content:
Node Name (Keyword)
Parent Node
Description
Type
TemplateId
Response.Template
template ID
String
Name
Response.Template
Template Name
String
BucketId
Response.Template
template's bucket
String
Category
Response.Template
template property, Custom or Official
String
Tag
Response.Template
Template type: SpeechRecognition
String
UpdateTime
Response.Template
Update time
String
CreateTime
Response.Template
Creation time
String
SpeechRecognition
Response.Template
Request.SpeechRecognition in the request body
Container

Error Code

This request returns common error responses and error codes. For more information, see Error Codes.

Practical Case

Request

POST /template HTTP/1.1
Authorization: q-sign-algorithm=sha1&q-ak=************************************&q-sign-time=1497530202;1497610202&q-key-time=1497530202;1497610202&q-header-list=&q-url-param-list=&q-signature=****************************************
Host: test-1234567890.ci.ap-chongqing.myqcloud.com
Content-Length: 1666
Content-Type: application/xml

<Request>
<Tag>SpeechRecognition</Tag>
<Name>TemplateName</Name>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ChannelNum>1</ChannelNum>
<ResTextFormat>1</ResTextFormat>
<FilterDirty>0</FilterDirty>
<FilterModal>1</FilterModal>
<ConvertNumMode>0</ConvertNumMode>
<SpeakerDiarization>1</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
<SentenceMaxLength>0</SentenceMaxLength>
</SpeechRecognition>
</Request>

Response

HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 100
Connection: keep-alive
Date: Thu, 14 Jul 2022 12:37:29 GMT
Server: tencent-ci
x-ci-request-id: NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x

<Response>
<RequestId>NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x</RequestId>
<Template>
<TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
<Name>TemplateName</Name>
<Tag>SpeechRecognition</Tag>
<CreateTime>2020-08-05T11:35:24+0800</CreateTime>
<UpdateTime>2020-08-31T16:15:20+0800</UpdateTime>
<BucketId>test-1234567890</BucketId>
<Category>Custom</Category>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ChannelNum>1</ChannelNum>
<ResTextFormat>1</ResTextFormat>
<FilterDirty>0</FilterDirty>
<FilterModal>1</FilterModal>
<ConvertNumMode>0</ConvertNumMode>
<SpeakerDiarization>1</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
<FlashAsr>false</FlashAsr>
<FirstChannelOnly>0</FirstChannelOnly>
<WordInfo>0</WordInfo>
<SentenceMaxLength>0</SentenceMaxLength>
<HotVocabularyTableId/>
</SpeechRecognition>
</Template>
</Response>

Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback