最新情報とお知らせ

製品アップデート情報

製品のお知らせ

製品概要

機能概要

応用シナリオ

製品の優位性

基本概念

リージョンとアクセスドメイン名

仕様と制限

製品の課金

課金概要

課金方式

課金項目

無料利用枠

記帳例

請求書の確認とダウンロード

お支払い遅れについて

よくある質問

クイックスタート

コンソールクイックスタート

COSBrowserクイックスタート

ユーザーガイド

リクエストの作成

バケット

オブジェクト

データ管理

バッチ処理

グローバルアクセラレーション

監視とアラーム

運用管理センター

データ処理

インテリジェントツールボックス使用ガイド

データワークフロー

アプリ統合

ツールガイド

ツール概要

環境のインストールと設定

COSBrowserツール

COSCLIツール

COSCMDツール

COS Migrationツール

FTP Serverツール

Hadoopツール

COSDistCpツール

HDFS TO COSツール

オンラインツール (Onrain Tsūru)

セルフ診断ツール

実践チュートリアル

概要

アクセス制御と権限管理

パフォーマンスの最適化

AWS S3 SDKを使用したCOSアクセス

データディザスタリカバリバックアップ

ドメイン名管理の実践

画像処理の実践

COSオーディオビデオプレーヤーの実践

データセキュリティ

データ検証

COSコスト最適化ソリューション

サードパーティアプリケーションでのCOSの使用

移行ガイド

サードパーティクラウドストレージのデータをCOSへ移行

データレークストレージ

クラウドネイティブデータレイク

メタデータアクセラレーション

データアクセラレーター GooseFS

データ処理

データ処理概要

画像処理

メディア処理

コンテンツ審査

ファイル処理

ドキュメントプレビュー

トラブルシューティング

RequestId取得の操作ガイド

パブリックネットワーク経由でのCOSへのファイルアップロード速度の遅さ

COSへのアクセス時に403エラーコードが返される

リソースアクセス異常

POST Objectの一般的な異常

セキュリティとコンプライアンス

データ災害復帰

データセキュリティ

クラウドアクセスマネジメント

よくある質問

よくあるご質問

一般的な問題

従量課金に関するご質問

ドメインコンプライアンスに関するご質問

バケット設定に関する質問

ドメイン名とCDNに関するご質問

ファイル操作に関するご質問

権限管理に関するご質問

データ処理に関するご質問

データセキュリティに関するご質問

署名付きURLに関するご質問

SDKクラスに関するご質問

ツール類に関するご質問

APIクラスに関するご質問

Agreements

Service Level Agreement

プライバシーポリシー

データ処理とセキュリティ契約

連絡先

用語集

Creating Templates

フォーカスモード

フォントサイズ

最終更新日: 2026-01-12 22:36:56

Feature Description
Create an Automatic Speech Recognition (ASR) template.
Authorization Description
When using a sub-account, you need to add the ci:CreateMediaTemplate permission to the action in the authorization policy. For all operation APIs supported by Cloud Infinite (CI), please refer to CI action.
Service Activation
To use this feature, you need to bind a bucket in advance and enable Cloud Infinite service.
To use this feature, you need to enable Smart Audio Service in advance via the console or API.
Note:
Note: After binding Cloud Infinite (CI), if you manually perform the unbind operation on the bucket, you will not be able to continue using this feature.
Use Limits
When using this API, please confirm the relevant restrictions. For details, see Usage Limits.
﻿
Request
Request sample
POST /template HTTP/1.1
Host: <BucketName-APPID>.ci.<Region>.myqcloud.com
Date: <GMT Date>
Authorization: <Auth String>
Content-Length: <length>
Content-Type: application/xml
﻿
<body>
Note:
Authorization: A request header that carries authentication information to verify the legitimacy of the request. For details, see the Request Signature document.
Request header
This API only uses common request headers. For details, see Common Request Headers documentation.
Request body
The following figure shows the request bodies required for implementing this request operation.
<Request>
    <Tag>SpeechRecognition</Tag>
    <Name>TemplateName</Name>
    <SpeechRecognition>
        <EngineModelType>16k_zh</EngineModelType>
        <ChannelNum>1</ChannelNum>
        <ResTextFormat>1</ResTextFormat>
        <FilterDirty>0</FilterDirty>
        <FilterModal>1</FilterModal>
        <ConvertNumMode>0</ConvertNumMode>
        <SpeakerDiarization>1</SpeakerDiarization>
        <SpeakerNumber>0</SpeakerNumber>
        <FilterPunc>0</FilterPunc>
        <OutputFileType>txt</OutputFileType>
    </SpeechRecognition>
</Request>
The detailed data is described as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required or Not
Request
None.
Container for saving requests
Container
Yes
Container type Request
 data description as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required or Not
Tag
Request
Template type: SpeechRecognition
String
Yes
Name
Request
Template name, supporting only Chinese, English, digits, _, -, and *, with a length not exceeding 64.
String
Yes
SpeechRecognition
Request
speech recognition parameter
Container
Yes
Container type SpeechRecognition
 data description as follows:
Node Name (Keyword)
Parent Node
Description
Type
Default Value
Required or Not
FlashAsr
Request.
SpeechRecognition
Enable ultra-fast ASR, value true/false
String
false
No
EngineModelType
Request.
SpeechRecognition
Engine model type, divided into phone call scenario and non-phone call scenario.
phone call scenario
8k_zh: 8k phone call Mandarin (applicable to stereo audio)
8k_zh_s: 8k phone call Mandarin speaker separation (applicable only to mono-channel audio)
8k_en: 8k phone call English
 non-phone call scenario
16k_zh: 16k Mandarin
16k_zh_video: 16k audio and video domain
16k_en: 16k English
16k_ca: 16k Cantonese
16k_ja: 16k Japanese
16k_zh_edu: Chinese education
16k_en_edu: Education in English
16k_zh_medical: Medical
16k_th: Thai
16k_zh_dialect: Multi-dialect, supports 23 dialects
Ultra-fast ASR supports 8k_zh, 16k_zh, 16k_en, 16k_zh_video, 16k_zh_dialect, 16k_ms (Malay), 16k_zh-PY (Chinese-English-Cantonese)
String
None.
Yes
ChannelNum
Request.
SpeechRecognition
Number of sound channels:
1 means mono. EngineModelType is only supported for mono in non-phone call scenarios.
2 means stereo (only supported for 8k_zh engine model, stereo should correspond to both callers).
Only supports non-ultra-fast ASR. This parameter is required for non-ultra-fast ASR.
String
None.
No
ResTextFormat
Request.
SpeechRecognition
Recognition result return format:
0: Recognition result text (with segment timestamp)
1: Word-level detailed recognition result, no punctuation, with speech speed value (Word Timestamp List, generally used to generate subtitle scene)
2: Word-level detailed recognition result (with punctuation and speech speed value)
3: Punctuation segmentation, with timestamp per segment, especially applicable to subtitle scene (includes word-level time, punctuation, speech speed value)
Only supports non-ultra-fast ASR
String
None.
No
FilterDirty
Request.
SpeechRecognition
Whether to filter profanity (Currently supports Mandarin engine)
0: Not filter profanity
1: Filter profanity
2: Replace profanity with *
String
0
No
FilterModal
Request.
SpeechRecognition
Whether to filter modal particles (Currently supports Mandarin engine)
0: Do not filter modal particles
1: Partial filtering
2: Strict filtering 
String
0
No
ConvertNumMode
Request.
SpeechRecognition
Whether to perform intelligent conversion of Arabic numerals (Currently supports Mandarin engine)
0: Do not convert, directly output Chinese numbers
1: Intelligently convert to Arabic numerals based on the scenario
3: Enable math-related number conversion
Only supports non-ultra-fast ASR
String
0
No
SpeakerDiarization
Request.
SpeechRecognition
whether to enable speaker separation
0: Do not enable.
1: Enable (only supports 8k_zh, 16k_zh, 16k_zh_video, mono-channel audio)
For 8k phone call scenarios, recommend using dual-channel to distinguish both callers. Set ChannelNum=2, no need to enable speaker separation.
String
0
No
SpeakerNumber
Request.
SpeechRecognition
Number of speakers to be separated (speaker separation must be enabled), value ranges from 0 to 10.
0 indicates automatic separation (currently only supports ≤6 persons)
1-10 indicates the specified number of speakers to be separated
Only supports non-ultra-fast ASR
String
0
No
FilterPunc
Request.
SpeechRecognition
Whether to filter punctuation (Currently supports Mandarin engine)
0: Not filter.
1: Filter out sentence-ending punctuation
2: Filter out ALL punctuation
String
0
No
OutputFileType
Request.
SpeechRecognition
Output file type, selectable txt, srt
Ultra-fast ASR only supports txt
Non-ultra-fast Asr with ResTextFormat set to 3 only supports txt
String
txt
No
Format
Request.
SpeechRecognition
Ultra-fast ASR audio format, supports wav, pcm, ogg-opus, speex, silk, mp3, m4a, aac
Ultra-fast ASR requires this parameter
String
None.
No
FirstChannelOnly
Request.
SpeechRecognition
whether to recognize the first sound channel
Identify all sound channels
Recognize the first sound channel
Ultra-fast ASR only
String
1
No
WordInfo
Request.
SpeechRecognition
whether to display word-level timestamp
0: Do not display
1: Display, excluding punctuation timestamp
2: Display, including punctuation timestamp
Ultra-fast ASR only
String
0
No
SentenceMaxLength
Request.
SpeechRecognition
Maximum characters per punctuation, range of values: [6,40]
Default value 0 means disable this feature
This parameter can be used in subtitle generation to control the maximum number of characters in a single-line subtitle
When FlashAsr is false, the parameter is valid only when ResTextFormat is 3.
String
0
No
Response
Response Headers
This API only returns the public response header. For details, see Common Response Headers documentation.
Response Body
The response body is returned as application/xml. An example including the complete node data is shown below:
<Response>
    <RequestId>NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x</RequestId>
    <Template>
        <Tag>SpeechRecognition</Tag>
        <TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
        <Name>TemplateName</Name>
        <Tag>SpeechRecognition</Tag>
        <CreateTime>2020-08-05T11:35:24+0800</CreateTime>
        <UpdateTime>2020-08-31T16:15:20+0800</UpdateTime>
        <BucketId>test-1234567890</BucketId>
        <Category>Custom</Category>
        <SpeechRecognition>
            <EngineModelType>16k_zh</EngineModelType>
            <ChannelNum>1</ChannelNum>
            <ResTextFormat>1</ResTextFormat>
            <FilterDirty>0</FilterDirty>
            <FilterModal>1</FilterModal>
            <ConvertNumMode>0</ConvertNumMode>
            <SpeakerDiarization>1</SpeakerDiarization>
            <SpeakerNumber>0</SpeakerNumber>
            <FilterPunc>0</FilterPunc>
            <OutputFileType>txt</OutputFileType>
            <FlashAsr>false</FlashAsr>
            <FirstChannelOnly>0</FirstChannelOnly>
            <WordInfo>0</WordInfo>
            <SentenceMaxLength>0</SentenceMaxLength>
            <HotVocabularyTableId/>
        </SpeechRecognition>
    </Template>
</Response>
The data are as follows:
Node Name (Keyword)
Parent Node
Description
Type
Response
None.
Container for saving results
Container
Container node Response
 content:
Node Name (Keyword)
Parent Node
Description
Type
Template
Response
Container for storing template details
Container
RequestId
Response
unique request ID
String
Container node Template
 content:
Node Name (Keyword)
Parent Node
Description
Type
TemplateId
Response.Template
template ID
String
Name
Response.Template
Template Name
String
BucketId
Response.Template
template's bucket
String
Category
Response.Template
template property, Custom or Official
String
Tag
Response.Template
Template type: SpeechRecognition
String
UpdateTime
Response.Template
Update time
String
CreateTime
Response.Template
Creation time
String
SpeechRecognition
Response.Template
﻿Request.SpeechRecognition in the request body
Container
Error Code
This request returns common error responses and error codes. For more information, see Error Codes.
Practical Case
Request
POST /template HTTP/1.1
Authorization: q-sign-algorithm=sha1&q-ak=************************************&q-sign-time=1497530202;1497610202&q-key-time=1497530202;1497610202&q-header-list=&q-url-param-list=&q-signature=****************************************
Host: test-1234567890.ci.ap-chongqing.myqcloud.com
Content-Length: 1666
Content-Type: application/xml
﻿
<Request>
    <Tag>SpeechRecognition</Tag>
    <Name>TemplateName</Name>
    <SpeechRecognition>
        <EngineModelType>16k_zh</EngineModelType>
        <ChannelNum>1</ChannelNum>
        <ResTextFormat>1</ResTextFormat>
        <FilterDirty>0</FilterDirty>
        <FilterModal>1</FilterModal>
        <ConvertNumMode>0</ConvertNumMode>
        <SpeakerDiarization>1</SpeakerDiarization>
        <SpeakerNumber>0</SpeakerNumber>
        <FilterPunc>0</FilterPunc>
        <OutputFileType>txt</OutputFileType>
        <SentenceMaxLength>0</SentenceMaxLength>
    </SpeechRecognition>
</Request>
Response
HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 100
Connection: keep-alive
Date: Thu, 14 Jul 2022 12:37:29 GMT
Server: tencent-ci
x-ci-request-id: NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x
﻿
<Response>
    <RequestId>NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x</RequestId>
    <Template>
        <TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
        <Name>TemplateName</Name>
        <Tag>SpeechRecognition</Tag>
        <CreateTime>2020-08-05T11:35:24+0800</CreateTime>
        <UpdateTime>2020-08-31T16:15:20+0800</UpdateTime>
        <BucketId>test-1234567890</BucketId>
        <Category>Custom</Category>
        <SpeechRecognition>
            <EngineModelType>16k_zh</EngineModelType>
            <ChannelNum>1</ChannelNum>
            <ResTextFormat>1</ResTextFormat>
            <FilterDirty>0</FilterDirty>
            <FilterModal>1</FilterModal>
            <ConvertNumMode>0</ConvertNumMode>
            <SpeakerDiarization>1</SpeakerDiarization>
            <SpeakerNumber>0</SpeakerNumber>
            <FilterPunc>0</FilterPunc>
            <OutputFileType>txt</OutputFileType>
            <FlashAsr>false</FlashAsr>
            <FirstChannelOnly>0</FirstChannelOnly>
            <WordInfo>0</WordInfo>
            <SentenceMaxLength>0</SentenceMaxLength>
            <HotVocabularyTableId/>
        </SpeechRecognition>
    </Template>
</Response>
﻿

ヘルプとサポート

この記事はお役に立ちましたか？

営業担当者にお問い合わせいただくかチケットを提出してサポートを求めることができます。

フィードバック

tencent cloud

Cloud Object Storage

Creating Templates

Feature Description

Authorization Description

Service Activation

Use Limits

Request

Request sample

Request header

Request body

Response

Response Headers

Response Body

Error Code

Practical Case

Request

Response

ヘルプとサポート

Node Name (Keyword)	Parent Node	Description	Type	Required or Not
Request	None.	Container for saving requests	Container	Yes