tencent cloud

Cloud Object Storage

最新情報とお知らせ
製品アップデート情報
製品のお知らせ
製品概要
製品概要
機能概要
応用シナリオ
製品の優位性
基本概念
リージョンとアクセスドメイン名
仕様と制限
製品の課金
課金概要
課金方式
課金項目
無料利用枠
記帳例
請求書の確認とダウンロード
お支払い遅れについて
よくある質問
クイックスタート
コンソールクイックスタート
COSBrowserクイックスタート
ユーザーガイド
リクエストの作成
バケット
オブジェクト
データ管理
バッチ処理
グローバルアクセラレーション
監視とアラーム
運用管理センター
データ処理
インテリジェントツールボックス使用ガイド
データワークフロー
アプリ統合
ツールガイド
ツール概要
環境のインストールと設定
COSBrowserツール
COSCLIツール
COSCMDツール
COS Migrationツール
FTP Serverツール
Hadoopツール
COSDistCpツール
HDFS TO COSツール
オンラインツール (Onrain Tsūru)
セルフ診断ツール
実践チュートリアル
概要
アクセス制御と権限管理
パフォーマンスの最適化
AWS S3 SDKを使用したCOSアクセス
データディザスタリカバリバックアップ
ドメイン名管理の実践
画像処理の実践
COSオーディオビデオプレーヤーの実践
データセキュリティ
データ検証
COSコスト最適化ソリューション
サードパーティアプリケーションでのCOSの使用
移行ガイド
サードパーティクラウドストレージのデータをCOSへ移行
データレークストレージ
クラウドネイティブデータレイク
メタデータアクセラレーション
データアクセラレーター GooseFS
データ処理
データ処理概要
画像処理
メディア処理
コンテンツ審査
ファイル処理
ドキュメントプレビュー
トラブルシューティング
RequestId取得の操作ガイド
パブリックネットワーク経由でのCOSへのファイルアップロード速度の遅さ
COSへのアクセス時に403エラーコードが返される
リソースアクセス異常
POST Objectの一般的な異常
セキュリティとコンプライアンス
データ災害復帰
データセキュリティ
クラウドアクセスマネジメント
よくある質問
よくあるご質問
一般的な問題
従量課金に関するご質問
ドメインコンプライアンスに関するご質問
バケット設定に関する質問
ドメイン名とCDNに関するご質問
ファイル操作に関するご質問
権限管理に関するご質問
データ処理に関するご質問
データセキュリティに関するご質問
署名付きURLに関するご質問
SDKクラスに関するご質問
ツール類に関するご質問
APIクラスに関するご質問
Agreements
Service Level Agreement
プライバシーポリシー
データ処理とセキュリティ契約
連絡先
用語集
ドキュメントCloud Object Storage

Creating Templates

フォーカスモード
フォントサイズ
最終更新日: 2026-01-12 22:36:56

Feature Description

Create an Automatic Speech Recognition (ASR) template.

Authorization Description

When using a sub-account, you need to add the ci:CreateMediaTemplate permission to the action in the authorization policy. For all operation APIs supported by Cloud Infinite (CI), please refer to CI action.

Service Activation

To use this feature, you need to bind a bucket in advance and enable Cloud Infinite service.
To use this feature, you need to enable Smart Audio Service in advance via the console or API.
Note:
Note: After binding Cloud Infinite (CI), if you manually perform the unbind operation on the bucket, you will not be able to continue using this feature.

Use Limits

When using this API, please confirm the relevant restrictions. For details, see Usage Limits.


Request

Request sample

POST /template HTTP/1.1
Host: <BucketName-APPID>.ci.<Region>.myqcloud.com
Date: <GMT Date>
Authorization: <Auth String>
Content-Length: <length>
Content-Type: application/xml

<body>
Note:
Authorization: A request header that carries authentication information to verify the legitimacy of the request. For details, see the Request Signature document.

Request header

This API only uses common request headers. For details, see Common Request Headers documentation.

Request body

The following figure shows the request bodies required for implementing this request operation.
<Request>
<Tag>SpeechRecognition</Tag>
<Name>TemplateName</Name>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ChannelNum>1</ChannelNum>
<ResTextFormat>1</ResTextFormat>
<FilterDirty>0</FilterDirty>
<FilterModal>1</FilterModal>
<ConvertNumMode>0</ConvertNumMode>
<SpeakerDiarization>1</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
</SpeechRecognition>
</Request>
The detailed data is described as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required or Not
Request
None.
Container for saving requests
Container
Yes
Container type
Request
data description as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required or Not
Tag
Request
Template type: SpeechRecognition
String
Yes
Name
Request
Template name, supporting only Chinese, English, digits, _, -, and *, with a length not exceeding 64.
String
Yes
SpeechRecognition
Request
speech recognition parameter
Container
Yes
Container type
SpeechRecognition
data description as follows:
Node Name (Keyword)
Parent Node
Description
Type
Default Value
Required or Not
FlashAsr
Request.
SpeechRecognition
Enable ultra-fast ASR, value true/false
String
false
No
EngineModelType
Request.
SpeechRecognition
Engine model type, divided into phone call scenario and non-phone call scenario.
phone call scenario
8k_zh: 8k phone call Mandarin (applicable to stereo audio)
8k_zh_s: 8k phone call Mandarin speaker separation (applicable only to mono-channel audio)
8k_en: 8k phone call English
non-phone call scenario
16k_zh: 16k Mandarin
16k_zh_video: 16k audio and video domain
16k_en: 16k English
16k_ca: 16k Cantonese
16k_ja: 16k Japanese
16k_zh_edu: Chinese education
16k_en_edu: Education in English
16k_zh_medical: Medical
16k_th: Thai
16k_zh_dialect: Multi-dialect, supports 23 dialects
Ultra-fast ASR supports 8k_zh, 16k_zh, 16k_en, 16k_zh_video, 16k_zh_dialect, 16k_ms (Malay), 16k_zh-PY (Chinese-English-Cantonese)
String
None.
Yes
ChannelNum
Request.
SpeechRecognition
Number of sound channels:
1 means mono. EngineModelType is only supported for mono in non-phone call scenarios.
2 means stereo (only supported for 8k_zh engine model, stereo should correspond to both callers).
Only supports non-ultra-fast ASR. This parameter is required for non-ultra-fast ASR.
String
None.
No
ResTextFormat
Request.
SpeechRecognition
Recognition result return format:
0: Recognition result text (with segment timestamp)
1: Word-level detailed recognition result, no punctuation, with speech speed value (Word Timestamp List, generally used to generate subtitle scene)
2: Word-level detailed recognition result (with punctuation and speech speed value)
3: Punctuation segmentation, with timestamp per segment, especially applicable to subtitle scene (includes word-level time, punctuation, speech speed value)
Only supports non-ultra-fast ASR
String
None.
No
FilterDirty
Request.
SpeechRecognition
Whether to filter profanity (Currently supports Mandarin engine)
0: Not filter profanity
1: Filter profanity
2: Replace profanity with *
String
0
No
FilterModal
Request.
SpeechRecognition
Whether to filter modal particles (Currently supports Mandarin engine)
0: Do not filter modal particles
1: Partial filtering
2: Strict filtering
String
0
No
ConvertNumMode
Request.
SpeechRecognition
Whether to perform intelligent conversion of Arabic numerals (Currently supports Mandarin engine)
0: Do not convert, directly output Chinese numbers
1: Intelligently convert to Arabic numerals based on the scenario
3: Enable math-related number conversion
Only supports non-ultra-fast ASR
String
0
No
SpeakerDiarization
Request.
SpeechRecognition
whether to enable speaker separation
0: Do not enable.
1: Enable (only supports 8k_zh, 16k_zh, 16k_zh_video, mono-channel audio)
For 8k phone call scenarios, recommend using dual-channel to distinguish both callers. Set ChannelNum=2, no need to enable speaker separation.
String
0
No
SpeakerNumber
Request.
SpeechRecognition
Number of speakers to be separated (speaker separation must be enabled), value ranges from 0 to 10.
0 indicates automatic separation (currently only supports ≤6 persons)
1-10 indicates the specified number of speakers to be separated
Only supports non-ultra-fast ASR
String
0
No
FilterPunc
Request.
SpeechRecognition
Whether to filter punctuation (Currently supports Mandarin engine)
0: Not filter.
1: Filter out sentence-ending punctuation
2: Filter out ALL punctuation
String
0
No
OutputFileType
Request.
SpeechRecognition
Output file type, selectable txt, srt
Ultra-fast ASR only supports txt
Non-ultra-fast Asr with ResTextFormat set to 3 only supports txt
String
txt
No
Format
Request.
SpeechRecognition
Ultra-fast ASR audio format, supports wav, pcm, ogg-opus, speex, silk, mp3, m4a, aac
Ultra-fast ASR requires this parameter
String
None.
No
FirstChannelOnly
Request.
SpeechRecognition
whether to recognize the first sound channel
Identify all sound channels
Recognize the first sound channel
Ultra-fast ASR only
String
1
No
WordInfo
Request.
SpeechRecognition
whether to display word-level timestamp
0: Do not display
1: Display, excluding punctuation timestamp
2: Display, including punctuation timestamp
Ultra-fast ASR only
String
0
No
SentenceMaxLength
Request.
SpeechRecognition
Maximum characters per punctuation, range of values: [6,40]
Default value 0 means disable this feature
This parameter can be used in subtitle generation to control the maximum number of characters in a single-line subtitle
When FlashAsr is false, the parameter is valid only when ResTextFormat is 3.
String
0
No

Response

Response Headers

This API only returns the public response header. For details, see Common Response Headers documentation.

Response Body

The response body is returned as application/xml. An example including the complete node data is shown below:
<Response>
<RequestId>NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x</RequestId>
<Template>
<Tag>SpeechRecognition</Tag>
<TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
<Name>TemplateName</Name>
<Tag>SpeechRecognition</Tag>
<CreateTime>2020-08-05T11:35:24+0800</CreateTime>
<UpdateTime>2020-08-31T16:15:20+0800</UpdateTime>
<BucketId>test-1234567890</BucketId>
<Category>Custom</Category>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ChannelNum>1</ChannelNum>
<ResTextFormat>1</ResTextFormat>
<FilterDirty>0</FilterDirty>
<FilterModal>1</FilterModal>
<ConvertNumMode>0</ConvertNumMode>
<SpeakerDiarization>1</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
<FlashAsr>false</FlashAsr>
<FirstChannelOnly>0</FirstChannelOnly>
<WordInfo>0</WordInfo>
<SentenceMaxLength>0</SentenceMaxLength>
<HotVocabularyTableId/>
</SpeechRecognition>
</Template>
</Response>
The data are as follows:
Node Name (Keyword)
Parent Node
Description
Type
Response
None.
Container for saving results
Container
Container node
Response
content:
Node Name (Keyword)
Parent Node
Description
Type
Template
Response
Container for storing template details
Container
RequestId
Response
unique request ID
String
Container node
Template
content:
Node Name (Keyword)
Parent Node
Description
Type
TemplateId
Response.Template
template ID
String
Name
Response.Template
Template Name
String
BucketId
Response.Template
template's bucket
String
Category
Response.Template
template property, Custom or Official
String
Tag
Response.Template
Template type: SpeechRecognition
String
UpdateTime
Response.Template
Update time
String
CreateTime
Response.Template
Creation time
String
SpeechRecognition
Response.Template
Request.SpeechRecognition in the request body
Container

Error Code

This request returns common error responses and error codes. For more information, see Error Codes.

Practical Case

Request

POST /template HTTP/1.1
Authorization: q-sign-algorithm=sha1&q-ak=************************************&q-sign-time=1497530202;1497610202&q-key-time=1497530202;1497610202&q-header-list=&q-url-param-list=&q-signature=****************************************
Host: test-1234567890.ci.ap-chongqing.myqcloud.com
Content-Length: 1666
Content-Type: application/xml

<Request>
<Tag>SpeechRecognition</Tag>
<Name>TemplateName</Name>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ChannelNum>1</ChannelNum>
<ResTextFormat>1</ResTextFormat>
<FilterDirty>0</FilterDirty>
<FilterModal>1</FilterModal>
<ConvertNumMode>0</ConvertNumMode>
<SpeakerDiarization>1</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
<SentenceMaxLength>0</SentenceMaxLength>
</SpeechRecognition>
</Request>

Response

HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 100
Connection: keep-alive
Date: Thu, 14 Jul 2022 12:37:29 GMT
Server: tencent-ci
x-ci-request-id: NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x

<Response>
<RequestId>NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x</RequestId>
<Template>
<TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
<Name>TemplateName</Name>
<Tag>SpeechRecognition</Tag>
<CreateTime>2020-08-05T11:35:24+0800</CreateTime>
<UpdateTime>2020-08-31T16:15:20+0800</UpdateTime>
<BucketId>test-1234567890</BucketId>
<Category>Custom</Category>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ChannelNum>1</ChannelNum>
<ResTextFormat>1</ResTextFormat>
<FilterDirty>0</FilterDirty>
<FilterModal>1</FilterModal>
<ConvertNumMode>0</ConvertNumMode>
<SpeakerDiarization>1</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
<FlashAsr>false</FlashAsr>
<FirstChannelOnly>0</FirstChannelOnly>
<WordInfo>0</WordInfo>
<SentenceMaxLength>0</SentenceMaxLength>
<HotVocabularyTableId/>
</SpeechRecognition>
</Template>
</Response>


ヘルプとサポート

この記事はお役に立ちましたか?

フィードバック