History
Introduction
API Category
Making API Requests
Region APIs
Instance APIs
Cloud Hosting Cluster APIs
Image APIs
Instance Launch Template APIs
Placement Group APIs
Key APIs
Security Group APIs
Network APIs
Data Types
Error Codes
ci:CreateMediaTemplate permission to the action in the authorization policy. For all operation APIs supported by Cloud Infinite (CI), please refer to CI action.POST /template HTTP/1.1Host: <BucketName-APPID>.ci.<Region>.myqcloud.comDate: <GMT Date>Authorization: <Auth String>Content-Length: <length>Content-Type: application/xml<body>
<Request><Tag>SpeechRecognition</Tag><Name>TemplateName</Name><SpeechRecognition><EngineModelType>16k_zh</EngineModelType><ChannelNum>1</ChannelNum><ResTextFormat>1</ResTextFormat><FilterDirty>0</FilterDirty><FilterModal>1</FilterModal><ConvertNumMode>0</ConvertNumMode><SpeakerDiarization>1</SpeakerDiarization><SpeakerNumber>0</SpeakerNumber><FilterPunc>0</FilterPunc><OutputFileType>txt</OutputFileType></SpeechRecognition></Request>
Node Name (Keyword) | Parent Node | Description | Type | Required or Not |
Request | None. | Container for saving requests | Container | Yes |
Node Name (Keyword) | Parent Node | Description | Type | Required or Not |
Tag | Request | Template type: SpeechRecognition | String | Yes |
Name | Request | Template name, supporting only Chinese, English, digits, _, -, and *, with a length not exceeding 64. | String | Yes |
SpeechRecognition | Request | speech recognition parameter | Container | Yes |
Node Name (Keyword) | Parent Node | Description | Type | Default Value | Required or Not |
FlashAsr | Request. SpeechRecognition | Enable ultra-fast ASR, value true/false | String | false | No |
EngineModelType | Request. SpeechRecognition | Engine model type, divided into phone call scenario and non-phone call scenario. phone call scenario 8k_zh: 8k phone call Mandarin (applicable to stereo audio) 8k_zh_s: 8k phone call Mandarin speaker separation (applicable only to mono-channel audio) 8k_en: 8k phone call English non-phone call scenario 16k_zh: 16k Mandarin 16k_zh_video: 16k audio and video domain 16k_en: 16k English 16k_ca: 16k Cantonese 16k_ja: 16k Japanese 16k_zh_edu: Chinese education 16k_en_edu: Education in English 16k_zh_medical: Medical 16k_th: Thai 16k_zh_dialect: Multi-dialect, supports 23 dialects Ultra-fast ASR supports 8k_zh, 16k_zh, 16k_en, 16k_zh_video, 16k_zh_dialect, 16k_ms (Malay), 16k_zh-PY (Chinese-English-Cantonese) | String | None. | Yes |
ChannelNum | Request. SpeechRecognition | Number of sound channels: 1 means mono. EngineModelType is only supported for mono in non-phone call scenarios. 2 means stereo (only supported for 8k_zh engine model, stereo should correspond to both callers). Only supports non-ultra-fast ASR. This parameter is required for non-ultra-fast ASR. | String | None. | No |
ResTextFormat | Request. SpeechRecognition | Recognition result return format: 0: Recognition result text (with segment timestamp) 1: Word-level detailed recognition result, no punctuation, with speech speed value (Word Timestamp List, generally used to generate subtitle scene) 2: Word-level detailed recognition result (with punctuation and speech speed value) 3: Punctuation segmentation, with timestamp per segment, especially applicable to subtitle scene (includes word-level time, punctuation, speech speed value) Only supports non-ultra-fast ASR | String | None. | No |
FilterDirty | Request. SpeechRecognition | Whether to filter profanity (Currently supports Mandarin engine) 0: Not filter profanity 1: Filter profanity 2: Replace profanity with * | String | 0 | No |
FilterModal | Request. SpeechRecognition | Whether to filter modal particles (Currently supports Mandarin engine) 0: Do not filter modal particles 1: Partial filtering 2: Strict filtering | String | 0 | No |
ConvertNumMode | Request. SpeechRecognition | Whether to perform intelligent conversion of Arabic numerals (Currently supports Mandarin engine) 0: Do not convert, directly output Chinese numbers 1: Intelligently convert to Arabic numerals based on the scenario 3: Enable math-related number conversion Only supports non-ultra-fast ASR | String | 0 | No |
SpeakerDiarization | Request. SpeechRecognition | whether to enable speaker separation 0: Do not enable. 1: Enable (only supports 8k_zh, 16k_zh, 16k_zh_video, mono-channel audio) For 8k phone call scenarios, recommend using dual-channel to distinguish both callers. Set ChannelNum=2, no need to enable speaker separation. | String | 0 | No |
SpeakerNumber | Request. SpeechRecognition | Number of speakers to be separated (speaker separation must be enabled), value ranges from 0 to 10. 0 indicates automatic separation (currently only supports ≤6 persons) 1-10 indicates the specified number of speakers to be separated Only supports non-ultra-fast ASR | String | 0 | No |
FilterPunc | Request. SpeechRecognition | Whether to filter punctuation (Currently supports Mandarin engine) 0: Not filter. 1: Filter out sentence-ending punctuation 2: Filter out ALL punctuation | String | 0 | No |
OutputFileType | Request. SpeechRecognition | Output file type, selectable txt, srt Ultra-fast ASR only supports txt Non-ultra-fast Asr with ResTextFormat set to 3 only supports txt | String | txt | No |
Format | Request. SpeechRecognition | Ultra-fast ASR audio format, supports wav, pcm, ogg-opus, speex, silk, mp3, m4a, aac Ultra-fast ASR requires this parameter | String | None. | No |
FirstChannelOnly | Request. SpeechRecognition | whether to recognize the first sound channel Identify all sound channels Recognize the first sound channel Ultra-fast ASR only | String | 1 | No |
WordInfo | Request. SpeechRecognition | whether to display word-level timestamp 0: Do not display 1: Display, excluding punctuation timestamp 2: Display, including punctuation timestamp Ultra-fast ASR only | String | 0 | No |
SentenceMaxLength | Request. SpeechRecognition | Maximum characters per punctuation, range of values: [6,40] Default value 0 means disable this feature This parameter can be used in subtitle generation to control the maximum number of characters in a single-line subtitle When FlashAsr is false, the parameter is valid only when ResTextFormat is 3. | String | 0 | No |
<Response><RequestId>NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x</RequestId><Template><Tag>SpeechRecognition</Tag><TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId><Name>TemplateName</Name><Tag>SpeechRecognition</Tag><CreateTime>2020-08-05T11:35:24+0800</CreateTime><UpdateTime>2020-08-31T16:15:20+0800</UpdateTime><BucketId>test-1234567890</BucketId><Category>Custom</Category><SpeechRecognition><EngineModelType>16k_zh</EngineModelType><ChannelNum>1</ChannelNum><ResTextFormat>1</ResTextFormat><FilterDirty>0</FilterDirty><FilterModal>1</FilterModal><ConvertNumMode>0</ConvertNumMode><SpeakerDiarization>1</SpeakerDiarization><SpeakerNumber>0</SpeakerNumber><FilterPunc>0</FilterPunc><OutputFileType>txt</OutputFileType><FlashAsr>false</FlashAsr><FirstChannelOnly>0</FirstChannelOnly><WordInfo>0</WordInfo><SentenceMaxLength>0</SentenceMaxLength><HotVocabularyTableId/></SpeechRecognition></Template></Response>
Node Name (Keyword) | Parent Node | Description | Type |
Response | None. | Container for saving results | Container |
Node Name (Keyword) | Parent Node | Description | Type |
Template | Response | Container for storing template details | Container |
RequestId | Response | unique request ID | String |
Node Name (Keyword) | Parent Node | Description | Type |
TemplateId | Response.Template | template ID | String |
Name | Response.Template | Template Name | String |
BucketId | Response.Template | template's bucket | String |
Category | Response.Template | template property, Custom or Official | String |
Tag | Response.Template | Template type: SpeechRecognition | String |
UpdateTime | Response.Template | Update time | String |
CreateTime | Response.Template | Creation time | String |
SpeechRecognition | Response.Template | Container |
POST /template HTTP/1.1Authorization: q-sign-algorithm=sha1&q-ak=************************************&q-sign-time=1497530202;1497610202&q-key-time=1497530202;1497610202&q-header-list=&q-url-param-list=&q-signature=****************************************Host: test-1234567890.ci.ap-chongqing.myqcloud.comContent-Length: 1666Content-Type: application/xml<Request><Tag>SpeechRecognition</Tag><Name>TemplateName</Name><SpeechRecognition><EngineModelType>16k_zh</EngineModelType><ChannelNum>1</ChannelNum><ResTextFormat>1</ResTextFormat><FilterDirty>0</FilterDirty><FilterModal>1</FilterModal><ConvertNumMode>0</ConvertNumMode><SpeakerDiarization>1</SpeakerDiarization><SpeakerNumber>0</SpeakerNumber><FilterPunc>0</FilterPunc><OutputFileType>txt</OutputFileType><SentenceMaxLength>0</SentenceMaxLength></SpeechRecognition></Request>
HTTP/1.1 200 OKContent-Type: application/xmlContent-Length: 100Connection: keep-aliveDate: Thu, 14 Jul 2022 12:37:29 GMTServer: tencent-cix-ci-request-id: NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x<Response><RequestId>NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x</RequestId><Template><TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId><Name>TemplateName</Name><Tag>SpeechRecognition</Tag><CreateTime>2020-08-05T11:35:24+0800</CreateTime><UpdateTime>2020-08-31T16:15:20+0800</UpdateTime><BucketId>test-1234567890</BucketId><Category>Custom</Category><SpeechRecognition><EngineModelType>16k_zh</EngineModelType><ChannelNum>1</ChannelNum><ResTextFormat>1</ResTextFormat><FilterDirty>0</FilterDirty><FilterModal>1</FilterModal><ConvertNumMode>0</ConvertNumMode><SpeakerDiarization>1</SpeakerDiarization><SpeakerNumber>0</SpeakerNumber><FilterPunc>0</FilterPunc><OutputFileType>txt</OutputFileType><FlashAsr>false</FlashAsr><FirstChannelOnly>0</FirstChannelOnly><WordInfo>0</WordInfo><SentenceMaxLength>0</SentenceMaxLength><HotVocabularyTableId/></SpeechRecognition></Template></Response>
피드백