ci:CreateMediaTemplate permission to the action in the authorization policy. For all operation APIs supported by Cloud Infinite (CI), please refer to CI action.POST /template HTTP/1.1Host: <BucketName-APPID>.ci.<Region>.myqcloud.comDate: <GMT Date>Authorization: <Auth String>Content-Length: <length>Content-Type: application/xml<body>
<Request><Tag>SpeechRecognition</Tag><Name>TemplateName</Name><SpeechRecognition><EngineModelType>16k_zh</EngineModelType><ChannelNum>1</ChannelNum><ResTextFormat>1</ResTextFormat><FilterDirty>0</FilterDirty><FilterModal>1</FilterModal><ConvertNumMode>0</ConvertNumMode><SpeakerDiarization>1</SpeakerDiarization><SpeakerNumber>0</SpeakerNumber><FilterPunc>0</FilterPunc><OutputFileType>txt</OutputFileType></SpeechRecognition></Request>
Node Name (Keyword) | Parent Node | Description | Type | Required or Not |
Request | None. | Container for saving requests | Container | Yes |
Node Name (Keyword) | Parent Node | Description | Type | Required or Not |
Tag | Request | Template type: SpeechRecognition | String | Yes |
Name | Request | Template name, supporting only Chinese, English, digits, _, -, and *, with a length not exceeding 64. | String | Yes |
SpeechRecognition | Request | speech recognition parameter | Container | Yes |
Node Name (Keyword) | Parent Node | Description | Type | Default Value | Required or Not |
FlashAsr | Request. SpeechRecognition | Enable ultra-fast ASR, value true/false | String | false | No |
EngineModelType | Request. SpeechRecognition | Engine model type, divided into phone call scenario and non-phone call scenario. phone call scenario 8k_zh: 8k phone call Mandarin (applicable to stereo audio) 8k_zh_s: 8k phone call Mandarin speaker separation (applicable only to mono-channel audio) 8k_en: 8k phone call English non-phone call scenario 16k_zh: 16k Mandarin 16k_zh_video: 16k audio and video domain 16k_en: 16k English 16k_ca: 16k Cantonese 16k_ja: 16k Japanese 16k_zh_edu: Chinese education 16k_en_edu: Education in English 16k_zh_medical: Medical 16k_th: Thai 16k_zh_dialect: Multi-dialect, supports 23 dialects Ultra-fast ASR supports 8k_zh, 16k_zh, 16k_en, 16k_zh_video, 16k_zh_dialect, 16k_ms (Malay), 16k_zh-PY (Chinese-English-Cantonese) | String | None. | Yes |
ChannelNum | Request. SpeechRecognition | Number of sound channels: 1 means mono. EngineModelType is only supported for mono in non-phone call scenarios. 2 means stereo (only supported for 8k_zh engine model, stereo should correspond to both callers). Only supports non-ultra-fast ASR. This parameter is required for non-ultra-fast ASR. | String | None. | No |
ResTextFormat | Request. SpeechRecognition | Recognition result return format: 0: Recognition result text (with segment timestamp) 1: Word-level detailed recognition result, no punctuation, with speech speed value (Word Timestamp List, generally used to generate subtitle scene) 2: Word-level detailed recognition result (with punctuation and speech speed value) 3: Punctuation segmentation, with timestamp per segment, especially applicable to subtitle scene (includes word-level time, punctuation, speech speed value) Only supports non-ultra-fast ASR | String | None. | No |
FilterDirty | Request. SpeechRecognition | Whether to filter profanity (Currently supports Mandarin engine) 0: Not filter profanity 1: Filter profanity 2: Replace profanity with * | String | 0 | No |
FilterModal | Request. SpeechRecognition | Whether to filter modal particles (Currently supports Mandarin engine) 0: Do not filter modal particles 1: Partial filtering 2: Strict filtering | String | 0 | No |
ConvertNumMode | Request. SpeechRecognition | Whether to perform intelligent conversion of Arabic numerals (Currently supports Mandarin engine) 0: Do not convert, directly output Chinese numbers 1: Intelligently convert to Arabic numerals based on the scenario 3: Enable math-related number conversion Only supports non-ultra-fast ASR | String | 0 | No |
SpeakerDiarization | Request. SpeechRecognition | whether to enable speaker separation 0: Do not enable. 1: Enable (only supports 8k_zh, 16k_zh, 16k_zh_video, mono-channel audio) For 8k phone call scenarios, recommend using dual-channel to distinguish both callers. Set ChannelNum=2, no need to enable speaker separation. | String | 0 | No |
SpeakerNumber | Request. SpeechRecognition | Number of speakers to be separated (speaker separation must be enabled), value ranges from 0 to 10. 0 indicates automatic separation (currently only supports ≤6 persons) 1-10 indicates the specified number of speakers to be separated Only supports non-ultra-fast ASR | String | 0 | No |
FilterPunc | Request. SpeechRecognition | Whether to filter punctuation (Currently supports Mandarin engine) 0: Not filter. 1: Filter out sentence-ending punctuation 2: Filter out ALL punctuation | String | 0 | No |
OutputFileType | Request. SpeechRecognition | Output file type, selectable txt, srt Ultra-fast ASR only supports txt Non-ultra-fast Asr with ResTextFormat set to 3 only supports txt | String | txt | No |
Format | Request. SpeechRecognition | Ultra-fast ASR audio format, supports wav, pcm, ogg-opus, speex, silk, mp3, m4a, aac Ultra-fast ASR requires this parameter | String | None. | No |
FirstChannelOnly | Request. SpeechRecognition | whether to recognize the first sound channel Identify all sound channels Recognize the first sound channel Ultra-fast ASR only | String | 1 | No |
WordInfo | Request. SpeechRecognition | whether to display word-level timestamp 0: Do not display 1: Display, excluding punctuation timestamp 2: Display, including punctuation timestamp Ultra-fast ASR only | String | 0 | No |
SentenceMaxLength | Request. SpeechRecognition | Maximum characters per punctuation, range of values: [6,40] Default value 0 means disable this feature This parameter can be used in subtitle generation to control the maximum number of characters in a single-line subtitle When FlashAsr is false, the parameter is valid only when ResTextFormat is 3. | String | 0 | No |
<Response><RequestId>NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x</RequestId><Template><Tag>SpeechRecognition</Tag><TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId><Name>TemplateName</Name><Tag>SpeechRecognition</Tag><CreateTime>2020-08-05T11:35:24+0800</CreateTime><UpdateTime>2020-08-31T16:15:20+0800</UpdateTime><BucketId>test-1234567890</BucketId><Category>Custom</Category><SpeechRecognition><EngineModelType>16k_zh</EngineModelType><ChannelNum>1</ChannelNum><ResTextFormat>1</ResTextFormat><FilterDirty>0</FilterDirty><FilterModal>1</FilterModal><ConvertNumMode>0</ConvertNumMode><SpeakerDiarization>1</SpeakerDiarization><SpeakerNumber>0</SpeakerNumber><FilterPunc>0</FilterPunc><OutputFileType>txt</OutputFileType><FlashAsr>false</FlashAsr><FirstChannelOnly>0</FirstChannelOnly><WordInfo>0</WordInfo><SentenceMaxLength>0</SentenceMaxLength><HotVocabularyTableId/></SpeechRecognition></Template></Response>
Node Name (Keyword) | Parent Node | Description | Type |
Response | None. | Container for saving results | Container |
Node Name (Keyword) | Parent Node | Description | Type |
Template | Response | Container for storing template details | Container |
RequestId | Response | unique request ID | String |
Node Name (Keyword) | Parent Node | Description | Type |
TemplateId | Response.Template | template ID | String |
Name | Response.Template | Template Name | String |
BucketId | Response.Template | template's bucket | String |
Category | Response.Template | template property, Custom or Official | String |
Tag | Response.Template | Template type: SpeechRecognition | String |
UpdateTime | Response.Template | Update time | String |
CreateTime | Response.Template | Creation time | String |
SpeechRecognition | Response.Template | Container |
POST /template HTTP/1.1Authorization: q-sign-algorithm=sha1&q-ak=************************************&q-sign-time=1497530202;1497610202&q-key-time=1497530202;1497610202&q-header-list=&q-url-param-list=&q-signature=****************************************Host: test-1234567890.ci.ap-chongqing.myqcloud.comContent-Length: 1666Content-Type: application/xml<Request><Tag>SpeechRecognition</Tag><Name>TemplateName</Name><SpeechRecognition><EngineModelType>16k_zh</EngineModelType><ChannelNum>1</ChannelNum><ResTextFormat>1</ResTextFormat><FilterDirty>0</FilterDirty><FilterModal>1</FilterModal><ConvertNumMode>0</ConvertNumMode><SpeakerDiarization>1</SpeakerDiarization><SpeakerNumber>0</SpeakerNumber><FilterPunc>0</FilterPunc><OutputFileType>txt</OutputFileType><SentenceMaxLength>0</SentenceMaxLength></SpeechRecognition></Request>
HTTP/1.1 200 OKContent-Type: application/xmlContent-Length: 100Connection: keep-aliveDate: Thu, 14 Jul 2022 12:37:29 GMTServer: tencent-cix-ci-request-id: NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x<Response><RequestId>NjJmMWQxYjNfOTBmYTUwNjRfNWYyY18x</RequestId><Template><TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId><Name>TemplateName</Name><Tag>SpeechRecognition</Tag><CreateTime>2020-08-05T11:35:24+0800</CreateTime><UpdateTime>2020-08-31T16:15:20+0800</UpdateTime><BucketId>test-1234567890</BucketId><Category>Custom</Category><SpeechRecognition><EngineModelType>16k_zh</EngineModelType><ChannelNum>1</ChannelNum><ResTextFormat>1</ResTextFormat><FilterDirty>0</FilterDirty><FilterModal>1</FilterModal><ConvertNumMode>0</ConvertNumMode><SpeakerDiarization>1</SpeakerDiarization><SpeakerNumber>0</SpeakerNumber><FilterPunc>0</FilterPunc><OutputFileType>txt</OutputFileType><FlashAsr>false</FlashAsr><FirstChannelOnly>0</FirstChannelOnly><WordInfo>0</WordInfo><SentenceMaxLength>0</SentenceMaxLength><HotVocabularyTableId/></SpeechRecognition></Template></Response>
Feedback