tencent cloud

Submit a Task
Last updated: 2025-09-15 17:39:18
Submit a Task
Last updated: 2025-09-15 17:39:18

Feature Description

Submit an ASR task.

Authorization Description

When using a sub-account, add the ci:CreateAsrJobs permission to the action in the authorization policy. For all supported API operations in Cloud Infinite, see CI action.
When a sub-account uses an asynchronous processing interface, it needs to be granted the cam:passrole permission. The asynchronous processing interface performs COS read-write operations through cam roles. The passrole permission is used for role passing. For details, refer to Access Management > Write Operation > passrole API.

Service Activation

To use this feature, bind a bucket and enable Cloud Infinite service in advance.
To use this feature, enable Smart Audio Service in advance via the console or API.
Note:
Note: After binding Cloud Infinite (CI), if you manually perform the unbind operation on the bucket, you will not be able to continue using this function.

Use Limits

When using this API, please confirm the related use limits. For details, see Use Limits.

Fee Instructions

This API is a paid service. The generated costs will be collected by Cloud Infinite. For billing details, see smart audio fee.


Request

Request sample

POST /jobs HTTP/1.1
Host: <BucketName-APPID>.ci.<Region>.myqcloud.com
Date: <GMT Date>
Authorization: <Auth String>
Content-Length: <length>
Content-Type: application/xml

<body>
Note:
Authorization: Auth String. For details, see the Request Signature document.

Request header

This API only uses common request headers. For details, see Common Request Headers documentation.

Request body

The following figure shows the request bodies required for implementing this request operation.
<Request>
<Tag>SpeechRecognition</Tag>
<Input>
<Object>input/test.mp3</Object>
</Input>
<Operation>
<TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
<Output>
<Region>ap-chongqing</Region>
<Bucket>test-123456789</Bucket>
<Object>output/asr.txt</Object>
</Output>
<UserData>This is my data.</UserData>
<JobLevel>0</JobLevel>
</Operation>
<CallBack>http://callback.demo.com</CallBack>
<CallBackFormat>JSON</CallBackFormat>
</Request>
The data are described as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required or Not
Request
None.
Container for saving requests
Container
Yes
Container Type Request data description as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required or Not
Tag
Request
Create task Tag: SpeechRecognition
String
Yes
Input
Request
Object information to be operated
Container
Yes
Operation
Request
Operation rule
Container
Yes
CallBackFormat
Request
Job callback format, JSON or XML, default XML, priority is higher than queue callback format
String
No
CallBackType
Request
Job callback type, Url or TDMQ, default Url, priority is higher than queue callback type
String
No
CallBack
Request
Job callback address, priority is higher than queue callback address. When set to no, it means the queue callback address does not generate callbacks.
String
No
CallBackMqConfig
Request
Task callback TDMQ configuration, required when CallBackType is TDMQ. For details, see CallBackMqConfig
Container
No
Container Type
Input
data description as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required or Not
Object
Request.Input
File path
String
No
Container Type
Operation
data description as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required or Not
TemplateId
Request.Operation
ASR template ID. For details, see Creating ASR Templates
String
No
SpeechRecognition
Request.Operation
ASR parameters, same as Request.SpeechRecognition in the Create ASR Template API
Container
No
Output
Request.Operation
Output Configuration
Container
Yes
UserData
Request.Operation
Pass through user information, printable ASCII, length not exceeding 1024
String
No
JobLevel
Request.Operation
Task priority, level limit: 0, 1, 2. The larger the level, the higher the task priority. Default is 0.
String
No
Note:
Note: The ASR parameter must be set through TemplateId or SpeechRecognition, with TemplateId having higher priority.
Container Type
Output
data description as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required or Not
Region
Request.Operation.Output
Bucket Region
String
Yes
Bucket
Request.Operation.Output
Bucket for result storage
String
Yes
Object
Request.Operation.Output
File name of the result
String
Yes

Response

Response Headers

This API only returns the public response header. For details, see Common Response Headers documentation.

Response Body

The response body is returned as application/xml. An example including the complete node data is shown below:
<Response>
<JobsDetail>
<Code>Success</Code>
<CreationTime>2021-08-05T15:43:50+0800</CreationTime>
<EndTime>-</EndTime>
<Input>
<BucketId>test-1234567890</BucketId>
<Object>input/test.mp3</Object>
<Region>ap-chongqing</Region>
</Input>
<JobId>s58ccb634149211ed84ce2b1cd7fbb14a</JobId>
<Message/>
<Operation>
<Output>
<Bucket>test-1234567890</Bucket>
<Object>output/asr.txt</Object>
<Region>ap-chongqing</Region>
</Output>
<TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
<TemplateName>speech_demo</TemplateName>
<UserData>This is my data.</UserData>
<JobLevel>0</JobLevel>
</Operation>
<QueueId>pcd463e1467964d39ad2d3f66aacd8199</QueueId>
<QueueType>Speeching</QueueType>
<StartTime>-</StartTime>
<State>Submitted</State>
<Tag>SpeechRecognition</Tag>
</JobsDetail>
</Response>
The data are as follows:
Node Name (Keyword)
Parent Node
Description
Type
Response
None.
Container for saving results
Container
Container node Response content:
Node Name (Keyword)
Parent Node
Description
Type
JobsDetail
Response
Task Details
Container array
Container node
JobsDetail
content:
Node Name (Keyword)
Parent Node
Description
Type
Code
Response.JobsDetail
Error code, only meaningful when State is Failed
String
Message
Response.JobsDetail
Error description, only meaningful when State is Failed
String
JobId
Response.JobsDetail
ID of the newly created task
String
Tag
Response.JobsDetail
Tag of the newly created task: SpeechRecognition
String
State
Response.JobsDetail
Task Status
Submitted: submitted, pending execution
Running: executing
Success: execution successful
Failed: execution failed
Pause: task is paused. When the pause queue is triggered, to be executed tasks change to paused state.
Cancel: task cancelled
String
CreationTime
Response.JobsDetail
TaskTask creation time
String
StartTime
Response.JobsDetail
Task Start Time
String
EndTime
Response.JobsDetail
Task Closed At
String
QueueId
Response.JobsDetail
Task belonging to Queue ID
String
QueueType
Response.JobsDetail
Queue type of the task
String
Input
Response.JobsDetail
Input resource address of the task
Container
Operation
Response.JobsDetail
Rule of the task
Container
Content of the Container node Input
Node Name (Keyword)
Parent Node
Description
Type
Region
Response.JobsDetail.Input
Bucket Region
String
BucketId
Response.JobsDetail.Input
The bucket where the source file resides
String
Object
Response.JobsDetail.Input
Filename of the source file
String
Content of the Container node Operation
Node Name (Keyword)
Parent Node
Description
Type
TemplateId
Response.JobsDetail.Operation
Template ID of the task
String
TemplateName
Response.JobsDetail.Operation
Template name of the task, return when TemplateId exists
String
SpeechRecognition
Response.JobsDetail.Operation
Container
Output
Response.JobsDetail.Operation
Container
UserData
Response.JobsDetail.Operation
Pass through user information
String
JobLevel
Response.JobsDetail.Operation
Task priority.
String
SpeechRecognitionResult
Response.JobsDetail.Operation
ASR task result, do not return if none
Container
Contents of the Container node SpeechRecognitionResult:
Node Name (Keyword)
Parent Node
Description
Type
AudioTime
Response.JobsDetail.Operation.SpeechRecognitionResult
audio duration (seconds)
String
Result
Response.JobsDetail.Operation.SpeechRecognitionResult
SpeechRecognitionResult
String
FlashResult
Response.JobsDetail.Operation.SpeechRecognitionResult
Ultra-fast ASR result
Container array
ResultDetail
Response.JobsDetail.Operation.SpeechRecognitionResult
Recognition result details, including word time offsets for each sentence, generally used in subtitle generation scenarios. (This field is not null when ResTextFormat=1 in the speech recognition request.)
Note: This field may be null, indicating that no valid value can be obtained.
Container array
Container node FlashResult content:
Node Name (Keyword)
Parent Node
Description
Type
channel_id
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult
Sound channel flag, starting from 0, corresponds to the number of audio channels
Int
text
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult
Sound channel audio integrity recognition result
String
sentence_list
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult
Sentence/paragraph-level recognition result list
Container array
Content of the Container node sentence_list:
Node Name (Keyword)
Parent Node
Description
Type
text
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list
Sentence/paragraph-level text
String
start_time
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list
Start Time
Int
end_time
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list
End Time
Int
speaker_id
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list
speaker Id (if speaker_diarization is set up in request, distinguish speakers by speaker_Id)
Int
word_list
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list
Word-level recognition result list
Container array
Content of the Container node word_list:
Node Name (Keyword)
Parent Node
Description
Type
word
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list.word_list
word-level text
String
start_time
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list.word_list
Start Time
Int
end_time
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list.word_list
End Time
Int
Content of the Container node ResultDetail
Node Name (Keyword)
Parent Node
Description
Type
FinalSentence
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Final recognition result of a sentence
String
SliceSentence
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
intermediate recognition result of a sentence, split into multiple words using space
String
StartMs
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Start time of a sentence (ms)
String
EndMs
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
End time of a sentence (ms)
String
WordsNum
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Number of words in a sentence
String
SpeechSpeed
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Speaking rate of a sentence, unit: words/sec
String
SpeakerId
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Sound channel or speaker Id (if speaker_diarization is set up or ChannelNum is set to 2 (stereo), distinguish speakers or channels)
String
Words
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Word details in a sentence
Container array
Content of the Container node Words:
Node Name (Keyword)
Parent Node
Description
Type
Word
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail.Words
word text
String
OffsetStartMs
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail.Words
Start time offset in a sentence
String
OffsetEndMs
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail.Words
End time offset in a sentence
String

Error Code

This request returns common error responses and error codes. For more information, see Error Codes.

Practical Case

Request: Use ASR Template ID

POST /jobs HTTP/1.1
Authorization: q-sign-algorithm=sha1&q-ak=************************************&q-sign-time=1497530202;1497610202&q-key-time=1497530202;1497610202&q-header-list=&q-url-param-list=&q-signature=****************************************
Host: test-1234567890.ci.ap-beijing.myqcloud.com
Content-Length: 166
Content-Type: application/xml

<Request>
<Tag>SpeechRecognition</Tag>
<Input>
<Object>input/test.mp3</Object>
</Input>
<Operation>
<TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
<Output>
<Region>ap-chongqing</Region>
<Bucket>test-123456789</Bucket>
<Object>output/asr.txt</Object>
</Output>
<UserData>This is my data.</UserData>
<JobLevel>0</JobLevel>
</Operation>
<CallBack>http://callback.demo.com</CallBack>
<CallBackFormat>JSON</CallBackFormat>
</Request>

Response

HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 230
Connection: keep-alive
Date: Mon, 28 Jun 2022 15:23:12 GMT
Server: tencent-ci
x-ci-request-id: NTk0MjdmODlfMjQ4OGY3XzYzYzhf****

<Response>
<JobsDetail>
<Code>Success</Code>
<CreationTime>2021-08-05T15:43:50+0800</CreationTime>
<EndTime>-</EndTime>
<Input>
<BucketId>test-1234567890</BucketId>
<Object>input/test.mp3</Object>
<Region>ap-chongqing</Region>
</Input>
<JobId>s58ccb634149211ed84ce2b1cd7fbb14a</JobId>
<Message/>
<Operation>
<JobLevel>0</JobLevel>
<Output>
<Bucket>test-1234567890</Bucket>
<Object>output/asr.txt</Object>
<Region>ap-chongqing</Region>
</Output>
<TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
<TemplateName>speech_demo</TemplateName>
<UserData>This is my data.</UserData>
<JobLevel>0</JobLevel>
</Operation>
<QueueId>pcd463e1467964d39ad2d3f66aacd8199</QueueId>
<QueueType>Speeching</QueueType>
<StartTime>-</StartTime>
<State>Submitted</State>
<Tag>SpeechRecognition</Tag>
</JobsDetail>
</Response>

Request 2: Use ASR Parameters

POST /jobs HTTP/1.1
Authorization: q-sign-algorithm=sha1&q-ak=************************************&q-sign-time=1497530202;1497610202&q-key-time=1497530202;1497610202&q-header-list=&q-url-param-list=&q-signature=****************************************
Host: test-1234567890.ci.ap-beijing.myqcloud.com
Content-Length: 166
Content-Type: application/xml

<Request>
<Tag>SpeechRecognition</Tag>
<Input>
<Object>input/test.mp3</Object>
</Input>
<Operation>
<SpeechRecognition>
<EngineModelType>16k_zh_video</EngineModelType>
<ChannelNum>1</ChannelNum>
<FilterDirty>1</ChannelNum>
<FilterModal>1</ChannelNum>
</SpeechRecognition>
<Output>
<Region>ap-chongqing</Region>
<Bucket>test-123456789</Bucket>
<Object>output/asr.txt</Object>
</Output>
<UserData>This is my data.</UserData>
<JobLevel>0</JobLevel>
</Operation>
<CallBack>http://callback.demo.com</CallBack>
<CallBackFormat>JSON</CallBackFormat>
</Request>

Response

HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 230
Connection: keep-alive
Date: Mon, 28 Jun 2022 15:23:12 GMT
Server: tencent-ci
x-ci-request-id: NTk0MjdmODlfMjQ4OGY3XzYzYzhf****


<Response>
<JobsDetail>
<Code>Success</Code>
<CreationTime>2021-08-05T15:43:50+0800</CreationTime>
<EndTime>-</EndTime>
<Input>
<BucketId>test-1234567890</BucketId>
<Object>input/test.mp3</Object>
<Region>ap-chongqing</Region>
</Input>
<JobId>s58ccb634149211ed84ce2b1cd7fbb14a</JobId>
<Message/>
<Operation>
<Output>
<Bucket>test-1234567890</Bucket>
<Object>output/asr.txt</Object>
<Region>ap-chongqing</Region>
</Output>
<SpeechRecognition>
<ChannelNum>1</ChannelNum>
<ConvertNumMode>0</ConvertNumMode>
<EngineModelType>16k_zh_video</EngineModelType>
<FilterDirty>0</FilterDirty>
<FilterModal>0</FilterModal>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
<ResTextFormat>0</ResTextFormat>
<SpeakerDiarization>0</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
</SpeechRecognition>
<UserData>This is my data.</UserData>
<JobLevel>0</JobLevel>
</Operation>
<QueueId>pcd463e1467964d39ad2d3f66aacd8199</QueueId>
<QueueType>Speeching</QueueType>
<StartTime>-</StartTime>
<State>Submitted</State>
<Tag>SpeechRecognition</Tag>
</JobsDetail>
</Response>

Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback