Video Production API - NoTraining Image Processing

Mode fokus

Ukuran font

Terakhir diperbarui: 2025-10-10 11:33:57

API Description
No need to train, a new video with lip-sync matching the input content can be generated based on a real-person photo material by entering text or audio.
The Audio and Video Production Progress Query API ultimately returns the final video. Currently, audio and video resources are only retained for 7 days. Please download them as soon as possible.
Calling Protocol
HTTPS + JSON
POST     /v2/ivh/videomaker/broadcastservice/phototovideonotrain
Header   Content-Type: application/json;charset=utf-8
Request Parameters
Parameters
Type
Mandatory
Description
RefPhotoUrl
string
Yes
Template image, format support jpg, jpeg, png, bmp, webp.
1. The file size must be within 10M.
2. The image unilateral resolution requirement is between 192 and 4096.
3. The image aspect ratio (width:height) is within the range of 1:2 to 2:1.
4. The image must feature a real person or realistic human cartoon face. Avoid uploading images with no face, incomplete face, unclear face, excessively large degree of deflection, or lips obstructed.
DriverType
string
Yes
Driver type. This field is required.
1. Text-driven. InputSsml field required.
2. OriginalVoice: Original voice audio-driven. InputAudioUrl field required.
InputAudioUrl
string
No
Audio URL for driving Digital Human. This field is required when DriverType is OriginalVoice.
Audio format requirements:
1. Duration ranges from [2, 60] seconds.
2. Supported formats: wav, mp3, wma, m4a, aac, ogg.
3. File size: no more than 20M.
InputSsml
string
No
Broadcast Text content supports SSML Tags. Refer to Digital Human SSML Markup Language Specification for supported tag types. Refer to the example for tag syntax. Content must not include line breaks. Symbols must be escaped. Upper limit is 300 words (Text-driven underlying layer converts to audio. If duration exceeds 60 seconds, the task will fail to create). No less than 4 words (counted as unicode characters). This field is required when DriverType is empty or Text.
SpeechParam
object
No
Define audio parameters. This field is required when DriverType is Text.
SpeechParam.Speed
float
No
The speech rate (1.0 is normal speed, range [0.5-1.5]. A value of 0.5 indicates the slowest speed and a value of 1.5 indicates the fastest speed. Speech rate control is not effective when DriverType is set to audio-driven). This field is required when DriverType is Text.
SpeechParam.TimbreKey
string
No
Voice type Key. This field is required when DriverType is Text.
SpeechParam.Volume
int
No
Volume level, ranging from 0 to 10. The default is 0, which represents normal volume. The higher the values, the louder the volume.
Note:
TimbreKey does not support audio volume adjustment for male_1-20 and female_1-23 (male voice 1-20, female voice 1-23).
SpeechParam.EmotionCategory
string
No
Controls the emotion of the synthesized audio, supported only for multi-emotion timbres. See the Personal Asset Management API Paginated Query Timbre List for available values.
SpeechParam.EmotionIntensity
int
No
Controls the intensity of the synthesized audio emotion, with a range of [50,200]. This is only effective when EmotionCategory is not empty.
SpeechParam.TimbreLanguage
string
No
Voice type language. See the Personal Asset Management API Paginated Query Timbre List for available languages. For multilingual voice types, the corresponding language must be selected during synthesis.
ConcurrencyType
string
No
Resource type used for video production tasks.
1. Exclusive: Use concurrent calls, no deduction from the hourly package. Purchase concurrency pack is required. If not purchased, task submission fails.
2. Shared: Calls deduct from the hourly package. Purchase hourly package is required. If not purchased, task submission fails.
3. Not specified: If you purchase concurrency or both concurrency and hourly package, it defaults to "Exclusive". If you do not purchase concurrency but purchase hourly package, it defaults to "Shared". If neither is purchased, task submission fails.
CallbackUrl
string
No
When users add a callback URL, video production results will be sent in fixed format via POST request to the URL address. The fixed format is described in Appendix II: Callback Request Body Format. Note:
1. Limit CallbackUrl length less than 1000.
2. Only one request will be sent. Regardless of the issue causing the request to fail, it cannot be resent.
VideoParam
object
No
Define related parameters for the output video. Use field default values when left blank.
VideoParam.EmotionLevel
int
No
Output video emotion intensity: selectable levels: 1, 2, 3; default value is 2. Larger values increase audio control intensity but may cause unnatural results.
Response Parameter
Parameters
Type
Mandatory
Description
TaskId
string
Yes
Video production task ID. Use the TaskId to access the Audio and Video Production Progress Query API to obtain production progress and production result.
Request Sample
Text-driven
{  
  "Header": {},  
  "Payload": {   
    "RefPhotoUrl": "http://virtualhuman-cos-test-1251316161.cos.ap-nanjing.myqcloud.com/ref_photo.jpg",   
    "DriverType": "Text",    
    "InputSsml": "Hello, I am the virtual <phoneme alphabet=\\"py\\" ph=\\"fu4\\">anchor</phoneme>",    
    "SpeechParam": {      
      "TimbreKey": "female_1",      
      "Volume": 1,      
      "Speed": 1.0    
    }  
  }
}
﻿
Audio-driven
{  
   "Header": {},  
   "Payload": {   
     "RefPhotoUrl": "http://virtualhuman-cos-test-1251316161.cos.ap-nanjing.myqcloud.com/ref_photo.jpg",    
     "DriverType": "OriginalVoice",    
     "InputAudioUrl": "http://virtualhuman-cos-test-1251316161.cos.ap-nanjing.myqcloud.com/audio.mp3"  
    }
}
Response Sample
{
    "Header": {
        "Code": 0,
        "DialogID": "",
        "Message": "",
        "RequestID": "fde854eaa981c7f2f7285d1c7eca335b",
        "SessionID": "gzb7dec22117297528294581119"
    },
    "Payload": {
        "TaskId": "81883d47c6154edf8e276531f09227b6"
    }
}
 
 

Bantuan dan Dukungan

Apakah halaman ini membantu?

Anda juga dapat Menghubungi Penjualan atau Mengirimkan Tiket untuk meminta bantuan.

masukan

tencent cloud

Tencent Cloud AI Digital Human

Video Production API - NoTraining Image Processing

API Description

Calling Protocol

Request Parameters

Response Parameter

Request Sample

Response Sample

Bantuan dan Dukungan

Parameters	Type	Mandatory	Description
RefPhotoUrl	string	Yes	Template image, format support jpg, jpeg, png, bmp, webp. 1. The file size must be within 10M. 2. The image unilateral resolution requirement is between 192 and 4096. 3. The image aspect ratio (width:height) is within the range of 1:2 to 2:1. 4. The image must feature a real person or realistic human cartoon face. Avoid uploading images with no face, incomplete face, unclear face, excessively large degree of deflection, or lips obstructed.
DriverType	string	Yes	Driver type. This field is required. 1. Text-driven. InputSsml field required. 2. OriginalVoice: Original voice audio-driven. InputAudioUrl field required.
InputAudioUrl	string	No	Audio URL for driving Digital Human. This field is required when DriverType is OriginalVoice. Audio format requirements: 1. Duration ranges from [2, 60] seconds. 2. Supported formats: wav, mp3, wma, m4a, aac, ogg. 3. File size: no more than 20M.
InputSsml	string	No	Broadcast Text content supports SSML Tags. Refer to Digital Human SSML Markup Language Specification for supported tag types. Refer to the example for tag syntax. Content must not include line breaks. Symbols must be escaped. Upper limit is 300 words (Text-driven underlying layer converts to audio. If duration exceeds 60 seconds, the task will fail to create). No less than 4 words (counted as unicode characters). This field is required when DriverType is empty or Text.
SpeechParam	object	No	Define audio parameters. This field is required when DriverType is Text.
SpeechParam.Speed	float	No	The speech rate (1.0 is normal speed, range [0.5-1.5]. A value of 0.5 indicates the slowest speed and a value of 1.5 indicates the fastest speed. Speech rate control is not effective when DriverType is set to audio-driven). This field is required when DriverType is Text.
SpeechParam.TimbreKey	string	No	Voice type Key. This field is required when DriverType is Text.
SpeechParam.Volume	int	No	Volume level, ranging from 0 to 10. The default is 0, which represents normal volume. The higher the values, the louder the volume. Note: TimbreKey does not support audio volume adjustment for male_1-20 and female_1-23 (male voice 1-20, female voice 1-23).
SpeechParam.EmotionCategory	string	No	Controls the emotion of the synthesized audio, supported only for multi-emotion timbres. See the Personal Asset Management API Paginated Query Timbre List for available values.
SpeechParam.EmotionIntensity	int	No	Controls the intensity of the synthesized audio emotion, with a range of [50,200]. This is only effective when EmotionCategory is not empty.
SpeechParam.TimbreLanguage	string	No	Voice type language. See the Personal Asset Management API Paginated Query Timbre List for available languages. For multilingual voice types, the corresponding language must be selected during synthesis.
ConcurrencyType	string	No	Resource type used for video production tasks. 1. Exclusive: Use concurrent calls, no deduction from the hourly package. Purchase concurrency pack is required. If not purchased, task submission fails. 2. Shared: Calls deduct from the hourly package. Purchase hourly package is required. If not purchased, task submission fails. 3. Not specified: If you purchase concurrency or both concurrency and hourly package, it defaults to "Exclusive". If you do not purchase concurrency but purchase hourly package, it defaults to "Shared". If neither is purchased, task submission fails.
CallbackUrl	string	No	When users add a callback URL, video production results will be sent in fixed format via POST request to the URL address. The fixed format is described in Appendix II: Callback Request Body Format. Note: 1. Limit CallbackUrl length less than 1000. 2. Only one request will be sent. Regardless of the issue causing the request to fail, it cannot be resent.
VideoParam	object	No	Define related parameters for the output video. Use field default values when left blank.
VideoParam.EmotionLevel	int	No	Output video emotion intensity: selectable levels: 1, 2, 3; default value is 2. Larger values increase audio control intensity but may cause unnatural results.