tencent cloud

Tencent Cloud AI Digital Human

Product Overview
Overview
Product Features
Product Advantage
Purchase Guide
Pricing Guide
Purchase Guide
Process for Purchasing with Vouchers
Refund Instructions
Introduction of Avatar
Introduction to Image Categories
Basic Image Library
Guide on Avatar and Voice Clone
Digital Human Platform Operation Guide
Accessing Platform
Avatar Production and Asset Management
Digital Human Conversation Interaction Application and Management
Broadcast Digital Human Video Generation and Management
Operations Management and Analysis
Server API Integration
Digital Human API Access Mode Overview
Avatar aPaas API Calling Methods
Avatar Image Customization and Voice Clone API Documentation
Video Generation Service API Documentation
Interactive Digital Human Service API Documentation
Personal Asset Management API Documentation
Client SDK Integration
Overall Introduction
3D Client-Side Rendering SDK Integration
2D Client-Side Rendering SDK Integration
Digital Human SSML Markup Language Specification
Related Agreement
Privacy Policy
DSA (Data Sharing Agreement)
FAQs

Voice-driven Instructions

PDF
Focus Mode
Font Size
Last updated: 2024-07-19 10:08:06
After you Create Long Connection Channel, you can use a websocket persistent connection to send audio to drive the digital human.

Request Parameters

Parameter name
Type
Required
Description
ReqId
String
Yes
A unique identifier for a single drive. Each segment of audio is assigned a UUID value.
SessionId
String
Yes
Unique identifier for the session.
Command
String
Yes
SEND_AUDIO; send the audio.
Data
Yes
Data Object
Data
Name
Type
Required
Description
Audio
string
Yes
The byte array of the original audio data, encoded into a string via Base64. Only supports: format-PCM, sampling rate-16kHz, sampling bit depth-16bits, audio track-mono.
Seq
int
Yes
Audio packet sequence number, which must start from 1.
IsFinal
bool
No
The default value is false.
Note:
1. If the data is being sent in real-time from a microphone, it can be sent every 160 ms (5120B) without any waiting interval. If the data is being sent from an offline audio file, the packet size should be 160 ms (5120B) with a 120 ms interval between packets.
2. The size of the last packet should be based on the actual remaining data (must be less than 160 ms).
3. After all data packets have been sent, an empty data packet with IsFinal=true (with the Audio field left empty) must be sent to signal the end of the audio session and return the Digital Human to a silent state.
4. The real-time rate of sending audio must be between [0.75, 1]. A rate lower than 0.75 will trigger throttling, while a rate higher than 1 will cause video stuttering. For example, for a 160 ms audio packet size, the sending interval must not be less than 120 ms or more than 160 ms.

Request Sample

{
"Header": {},
"Payload": {
"ReqId": "d7aa08da33dd4a662ad5be508c5b77cf",
"SessionId": "m123adfafvbadsafd",
"Command": "SEND_AUDIO",
"Data": {
"Audio": "The value of the audio binary data encoded in Base64",
"Seq": 0,
"IsFinal": false
}
}
}


Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback