tencent cloud

Tencent Cloud AI Digital Human

Best Practices demo

Download
Focus Mode
Font Size
Last updated: 2026-04-24 20:32:21
Note:
Before using this tutorial, you need to download the Demo (Python).

1. Overview

This Demo provides a complete Tencent Cloud Intelligent Digital Human (TCADH) interaction solution, supporting driving Digital Humans via text or audio, with real-time rendering of Digital Human video streams on H5 pages.

Core Capabilities

Capability
Description
Two methods of stream creation
AssetVirtualmanKey (Asset ID), VirtualmanProjectId (Digital Human Project ID)
Three types of streaming protocols
RTMP,TRTC,WebRTC
Two drive modes
text-driven (input text to make Digital Human speak), audio-driven (upload audio files to drive lip-sync movements)
H5 video playback
TRTC and WebRTC protocols automatically pop up the browser playback page.

File list

├── tencent_virtual_human_completeV1.py # Main script (Python)
├── trtc_player.html # TRTC protocol H5 playback page
├── webrtc_player.html # WebRTC protocol H5 playback page
└── TcPlayer-2.4.5.js # TCPlayerLite SDK (WebRTC playback dependency)

2. Environment Preparation

2.1 Python dependencies

Warning:
Use Python version 3.11, as some dependencies are unavailable or have been removed in higher versions. It is strongly recommended to create a Python virtual environment first and install dependencies within the virtual environment.
pip install requests websocket-client pydub

2.2 System dependencies

The audio-driven feature requires ffmpeg (used for audio format conversion):
# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# Windows
# Download https://ffmpeg.org/download.html and add it to PATH

2.3 Tencent Cloud credentials

Obtain the following information on the Tencent Cloud Digital Human Platform:
Parameters
Description
Obtaining Method
appkey
Application Identifier
Digital Human Platform → Application Management (Refer to Figure 1)
accesstoken
Access Token
Digital Human Platform → Application Management (Refer to Figure 1)
asset_virtualman_key
Avatar Asset ID
Digital Human Platform → Image Asset Management (Refer to Figure 2)
virtualman_project_id
Digital Human Project ID
Digital Human Platform → Project Management
Note:
asset_virtualman_key and virtualman_project_id represent two different stream creation methods. When using, you can choose one of the two options.

Figure 1. How to Obtain the appkey and accesstoken


Figure 2. How to Obtain the asset_virtualman_key


Figure 3. How to Obtain the virtualman_project_id

Note:
Before proceeding, you must first create an interactive project. For instructions on how to create one, see the figure below. After the project is created, you need to configure the digital human's avatar and voice (see the documentation).



3. Quick Start

3.1 Configuration Parameters

Edit the CONFIG dictionary at the bottom of the tencent_virtual_human_completeV1.py file:
CONFIG = {
"appkey": "your_appkey",
"accesstoken": "your_accesstoken",
"asset_virtualman_key": "your_asset_key", # Image Asset ID (preferred)
"virtualman_project_id": "", # Project ID (used when asset_key is empty)
"protocol": "rtmp", # Streaming protocol: "rtmp" / "trtc" / "webrtc"
"protocol_option": None, # Protocol option (see Section 5)
}

Running Environment

Operating System: Ubuntu 24.04.3 LTS / x86_64

Runtime Version: Python 3.11.1

3.2 Run script

python tencent_virtual_human_completeV1.py
The script will automatically execute the following process flow:
1️⃣ Create a session (automatically select the stream creation method)
2️⃣ Wait for session readiness (polling status, maximum 120 seconds)
3️⃣ Enable session
🎬 Automatically launch the H5 player (for TRTC/WebRTC protocols)
4️⃣ Create WebSocket connection
5️⃣ Enter interactive mode
6️⃣ Close session

3.3 Interactive mode

After the interactive mode is entered, the terminal will prompt you to select a driver mode:
📋 Driver mode selection:
1 - text-driven (input text to make Digital Humans speak)
2 - audio-driven (selecting audio files to drive the Digital Human)
q - Exit
Input 1: Enter text-driven mode. After text is inputted, the Digital Human will use TTS to synthesize speech and synchronize lip-sync.
Input 2: Enter audio-driven mode. Enter the audio file path (supports formats such as mp3, wav), and the Digital Human will use your audio to drive the lip-sync.
Input q: Exit and close the session.

4. Stream creation method

4.1 AssetVirtualmanKey stream creation (default)

Use image asset ID to create a session, which applies to Digital Humans created via the image asset management page.

API interface: /v2/ivh/sessionmanager/sessionmanagerservice/createsessionbyasset
CONFIG = {
"asset_virtualman_key": "your_asset_key",
"virtualman_project_id": "", # Leave blank
}

Running Environment

Operating System: Ubuntu 24.04.3 LTS / x86_64

Runtime Version: Python 3.11.1

Note:
You can refer to using personal asset image for stream creation to perform related operations.

4.2 VirtualmanProjectId stream creation

Use Digital Human project ID to create a session, which applies to Digital Humans created via the project management page.

API interface: /v2/ivh/sessionmanager/sessionmanagerservice/createsession
CONFIG = {
"asset_virtualman_key": "", # Leave blank
"virtualman_project_id": "your_project_id",
}

Running Environment

Operating System: Ubuntu 24.04.3 LTS / x86_64

Runtime Version: Python 3.11.1

Note:
You can refer to Create New Live Stream Session to perform related operations.

4.3 Priority Logic

The script automatically selects the stream creation method through the unified entry point of create_session():
def create_session(self) -> Tuple[bool, str]:
if self.asset_virtualman_key:
return self.create_session_by_asset() # Priority
elif self.virtualman_project_id:
return self.create_session_by_project() # Fallback
else:
return False, "Both parameters are empty, unable to create stream"

5. Streaming Protocol and ProtocolOption

5.1 Comparison of Three Protocols

Protocol
Delay
Playback Method
Scenarios
RTMP protocol
2-5 seconds
VLC and other external players
Common scenarios, highest compatibility
TRTC protocol
200~400ms
Automatically pop up the H5 page (TRTC Web SDK)
Ultra-low latency real-time interaction
WebRTC protocol
500ms-1s
Automatically pop up the H5 page (TCPlayerLite)
Low-latency Web playback

5.2 RTMP protocol (default)

CONFIG = {
"protocol": "rtmp",
"protocol_option": None,
}
After successful stream creation, the RTMP playback URL is returned, such as: rtmp://liveplay.ivh.qq.com/live/m789. It can be opened using players like VLC.

5.3 TRTC protocol

# Debug Mode (using the platform-unified AppId, no additional configuration required)
CONFIG = {
"protocol": "trtc",
"protocol_option": None,
}
After successful stream creation, the trtc:// format playback URL is returned, and the script automatically:
1. Start an HTTP server on local port 8080.
2. Parse parameters such as appId, roomId, and userSig in the trtc:// address.
3. Open the trtc_player.html playback page in the browser;
4. H5 page enters the TRTC room as audience role, automatically pulling the Digital Human video stream.


5.4 WebRTC Protocol

CONFIG = {
"protocol": "webrtc",
"protocol_option": None,
}
After successful stream creation, the webrtc:// format playback URL is returned, and the script automatically:
1. Start an HTTP server on local port 8080.
2. Open the webrtc_player.html playback page in the browser;
3. Use TCPlayerLite SDK for ultra-low latency playback.


5.5 ProtocolOption Advanced Configuration

ProtocolOption is used for TRTC production environments or custom streaming scenarios.

Available fields

Field
Type
Description
TrtcUseExternalApp
bool
Whether to use an external TRTC AppId
TrtcAppId
str
TRTC AppId (mandatory when an external AppId is used)
TrtcRoomId
int
TRTC digital room ID
TrtcStrRoomId
str
TRTC string room ID (choose one of the two options with TrtcRoomId)
TrtcAutoGenRoomIdType
int
Auto-generated room ID type: 0=number (default), 1=string
TrtcUserSig
str
TRTC user signature (mandatory when an external AppId is used)
TrtcPrivateMapKey
str
TRTC permission ticket (enter "dummy" if advanced permissions are not enabled)
CssCustomPushUrl
str
Custom CSS push URL (any protocol can be used)

Example Scenario

TRTC Production Mode (using external AppId):
CONFIG = {
"protocol": "trtc",
"protocol_option": {
"TrtcUseExternalApp": True,
"TrtcAppId": "1400xxxxxx",
"TrtcRoomId": 12345,
"TrtcUserSig": "eJw8js0Kgk...",
"TrtcPrivateMapKey": "dummy"
},
}
Custom Push URL (both RTMP/WebRTC supported):
CONFIG = {
"protocol": "rtmp",
"protocol_option": {
"CssCustomPushUrl": "rtmp://domain/appName/streamName?txSecret={0}&txTime={1}"
},
}

5.6 Optional configuration

If the avatar supports a transparent background, you can also adjust the Demo parameters to see the transparent background effect, as shown below:
"ExtraInfo": {"AlphaChannelEnable":True} # Enable Alpha channel (if transparent background is required)

6. H5 Playback Page

6.1 TRTC Playback Page (trtc_player.html)

Technical Solution: TRTC Web SDK v5 (loaded via unpkg CDN)

Core Logic:
Parsing appId, roomId, userId, and userSig from URL parameters.
Enter the TRTC room as audience role (role: 'audience').
Listen for the REMOTE_VIDEO_AVAILABLE event and automatically pull the remote video stream.
The video display uses object-fit: contain to be displayed in its entirety.

Key code snippet:
// Enter the room (pull stream only as audience, no pushing)
await trtc.enterRoom({
sdkAppId: config.appId,
userId: config.userId,
userSig: config.userSig,
roomId: config.roomId,
scene: 'live',
role: 'audience'
});

// Listen for and play remote video
trtc.on(TRTC.EVENT.REMOTE_VIDEO_AVAILABLE, async (event) => {
await trtc.startRemoteVideo({
userId: event.userId,
streamType: event.streamType,
view: 'remote-video',
option: { objectFit: 'contain' }
});
});
Standalone Usage (without relying on Python scripts):
http://localhost:8080/trtc_player.html?appId=1400695865&roomId=402183450&userId=user_xxx&userSig=eJw8...&virtualManUserId=402183450_ivh_anchor


6.2 WebRTC Playback Page (webrtc_player.html)

Technical Solution: TCPlayerLite v2.4.5 (locally deployed TcPlayer-2.4.5.js)

Core Logic:
Read the WebRTC playback URL from the URL parameter ?url=webrtc://...
Initialize the player using TCPlayerLite, and it will automatically play in live streaming mode.

Key code snippet:
player = new TcPlayer('player-container', {
"webrtc": webrtcUrl,
"width": '100%',
"height": '540',
"autoplay": true,
"live": true,
"controls": "none",
"webrtcConfig": {
"streamType": "auto"
},
"listener": function (msg) {
handlePlayerEvent(msg);
}
});
Standalone Use:
http://localhost:8080/webrtc_player.html?url=webrtc://liveplay.ivh.qq.com/live/m11533590420520971383?min_delay_ms=100
Note:
WebRTC playback page must be accessed via an HTTP server; you cannot directly open local files (file:// protocol is not supported).


7. API Signature Mechanism

All API requests require signature authentication. Signing process:
1. Parameter Sorting: Sort all request parameters in lexicographical order;
2. Concatenate strings: key1=value1&key2=value2&...
3. HMAC-SHA256: use accesstoken as the key to compute the signature;
4. Base64 + URL Encoding: Encode the signature result.
def _generate_signature(self, parameters: Dict[str, str]) -> str:
sorted_params = sorted(parameters.items())
signing_content = '&'.join(f'{k}={v}' for k, v in sorted_params)
h = hmac.new(
self.accesstoken.encode('utf-8'),
signing_content.encode('utf-8'),
hashlib.sha256
)
hash_in_base64 = base64.b64encode(h.digest()).decode('utf-8')
return quote(hash_in_base64)


8. Session Lifecycle

┌─────────────┐
│ Create session │ create_session()
└──────┬──────┘
┌─────────────┐
│ Wait for readiness │ wait_for_session_ready() ← Poll SessionStatus
└──────┬──────┘ SessionStatus=3 → Preparing
SessionStatus=1 → Ready
┌─────────────┐
│ Enable session │ start_session()
└──────┬──────┘
┌─────────────┐
│ H5 Player │ start_h5_player() ← Automatically launched for TRTC/WebRTC
└──────┬──────┘
┌─────────────┐
│ WebSocket │ create_websocket_connection()
│ Persistent connection channel │
└──────┬──────┘
┌─────────────┐
│ Interactive Drive │ send_text_drive() / send_audio_drive()
(Loop)
└──────┬──────┘
┌─────────────┐
│ Close session │ close_session()
└─────────────┘


9. Driver Instructions

9.1 text-driven

By sending text via WebSocket, the Digital Human uses TTS to synthesize speech and synchronize lip-sync.
drive_cmd = {
"Header": {},
"Payload": {
"ReqId": req_id,
"SessionId": self.session_id,
"Command": "SEND_TEXT",
"Data": {
"Text": "Hello, welcome to Tencent Cloud Digital Human Platform",
"ChatCommand": "NotUseChat"
}
}
}
self.ws.send(json.dumps(drive_cmd, ensure_ascii=False))

9.2 audio-driven

Convert the audio file to PCM format (16kHz, mono, 16bit), packetize it, and send it via WebSocket.

Audio Conversion:
audio = AudioSegment.from_file(audio_file_path)
audio = audio.set_channels(1).set_frame_rate(16000).set_sample_width(2)
pcm_data = audio.raw_data

Packetization Sending Policy:
5120 bytes per packet (160ms audio)
The first 6 packets Rapid Sending (without interval)
Subsequent packets at 120ms intervals
Send the final end packet IsFinal: True



10. FAQs

Q1: Audio conversion failure

Ensure that ffmpeg is installed and available in the PATH. The script will automatically locate the ffmpeg path:
_ffmpeg_path = shutil.which('ffmpeg')


Q2: TRTC playback page video is cropped

H5 page has been configured with object-fit: contain. If issues persist, check whether the browser is the latest version.


Issue 3: WebRTC playback page fails to open

Ensure that the TcPlayer-2.4.5.js file is in the same directory as webrtc_player.html
Must be accessed via an HTTP server (the script will start automatically), and cannot be directly opened using the file:// protocol.


Q4: Session creation timeout

After session creation, the status "preparing" (SessionStatus=3) is normal as model loading is required. The script defaults to polling with a maximum wait time of 120 seconds.


Q5: Port 8080 is occupied

H5 Player uses port 8080 by default. If you need to change it, modify the start_h5_player() function's h5_port parameter:
h5_url = self.start_h5_player(h5_port=9090)


11. Reference Documents




Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback