tencent cloud

Pricing Guide
Last updated: 2025-11-05 10:02:23
Pricing Guide
Last updated: 2025-11-05 10:02:23

Basic Structure

Tencent Cloud AI Digital Human (TCADH) offers three products for sales: Image Procurement, Broadcasting Service, and Interactive Service. Image Procurement is a required option and can be used together with the Broadcasting Service and Interactive Service of the digital human. Note that purchasing Image Procurement, Broadcasting Service, or Interactive Service alone cannot be directly applied to the final application scenarios and a combination purchase is needed.
2D/3D Avatar
Avatar customization
Customize your exclusive Digital Human avatar. For 3D, an additional purchase of the cloud-driven engine is required, while for 2D it's not necessary. Choose either customization or leasing for purchase.
Renewal of customized Avatar
The customized image has a default validity period of 1 year. This service is specifically for purchase and use after the customized avatar expires. After the rental image expires, you can directly repurchase it without the need to buy this renewal service.
application scenarios
Conversation interaction
Cloud Rendering
After rendering and generating the avatar through cloud services, the avatar is pushed to the terminal for real - time display. You need to purchase "Cloud Rendering Session Driver Concurrency". It supports APl and SDK and is mutually exclusive with local rendering.
Conversation interaction

The avatar is rendered and displayed directly on - terminal locally. The cloud service only takes responsibility for pushing conversation content. Local rendering You need to purchase the "Local Rendering Session Driver Usage Package" or a terminal- authorized license. APl and SDK are supported. It is mutually exclusive with cloud rendering.

Audio - video broadcasting
Generate video (including audio)
Generate a video using a specified virtual avatar and voice, following a preset text. You need to purchase the "Video Broadcast Synthesis Hour Package"(which includes audio synthesis capabilities).
Generate audio only
Generate audio with the specified voice tone according to the preset text. You need to purchase the "Audio Broadcast Synthesis Hour Package"(This package is required when only generating audio).
Concurrent audio - video broadcasting
Increase the number of concurrent channels to improve the generation efficiency of videos or audio, without affecting the generated results. Optional for purchase.
Voice Customization
Voice replication
Train and generate a specified voice timbre through the provided voice materials, which can be used in application scenarios.
Renewal of customized voice
The replicated voice has a default validity period of one year. This service is specifically for purchase and use after the replicated voice expires.

Avatar Introduction

Introduction to Avatar Categories
Avatar Type
Definition
Use Cases
Example
2D Premium
After about two weeks of training with motion materials recorded in professional studios, a digital human can be generated for broadcasting and interactive scenarios. The boutique image allows random insertion of specified actions in the text, presenting a variety of motions.
Suitable for customers in finance and media categories who have requirements for digital human avatars and actions.



2D small sample - general lip movement
Train a digital human using a real-person video footage. The appearance of the digital human matches that of the real person. The lip movement will use the general lip and teeth generated by a large model. The requirement for training video footage is lower. Refer to Image Recording Guide - General Lip Movement.
Suitable for customers with no requirement for digital intelligence population type and no good shooting conditions.



2D small sample - exclusive lip shape
Train a digital human using a real-person video footage. The appearance of the digital human matches that of the real person. The lip movement will use the real person exclusive lips and teeth. The training video footage should have no other voices and obvious ambient sound. Refer to Avatar Recording Guide - Exclusive Lip Movement.
Suitable for customers who have requirements for digital human avatar replication and have good shooting conditions.
2D small sample - high-precision version
Train a digital human using a 4K real-person video footage. The material collection requirements and the final lip and teeth effect are the same as those of the 2D small sample (exclusive lip movement). Finally, the clarity of the digital human is improved to 4K. Refer to Avatar Recording Guide - High-Precision Version.
Suitable for large conferences, face-to-face dialogues, product launch events, and large screen scenarios.
2D small sample - photo avatar
An avatar can be trained through a photo. This version features low price and high speed. Generally, it can be used 10 minutes after material submission.
Suitable for general internet and entertainment scenarios.



3D Cartoon
Set digital human facial features, hairstyle, clothing, accessories, etc. according to customer needs to complete the original painting. After the customer reviews and finalizes the final image, carry out model making. After stages such as bone binding, rendering, and UE optimization, a digital human that covers interactive and broadcasting scenarios can be output.
Suitable for scenarios where there is an existing 2D mascot image and there is an expectation to upgrade it to a 3D image to deliver service to users.



3D Semi-Realistic
Set digital human facial features, hairstyle, clothing, accessories, etc. according to customer needs to complete the original painting. After the customer reviews and finalizes the final image, carry out model making. After stages such as bone binding, rendering, and UE optimization, a digital human that covers interactive and broadcasting scenarios can be output.
Suitable for scenarios requiring a certain degree of realism but with low accuracy requirements, such as news reading and mobile smart customer service scenarios.



3D Realistic
Set digital human facial features, hairstyle, clothing, accessories, etc. according to customer needs to complete the original painting. After the customer reviews and finalizes the final image, carry out model making. After stages such as bone binding, rendering, and UE optimization, a digital human that covers interactive and broadcasting scenarios can be output.
Suitable for scenarios requiring high realistic sense and high precision display, such as brand promotion and large screen interaction scenarios.




Image comparison


2D Small Sample - General Lip Movement
2D Small Sample - Exclusive Lip Shape
2D Small Sample - High-Precision Version
2D Small Sample - Photo-Based Digital Human
Recording requirements
Record a video of at least 60s, no requirement for video shooting sound
Record a video of at least 180s. The recording environment must be quiet, and only the sound of the subject can be recorded.
The recording standard is the same as the exclusive lip shape. The video resolution must be 4K.
Only a clear frontal portrait of individual is required.
Delivery cycle
Deliver a demo within 1 day for customer effect confirmation. It can be used after the customer clicks to confirm.
Deliver a demo within 2 days for customer effect confirmation. It can be used after the customer clicks to confirm.
Deliver a demo within 3 days for customer effect confirmation. It can be used after the customer clicks to confirm.
Available within 10 minutes
Finished product effect
The general version uses lip and teeth generated by the big data model.
The exclusive version records the lip movement of the person and has better facial clarity.
On the basis of the exclusive lip shape effect, output 4K resolution for a clearer view.
Photo avatar uses lip and teeth generated by the big data model, and the body pose cannot have a slight shake.
General lip movement vs Exclusive lip shape



General lip movement vs Photo avatar



Exclusive lip shape vs High-precision version




Price Details

Image Procurement

It refers to purchase of Avatar's image, which can be divided into Image Rental and Image Customization. Additionally, it supports Voice Clone.
Avatar Rental: Rent an avatar from the Public Basic Image Library. During the rental period, it is a non-exclusive rental. You only have the usage right of the avatar. The ownership of the avatar still belongs to Tencent, and Tencent has the right to secondary lease the avatar. It is suitable for customers who do not have high requirements for exclusive avatars and whose businesses are in the initial stage.
Customize Avatar: Customize the digital human avatar via recording training or modeling. It is suitable for customers who have requirements for self-owned avatars and need to own the avatars.
Voice Replication: Replicate a specific voice through the collection and training of speech data.

1. Image Leasing
Supports leasing images from the public image library. For concrete images, please refer to 2D small sample (Instant Lip-Sync) Basic Image Library.
Avatar Type
Feature Description
Price
2D small sample - Instant Lip-Sync
Select an avatar from the 2D small sample - Instant Lip-Sync avatar library for leasing. The lease is non-exclusive and supports being driven by text or original sound. Basic actions are determined by the actual image condition. Includes timbre.
25 USD/each/month
2D small sample - Studio Lip-Sync
Select an avatar from the 2D small sample - Studio Lip-Sync avatar library for leasing. The lease is non-exclusive and supports being driven by text or original sound. Basic actions are determined by the actual image condition. Includes timbre.
60 USD/each/month
3D
Select an avatar from the 3D avatar library for leasing. The lease is non-exclusive and supports being driven by text or original sound. Basic actions are determined by the actual image condition. Includes timbre.
5358 USD/each/month
2. Avatar Customization
Need to pay attention: The image customization quota takes effect immediately after purchase and has a validity of one year.
Avatar Type
Feature Description
Price
2D small sample - general lip movement
Limited to cloud services. Driven by text and original sound.
By providing 1-minute video footage, you can customize an AI Digital Human with 1 default timbre.
Clothing style, pose, and motion shall be subject to video material data.
Only when the material has a green screen solid - color background can the background replacement feature be supported.
200 USD/each

2D small sample - exclusive lip shape
Supported by text or original sound.
You can customize a broadcasting digital human with one piece of 3-minute video footage, including one default voice type.
Clothing style, pose, and motion shall be subject to video material data.
Only when the material has a green screen solid - color background can the background replacement feature be supported.
1,000 USD/each
2D small sample photo
Supported by text or original sound.
An avatar can be trained through a photo. It is cost-effective and has a quick customization speed.
2.5 USD/each
3D
Available for both cloud services and private use.
Support text/sound/single-camera video driving, based on the default version of 3D portrait (Refer to YoYo Character Body Template), customization for face shape, hairstyle, clothing, and motion as specified. The complete model set includes 1 face shape, 1 hairstyle, 1 clothing, and an action library of 8.
If additional customization of hairstyle, clothing, motion, and expression is required, extra items need to be added to the cart.
The supporting asset accuracy of 3D realistic images is Grade S.
Contact us for a quote
3D cloud-based driver engine
Suitable for 3D avatar assets the customer already has and meet the drive specification, driven by text or original sound.
137,500 USD/each
3. Voice Clone
Pay attention: The VRS quota takes effect immediately after purchase, with a validity of one year.
Category
Feature Description
Price
Voice Reproduce (VRS) - Ultra-fast Version
Input audio data in seconds, and you can instantly own an exclusive AI customized timbre within 10 minutes; mainly used with photo avatars, featuring immediate usability. See Voice Clone Recording Guide - Ultra-fast Version. The generated avatar image is permanently valid.
2.5 USD/each
Voice Reproduce (VRS) - Ultra-fast Version (Minority Language)
The same feature as above supports multiple languages. For details, see Appendix 4 - Language List.
50 USD/each
4. Renew custom image or voice
It can be used to extend the effective time of customized images or cloned voices.
Avatar Type
Feature Description
Price
2D Avatar - Studio Lip-Sync
Supports on-shelf service renewal for 2D small sample Studio Lip-Sync custom avatars.
18 USD/each/month
2D Avatar - Instant Lip-Sync
Supports on-shelf service renewal for 2D small sample Instant Lip-Sync custom avatars.
2 USD/each/month
3D Avatar
Supports on-shelf service renewal for custom avatars in styles including 3D cartoon, 3D semi-realistic, and 3D realistic.
84 USD/each/month
Ultra-fast Voice (Minority Language)
Supports on-shelf service renewal for the Ultra-fast Voice (Minority Language) edition of Voice Clone.
4 USD/each/month
Note:
Photo avatar permanently valid.

Broadcasting Service

It implies the capability of providing audio and video broadcasts via Avatar. In this scenario, services are provided in three categories: Video Generation Service - Hourly Package, Audio Generation Service - Hourly Package, and Video Generation Concurrency Service.Video Generation Service - Hourly Package and Concurrency are charged based on the image type, and packages for different image types are not interchangeable.
Video Generation Service - Hourly Package: video duration resource packages for broadcasting audio and video content.
Audio Generation Service - Hourly Package: Audio duration resource packages for broadcast audio and video.
Video Generation Concurrency Service: concurrency number for simultaneously generated videos.
1. Video Generation Service - Hourly Package
Avatar Type
Feature Description
Price
2D small sample - general lip movement
Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours
1800 USD/each
2D small sample - exclusive lip shape
Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours
1800 USD/each
2D small sample photo avatar
Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours
1800 USD/each
3D
Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours
3600 USD/each
2. Generate Video (with Audio) Hourly Package - Avatar-Free Training Version, which directly synthesizes videos without collocating avatars.
Avatar Type
Feature Description
Price
2D small sample without training (video footage)
Video Broadcast Synthesis Hourly Package - No Training Version 10 hr
4,500 USD/each (10 hr)
2D small sample without training (photo materials)
Video Broadcast Synthesis Hourly Package - No Training Version 1 hr
2,000 USD/each (1 hr)
Note:
2D small sample training-free (video footage): The same user can submit 5 video production tasks at the same time by default, shared between main and sub-accounts. The concurrent resource called is shared with all customers, and queued waiting may occur.
2D small sample training-free (photo materials): The same user can submit 1 video production task at the same time by default, shared between main and sub-accounts. The concurrent resource called is shared with all customers, and queued waiting may occur.
Training-free concurrent resources are exclusive and no additional purchase required. For example, if a customer purchases 2 concurrent, they can submit 2 training-free video production tasks at the same time.
Purchased video footage and photo materials hourly packages or concurrency cannot be used interchangeably.
3. Audio Generation Service - Hourly
Renewal Type of Hourly Package
Feature Description
Price
Common audio broadcast synthesis
Limited to Cloud Services, Lease/Clone Digital Human Voice Audio Generation Duration: 1 Hour Package
10 USD/each
4. Broadcast concurrency
Avatar Type
Feature Description
Price
2D small sample - general lip movement
Support 2D small sample - general lip movement, with a maximum resolution of 1080p.
500 USD/month/channel
2D small sample - exclusive lip shape
Support 2D small sample - exclusive lip shape, with a maximum resolution of 1080p.
500 USD/month/channel
2D small sample photo
Support 2D small sample photos, with a maximum resolution of 1080p.
500 USD/month/channel
3D
Supports 3D with a max resolution of 1080p.
800 USD/month/channel
2D small sample without training (video footage)
Supports 2D small sample without training (video footage), with a max resolution of 1080p.
Note: Cannot be used interchangeably with photo materials and can replace the corresponding hourly package.
2,000 USD/channel/month
2D small sample without training (photo materials)
Supports 2D small sample without training (photo materials), with a max resolution of 1080p.
Note: Cannot be used interchangeably with video footage and can replace the corresponding hourly package.
2,000 USD/channel/month

Interaction Service (Cloud Rendered Conversation Drive Concurrency)

It refers to the capability provided by Avatar for voice interaction, commonly used in intelligent customer service, digital human live streaming, and other scenarios. This scenario provides services for interactive concurrency, specifically referring to the number of concurrent online interactions and stream building. Interactive concurrency is provided separately based on the image type, and different image types do not support mixed use.
Avatar Type
Feature Description
Price
2D small sample - general lip movement
Support 2D small sample - general lip movement, with a maximum resolution of 1080p.
500 USD/month/channel
2D small sample - exclusive lip shape
Support 2D small sample - exclusive lip shape, with a maximum resolution of 1080p.
500 USD/month/channel
2D small sample photo
Support 2D small sample photos, with a maximum resolution of 1080p.
500 USD/month/channel
3D
Supports 3D with a max resolution of 1080p.
800 USD/month/channel

Interaction Service (Client Rendering Scene)

Client-side rendering mode
In this mode, the client-side rendering SDK must be used.
Note:
2D and 3D billing logic is different, carefully check. 2D just needs to purchase an annual usage package; 3D must purchase both a permanent authorization package and a session driver usage package.
Avatar Type
Product Content
Feature Description
Price
2D
2D client-side rendering SDK annual authorization package - per device
Support digital humans with general lip movement, exclusive lip shape, and photo types, sold with a per-device authorization mode.
1,200 USD/year
2D client-side rendering SDK annual authorization package - per app
Support digital humans with general lip movement, exclusive lip shape, and photo types, authorization by application, including iOS and Android, with no limit on the number of users.
150,000 USD/year
3D
3D client-side rendering SDK license for H5
To authorize the SDK, purchase once for lifetime availability. The H5 version SDK requires WebGL avatar. Essential. (Offline purchase, includes 1-year maintenance.)
34,287 USD/each
3D client-side rendered conversation usage package
For statistics of 3D SDK API calls. Each package supports 1 million calls, using GBK encoding (40 bytes = 1 call, or 20 Chinese characters = 1 consumption). Valid for one year from the date of purchase. Required.
2,000 USD/each

Private Service

If you need to purchase private services, please contact your business manager for a quote.


Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback