2D/3D Avatar | Avatar customization | Customize your exclusive Digital Human avatar. For 3D, an additional purchase of the cloud-driven engine is required, while for 2D it's not necessary. Choose either customization or leasing for purchase. | | |
Renewal of customized Avatar | The customized image has a default validity period of 1 year. This service is specifically for purchase and use after the customized avatar expires. After the rental image expires, you can directly repurchase it without the need to buy this renewal service. | | | |
application scenarios | Conversation interaction | Cloud Rendering | After rendering and generating the avatar through cloud services, the avatar is pushed to the terminal for real - time display. You need to purchase "Cloud Rendering Session Driver Concurrency". It supports APl and SDK and is mutually exclusive with local rendering. | |
| | Conversation interaction | The avatar is rendered and displayed directly on - terminal locally. The cloud service only takes responsibility for pushing conversation content. Local rendering You need to purchase the "Local Rendering Session Driver Usage Package" or a terminal- authorized license. APl and SDK are supported. It is mutually exclusive with cloud rendering. | |
| Audio - video broadcasting | Generate video (including audio) | Generate a video using a specified virtual avatar and voice, following a preset text. You need to purchase the "Video Broadcast Synthesis Hour Package"(which includes audio synthesis capabilities). | |
| | Generate audio only | Generate audio with the specified voice tone according to the preset text. You need to purchase the "Audio Broadcast Synthesis Hour Package"(This package is required when only generating audio). | |
| | Concurrent audio - video broadcasting | Increase the number of concurrent channels to improve the generation efficiency of videos or audio, without affecting the generated results. Optional for purchase. | |
Voice Customization | Voice replication | Train and generate a specified voice timbre through the provided voice materials, which can be used in application scenarios. | | |
Renewal of customized voice | The replicated voice has a default validity period of one year. This service is specifically for purchase and use after the replicated voice expires. | | | |
Avatar Type | Definition | Use Cases | Example |
2D Premium | After about two weeks of training with motion materials recorded in professional studios, a digital human can be generated for broadcasting and interactive scenarios. The boutique image allows random insertion of specified actions in the text, presenting a variety of motions. | Suitable for customers in finance and media categories who have requirements for digital human avatars and actions. | ![]() |
2D small sample - general lip movement | Train a digital human using a real-person video footage. The appearance of the digital human matches that of the real person. The lip movement will use the general lip and teeth generated by a large model. The requirement for training video footage is lower. Refer to Image Recording Guide - General Lip Movement. | Suitable for customers with no requirement for digital intelligence population type and no good shooting conditions. | ![]() |
2D small sample - exclusive lip shape | Train a digital human using a real-person video footage. The appearance of the digital human matches that of the real person. The lip movement will use the real person exclusive lips and teeth. The training video footage should have no other voices and obvious ambient sound. Refer to Avatar Recording Guide - Exclusive Lip Movement. | Suitable for customers who have requirements for digital human avatar replication and have good shooting conditions. | |
2D small sample - high-precision version | Train a digital human using a 4K real-person video footage. The material collection requirements and the final lip and teeth effect are the same as those of the 2D small sample (exclusive lip movement). Finally, the clarity of the digital human is improved to 4K. Refer to Avatar Recording Guide - High-Precision Version. | Suitable for large conferences, face-to-face dialogues, product launch events, and large screen scenarios. | |
2D small sample - photo avatar | An avatar can be trained through a photo. This version features low price and high speed. Generally, it can be used 10 minutes after material submission. | Suitable for general internet and entertainment scenarios. | ![]() |
3D Cartoon | Set digital human facial features, hairstyle, clothing, accessories, etc. according to customer needs to complete the original painting. After the customer reviews and finalizes the final image, carry out model making. After stages such as bone binding, rendering, and UE optimization, a digital human that covers interactive and broadcasting scenarios can be output. | Suitable for scenarios where there is an existing 2D mascot image and there is an expectation to upgrade it to a 3D image to deliver service to users. | ![]() |
3D Semi-Realistic | Set digital human facial features, hairstyle, clothing, accessories, etc. according to customer needs to complete the original painting. After the customer reviews and finalizes the final image, carry out model making. After stages such as bone binding, rendering, and UE optimization, a digital human that covers interactive and broadcasting scenarios can be output. | Suitable for scenarios requiring a certain degree of realism but with low accuracy requirements, such as news reading and mobile smart customer service scenarios. | ![]() |
3D Realistic | Set digital human facial features, hairstyle, clothing, accessories, etc. according to customer needs to complete the original painting. After the customer reviews and finalizes the final image, carry out model making. After stages such as bone binding, rendering, and UE optimization, a digital human that covers interactive and broadcasting scenarios can be output. | Suitable for scenarios requiring high realistic sense and high precision display, such as brand promotion and large screen interaction scenarios. | ![]() |
| 2D Small Sample - General Lip Movement | 2D Small Sample - Exclusive Lip Shape | 2D Small Sample - High-Precision Version | 2D Small Sample - Photo-Based Digital Human |
Recording requirements | Record a video of at least 60s, no requirement for video shooting sound | Record a video of at least 180s. The recording environment must be quiet, and only the sound of the subject can be recorded. | The recording standard is the same as the exclusive lip shape. The video resolution must be 4K. | Only a clear frontal portrait of individual is required. |
Delivery cycle | Deliver a demo within 1 day for customer effect confirmation. It can be used after the customer clicks to confirm. | Deliver a demo within 2 days for customer effect confirmation. It can be used after the customer clicks to confirm. | Deliver a demo within 3 days for customer effect confirmation. It can be used after the customer clicks to confirm. | Available within 10 minutes |
Finished product effect | The general version uses lip and teeth generated by the big data model. | The exclusive version records the lip movement of the person and has better facial clarity. | On the basis of the exclusive lip shape effect, output 4K resolution for a clearer view. | Photo avatar uses lip and teeth generated by the big data model, and the body pose cannot have a slight shake. |
General lip movement vs Exclusive lip shape | ![]() | | | |
General lip movement vs Photo avatar | ![]() | | | |
Exclusive lip shape vs High-precision version | ![]() | | | |
Avatar Type | Feature Description | Price |
2D small sample - Instant Lip-Sync | Select an avatar from the 2D small sample - Instant Lip-Sync avatar library for leasing. The lease is non-exclusive and supports being driven by text or original sound. Basic actions are determined by the actual image condition. Includes timbre. | 25 USD/each/month |
2D small sample - Studio Lip-Sync | Select an avatar from the 2D small sample - Studio Lip-Sync avatar library for leasing. The lease is non-exclusive and supports being driven by text or original sound. Basic actions are determined by the actual image condition. Includes timbre. | 60 USD/each/month |
3D | Select an avatar from the 3D avatar library for leasing. The lease is non-exclusive and supports being driven by text or original sound. Basic actions are determined by the actual image condition. Includes timbre. | 5358 USD/each/month |
Avatar Type | Feature Description | Price |
2D small sample - general lip movement | Limited to cloud services. Driven by text and original sound. By providing 1-minute video footage, you can customize an AI Digital Human with 1 default timbre. Clothing style, pose, and motion shall be subject to video material data. Only when the material has a green screen solid - color background can the background replacement feature be supported. | 200 USD/each |
2D small sample - exclusive lip shape | Supported by text or original sound. You can customize a broadcasting digital human with one piece of 3-minute video footage, including one default voice type. Clothing style, pose, and motion shall be subject to video material data. Only when the material has a green screen solid - color background can the background replacement feature be supported. | 1,000 USD/each |
2D small sample photo | Supported by text or original sound. An avatar can be trained through a photo. It is cost-effective and has a quick customization speed. | 2.5 USD/each |
3D | Available for both cloud services and private use. Support text/sound/single-camera video driving, based on the default version of 3D portrait (Refer to YoYo Character Body Template), customization for face shape, hairstyle, clothing, and motion as specified. The complete model set includes 1 face shape, 1 hairstyle, 1 clothing, and an action library of 8. If additional customization of hairstyle, clothing, motion, and expression is required, extra items need to be added to the cart. The supporting asset accuracy of 3D realistic images is Grade S. | Contact us for a quote |
3D cloud-based driver engine | Suitable for 3D avatar assets the customer already has and meet the drive specification, driven by text or original sound. | 137,500 USD/each |
Category | Feature Description | Price |
Voice Reproduce (VRS) - Ultra-fast Version | Input audio data in seconds, and you can instantly own an exclusive AI customized timbre within 10 minutes; mainly used with photo avatars, featuring immediate usability. See Voice Clone Recording Guide - Ultra-fast Version. The generated avatar image is permanently valid. | 2.5 USD/each |
Voice Reproduce (VRS) - Ultra-fast Version (Minority Language) | 50 USD/each |
Avatar Type | Feature Description | Price |
2D Avatar - Studio Lip-Sync | Supports on-shelf service renewal for 2D small sample Studio Lip-Sync custom avatars. | 18 USD/each/month |
2D Avatar - Instant Lip-Sync | Supports on-shelf service renewal for 2D small sample Instant Lip-Sync custom avatars. | 2 USD/each/month |
3D Avatar | Supports on-shelf service renewal for custom avatars in styles including 3D cartoon, 3D semi-realistic, and 3D realistic. | 84 USD/each/month |
Ultra-fast Voice (Minority Language) | Supports on-shelf service renewal for the Ultra-fast Voice (Minority Language) edition of Voice Clone. | 4 USD/each/month |
Avatar Type | Feature Description | Price |
2D small sample - general lip movement | Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours | 1800 USD/each |
2D small sample - exclusive lip shape | Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours | 1800 USD/each |
2D small sample photo avatar | Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours | 1800 USD/each |
3D | Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours | 3600 USD/each |
Avatar Type | Feature Description | Price |
2D small sample without training (video footage) | Video Broadcast Synthesis Hourly Package - No Training Version 10 hr | 4,500 USD/each (10 hr) |
2D small sample without training (photo materials) | Video Broadcast Synthesis Hourly Package - No Training Version 1 hr | 2,000 USD/each (1 hr) |
Renewal Type of Hourly Package | Feature Description | Price |
Common audio broadcast synthesis | Limited to Cloud Services, Lease/Clone Digital Human Voice Audio Generation Duration: 1 Hour Package | 10 USD/each |
Avatar Type | Feature Description | Price |
2D small sample - general lip movement | Support 2D small sample - general lip movement, with a maximum resolution of 1080p. | 500 USD/month/channel |
2D small sample - exclusive lip shape | Support 2D small sample - exclusive lip shape, with a maximum resolution of 1080p. | 500 USD/month/channel |
2D small sample photo | Support 2D small sample photos, with a maximum resolution of 1080p. | 500 USD/month/channel |
3D | Supports 3D with a max resolution of 1080p. | 800 USD/month/channel |
2D small sample without training (video footage) | Supports 2D small sample without training (video footage), with a max resolution of 1080p. Note: Cannot be used interchangeably with photo materials and can replace the corresponding hourly package. | 2,000 USD/channel/month |
2D small sample without training (photo materials) | Supports 2D small sample without training (photo materials), with a max resolution of 1080p. Note: Cannot be used interchangeably with video footage and can replace the corresponding hourly package. | 2,000 USD/channel/month |
Avatar Type | Feature Description | Price |
2D small sample - general lip movement | Support 2D small sample - general lip movement, with a maximum resolution of 1080p. | 500 USD/month/channel |
2D small sample - exclusive lip shape | Support 2D small sample - exclusive lip shape, with a maximum resolution of 1080p. | 500 USD/month/channel |
2D small sample photo | Support 2D small sample photos, with a maximum resolution of 1080p. | 500 USD/month/channel |
3D | Supports 3D with a max resolution of 1080p. | 800 USD/month/channel |
Avatar Type | Product Content | Feature Description | Price |
2D | 2D client-side rendering SDK annual authorization package - per device | Support digital humans with general lip movement, exclusive lip shape, and photo types, sold with a per-device authorization mode. | 1,200 USD/year |
| 2D client-side rendering SDK annual authorization package - per app | Support digital humans with general lip movement, exclusive lip shape, and photo types, authorization by application, including iOS and Android, with no limit on the number of users. | 150,000 USD/year |
3D | 3D client-side rendering SDK license for H5 | To authorize the SDK, purchase once for lifetime availability. The H5 version SDK requires WebGL avatar. Essential. (Offline purchase, includes 1-year maintenance.) | 34,287 USD/each |
| 3D client-side rendered conversation usage package | For statistics of 3D SDK API calls. Each package supports 1 million calls, using GBK encoding (40 bytes = 1 call, or 20 Chinese characters = 1 consumption). Valid for one year from the date of purchase. Required. | 2,000 USD/each |
Feedback