Tencent's in-house ASR engine is natively integrated into the TRTC platform, delivering ultra-low latency through direct access to TRTC's real-time audio pipeline. Advanced audio processing — including AI noise suppression, echo cancellation, and customizable conversation modes — ensures clear transcription even in noisy environments. The flexible engine framework supports a broad range of models covering Chinese, English, Cantonese, and mixed-language scenarios, all configurable through STTConfig fields with no additional service accounts required. Ideal for teams seeking the fastest integration path with zero external dependencies.
Usage
To use Tencent ASR as the STT engine, pass the following JSON in the STTConfig field of the StartAIConversation API:
{
"Language": "zh",
"VadSilenceTime": 1000
}
Built-in provider:
Tencent ASR is TRTC's built-in speech recognition engine. Unlike third-party providers (Azure, Deepgram, Soniox), it does not require the CustomParam field — just configure the STTConfig top-level fields below.
Parameter reference
The following fields are part of STTConfig. For the full definition, see STTConfig. |
Language
| String | No | Primary language code for recognition (e.g., "zh", "en"). See STTConfig. |
VadSilenceTime
| Integer | No | VAD silence duration in milliseconds. When silence exceeds this value, the current speech segment ends. See STTConfig. |