tencent cloud

AI Interviewing
Last updated: 2025-12-10 11:10:54
AI Interviewing
Last updated: 2025-12-10 11:10:54

Scenario Introduction

AI interviewing is an innovative solution that leverages artificial intelligence (AI) and high-quality real-time audio and video communication to enable automated online interviews. In traditional interview processes, limitations such as the number of interviewers, scheduling constraints, and subjective evaluations often lead to inefficiencies, high costs, and poor experiences for enterprises during large-scale recruitment and talent screening. With Conversational AI capabilities, AI interviewing provides enterprises and candidates with a 24/7 available, standardized, and efficient online interview experience. AI interviewers, powered by large language models (LLMs), can engage in natural conversations with candidates, ask and follow up on questions in real time, assess the comprehensive abilities of candidates, and automatically record and organize textual data from interviews to support subsequent evaluations.
Tencent RTC serves as the foundational support, delivering stable, high-quality, and low-latency audio and video communication for AI interview scenarios. The cross-platform and global interoperability advantages of Tencent RTC allow interviewees to confidently use any terminal device, including iOS, Android, Windows, Mac, Web, and WeChat/QQ mini programs, to participate in AI interviews anytime, anywhere. Whether for initial screening or in-depth QA, RTC Engine ensures a clear and smooth communication process, offering a user experience comparable to native applications.
For enterprise developers, RTC Engine provides a rich set of scenario-based UI components and low-barrier development and integration capabilities, enabling rapid integration and deployment with just a few lines of code. Whether deploying AI interview entry points in proprietary apps, public accounts, or mini programs, enterprises can easily implement their solutions.

Implementation Solution

A comprehensive AI interview scenario typically includes the following key modules, that is, Real-Time Communication (RTC), Conversational AI, LLM, Text To Speech (TTS), and interview management backend. Below are the features and characteristics of each module in AI interview scenarios:
Feature Module
Application in AI Interview Scenarios
RTC
Powered by RTC Engine, RTC offers high-quality, low-latency audio and video connections and supports 720P/1080P/2K HD videos and 48kHz high-fidelity audio. Regardless of network environments, smooth interactions are ensured, simulating real interview scenarios.
Conversational AI
The Tencent Conversational AI enables businesses to flexibly connect with multiple large language models and build real-time audio and video interactions between AI and users. Powered by global low-delay transmission of Tencent Real-Time Communication (Tencent RTC), Conversational AI delivers natural and realistic conversation effects, making integration convenient and ready to use out of the box.
LLM
LLM can intelligently understand candidates’ speech content and context, accurately extract key points from responses, dynamically generate follow-up interview questions, and enable personalized and structured interview processes. LLM technology can also automatically adjust scoring criteria based on different position algorithms, improving the fairness and accuracy of evaluations.
TTS
Integration with third-party TTS services is supported, offering output in multiple languages and voice styles. AI interviewers can use TTS technology to present different tones and personalities, closely simulating real interviewers and enhancing the candidate experience.
Chat
Key business signaling is transmitted via Chat.
Interview Management Backend
Supported capabilities include question bank and interview design, automated scoring, data storage, visual analysis, and interview schedule management.

Solution Architecture



The following introduces the main process of AI interview integration.

Prerequisites

Preparing LLM

Conversational AI supports any LLM that complies with the OpenAI standard protocol, as well as LLM application development platforms such as Tencent Cloud Agent Development Platform (ADP), Dify, and Coze. For supported platforms, see Large Language Model Configuration.

Preparing TTS

Use Tencent Cloud TTS:
To use the TTS feature, you need to activate the TTS service for your application.
Go to Account Information to obtain the AppID.
Go to API Key Management to obtain the SecretId and SecretKey. SecretKey can only be viewed during key creation. Please save SecretKey promptly.
Go to Timbre List to obtain the adjustable timbre.
Use third-party or custom TTS: For currently supported TTS configurations, see Text-to-Speech Configuration.

Preparing RTC Engine

Note:
Conversational AI calls may incur usage fees. For more information, see Billing of Conversational AI Services.

Integration Steps

Business Flowchart



Step 1: Importing the RTC Engine SDK into the Project and Entering the RTC Engine Room

Step 2: Releasing the Audio Stream

Android&iOS&Flutter
Web&H5
Windows
Mac
You can call startLocalAudio to enable capture via microphone. When calling this API, you need to set the quality parameter to determine the capture mode. Although the parameter is named quality, a higher quality value does not always ensure better audio quality. Each business scenario has an optimal parameter setting (This parameter actually indicates the scene).
SPEECH mode is recommended for AI conversation scenarios. In this mode, the audio module of the SDK focuses on refining voice signals and filtering out ambient noise as much as possible. In addition, this mode can ensure the quality of audio data in environments with poor network quality. Therefore, this mode is particularly suitable for scenarios that focus on vocal communication, such as "video calls" and "online meetings".
Android
iOS&Mac
Flutter
// Enable capture via microphone and set the mode to SPEECH mode (strong denoising capability and resistance to poor network conditions).
mCloud.startLocalAudio(TRTCCloudDef.TRTC_AUDIO_QUALITY_SPEECH );
self.trtcCloud = [TRTCCloud sharedInstance];
// Enable capture via microphone and set the mode to SPEECH mode (strong denoising capability and resistance to poor network conditions).
[self.trtcCloud startLocalAudio:TRTCAudioQualitySpeech];
// Enable capture via microphone and set the mode to SPEECH mode (strong denoising capability and resistance to poor network conditions).
trtcCloud.startLocalAudio(TRTCAudioQuality.speech);
Use the trtc.startLocalAudio() method to enable the microphone and release the audio stream to the room.
await trtc.startLocalAudio();
Call startLocalAudio to enable capture via microphone. SPEECH mode is recommended for AI conversation scenarios.
// Enable capture via microphone and set the mode to SPEECH mode.
// Provide strong denoising capability and resistance to poor network conditions.
ITRTCCloud* trtcCloud = CRTCWindowsApp::GetInstance()->trtc_cloud_;
trtcCloud->startLocalAudio(TRTCAudioQualitySpeech);
Call startLocalAudio to enable capture via microphone. SPEECH mode is recommended for AI conversation scenarios.
// Enable capture via microphone and set the mode to SPEECH mode.
// Provide strong denoising capability and resistance to poor network conditions.
AppDelegate *appDelegate = (AppDelegate *)[[NSApplication sharedApplication] delegate];
[appDelegate.trtcCloud startLocalAudio:TRTCAudioQualitySpeech];

Step 3: Initiating an AI Conversation

Starting an AI Conversation: StartAIConversation
Call the StartAIConversation API in the business backend to initiate a real-time AI conversation. After the call succeeds, the AI chatbot will enter the RTC Engine room. Fill in the relevant LLM and TTS information from Prerequisites into LLMConfig and TTSConfig.
LLMConfig
TTSConfig
The following describes how to configure LLMConfig by using an LLM that follows the OpenAI standard protocol as an example.
Configuration Descriptions
Name
Type
Required
Description
LLMType
String
Yes
LLM type. Fill in openai for LLMs that comply with the OpenAI API protocol.
Model
String
Yes
Specific LLM name. For example, gpt-4o and deepseek-chat.
APIKey
String
Yes
LLM APIKey.
APIUrl
String
Yes
LLM APIUrl.
Streaming
Boolean
No
Whether streaming transmission is used. Default value: false. Recommended value: true.
SystemPrompt
String
No
System prompt.
Timeout
Float
No
Timeout period. Value range: 1–⁠50. Default value: 3 seconds (Unit: second).
History
Integer
No
Set the context rounds for LLM. Default value: 0 (No context management is provided). Maximum value: 50 (Context management is provided for the most recent 50 rounds).
MaxTokens
Integer
No
Maximum token limit for output text.
Temperature
Float
No
Sampling temperature.
TopP
Float
No
Selection range for sampling. This parameter controls the diversity of output tokens.
UserMessages
Object[]
No
User prompt.
MetaInfo
Object
No
Custom parameters. These parameters will be contained in the request body and passed to the LLM.
Configuration Example
"LLMConfig": {
"LLMType": "openai",
"Model": "gpt-4o",
"APIKey": "api-key",
"APIUrl": "https://api.openai.com/v1/chat/completions",
"Streaming": true,
"SystemPrompt": "You are a personal assistant",
"Timeout": 3.0,
"History": 5,
"MetaInfo": {},
"MaxTokens": 4096,
"Temperature": 0.8,
"TopP": 0.8,
"UserMessages": [
{
"Role": "user",
"Content": "content"
},
{
"Role": "assistant",
"Content": "content"
}
]
}
The following describes how to configure TTSConfig by using Tencent TTS as an example.
{
"TTSType": "tencent", // TTS type in String format. Valid values: "tencent" and "minixmax". Other vendors will be supported in future versions.
"AppId": Your application ID, // Required. Integer value.
"SecretId": "Your key ID", // Required. String value.
"SecretKey": "Your key", // Required. String value.
"VoiceType": 101001, // Required. Timbre ID in integer format. Standard timbre and premium timbre are supported. The premium timbre is more real, and its price differs from that of the standard timbre. See TTS Billing Overview for details. For the complete list of timbre IDs, see TTS Timbre List.
"Speed": 1.25, // Optional. Speech speed in integer format. Value range: [-2, 6]. Different values correspond to different speech speeds. -2: 0.6x; -1: 0.8x; 0: 1.0x (default value); 1: 1.2x; 2: 1.5x; 6: 2.5x. If you need a more fine-grained speech speed, the value can be accurate to 2 decimal places, such as 0.5, 1.25, and 2.81. For the conversion between the parameter value and actual speech speed, see Speech Speed Conversion.
"Volume": 5, // Optional. Volume level in integer format. Value range: [0, 10]. The valid values correspond to 11 volume levels. The default value is 0, representing the normal volume.
"PrimaryLanguage": 1, // Optional. Primary language in integer format. 1: Chinese (default value); 2: English; 3: Japanese.
"FastVoiceType": "xxxx" // Optional. Fast voice cloning parameter in String format.
"EmotionCategory":"angry",// Optional. String value. This parameter controls the emotion of the synthesized audio and is only available for multi-emotion timbres. Example values: neutral and sad.
"EmotionIntensity":150 // Optional. Integer value. This parameter controls the emotion intensity of the synthesized audio. Value range: [50, 200]. Default value: 100. This parameter takes effect only when EmotionCategory is not empty.
}
Descriptions of Currently Supported STTConfig, LLMConfig, and TTSConfig Configurations:
Note:
The value of RoomId should be consistent with that of RoomId used by the client for room entry. The room ID types (number/string) should also be the same. This means that the chatbot and user need to be in the same room.
The value of TargetUserId should be consistent with that of UserId used by the client for room entry.
The values of LLMConfig and TTSConfig are JSON strings and should be properly configured before you can successfully initiate a real-time AI conversation.

Step 4: Starting a Conversation

At this point, user can have a conversation with the AI chatbot.

Step 5: Stopping the AI Conversation and Exiting the RTC Engine Room

1. Stop the AI conversation task in the server. Call the StopAIConversation API in the business backend to terminate the conversation task.
2. Exit the RTC Engine room in the client. For details, see Exiting the Room.

Advanced Features

Far-Field Voice Suppression

During an AI interview, the AI chatbot may mistakenly recognize and respond to voices from other people on the user side as the user's speech. To avoid such cases as much as possible, we recommend enabling the far-field voice suppression capability. When calling the StartAIConversation API, you can set STTConfig.VadLevel to 2 or 3 for enhanced far-field voice suppression performance.

Conversation Latency Optimization

In Conversational AI, the latency of AI responses primarily consists of the first-token duration of the LLM and TTS, as well as the VadSilenceTime and RTC Engine channel latency in Automatic Speech Recognition (ASR).
With its self-developed multi-level optimal addressing algorithm, RTC Engine can perform scheduling across the entire network, achieving an average end-to-end latency of less than 300 ms. Compared with the first-token duration of the LLM and TTS, the latency of RTC Engine is extremely low. Therefore, developers generally do not need to pay attention to the latency of RTC Engine.
The duration of ASR is primarily determined by VadSilenceTime. If this value is set too high, conversation latency will increase. If this value is set too low, the interval for sentence segmentation in ASR will become too short. As a result, even a short pause in user speech may be misinterpreted as a complete sentence, prompting a request to the LLM.
The first-token duration of the LLM and TTS has the greatest impact on the latency of AI responses. Developers can obtain callbacks for the first-packet duration of the LLM and TTS through Conversational AI SDK Webhooks and AI Conversation Server Webhooks.

Metric Name Descriptions

Status Code
Description
asr_latency
ASR latency. Note: This metric includes the time set by VadSilenceTime when Conversational AI is started.
llm_network_latency
Network latency of LLM requests.
llm_first_token
LLM first-token duration, including network latency.
tts_network_latency
Network latency of TTS requests.
tts_first_frame_latency
TTS first-frame duration, including network latency.
tts_discontinuity
Number of occurrences of TTS request discontinuity. Discontinuity indicates that no result is returned for the next request after the current TTS streaming request is completed. This is usually caused by high TTS latency.
interruption
This metric indicates that this round of conversation is interrupted.
The most important metrics are llm_first_token (LLM first-token duration) and tts_first_frame_latency (TTS first-frame duration).
llm_first_token
We recommend keeping the LLM first-token duration below 2 seconds, and ideally as low as possible. In voice conversation scenarios, we recommend enabling LLM streaming return (Set Streaming in LLMConfig to true). This can significantly reduce latency. We do not recommend reasoning models such as DeepSeek-R1, because these LLMs have too high latency and are not suitable for voice conversations. If you are especially sensitive to conversation latency, you can choose models with a smaller parameter, many of which can maintain the first-token duration around 500 ms.
In addition, integrating some agents or workflow platforms may increase the first-token duration. Using an LLM with a prompt alone typically results in a lower first-token duration.
tts_first_frame_latency
The TTS first-frame duration typically ranges from 500 ms to 1000 ms. If the duration is particularly high, you can switch to a different timbre or TTS provider to optimize the conversation latency experience.

Receiving AI Conversation Subtitles and AI Status

You can use the Receive Custom Message feature of RTC Engine to listen for callbacks in the client to receive data such as real-time subtitles and AI status. The cmdID value is fixed at 1.

Receiving Real-Time Subtitles

Message format:
{
"type": 10000, // 10000 indicates real-time subtitles are delivered.
"sender": "user_a", // userid of the speaker.
"receiver": [], // List of receiver userid. The message is actually broadcast within the room.
"payload": {
"text":"", // Text recognized by ASR.
"start_time":"00:00:01", // Start time of a sentence.
"end_time":"00:00:02", // End time of a sentence.
"roundid": "xxxxx", // Unique identifier of a conversation round.
"end": true // If the value is true, the sentence is a complete sentence.
}
}

Receiving Chatbot Status

Message format:
{
"type": 10001, // Chatbot status.
"sender": "user_a", // userid of the sender, which is the chatbot ID in this case.
"receiver": [], // List of receiver userid. The message is actually broadcast within the room.
"payload": {
"roundid": "xxx", // Unique identifier of a conversation round.
"timestamp": 123,
"state": 1, // 1: Listening; 2: Thinking; 3: Speaking; 4: Interrupted; 5: Finished speaking.
}
}

Example code
Android
iOS&Mac
Web&H5
Windows
Flutter
@Override
public void onRecvCustomCmdMsg(String userId, int cmdID, int seq, byte[] message) {
String data = new String(message, StandardCharsets.UTF_8);
try {
JSONObject jsonData = new JSONObject(data);
Log.i(TAG, String.format("receive custom msg from %s cmdId: %d seq: %d data: %s", userId, cmdID, seq, data));
} catch (JSONException e) {
Log.e(TAG, "onRecvCustomCmdMsg err");
throw new RuntimeException(e);
}
}
func onRecvCustomCmdMsgUserId(_ userId: String, cmdID: Int, seq: UInt32, message: Data) {
if cmdID == 1 {
do {
if let jsonObject = try JSONSerialization.jsonObject(with: message, options: []) as? [String: Any] {
print("Dictionary: \\(jsonObject)")
} else {
print("The data is not a dictionary.")
}
} catch {
print("Error parsing JSON: \\(error)")
}
}
}
trtcClient.on(TRTC.EVENT.CUSTOM_MESSAGE, (event) => {
let data = new TextDecoder().decode(event.data);
let jsonData = JSON.parse(data);
console.log(`receive custom msg from ${event.userId} cmdId: ${event.cmdId} seq: ${event.seq} data: ${data}`);
if (jsonData.type == 10000 && jsonData.payload.end == false) {
// Subtitle intermediate state.
} else if (jsonData.type == 10000 && jsonData.payload.end == true) {
// That is all for this sentence.
}
});
void onRecvCustomCmdMsg(const char* userId, int cmdID, int seq,
const uint8_t* message, uint32_t msgLen) {
std::string data;
if (message != nullptr && msgLen > 0) {
data.assign(reinterpret_cast<const char*>(message), msgLen);
}
if (cmdID == 1) {
try {
auto j = nlohmann::json::parse(data);
std::cout << "Dictionary: " << j.dump() << std::endl;
} catch (const std::exception& e) {
std::cerr << "Error parsing JSON: " << e.what() << std::endl;
}
return;
}
}
void onRecvCustomCmdMsg(String userId, int cmdID, int seq, String message) {
if (cmdID == 1) {
try {
final decoded = json.decode(message);
if (decoded is Map<String, dynamic>) {
print('Dictionary: $decoded');
} else {
print('The data is not a dictionary. Raw: $decoded');
}
} catch (e) {
print('Error parsing JSON: $e');
}
return;
}
}
Note:
Additional client-side callbacks for Conversational AI are provided. For details, see Conversational AI Status Callback, Conversational AI Subtitle Callback, Conversational AI Metrics Callback, and Conversational AI Error Callback.

Proxying LLM Requests

The Conversational AI service supports the standard OpenAI protocol, enabling developers to implement customized LLMs in their own business. Developers can build an LLM API that is compatible with OpenAI APIs in their business backend, encapsulate context logic and RAG within the LLM requests, and forward these requests to third-party LLMs. The implementation process is as follows:

This flowchart displays the basic steps for custom context management. Developers can adjust and optimize this process based on their specific needs.
Example code
import time
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from typing import List, Optional
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_openai import ChatOpenAI

app = FastAPI(debug=True)

# Add CORS middleware.
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)

class Message(BaseModel):
role: str
content: str

class ChatRequest(BaseModel):
model: str
messages: List[Message]
temperature: Optional[float] = 0.7

class ChatResponse(BaseModel):
id: str
object: str
created: int
model: str
choices: List[dict]
usage: dict

@app.post("/v1/chat/completions")
async def chat_completions(request: ChatRequest):
try:
# Convert the request message to the LangChain message format.
langchain_messages = []
for msg in request.messages:
if msg.role == "system":
langchain_messages.append(SystemMessage(content=msg.content))
elif msg.role == "user":
langchain_messages.append(HumanMessage(content=msg.content))

# add more historys
# Use the ChatOpenAI model from LangChain.
chat = ChatOpenAI(temperature=request.temperature,
model_name=request.model)
response = chat(langchain_messages)
print(response)

# Construct a response that conforms to the OpenAI API format.
return ChatResponse(
id="chatcmpl-" + "".join([str(ord(c))
for c in response.content[:8]]),
object="chat.completion",
created=int(time.time()),
model=request.model,
choices=[{
"index": 0,
"message": {
"role": "assistant",
"content": response.content
},
"finish_reason": "stop"
}],
usage={
"prompt_tokens": -1, # LangChain does not provide this information. Thus, we use a placeholder value.
"completion_tokens": -1,
"total_tokens": -1
}
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))

if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)

Transmitting Custom Signaling via LLMs

To exclude certain content returned by the LLM from TTS, you can add a custom field metainfo to the content returned by the LLM. When the AI service detects metainfo, it will forward the data to the client SDK via Custom Message, enabling the pass-through of metainfo.
LLM sending method: When the LLM streams back chat.completion.chunk objects, a meta.info chunk is returned at the same time.
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-xxxx", "system_fingerprint": "fp_xxxx", "choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-xxxx", "system_fingerprint": "fp_xxxx", "choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]}
// Add the following custom message.
{"id":"chatcmpl-123","type":"meta.info","created":1694268190,"metainfo": {}}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-xxxx", "system_fingerprint": "fp_xxxx", "choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}
Client-side receiving method: Once the AI service detects metainfo, it will distribute the data via the Custom Message feature of RTC Engine. The client can receive the data through the onRecvCustomCmdMsg API in the SDK callback.
{
"type": 10002, // Custom message.
"sender": "user_a", // userid of the sender, which is the chatbot ID in this case.
"receiver": [], // List of receiver userid. The message is actually broadcast within the room.
"roundid": "xxxxxx",
"payload": {} // metainfo
}

Transmitting Key Signaling via Chat

If the business server needs to transmit the key signaling of some businesses to the client, we recommend using the Chat SDK for transmission. This can avoid the issue of metainfo signal loss due to the AI chatbot being interrupted.
Server-side message sending:
Client-side message reception:

Preventing the AI Interviewer from “Interrupting”

Using the Manual Round Mode

You can set the AgentConfig.TurnDetectionMode parameter to 1 in StartAIConversation to enable the manual round mode. When this mode is enabled, the client can determine whether to manually send a chat signaling message to trigger a new round of conversation upon receiving a subtitle message.
Parameter Descriptions
Parameter
Type
Description
TurnDetectionMode
Integer
Controls the trigger mode for a new round of conversation. Default value: 0.
0 means a new round of conversation is automatically triggered once the server ASR detects a complete sentence.
1 means the client can determine whether to manually send a chat signaling message to trigger a new round of conversation upon receiving a subtitle message.
Example value: 0.
Chat Signaling
{
"type": 20000, // Custom text message sent by the client.
"sender": "user_a", // Sender userid. The server will check whether the userid is valid.
"receiver": ["user_bot"], // List of receiver userid. Fill in the chatbot userid. The server will check whether the userid is valid.
"payload": {
"id": "uuid", // Message ID. You can use a UUID. The ID is used for troubleshooting.
"message": "xxx", // Message content.
"timestamp": 123 // Timestamp. The timestamp is used for troubleshooting.
}
}
Example code
Android
iOS
Web&H5
public void sendMessage() {
try {
int cmdID = 0x2;

long time = System.currentTimeMillis();
String timeStamp = String.valueOf(time/1000);
JSONObject payLoadContent = new JSONObject();
payLoadContent.put("timestamp", timeStamp);
payLoadContent.put("message", message);
payLoadContent.put("id", String.valueOf(GenerateTestUserSig.SDKAPPID) + "_" + mRoomId);

String[] receivers = new String[]{robotUserId};

JSONObject interruptContent = new JSONObject();
interruptContent.put("type", 20000);
interruptContent.put("sender", mUserId);
interruptContent.put("receiver", new JSONArray(receivers));
interruptContent.put("payload", payLoadContent);

String interruptString = interruptContent.toString();
byte[] data = interruptString.getBytes("UTF-8");

Log.i(TAG, "sendInterruptCode :" + interruptString);

mTRTCCloud.sendCustomCmdMsg(cmdID, data, true, true);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} catch (JSONException e) {
throw new RuntimeException(e);
}
}
@objc func sendMessage() {
let cmdId = 0x2
let timestamp = Int(Date().timeIntervalSince1970 * 1000)
let payload = [
"id": userId + "_\\(roomId)" + "_\\(timestamp)", // Message ID. You can use a UUID. The ID is used for troubleshooting.
"timestamp": timestamp, // Timestamp. The timestamp is used for troubleshooting.
"message": "xxx" // Message content.
] as [String : Any]
let dict = [
"type": 20001,
"sender": userId,
"receiver": [botId],
"payload": payload
] as [String : Any]
do {
let jsonData = try JSONSerialization.data(withJSONObject: dict, options: [])
self.trtcCloud.sendCustomCmdMsg(cmdId, data: jsonData, reliable: true, ordered: true)
} catch {
print("Error serializing dictionary to JSON: \\(error)")
}
}
const message = {
"type": 20000,
"sender": "user_a",
"receiver": ["user_bot"],
"payload": {
"id": "uuid",
"timestamp": 123,
"message": "xxx", // Message content.
}
};

trtc.sendCustomMessage({
cmdId: 2,
data: new TextEncoder().encode(JSON.stringify(message)).buffer
});
Note:
Custom messages are limited to a maximum size of 1 KB per message packet (data size). To send messages larger than 1 KB, you can use the Chat Signaling Channel.

Interruption Latency Optimization

If you notice a high latency when interrupting AI speech during a conversation, you can lower the values of AgentConfig.InterruptSpeechDuration and STTConfig.VadSilenceTime parameters in StartAIConversation to reduce the interruption latency. We also recommend enabling the Far-Field Voice Suppression feature to reduce the likelihood of false interruptions.
Parameter Descriptions
Parameter
Type
Description
AgentConfig.InterruptSpeechDuration
Integer
Used when InterruptMode is 0. Unit: millisecond. Default value: 500 ms. This means that the server will interrupt when it detects continuous human speech for the specified InterruptSpeechDuration duration.
Example value: 500
STTConfig.VadSilenceTime
Integer
ASR VAD time. Value range: 240–2000. Default value: 1000. Unit: ms. A smaller value results in faster sentence segmentation in ASR.
Example value: 500

Server Callbacks

Note:
The callback address is configured in the RTC Engine console for Conversational AI callbacks.
Developers can use server callbacks together with Room&Media Webhooks of RTC Engine to enable more features.

On-Cloud Recording

The newly upgraded on-cloud recording feature of RTC Engine uses the internal real-time recording cluster in RTC Engine for audio and video recording, offering a more complete and unified recording experience.
Single-stream recording: Through the on-cloud recording feature of RTC Engine, you can record the audio stream of each user in a room into a separate file.



Mixed-stream recording: All audio media streams from the same room are mixed and recorded into a single file.



Note:
For a detailed introduction and activation instructions of on-cloud recording in RTC Engine, see RTC Engine On-Cloud Recording Instructions.

FAQs

Why Is the Chatbot Not Speaking?

1. Check whether capture via microphone is enabled and the audio stream is released in the client.
2. Check whether data such as real-time subtitles and AI status can be received via the Receive Custom Message feature of RTC Engine. If the data cannot be received, check whether the value of RoomId used when the StartAIConversation API is called matches that of RoomId used by the client for room entry, and the room ID types (number/string) should also be the same (that is, the chatbot and user need to be in the same room). Additionally, check whether the value of TargetUserId matches that of UserId used by the client for room entry.
3. If you can receive subtitles for your own speech but not the reply subtitles of the chatbot, check your LLM-related configurations.
4. If you can receive the reply subtitles of the chatbot but cannot hear its voice, check your TTS-related configurations.
5. You can obtain LLM and TTS error messages through Conversational AI SDK Webhooks and AI Conversation Server Webhooks callbacks, allowing developers to troubleshoot issues efficiently.
Service Category
Error Code
Error Description
ASR
30100
Request timeout.
30102
Internal error.
LLM
30200
LLM request timeout.
30201
LLM request frequency limited.
30202
LLM service return failure.
TTS
30300
TTS request timeout.
30301
TTS request frequency limited.
30302
TTS service return failure.

LLM Timeout Errors

If you encounter an LLM Timeout error, such as llm error Timeout on reading data from socket, it usually indicates that the LLM request timed out. You can increase the value of the Timeout parameter in LLMConfig (the default value is 3 seconds). In addition, when the first-token duration of the LLM exceeds 3 seconds, the conversation latency is relatively high, which may impact the AI conversation experience. In the case of no special requirements, we recommend optimizing the first-token duration of the LLM. See Conversation Latency Optimization.

Tencent TTS Errors

If you encounter a Tencent TTS error, such as the following error:
TencentTTS chunk error {'Response': {'RequestId': 'xxxxxx', 'Error': {'Code': 'AuthorizationFailed', 'Message': "Please check http header 'Authorization' field or request parameter"}}}
You can troubleshoot from the following aspects:
1. Check whether the TTS service is activated for your application.
2. Check whether the APPID, SecretId, and SecretKey are filled in correctly.
3. Check whether you have claimed a free TTS resource package.
4. Check whether the specified timbre ID is included in the free resource package.
See the TTS section in Prerequisites and complete the steps again. In addition, if a sub-account is used, you should grant the sub-account TTS permissions.

Why a Single-Word User Response Does Not Trigger an LLM Request?

When user responds with a single word, such as "Yes" or "Ok", if no LLM request is triggered, check whether the AgentConfig.FilterOneWord parameter in StartAIConversation is set to false (The default value is true).
Parameter
Type
Description
FilterOneWord
Boolean
Whether to filter out single-word sentences from the user. true: Filter; false means: Not filter. Default value: true.
Example value: true.

Exception Error Handling

When the RTC Engine SDK encounters an unrecoverable error, the error is thrown in the onError callback. For details, see Error Codes.
UserSig-related errors.
UserSig verification failure can lead to room entry failure. You can use the UserSig tool for verification.
Enumeration
Value
Description
ERR_TRTC_INVALID_USER_SIG
-3320
The room entry parameter UserSig is incorrect. Check whether TRTCParams.userSig is empty.
ERR_TRTC_USER_SIG_CHECK_FAILED
-100018
The UserSig verification fails. Check whether the parameter TRTCParams.userSig is filled in correctly or has expired.
Errors related to room entry and exit.
If room entry fails, first check whether the room entry parameters are correct. The room entry and exit APIs should be called in pairs. This means that even if room entry fails, the room exit API should still be called.
Enumeration
Value
Description
ERR_TRTC_CONNECT_SERVER_TIMEOUT
-3308
The room entry request times out. Check whether the Internet connection is lost or if a VPN is enabled. You may also attempt to switch to 4G for testing.
ERR_TRTC_INVALID_SDK_APPID
-3317
The room entry parameter SDKAppId is incorrect. Check whether TRTCParams.sdkAppId is empty.
ERR_TRTC_INVALID_ROOM_ID
-3318
The room entry parameter roomId is incorrect. Check whether TRTCParams.roomId or TRTCParams.strRoomId is empty. Note that roomId and strRoomId cannot be used interchangeably.
ERR_TRTC_INVALID_USER_ID
-3319
The room entry parameter UserID is incorrect. Check whether TRTCParams.userId is empty.
ERR_TRTC_ENTER_ROOM_REFUSED
-3340
The room entry request is denied. Check whether enterRoom is called consecutively to enter a room with the same ID.
Device-related errors.
Device-related errors can be monitored. Users are prompted via UI if related errors occur.
Enumeration
Value
Description
ERR_MIC_START_FAIL
-1302
Failed to turn the microphone on. This may occur when there is a problem with the microphone configuration program (driver) on Windows or macOS. Disable and re-enable the microphone, restart the microphone, or update the configuration program.
ERR_SPEAKER_START_FAIL
-1321
Failed to turn the speaker on. This may occur when there is a problem with the speaker configuration program (driver) on Windows or macOS. Disable and re-enable the speaker, restart the speaker, or update the configuration program.
ERR_MIC_OCCUPY
-1319
The microphone is occupied. For example, when the user is currently having a call on a mobile device, the microphone will fail to turn on.

Supporting Products for the Solution

System Level
Product Name
Application Scenarios
Access Layer
Provides low-latency, high-quality real-time audio and video interaction solutions, serving as the foundational capability for audio and video call scenarios.
Access Layer
Completes the transmission of key business signaling.
Cloud Services
Enables real-time audio and video interactions between AI and users and develops Conversational AI capabilities tailored to business scenarios.
Cloud Services
Provides identity authentication and anti-cheating capabilities.
LLM
Serves as the brain of smart customer services and offers multiple agent development frameworks such as LLM+RAG, Workflow, and Multi-agent.
Data Storage
Provides storage services for audio recording files and audio slicing files.

Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback