Feature Module | Application in AI Interview Scenarios |
RTC | Powered by RTC Engine, RTC offers high-quality, low-latency audio and video connections and supports 720P/1080P/2K HD videos and 48kHz high-fidelity audio. Regardless of network environments, smooth interactions are ensured, simulating real interview scenarios. |
Conversational AI | The Tencent Conversational AI enables businesses to flexibly connect with multiple large language models and build real-time audio and video interactions between AI and users. Powered by global low-delay transmission of Tencent Real-Time Communication (Tencent RTC), Conversational AI delivers natural and realistic conversation effects, making integration convenient and ready to use out of the box. |
LLM | LLM can intelligently understand candidates’ speech content and context, accurately extract key points from responses, dynamically generate follow-up interview questions, and enable personalized and structured interview processes. LLM technology can also automatically adjust scoring criteria based on different position algorithms, improving the fairness and accuracy of evaluations. |
TTS | Integration with third-party TTS services is supported, offering output in multiple languages and voice styles. AI interviewers can use TTS technology to present different tones and personalities, closely simulating real interviewers and enhancing the candidate experience. |
Chat | Key business signaling is transmitted via Chat. |
Interview Management Backend | Supported capabilities include question bank and interview design, automated scoring, data storage, visual analysis, and interview schedule management. |


// Enable capture via microphone and set the mode to SPEECH mode (strong denoising capability and resistance to poor network conditions).mCloud.startLocalAudio(TRTCCloudDef.TRTC_AUDIO_QUALITY_SPEECH );
self.trtcCloud = [TRTCCloud sharedInstance];// Enable capture via microphone and set the mode to SPEECH mode (strong denoising capability and resistance to poor network conditions).[self.trtcCloud startLocalAudio:TRTCAudioQualitySpeech];
// Enable capture via microphone and set the mode to SPEECH mode (strong denoising capability and resistance to poor network conditions).trtcCloud.startLocalAudio(TRTCAudioQuality.speech);
await trtc.startLocalAudio();
// Enable capture via microphone and set the mode to SPEECH mode.// Provide strong denoising capability and resistance to poor network conditions.ITRTCCloud* trtcCloud = CRTCWindowsApp::GetInstance()->trtc_cloud_;trtcCloud->startLocalAudio(TRTCAudioQualitySpeech);
// Enable capture via microphone and set the mode to SPEECH mode.// Provide strong denoising capability and resistance to poor network conditions.AppDelegate *appDelegate = (AppDelegate *)[[NSApplication sharedApplication] delegate];[appDelegate.trtcCloud startLocalAudio:TRTCAudioQualitySpeech];
LLMConfig and TTSConfig.Name | Type | Required | Description |
LLMType | String | Yes | LLM type. Fill in openai for LLMs that comply with the OpenAI API protocol. |
Model | String | Yes | Specific LLM name. For example, gpt-4o and deepseek-chat. |
APIKey | String | Yes | LLM APIKey. |
APIUrl | String | Yes | LLM APIUrl. |
Streaming | Boolean | No | Whether streaming transmission is used. Default value: false. Recommended value: true. |
SystemPrompt | String | No | System prompt. |
Timeout | Float | No | Timeout period. Value range: 1–50. Default value: 3 seconds (Unit: second). |
History | Integer | No | Set the context rounds for LLM. Default value: 0 (No context management is provided). Maximum value: 50 (Context management is provided for the most recent 50 rounds). |
MaxTokens | Integer | No | Maximum token limit for output text. |
Temperature | Float | No | Sampling temperature. |
TopP | Float | No | Selection range for sampling. This parameter controls the diversity of output tokens. |
UserMessages | Object[] | No | User prompt. |
MetaInfo | Object | No | Custom parameters. These parameters will be contained in the request body and passed to the LLM. |
"LLMConfig": {"LLMType": "openai","Model": "gpt-4o","APIKey": "api-key","APIUrl": "https://api.openai.com/v1/chat/completions","Streaming": true,"SystemPrompt": "You are a personal assistant","Timeout": 3.0,"History": 5,"MetaInfo": {},"MaxTokens": 4096,"Temperature": 0.8,"TopP": 0.8,"UserMessages": [{"Role": "user","Content": "content"},{"Role": "assistant","Content": "content"}]}
{"TTSType": "tencent", // TTS type in String format. Valid values: "tencent" and "minixmax". Other vendors will be supported in future versions."AppId": Your application ID, // Required. Integer value."SecretId": "Your key ID", // Required. String value."SecretKey": "Your key", // Required. String value."VoiceType": 101001, // Required. Timbre ID in integer format. Standard timbre and premium timbre are supported. The premium timbre is more real, and its price differs from that of the standard timbre. See TTS Billing Overview for details. For the complete list of timbre IDs, see TTS Timbre List."Speed": 1.25, // Optional. Speech speed in integer format. Value range: [-2, 6]. Different values correspond to different speech speeds. -2: 0.6x; -1: 0.8x; 0: 1.0x (default value); 1: 1.2x; 2: 1.5x; 6: 2.5x. If you need a more fine-grained speech speed, the value can be accurate to 2 decimal places, such as 0.5, 1.25, and 2.81. For the conversion between the parameter value and actual speech speed, see Speech Speed Conversion."Volume": 5, // Optional. Volume level in integer format. Value range: [0, 10]. The valid values correspond to 11 volume levels. The default value is 0, representing the normal volume."PrimaryLanguage": 1, // Optional. Primary language in integer format. 1: Chinese (default value); 2: English; 3: Japanese."FastVoiceType": "xxxx" // Optional. Fast voice cloning parameter in String format."EmotionCategory":"angry",// Optional. String value. This parameter controls the emotion of the synthesized audio and is only available for multi-emotion timbres. Example values: neutral and sad."EmotionIntensity":150 // Optional. Integer value. This parameter controls the emotion intensity of the synthesized audio. Value range: [50, 200]. Default value: 100. This parameter takes effect only when EmotionCategory is not empty.}
STTConfig, LLMConfig, and TTSConfig Configurations:RoomId should be consistent with that of RoomId used by the client for room entry. The room ID types (number/string) should also be the same. This means that the chatbot and user need to be in the same room.TargetUserId should be consistent with that of UserId used by the client for room entry.LLMConfig and TTSConfig are JSON strings and should be properly configured before you can successfully initiate a real-time AI conversation.STTConfig.VadLevel to 2 or 3 for enhanced far-field voice suppression performance.Status Code | Description |
asr_latency | ASR latency. Note: This metric includes the time set by VadSilenceTime when Conversational AI is started. |
llm_network_latency | Network latency of LLM requests. |
llm_first_token | LLM first-token duration, including network latency. |
tts_network_latency | Network latency of TTS requests. |
tts_first_frame_latency | TTS first-frame duration, including network latency. |
tts_discontinuity | Number of occurrences of TTS request discontinuity. Discontinuity indicates that no result is returned for the next request after the current TTS streaming request is completed. This is usually caused by high TTS latency. |
interruption | This metric indicates that this round of conversation is interrupted. |
Streaming in LLMConfig to true). This can significantly reduce latency. We do not recommend reasoning models such as DeepSeek-R1, because these LLMs have too high latency and are not suitable for voice conversations. If you are especially sensitive to conversation latency, you can choose models with a smaller parameter, many of which can maintain the first-token duration around 500 ms.{"type": 10000, // 10000 indicates real-time subtitles are delivered."sender": "user_a", // userid of the speaker."receiver": [], // List of receiver userid. The message is actually broadcast within the room."payload": {"text":"", // Text recognized by ASR."start_time":"00:00:01", // Start time of a sentence."end_time":"00:00:02", // End time of a sentence."roundid": "xxxxx", // Unique identifier of a conversation round."end": true // If the value is true, the sentence is a complete sentence.}}
{"type": 10001, // Chatbot status."sender": "user_a", // userid of the sender, which is the chatbot ID in this case."receiver": [], // List of receiver userid. The message is actually broadcast within the room."payload": {"roundid": "xxx", // Unique identifier of a conversation round."timestamp": 123,"state": 1, // 1: Listening; 2: Thinking; 3: Speaking; 4: Interrupted; 5: Finished speaking.}}
@Overridepublic void onRecvCustomCmdMsg(String userId, int cmdID, int seq, byte[] message) {String data = new String(message, StandardCharsets.UTF_8);try {JSONObject jsonData = new JSONObject(data);Log.i(TAG, String.format("receive custom msg from %s cmdId: %d seq: %d data: %s", userId, cmdID, seq, data));} catch (JSONException e) {Log.e(TAG, "onRecvCustomCmdMsg err");throw new RuntimeException(e);}}
func onRecvCustomCmdMsgUserId(_ userId: String, cmdID: Int, seq: UInt32, message: Data) {if cmdID == 1 {do {if let jsonObject = try JSONSerialization.jsonObject(with: message, options: []) as? [String: Any] {print("Dictionary: \\(jsonObject)")} else {print("The data is not a dictionary.")}} catch {print("Error parsing JSON: \\(error)")}}}
trtcClient.on(TRTC.EVENT.CUSTOM_MESSAGE, (event) => {let data = new TextDecoder().decode(event.data);let jsonData = JSON.parse(data);console.log(`receive custom msg from ${event.userId} cmdId: ${event.cmdId} seq: ${event.seq} data: ${data}`);if (jsonData.type == 10000 && jsonData.payload.end == false) {// Subtitle intermediate state.} else if (jsonData.type == 10000 && jsonData.payload.end == true) {// That is all for this sentence.}});
void onRecvCustomCmdMsg(const char* userId, int cmdID, int seq,const uint8_t* message, uint32_t msgLen) {std::string data;if (message != nullptr && msgLen > 0) {data.assign(reinterpret_cast<const char*>(message), msgLen);}if (cmdID == 1) {try {auto j = nlohmann::json::parse(data);std::cout << "Dictionary: " << j.dump() << std::endl;} catch (const std::exception& e) {std::cerr << "Error parsing JSON: " << e.what() << std::endl;}return;}}
void onRecvCustomCmdMsg(String userId, int cmdID, int seq, String message) {if (cmdID == 1) {try {final decoded = json.decode(message);if (decoded is Map<String, dynamic>) {print('Dictionary: $decoded');} else {print('The data is not a dictionary. Raw: $decoded');}} catch (e) {print('Error parsing JSON: $e');}return;}}

import timefrom fastapi import FastAPI, HTTPExceptionfrom fastapi.middleware.cors import CORSMiddlewarefrom pydantic import BaseModelfrom typing import List, Optionalfrom langchain_core.messages import HumanMessage, SystemMessagefrom langchain_openai import ChatOpenAIapp = FastAPI(debug=True)# Add CORS middleware.app.add_middleware(CORSMiddleware,allow_origins=["*"],allow_credentials=True,allow_methods=["*"],allow_headers=["*"],)class Message(BaseModel):role: strcontent: strclass ChatRequest(BaseModel):model: strmessages: List[Message]temperature: Optional[float] = 0.7class ChatResponse(BaseModel):id: strobject: strcreated: intmodel: strchoices: List[dict]usage: dict@app.post("/v1/chat/completions")async def chat_completions(request: ChatRequest):try:# Convert the request message to the LangChain message format.langchain_messages = []for msg in request.messages:if msg.role == "system":langchain_messages.append(SystemMessage(content=msg.content))elif msg.role == "user":langchain_messages.append(HumanMessage(content=msg.content))# add more historys# Use the ChatOpenAI model from LangChain.chat = ChatOpenAI(temperature=request.temperature,model_name=request.model)response = chat(langchain_messages)print(response)# Construct a response that conforms to the OpenAI API format.return ChatResponse(id="chatcmpl-" + "".join([str(ord(c))for c in response.content[:8]]),object="chat.completion",created=int(time.time()),model=request.model,choices=[{"index": 0,"message": {"role": "assistant","content": response.content},"finish_reason": "stop"}],usage={"prompt_tokens": -1, # LangChain does not provide this information. Thus, we use a placeholder value."completion_tokens": -1,"total_tokens": -1})except Exception as e:raise HTTPException(status_code=500, detail=str(e))if __name__ == "__main__":import uvicornuvicorn.run(app, host="0.0.0.0", port=8000)
metainfo to the content returned by the LLM. When the AI service detects metainfo, it will forward the data to the client SDK via Custom Message, enabling the pass-through of metainfo.chat.completion.chunk objects, a meta.info chunk is returned at the same time.{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-xxxx", "system_fingerprint": "fp_xxxx", "choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-xxxx", "system_fingerprint": "fp_xxxx", "choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]}// Add the following custom message.{"id":"chatcmpl-123","type":"meta.info","created":1694268190,"metainfo": {}}{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-xxxx", "system_fingerprint": "fp_xxxx", "choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}
metainfo, it will distribute the data via the Custom Message feature of RTC Engine. The client can receive the data through the onRecvCustomCmdMsg API in the SDK callback.{"type": 10002, // Custom message."sender": "user_a", // userid of the sender, which is the chatbot ID in this case."receiver": [], // List of receiver userid. The message is actually broadcast within the room."roundid": "xxxxxx","payload": {} // metainfo}
metainfo signal loss due to the AI chatbot being interrupted.AgentConfig.TurnDetectionMode parameter to 1 in StartAIConversation to enable the manual round mode. When this mode is enabled, the client can determine whether to manually send a chat signaling message to trigger a new round of conversation upon receiving a subtitle message.Parameter | Type | Description |
TurnDetectionMode | Integer | Controls the trigger mode for a new round of conversation. Default value: 0. 0 means a new round of conversation is automatically triggered once the server ASR detects a complete sentence. 1 means the client can determine whether to manually send a chat signaling message to trigger a new round of conversation upon receiving a subtitle message. Example value: 0. |
{"type": 20000, // Custom text message sent by the client."sender": "user_a", // Sender userid. The server will check whether the userid is valid."receiver": ["user_bot"], // List of receiver userid. Fill in the chatbot userid. The server will check whether the userid is valid."payload": {"id": "uuid", // Message ID. You can use a UUID. The ID is used for troubleshooting."message": "xxx", // Message content."timestamp": 123 // Timestamp. The timestamp is used for troubleshooting.}}
public void sendMessage() {try {int cmdID = 0x2;long time = System.currentTimeMillis();String timeStamp = String.valueOf(time/1000);JSONObject payLoadContent = new JSONObject();payLoadContent.put("timestamp", timeStamp);payLoadContent.put("message", message);payLoadContent.put("id", String.valueOf(GenerateTestUserSig.SDKAPPID) + "_" + mRoomId);String[] receivers = new String[]{robotUserId};JSONObject interruptContent = new JSONObject();interruptContent.put("type", 20000);interruptContent.put("sender", mUserId);interruptContent.put("receiver", new JSONArray(receivers));interruptContent.put("payload", payLoadContent);String interruptString = interruptContent.toString();byte[] data = interruptString.getBytes("UTF-8");Log.i(TAG, "sendInterruptCode :" + interruptString);mTRTCCloud.sendCustomCmdMsg(cmdID, data, true, true);} catch (UnsupportedEncodingException e) {e.printStackTrace();} catch (JSONException e) {throw new RuntimeException(e);}}
@objc func sendMessage() {let cmdId = 0x2let timestamp = Int(Date().timeIntervalSince1970 * 1000)let payload = ["id": userId + "_\\(roomId)" + "_\\(timestamp)", // Message ID. You can use a UUID. The ID is used for troubleshooting."timestamp": timestamp, // Timestamp. The timestamp is used for troubleshooting."message": "xxx" // Message content.] as [String : Any]let dict = ["type": 20001,"sender": userId,"receiver": [botId],"payload": payload] as [String : Any]do {let jsonData = try JSONSerialization.data(withJSONObject: dict, options: [])self.trtcCloud.sendCustomCmdMsg(cmdId, data: jsonData, reliable: true, ordered: true)} catch {print("Error serializing dictionary to JSON: \\(error)")}}
const message = {"type": 20000,"sender": "user_a","receiver": ["user_bot"],"payload": {"id": "uuid","timestamp": 123,"message": "xxx", // Message content.}};trtc.sendCustomMessage({cmdId: 2,data: new TextEncoder().encode(JSON.stringify(message)).buffer});
AgentConfig.InterruptSpeechDuration and STTConfig.VadSilenceTime parameters in StartAIConversation to reduce the interruption latency. We also recommend enabling the Far-Field Voice Suppression feature to reduce the likelihood of false interruptions.Parameter | Type | Description |
AgentConfig.InterruptSpeechDuration | Integer | Used when InterruptMode is 0. Unit: millisecond. Default value: 500 ms. This means that the server will interrupt when it detects continuous human speech for the specified InterruptSpeechDuration duration. Example value: 500 |
STTConfig.VadSilenceTime | Integer | ASR VAD time. Value range: 240–2000. Default value: 1000. Unit: ms. A smaller value results in faster sentence segmentation in ASR. Example value: 500 |


RoomId used when the StartAIConversation API is called matches that of RoomId used by the client for room entry, and the room ID types (number/string) should also be the same (that is, the chatbot and user need to be in the same room). Additionally, check whether the value of TargetUserId matches that of UserId used by the client for room entry.Service Category | Error Code | Error Description |
ASR | 30100 | Request timeout. |
| 30102 | Internal error. |
LLM | 30200 | LLM request timeout. |
| 30201 | LLM request frequency limited. |
| 30202 | LLM service return failure. |
TTS | 30300 | TTS request timeout. |
| 30301 | TTS request frequency limited. |
| 30302 | TTS service return failure. |
llm error Timeout on reading data from socket, it usually indicates that the LLM request timed out. You can increase the value of the Timeout parameter in LLMConfig (the default value is 3 seconds). In addition, when the first-token duration of the LLM exceeds 3 seconds, the conversation latency is relatively high, which may impact the AI conversation experience. In the case of no special requirements, we recommend optimizing the first-token duration of the LLM. See Conversation Latency Optimization.TencentTTS chunk error {'Response': {'RequestId': 'xxxxxx', 'Error': {'Code': 'AuthorizationFailed', 'Message': "Please check http header 'Authorization' field or request parameter"}}}
AgentConfig.FilterOneWord parameter in StartAIConversation is set to false (The default value is true).Parameter | Type | Description |
FilterOneWord | Boolean | Whether to filter out single-word sentences from the user. true: Filter; false means: Not filter. Default value: true. Example value: true. |
onError callback. For details, see Error Codes.Enumeration | Value | Description |
ERR_TRTC_INVALID_USER_SIG | -3320 | The room entry parameter UserSig is incorrect. Check whether TRTCParams.userSig is empty. |
ERR_TRTC_USER_SIG_CHECK_FAILED | -100018 | The UserSig verification fails. Check whether the parameter TRTCParams.userSig is filled in correctly or has expired. |
Enumeration | Value | Description |
ERR_TRTC_CONNECT_SERVER_TIMEOUT | -3308 | The room entry request times out. Check whether the Internet connection is lost or if a VPN is enabled. You may also attempt to switch to 4G for testing. |
ERR_TRTC_INVALID_SDK_APPID | -3317 | The room entry parameter SDKAppId is incorrect. Check whether TRTCParams.sdkAppId is empty. |
ERR_TRTC_INVALID_ROOM_ID | -3318 | The room entry parameter roomId is incorrect. Check whether TRTCParams.roomId or TRTCParams.strRoomId is empty. Note that roomId and strRoomId cannot be used interchangeably. |
ERR_TRTC_INVALID_USER_ID | -3319 | The room entry parameter UserID is incorrect. Check whether TRTCParams.userId is empty. |
ERR_TRTC_ENTER_ROOM_REFUSED | -3340 | The room entry request is denied. Check whether enterRoom is called consecutively to enter a room with the same ID. |
Enumeration | Value | Description |
ERR_MIC_START_FAIL | -1302 | Failed to turn the microphone on. This may occur when there is a problem with the microphone configuration program (driver) on Windows or macOS. Disable and re-enable the microphone, restart the microphone, or update the configuration program. |
ERR_SPEAKER_START_FAIL | -1321 | Failed to turn the speaker on. This may occur when there is a problem with the speaker configuration program (driver) on Windows or macOS. Disable and re-enable the speaker, restart the speaker, or update the configuration program. |
ERR_MIC_OCCUPY | -1319 | The microphone is occupied. For example, when the user is currently having a call on a mobile device, the microphone will fail to turn on. |
System Level | Product Name | Application Scenarios |
Access Layer | Provides low-latency, high-quality real-time audio and video interaction solutions, serving as the foundational capability for audio and video call scenarios. | |
Access Layer | Completes the transmission of key business signaling. | |
Cloud Services | Enables real-time audio and video interactions between AI and users and develops Conversational AI capabilities tailored to business scenarios. | |
Cloud Services | Provides identity authentication and anti-cheating capabilities. | |
LLM | Serves as the brain of smart customer services and offers multiple agent development frameworks such as LLM+RAG, Workflow, and Multi-agent. | |
Data Storage | Provides storage services for audio recording files and audio slicing files. |
Feedback