Text generation

Download

Focus Mode

Font Size

Last updated: 2026-05-13 14:42:32

Overview
Text generation is one of the platform's core capabilities, supporting multiple large language models (LLMs) and covering scenarios such as conversational interactions, content creation, code generation, and reasoning and analysis. The platform is compatible with OpenAI Completions API and Anthropic API protocols, allowing you to directly access it using OpenAI SDK or any compatible client.
Supported Protocols by Model:
Model Name
Model (API Parameter)
OpenAI Completions
Anthropic
DeepSeek-V4-Flash
deepseek-v4-flash
✅
✅
DeepSeek-V4-Pro
deepseek-v4-pro
✅
✅
Deepseek-v3.2
deepseek-v3.2
✅
✅
GLM-5
glm-5
✅
✅
GLM-5V-Turbo
glm-5v-turbo
✅
✅
GLM-5-Turbo
glm-5-turbo
✅
❌
GLM-5.1
glm-5.1
✅
✅
Kimi-K2.6
kimi-k2.6
✅
✅
Kimi-K2.5
kimi-k2.5
✅
✅
MiniMax-M2.5
minimax-m2.5
✅
✅
MiniMax-M2.7
minimax-m2.7
✅
✅
OpenAI API Usage
BaseURL
Singapore: https://tokenhub-intl.tencentcloudmaas.com/v1
Guangzhou: https://tokenhub.tencentcloudmaas.com/v1
Request Parameters
Parameter Name
Required
Type
Description
model
Yes
String
Service ID, which can be uniformly viewed from the online inference service and Service ID field.
For services created by default on the platform, the service ID is the same as the model name, for example: glm-5.
For custom services created by users, the service ID format is: ep-xxxxxxxx, which can be viewed on the Online Inference Service page.
messages
Yes
Array
Message array for chat context. For details, see messages parameter description.
stream
No
Boolean
Whether to enable streaming output.
Value range: true / false, with a default value of false.
temperature
No
Float
Output randomness.
Value range: [0.0, 2.0].
top_p
No
Float
Output diversity (nucleus sampling).
Value range: [0.0, 1.0].
max_tokens
No
Integer
Limit the maximum number of output tokens.
stop
No
Array of String
Specify the stop sequences for model output. When the generated result encounters any of the specified sequences, the model will stop outputting, and the response will not include that stop sequence. Supports passing a single string or an array of strings, with a maximum of 4 sequences.
For example: if you want to have the model generate a list of 10 items and prevent it from continuing to write the 11th item, you can set this parameter to: ["11."]. 
tools
No
Array
Function Calling tool definitions list.
tool_choice
No
String
Tool invocation policy: none (disable) / auto (auto-select) / required (force-invoke).
seed
No
Integer
Random seed for result reproducibility. When the same seed value is used across multiple requests while keeping other parameters consistent, the model is more likely to return consistent or very similar results.
Messages Parameter
Each object in the message array contains the following fields:
Field
Type
Description
role
String
Role: system (system prompt), user (user), assistant (assistant), tool (tool response)
content
String
Message text content.
Message sequence rule: [system (optional) → user → assistant → user → ...], must end with the user role.
Response Parameters
Parameter Name
Type
Description
id
String
Request unique identifier.
object
String
Object type, fixed as chat.completion.
created
Integer
Creation time (Unix timestamp).
model
String
The model name actually used.
choices
Array
The list of candidate results returned by the model for the same request. For details, see choices array elements.
usage
Object
Token consumption statistics.
Choices Array Elements
Field
Type
Description
index
Integer
Option index.
message
Object
Response message, containing role and content.
finish_reason
String
End reason: stop (normal end), length (reached maximum length), tool_calls (requires tool calls)
Usage Object
Field
Type
Description
prompt_tokens
Integer
Number of input tokens
completion_tokens
Integer
Number of output tokens
total_tokens
Integer
Total number of tokens (used for billing)
Example
Example 1: Basic conversation
# Replace YOUR_API_KEY with the API KEY created in the previous steps
# Replace the model field with the service ID you want to test
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "deepseek-v3.2",
    "messages": [
      {"role": "user", "content": "Hello, please introduce yourself"}
    ]
  }'
Response sample:
{
    "id": "5e9c7ae9-e0e4-4ec1-bbd0-22bcfda61e45",
    "object": "chat.completion",
    "model": "deepseek-v3.2",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! Nice to meet you! 😊\\n\\nI am DeepSeek, an AI assistant created by DeepSeek Company. Let me briefly introduce myself:\\n\\n**My features:**\\n- 📚 Knowledge cutoff is July 2024, and I am the latest version model of DeepSeek\\n- 💬 A pure text dialogue model, focused on understanding and generating text content\\n- 📁 Supports file upload feature—can handle files like images, txt, pdf, ppt, word, excel, and extract text information from them\\n- 🌐 Supports online search (requires you to manually enable it in the Web/App)\\n- 💾 Has a 128K context length, able to remember our longer conversations\\n\\n**What I can help you with:**\\n- Answer various questions and engage in in-depth discussions\\n- Assist with writing, translation, analysis\\n- Process uploaded document content\\n- Provide suggestions for learning, work, and life\\n\\n**Important reminders:**\\n- I am completely free to use, with no paid plans\\n- Currently, voice features are not supported\\n- You can download the app from the official app store to use it\\n\\nMy response style is warm and detailed, hoping to bring you a warm communication experience! If there's anything you want to talk about or need help with, feel free to tell me! ✨"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 244,
        "total_tokens": 254,
        "prompt_tokens_details": {
            "cached_token": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}
Example 2: Streaming dialogue
# Replace YOUR_API_KEY with the API KEY created in the previous steps
# Replace the model field with the service ID you want to test
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "glm-5",
    "messages": [
      {"role": "system", "content": "You are a helpful AI assistant."},
      {"role": "user", "content": "Calculate 1+1"}
    ],
    "stream": true
  }'
Streaming response adopts the Server-Sent Events SSE (Server-Sent Events) format:
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"role":"assistant","content":"1"},"finish_reason":null}]}
﻿
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"content":"+"},"finish_reason":null}]}
﻿
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"content":"1"},"finish_reason":null}]}
﻿
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"content":"="},"finish_reason":null}]}
﻿
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"content":"2"},"finish_reason":null}]}
﻿
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
﻿
data: [DONE]
Example 3: System Prompt
# Replace YOUR_API_KEY with the API KEY created in the previous steps
# Replace the model field with the service ID you want to test
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "glm-5",
    "messages": [
      {
        "role": "system",
        "content": "You are a professional English translation assistant. Translate user input from Chinese to English and from English to Chinese. Return only the translation result without any explanation."
      },
      {
        "role": "user",
        "content": "The weather is very nice today."
      }
    ]
  }'
Response sample:
{
    "id": "5d42fea3-413e-42ce-99b2-0d1595dae996",
    "object": "chat.completion",
    "model": "glm-5",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The weather is really nice today."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 38,
        "completion_tokens": 7,
        "total_tokens": 45,
        "prompt_tokens_details": {
            "cached_token": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}
Sample 4: Multi-turn conversation
# Replace YOUR_API_KEY with the API KEY created in the previous steps
# Replace the model field with the service ID you want to test
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "deepseek-v3.2",
    "messages": [
      {"role": "user", "content": "Please introduce quantum computing"},
      {"role": "assistant", "content": "Quantum computing is a computational method that utilizes the principles of quantum mechanics for information processing..."},
      {"role": "user", "content": "What is the difference between it and traditional computing?"}
    ]
  }'
Response sample:
{
    "id": "fda59c08-6a85-4514-bdbf-d77a8d68e018",
    "object": "chat.completion",
    "model": "deepseek-v3.2",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Okay, this is a very core question. The fundamental difference between quantum computing and classical computing lies in the basic units they use to process information and their underlying principles.\\n\\nWe can start with a classic analogy:\\n\\n*   **Classical Computers** are like a huge **library** where a **librarian** (CPU) runs down a long corridor (bus), opening only one room (memory address) at a time, looking at one book (one bit of data), and then making a decision.\\n*   **Quantum Computers**, on the other hand, are like sending **all librarians** (qubits) into **all rooms** simultaneously, reading **every possible combination of all books** in an instant, and then telling you the final result.\\n\\nBelow, we provide a detailed comparison across several key dimensions:\\n\\n### 1. Basic Unit of Information: Bit vs. Qubit\\n\\n| Feature | Classical Computing (Bit) | Quantum Computing (Qubit) |\\n| :--- | :--- | :--- |\\n| **State** | **Binary**: Can only be **0** or **1**. Like a light switch, either on or off. Very definite. | **Superposition**: Can be **simultaneously** 0 and 1, or any probabilistic combination of 0 and 1. Like a \\"quantum light\\" that is both on and off at the same time. |\\n| **Representation** | A definite, discrete value. | A state vector, represented in Dirac notation as: \\\\|ψ⟩ = α\\\\|0⟩ + β\\\\|1⟩, where α and β are complex numbers, and \\\\|α\\\\|² + \\\\|β\\\\|² = 1. |\\n| **Core Difference** | **Determinism**: Each bit has a definite value at any moment. | **Probability**: When measured, a qubit collapses to 0 with probability \\\\|α\\\\|² or to 1 with probability \\\\|β\\\\|². |\\n\\n### 2. Working Principle: Logic Gates vs. Quantum Phenomena\\n\\n| Feature | Classical Computing | Quantum Computing |\\n| :--- | :--- | :--- |\\n| **Operation Method** | Uses **logic gates** (e.g., AND, OR, NOT) to manipulate bits. One operation changes the state of one or a group of bits. | Uses **quantum logic gates** to manipulate qubits. These operations are **reversible** and leverage superposition for **parallel computation**. |\\n| **Core Advantage** | **Serial Processing**: Tasks are broken down into sequential steps. Extremely efficient for simple, logically clear tasks. | **Quantum Parallelism**: Because qubits are in superposition, a single quantum operation can **act on all possible inputs simultaneously**. This is the root of quantum speedup. |\\n| **Unique Phenomenon** | None | **Quantum Entanglement**: Two or more qubits can form a mysterious correlation. Regardless of distance, measuring one instantly determines the state of the other(s). This allows quantum computers to tightly link the states of different qubits for highly coordinated computation. |\\n\\n### 3. Performance and Applicable Domains\\n\\n| Feature | Classical Computing | Quantum Computing |\\n| :--- | :--- | :--- |\\n| **Strong Suit** | - **General-purpose computing**: Office software, web browsing, games<br>- **Logic control**: Operating systems, application logic<br>- **Most data processing**: Database management (DMC), spreadsheets | - **Exponential speedup in specific domains**:<br>  - **Cryptography**: Breaking encryption algorithms like RSA (Shor's algorithm)<br>  - **Material simulation**: Accurately simulating quantum properties of molecules and materials<br>  - **Optimization problems**: Logistics route planning, financial portfolio optimization<br>  - **Artificial intelligence**: Accelerating machine learning training |\\n| **Computational Complexity** | For certain complex problems (e.g., large number factorization), classical algorithms require **exponentially** increasing time. | For specific problems, quantum algorithms can reduce complexity to the **polynomial** level, achieving \\"quantum supremacy\\". |\\n| **Output** | Precise, deterministic results. | Typically **probabilistic** results. Due to measurement, we get a potentially correct answer, so algorithms often need multiple runs to increase confidence. |\\n\\n### 4. Physical Implementation and Challenges\\n\\n| Feature | Classical Computers | Quantum Computers |\\n| :--- | :--- | :--- |\\n| **Hardware Basis** | Based on **transistors** (semiconductors), mature technology, allows massive integration (e.g., CPUs with billions of transistors). | Requires physical systems capable of maintaining quantum states, such as: superconducting circuits, trapped ions, photonics. Technology is still nascent. |\\n| **Main Challenge** | Power consumption, heat dissipation, transistor size approaching physical limits (Moore's Law slowing). | **Quantum Decoherence**: Quantum states are extremely fragile and easily lose their quantum properties due to environmental interference (e.g., heat, vibration). Requires extremely low temperatures (near absolute zero) and highly isolated environments. |\\n| **Error Correction** | Very low error rates, relatively simple correction (e.g., parity checks). | High error rates, require complex **quantum error correction codes**, using multiple physical qubits to encode one logical qubit, resulting in massive overhead. |\\n\\n### Summary Table\\n\\n| Comparison Dimension | Classical Computing | Quantum Computing |\\n| :--- | :--- | :--- |\\n| **Basic Unit** | Bit (0 or 1) | Qubit (Superposition: 0 and 1 simultaneously) |\\n| **Operation Method** | Logic Gates (Serial) | Quantum Gates (Parallel) |\\n| **Core Principle** | Boolean Logic | Superposition, Entanglement, Interference |\\n| **Output** | Deterministic | Probabilistic |\\n| **Strong Domain** | General tasks, logic control | Specific complex problems (e.g., simulation, optimization, cryptanalysis) |\\n| **Technology Maturity** | Very mature, widely used | Early stage, primarily for research and specific computations |\\n| **Relationship to Users** | **Complementary, not Replacement**: Quantum computers are **not** meant to replace your phone or laptop. They are more like **specialized accelerators** for solving specific, intractable problems that classical computers cannot solve in the foreseeable future. In the future, we might access quantum computers via the cloud, letting them handle the most complex parts, while classical computers manage daily tasks and user interaction. |\\n\\nSimply put, classical computers are \\"precision marksmen,\\" while quantum computers are \\"oracles capable of exploring all possibilities simultaneously.\\" Each has its own strengths, and they will work together for a long time to come."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 32,
        "completion_tokens": 1321,
        "total_tokens": 1353,
        "prompt_tokens_details": {
            "cached_token": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}
Example 5: Function Calling (tool invocation)
# Replace YOUR_API_KEY with the API KEY created in the previous steps
# Replace the model field with the service ID you want to test
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "glm-5",
    "messages": [
      {"role": "user", "content": "What's the weather like in Beijing today?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Obtains the weather information for a specified city",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {
                "type": "string",
                "description": "City name, e.g. Beijing"
              }
            },
            "required": ["city"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'
When the model decides to invoke a tool, it returns:
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\\"city\\": \\"Beijing\\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}
Return the tool execution result to the model and continue the conversation:
{
  "model": "glm-5",
  "messages": [
    {"role": "user", "content": "What's the weather like in Beijing today?"},
    {"role": "assistant", "content": null, "tool_calls": [{"id": "call_abc123", "type": "function", "function": {"name": "get_weather", "arguments": "{\\"city\\": \\"Beijing\\"}"}}]},
    {"role": "tool", "tool_call_id": "call_abc123", "content": "{\\"temperature\\": 22, \\"weather\\": \\"sunny\\", \\"humidity\\": 45}"}
  ]
}
Example 6: Deep thinking
# Replace YOUR_API_KEY with the API KEY created in the previous steps
# Replace the model field with the service ID you want to test
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
  -H "Content-Type: application/json" \\
  -H "Authorization: Bearer YOUR_API_KEY" \\
  -d '{
    "model": "glm-5",
    "reasoning_effort": "high",
    "messages": [
      {
        "role": "system",
        "content": "You are a rigorous assistant; first reason thoroughly, then provide clear conclusions."
      },
      {
        "role": "user",
        "content": "Please analyze why idempotency is important in distributed systems and provide an example in a payment scenario."
      }
    ],
    "temperature": 0.2
  }'
Sample Response
{
    "id": "663d466c-4469-48a1-b0aa-6ebd31bba09f",
    "object": "chat.completion",
    "model": "glm-5",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "### I. Analysis of the Importance of Idempotence in Distributed Systems\\n\\n#### 1. Definition of Idempotence\\nIdempotence refers to **the same operation producing the same final effect whether executed once or multiple times**. Mathematically, if $f(f(x)) = f(x)$, then $f$ is idempotent; in distributed systems, it manifests as one or multiple requests to a resource resulting in the same final state as if executed only once.\\n\\n#### 2. Characteristics and Challenges of Distributed Systems\\nDistributed systems inherently face **uncertainty** due to factors like networks, nodes, and concurrency, making duplicate requests unavoidable:\\n- **Unreliable Networks**: Requests may time out or be lost before reaching the server, prompting client retries (e.g., HTTP retries, RPC retransmissions).\\n- **Node Failures**: If a server crashes while processing a request, it may reprocess the request upon recovery (e.g., "at least once" delivery semantics in message queues).\\n- **Client Behavior**: Users may click repeatedly due to interface lag (e.g., payment buttons), triggering multiple requests.\\n- **Concurrency Conflicts**: Multiple nodes may process the same request simultaneously (e.g., coordination issues in distributed transactions).\\n\\nIf operations lack idempotence, duplicate requests can cause **cumulative side effects** (e.g., duplicate deductions, duplicate orders), compromising data consistency and potentially leading to financial losses or business anomalies.\\n\\n#### 3. Core Value of Idempotence\\n- **Prevent Side Effects**: Avoid erroneous resource consumption (e.g., funds, inventory) from repeated operations.\\n- **Ensure Consistency**: Guarantee eventual consistency across nodes in distributed systems (e.g., order status, account balances).\\n- **Support Safe Retries**: Allow clients to retry confidently on timeouts or failures without additional logic to check "whether processed".\\n- **Simplify System Design**: Reduce complex deduplication logic (e.g., distributed locks, transaction compensation), lowering development and maintenance costs.\\n\\n\\n### II. Idempotence Example in a Payment Scenario\\n\\n#### Scenario Description\\nA user pays a 100 CNY order (order ID: `ORDER_20231001_001`) via an APP, with the following flow:\\n1. The client calls the payment interface (`/pay?orderId=ORDER_20231001_001&amount=100`).\\n2. The server verifies the order status (unpaid), deducts the user balance, updates the order to "paid", and returns success.\\n3. If the client does not receive a response due to a network timeout, it may retry the payment interface (with the same parameters).\\n\\n#### Problems Without Idempotence\\nAssume the payment interface lacks idempotence control:\\n- First request: The server deducts 100 CNY, updates the order status to "paid", but the client receives no response due to network issues.\\n- Second request (retry): The server rechecks the order status (still "unpaid" because the first update hasn't synced to all nodes or the client is unaware), deducts another 100 CNY, and updates the status again (redundant operation).\\n- **Result**: The user is charged 200 CNY for a 100 CNY order, causing financial loss and data inconsistency.\\n\\n#### Idempotent Solution\\nImplement idempotence using a **unique idempotence key** (e.g., order ID `ORDER_20231001_001`):\\n1. When initiating a payment request, the client includes a unique order ID (or a Token generated by the server).\\n2. Upon receiving the request, the server first checks the payment record for that order ID (e.g., querying a database or cache):\\n   - If a "paid" record exists, return success directly (no duplicate deduction).\\n   - If not, execute the deduction and status update, and record the payment result for the order ID.\\n3. Even if the client retries, the server identifies duplicate requests via the order ID and processes them only once.\\n\\n**Key Implementation Points**:\\n- The idempotence key must be globally unique (e.g., order ID, payment transaction ID).\\n- Store idempotence status (e.g., in Redis or a database), marking "in progress" or "completed".\\n- Handle concurrency (e.g., two requests arriving simultaneously) with atomic operations (e.g., database unique indexes, distributed locks) to avoid race conditions.\\n\\n\\n### III. Conclusion\\nIn distributed systems, idempotence is a core mechanism for addressing uncertainties like unreliable networks, node failures, and client retries. By ensuring that repeated requests have the same effect as a single request, it prevents side effects from duplicate operations (e.g., financial losses, data chaos) and is a key design principle for system reliability and consistency. In payment scenarios, idempotence directly impacts user fund security, providing essential assurance for "one payment, one deduction".\\n",
                "reasoning_content": "Hmm, the user asked me to analyze the importance of idempotence in distributed systems and provide an example in a payment scenario. First, I need to recall the definition of idempotence: multiple executions of the same operation yield the same result as a single execution. Why is this crucial in distributed systems?\\n\\nFirst, what characterizes distributed systems? Unreliable networks with potential delays, packet loss, and retries; node failures leading to duplicate requests; and concurrency issues. In such scenarios, clients might resend requests—like users clicking a payment button multiple times or clients automatically retrying after network timeouts. Without idempotence, this could cause duplicate deductions or repeated orders, compromising data consistency.\\n\\nRegarding importance, this can be analyzed from several perspectives: preventing side effects from repeated operations (e.g., financial losses); ensuring data consistency across distributed nodes handling the same request; enhancing system reliability by allowing safe client retries; and simplifying business logic by eliminating extra deduplication mechanisms.\\n\\nNext, the payment scenario example. Suppose a user pays for an order, but the first request times out due to network issues. The client doesn’t receive a response and retries. Without idempotence, the payment interface would process the retry, leading to duplicate charges. With idempotence implemented—using a unique order ID as the idempotence key—the system checks if the order was already paid. If so, it returns success without reprocessing, avoiding duplicate deductions.\\n\\nKey details: The example should clearly illustrate the cause (network timeout → retry) and consequence (duplicate charges without idempotence), then contrast it with the solution (unique ID + state validation). Common implementation methods (e.g., unique IDs, optimistic locking, Token mechanisms) can be mentioned briefly, as the focus is on explaining the rationale and example.\\n\\nAdditionally, the user likely seeks to understand why idempotence is essential in real-world development, especially in financial contexts. Emphasize the sensitivity of fund security in payments—duplicate charges severely impact users and system integrity, making idempotence non-negotiable.\\n\\nTo summarize: Start by defining idempotence, then analyze distributed system challenges (network issues, retries, concurrency) that necessitate it. Highlight its core value (preventing side effects, ensuring consistency, enabling reliability, simplifying logic), and conclude with a concrete payment example contrasting problems without idempotence and solutions with it.\\n"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 36,
        "completion_tokens": 1329,
        "total_tokens": 1365,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}
Example 7: Using Python SDK
from openai import OpenAI
﻿
client = OpenAI(
# Replace YOUR_API_KEY with the API KEY created in the previous steps
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1"
)
﻿
# Basic Conversation
response = client.chat.completions.create(
# Replace the model field with the service ID you want to test
    model="glm-5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Please explain what a large language model is in one sentence"}
    ]
)
print(response.choices[0].message.content)
Example 8: Using the Node.js SDK
import OpenAI from 'openai';
﻿
const client = new OpenAI({
// Replace YOUR_API_KEY with the API KEY created in the previous steps
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub-intl.tencentcloudmaas.com/v1',
});
﻿
async function main() {
  const response = await client.chat.completions.create({
// Replace the model field with the service ID you want to test
    model: 'glm-5',
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Please explain what a large language model is in one sentence' },
    ],
  });
  console.log(response.choices[0].message.content);
}
main();
Error Code
HTTP Status Code
Error Code
Error description
400
invalid_request_error
The request parameters are incorrect.
401
authentication_error
The API Key is invalid.
403
model_not_available
The model does not exist or has not been activated.
429
rate_limit_exceeded
Request rate limit exceeded.
500
engine_server_error
Engine internal error
503
service_unavailable
Service is temporarily unavailable.
Anthropic API Usage
BaseURL
Singapore: https://tokenhub-intl.tencentcloudmaas.com
Guangzhou: https://tokenhub.tencentcloudmaas.com
HTTP Headers
Field
Support status
Description
anthropic-beta
Ignored.
This header is not processed.
anthropic-version
Ignored.
This header is not processed.
x-api-key
Fully supported
For authentication.
Request Parameters
Field
Support status
Description
model
Supported
Use the DeepSeek model to replace.
max_tokens
Fully supported
Maximum output tokens
container
Ignored.
Not processed this field.
mcp_servers
Ignored.
Not processed this field.
metadata
Ignored.
Not processed this field.
service_tier
Ignored.
Not processed this field.
stop_sequences
Fully supported
Stop sequence
stream
Fully supported
Streaming response
system
Fully supported
System message
temperature
Fully supported
Temperature parameter (0.0-2.0)
thinking
Ignored.
Not processed this field.
top_k
Ignored.
Not processed this field.
top_p
Fully supported
Top-p sampling
Tool Support
Tools
Field
Support status
Description
name
Fully supported
Tool Name
input_schema
Fully supported
input parameter mode
description
Fully supported
Tool Description
cache_control
Ignored.
Not processed this field.
tool_choice
String format
Fully supported
tool_choice
Object format
Fully supported
tool_choice.disable_parallel_tool_use
Ignored.
Not processed this field.
Tool_choice
Field
Support status
none
Fully supported
auto
Fully supported
any
Fully supported
tool
Fully supported
disable_parallel_tool_use
Ignored.
Message field supported
Field Type
Variant
Subfield
Support status
content
string
-
Fully supported
content
array, type="text"
text
Fully supported
content
array, type="text"
cache_control
Ignored.
content
array, type="text"
citations
Ignored.
content
array, type="image"
-
Not supported.
content
array, type="document"
-
Not supported.
content
array, type="search_result"
-
Not supported.
content
array, type="thinking"
-
Ignored.
content
array, type="redacted_thinking"
-
Not supported.
content
array, type="tool_use"
id
Fully supported
content
array, type="tool_use"
input
Fully supported
content
array, type="tool_use"
name
Fully supported
content
array, type="tool_use"
cache_control
Ignored.
content
array, type="tool_result"
tool_use_id
Fully supported
content
array, type="tool_result"
content
Fully supported
content
array, type="tool_result"
cache_control
Ignored.
content
array, type="tool_result"
is_error
Ignored.
Note:
1. Ignored fields: Certain Anthropic-specific fields will be ignored without raising errors.
2. Parallel tool invocation: The disable_parallel_tool_use parameter is ignored.
3. Cache Control: All cache_control related fields are ignored.
For more information on using the Anthropic API, see: Claude API Docs.
Sample Code
# Replace YOUR_API_KEY with the API KEY created in the previous steps
curl https://tokenhub-intl.tencentcloudmaas.com/v1/messages \\
-H "Content-Type: application/json" \\
-H "x-api-key: 'YOUR_API_KEY'" \\
-d '{
    "model": "minimax-m2.7",
    "max_tokens": 1000,
    "stream": true,
    "system": [
        {
            "type": "text",
            "text": "You are a helpful assistant."
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Hi, how are you?"
                }
            ]
        }
    ]
}'
Response sample:
data: {"content_block":{"text":"","type":"text"},"index":1,"type":"content_block_start"}
﻿
event: content_block_delta
data: {"delta":{"text":"Hey","type":"text_delta"},"index":0,"type":"content_block_delta"}
﻿
event: content_block_delta
data: {"delta":{"text":"! I'm doing well, thanks for asking! I'm","type":"text_delta"},"index":0,"type":"content_block_delta"}
﻿
event: content_block_delta
data: {"delta":{"text":" here and ready to help with whatever you need.","type":"text_delta"},"index":0,"type":"content_block_delta"}
﻿
event: content_block_delta
data: {"delta":{"text":" How are you doing today? Is there something I","type":"text_delta"},"index":0,"type":"content_block_delta"}
﻿
event: content_block_delta
data: {"delta":{"text":" can assist you with?","type":"text_delta"},"index":0,"type":"content_block_delta"}
﻿
event: content_block_stop
data: {"index":1,"type":"content_block_stop"}
﻿
event: message_delta
data: {"delta":{"stop_reason":"end_turn","stop_sequence":null},"type":"message_delta","usage":{"output_tokens":57}}
﻿
event: message_stop
data: {"type":"message_stop"}
Integrate the model with Claude Code
Install Claude Code
Install or update Anthropic Claude Code by running the following command:
npm install -g @anthropic-ai/claude-code
Configure environment variables
export ANTHROPIC_BASE_URL=https://tokenhub-intl.tencentcloudmaas.com
export ANTHROPIC_AUTH_TOKEN=${API_KEY}
export API_TIMEOUT_MS=600000
export ANTHROPIC_MODEL=${MODEL_NAME}
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
Note:
Set API_TIMEOUT_MS to prevent excessive output that may trigger a timeout in the Claude Code client. The timeout duration is set to 10 minutes here and can be customized as needed.
Execute the claude command
Enter the project directory, execute the claude command to start using.
cd my-project
claude
﻿
﻿
﻿
﻿
﻿

Help and Support

Was this page helpful?

You can also Contact sales or Submit a Ticket for help.

Help us improve! Rate your documentation experience in 5 mins.

Feedback

tencent cloud

LLM Service TokenHub

Text generation

Overview

OpenAI API Usage

BaseURL

Request Parameters

Messages Parameter

Response Parameters

Choices Array Elements

Usage Object

Example

Example 1: Basic conversation

Example 2: Streaming dialogue

Example 3: System Prompt

Sample 4: Multi-turn conversation

Example 5: Function Calling (tool invocation)

Example 6: Deep thinking

Example 7: Using Python SDK

Example 8: Using the Node.js SDK

Error Code

Anthropic API Usage

BaseURL

HTTP Headers

Request Parameters

Tool Support

Tools

Tool_choice

Message field supported

Sample Code

Integrate the model with Claude Code

Install Claude Code

Configure environment variables

Execute the claude command

Help and Support

Model Name	Model (API Parameter)	OpenAI Completions	Anthropic
DeepSeek-V4-Flash	`deepseek-v4-flash`	✅	✅
DeepSeek-V4-Pro	`deepseek-v4-pro`	✅	✅
Deepseek-v3.2	`deepseek-v3.2`	✅	✅
GLM-5	`glm-5`	✅	✅
GLM-5V-Turbo	`glm-5v-turbo`	✅	✅
GLM-5-Turbo	`glm-5-turbo`	✅	❌
GLM-5.1	`glm-5.1`	✅	✅
Kimi-K2.6	`kimi-k2.6`	✅	✅
Kimi-K2.5	`kimi-k2.5`	✅	✅
MiniMax-M2.5	`minimax-m2.5`	✅	✅
MiniMax-M2.7	`minimax-m2.7`	✅	✅

Parameter Name	Required	Type	Description
model	Yes	String	Service ID, which can be uniformly viewed from the online inference service and Service ID field. For services created by default on the platform, the service ID is the same as the model name, for example: `glm-5`. For custom services created by users, the service ID format is: `ep-xxxxxxxx`, which can be viewed on the Online Inference Service page.
messages	Yes	Array	Message array for chat context. For details, see messages parameter description.
stream	No	Boolean	Whether to enable streaming output. Value range: `true / false`, with a default value of `false`.
temperature	No	Float	Output randomness. Value range: `[0.0, 2.0]`.
top_p	No	Float	Output diversity (nucleus sampling). Value range: `[0.0, 1.0]`.
max_tokens	No	Integer	Limit the maximum number of output tokens.
stop	No	Array of String	Specify the stop sequences for model output. When the generated result encounters any of the specified sequences, the model will stop outputting, and the response will not include that stop sequence. Supports passing a single string or an array of strings, with a maximum of 4 sequences. For example: if you want to have the model generate a list of 10 items and prevent it from continuing to write the 11th item, you can set this parameter to: ["11."].
tools	No	Array	Function Calling tool definitions list.
tool_choice	No	String	Tool invocation policy: `none (disable)` / `auto (auto-select)` / `required (force-invoke)`.
seed	No	Integer	Random seed for result reproducibility. When the same seed value is used across multiple requests while keeping other parameters consistent, the model is more likely to return consistent or very similar results.

Field	Type	Description
role	String	Role: `system` (system prompt), `user` (user), `assistant` (assistant), `tool` (tool response)
content	String	Message text content.

HTTP Status Code	Error Code	Error description
400	`invalid_request_error`	The request parameters are incorrect.
401	`authentication_error`	The API Key is invalid.
403	`model_not_available`	The model does not exist or has not been activated.
429	`rate_limit_exceeded`	Request rate limit exceeded.
500	`engine_server_error`	Engine internal error
503	`service_unavailable`	Service is temporarily unavailable.

Field	Support status	Description
anthropic-beta	Ignored.	This header is not processed.
anthropic-version	Ignored.	This header is not processed.
x-api-key	Fully supported	For authentication.

Field	Support status
none	Fully supported
auto	Fully supported
any	Fully supported
tool	Fully supported
disable_parallel_tool_use	Ignored.

Field Type	Variant	Subfield	Support status
content	string	-	Fully supported
content	array, type="text"	text	Fully supported
content	array, type="text"	cache_control	Ignored.
content	array, type="text"	citations	Ignored.
content	array, type="image"	-	Not supported.
content	array, type="document"	-	Not supported.
content	array, type="search_result"	-	Not supported.
content	array, type="thinking"	-	Ignored.
content	array, type="redacted_thinking"	-	Not supported.
content	array, type="tool_use"	id	Fully supported
content	array, type="tool_use"	input	Fully supported
content	array, type="tool_use"	name	Fully supported
content	array, type="tool_use"	cache_control	Ignored.
content	array, type="tool_result"	tool_use_id	Fully supported
content	array, type="tool_result"	content	Fully supported
content	array, type="tool_result"	cache_control	Ignored.
content	array, type="tool_result"	is_error	Ignored.