tencent cloud

Glossary

PDF
Focus Mode
Font Size
Last updated: 2026-04-22 10:57:28

Token usage per minute

Tokens Per Minute (TPM), the token usage per minute. It represents the upper limit on the total number of tokens (input + output) that a service can process within one minute. This is a key quota metric that imposes limitations on service throughput.

RPM

Requests Per Minute (RPM), the number of requests per minute. It represents the upper limit on the number of independent requests (API calls) that a service can process within one minute. This is a key quota metric that imposes limitations on service concurrency capacity.

Per-output Token latency

Time Per Output Token (TPOT), the latency per output Token (excluding the first Token). It represents the average time required for the model to generate each subsequent output Token after the first Token is produced. This metric determines the fluency of "streaming output" described below.

First Token Latency

Time To First Token (TTFT), the first token latency. It refers to the time it takes from when a user sends a complete request to when the model returns the first token. This metric directly impacts the perceived "responsiveness" for users.

​​Token​

Token. The basic unit for processing text in large language models. In Chinese, a word, a character, or even a punctuation mark may be divided into one or more Tokens. It is the core unit for measuring model processing volume and computational cost.


Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback