Glossary

Download

Focus Mode

Font Size

Last updated: 2026-04-22 10:57:28

Token usage per minute
Tokens Per Minute (TPM), the token usage per minute. It represents the upper limit on the total number of tokens (input + output) that a service can process within one minute. This is a key quota metric that imposes limitations on service throughput.
RPM
Requests Per Minute (RPM), the number of requests per minute. It represents the upper limit on the number of independent requests (API calls) that a service can process within one minute. This is a key quota metric that imposes limitations on service concurrency capacity.
Per-output Token latency
Time Per Output Token (TPOT), the latency per output Token (excluding the first Token). It represents the average time required for the model to generate each subsequent output Token after the first Token is produced. This metric determines the fluency of "streaming output" described below.
First Token Latency
Time To First Token (TTFT), the first token latency. It refers to the time it takes from when a user sends a complete request to when the model returns the first token. This metric directly impacts the perceived "responsiveness" for users.
​​Token​
Token. The basic unit for processing text in large language models. In Chinese, a word, a character, or even a punctuation mark may be divided into one or more Tokens. It is the core unit for measuring model processing volume and computational cost.
﻿

Help and Support

Was this page helpful?

You can also Contact sales or Submit a Ticket for help.

Help us improve! Rate your documentation experience in 5 mins.

Feedback

tencent cloud

LLM Service TokenHub

Glossary

Token usage per minute

RPM

Per-output Token latency

First Token Latency

Token

Help and Support

tencent cloud

LLM Service TokenHub

Glossary

Token usage per minute

RPM

Per-output Token latency

First Token Latency

​​Token​

Help and Support

Token