tencent cloud

Online Inference

PDF
Focus Mode
Font Size
Last updated: 2026-04-22 10:56:25

Feature Overview

The online inference service manages how models are used, such as free quota usage, whether pay-per-token billing is enabled, security policies, and rate limiting policies, and so on.

Service Types

Online inference services are categorized into two types:

1. Default

The platform creates online inference services for all supported models by default. Users can enable Pay-as-you-go in the service list on the Model Gallery or Online Inference page to start using it.

2. Custom

If you need to customize the billing policy for model services or create multiple services to track usage statistics and manage permissions by team, you can create custom inference services in Online Inference.
Custom inference services support a wider range of billing options, such as TPM Reservation. In the future, the platform will further support capabilities like intelligent routing, rate limiting rules, and plug-in enablement/disablement in custom services, helping you achieve more flexible service management and governance.


Service Status

Each online inference service has a status, as described below:
Status
Description
Not enabled
Default-type inference services remain in the Not enabled state before user activation. After postpaid is enabled, the service will transition to the Running state.
Creating
When a service is enabled for the first time, it will briefly be in the Creating state and is expected to transition to Running within 5s.
Running
The current service is accessible.
Stopped
1. When the account has overdue payment, pay-as-you-go services will become Stopped; when the account balance is replenished, the service will automatically return to Running.
2. When the postpaid service is manually disabled, the service status will change to Stopped. Users need to manually enable postpaid on the Online Inference page to restore the service.

Billing Mode

The billing method indicates the payment status of the current service, as described below:
Status
Description
Pay-as-you-go
The current service has activated the postpaid billing method based on Token usage.
TPM Reservation
The current service has TPM Reservation enabled. Traffic exceeding the TPM limit will be billed by Token.
None
When postpaid is not enabled, there will be no billing status, and the service will be stopped.





Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback