tencent cloud

Service Registry and Governance

Viewing Default Monitoring

PDF
フォーカスモード
フォントサイズ
最終更新日: 2026-05-07 17:26:54

Scenarios

AI gateway provides multi-dimensional monitoring metrics for running gateway instances to comprehensively monitor instance operating status and AI invocation quality. The monitoring metrics cover general gateway performance metrics (such as number of requests, latency, error codes) and LLM-specific metrics for large model scenarios (such as Token consumption, model response time).
You can use these metrics to understand the operating status of gateway instances and various model APIs in real time, gain insights into AI invocation costs and performance, and address potential risks promptly to maintain the stability of AI services and cost controllability. This document describes how to view default monitoring metrics for gateways through the TSF console.

Operation Steps

1. Log in to Microservices Platform Console, in the left sidebar, click Cloud Native Intelligent Gateway > Instance List.
2. On the instance list page, click the "ID" of the gateway instance to be configured to go to its basic information page.
3. In the left sidebar, click Data Observation.
4. You can use the filters at the top of the page to view monitoring data from different dimensions.

Supported Monitoring Metrics and Their Meanings

Request monitoring

Instance/Node
This set of metrics applies to all traffic passing through the gateway, used to evaluate the general performance and health status of the gateway and backend services.
Metric Name
Metric Meaning
Total number of requests
Total number of requests, summed based on the selected time granularity
Average request latency
Average request latency, calculated based on the selected time granularity.
Maximum request latency
Maximum request latency, calculated based on the selected time granularity.
Number of requests directly returned by the gateway
Number of requests that are not forwarded to the backend but directly responded to by the gateway (for example, when authentication fails or traffic throttling is triggered), summed based on the selected time granularity.
Average gateway latency
Average time taken by the gateway itself to process requests.
Maximum gateway latency
Maximum time taken by the gateway itself to process requests.
Number of 2xx requests
Number of requests sent from the client to the AI Gateway that are successful (for example, 200 OK), summed based on the selected time granularity.
Number of 3xx requests
Number of requests sent from the client to the AI Gateway that are redirected, summed based on the selected time granularity.
Number of 4xx requests
Number of requests sent from the client to the AI Gateway that are illegal (for example, due to authentication failure or exceeding the throttling threshold) and are directly responded to by the gateway with client error codes (such as 401 Unauthorized, 403 Forbidden, 429 Too Many Requests), summed based on the selected time granularity.
Number of 5xx requests
Number of requests forwarded by the AI Gateway to the backend service that result in server error responses from the backend (for example, 500 Internal Server Error, 502 Bad Gateway, 504 Gateway Timeout), summed based on the selected time granularity.
Number of 404 requests
Number of requests that fail to reach the backend service because the requested resource is not found on the backend server, summed based on the selected time granularity.
Number of 429 requests
Number of requests failed to be sent to the backend service because the requests are throttled, summed based on the selected time granularity
Number of 499 requests
Number of requests that fail to reach the backend service because the client actively disconnects before a response is received from the backend, summed based on the selected time granularity.
Number of 502 requests
Number of requests that fail to reach the backend service because the requests are throttled, summed based on the selected time granularity.
Number of 504 requests
Number of requests that fail to reach the backend service because the backend server is unreachable when the gateway attempts to execute the requests, summed based on the selected time granularity.
Number of requests forwarded to the backend
Number of requests successfully forwarded by the gateway to the backend service, summed based on the selected time granularity.
Average backend latency
Average time taken by the backend service to process requests, calculated based on the selected time granularity.
Maximum backend latency
Maximum time taken by the backend service to process requests, calculated based on the selected time granularity.
Number of backend 2xx requests
Number of requests from the backend service that are successful (for example, 200 OK), summed based on the selected time granularity.
Number of backend 3xx requests
Number of requests from the backend service that are redirected, summed based on the selected time granularity.
Number of backend 4xx requests
Number of requests from the backend service that are illegal, summed based on the selected time granularity.
Number of backend 5xx requests
Number of server-side errors returned by the backend service (for example, 500 backend exception, 502 backend invalid response, 504 backend unreachable), summed based on the selected time granularity.
Number of backend 404 requests
Number of requests that fail because the requested backend service resource is not found on the backend server, summed based on the selected time granularity.
Number of backend 429 requests
Number of requests from the backend service that fail because the requests are throttled, summed based on the selected time granularity.
Number of backend 499 requests
Number of requests from the backend service that fail because the client actively disconnects before it receives a response from the backend, summed based on the selected time granularity.
Number of backend 502 requests
Number of requests from the backend service that fail because the backend service receives an invalid response, summed based on the selected time granularity.
Number of backend 504 requests
Number of requests from the backend service that fail because the backend server is unreachable, summed based on the selected time granularity.
LLM Dedicated Monitoring
This set of metrics is specifically designed for monitoring large model invocation scenarios, helping you analyze Token consumption costs and model provider performance.
Metric Name
Metric Meaning
Number of LLM HTTP Requests
The number of HTTP calls initiated by the gateway to the LLM provider. This metric directly reflects the invocation frequency of the model API.
Total LLM Token Consumption
The total number of tokens consumed by the gateway from the LLM provider, which is the sum of the actual tokens consumed for input (Prompt) and output (Completion). It is used to evaluate the total data throughput of Token consumption.
LLM prompt token Consumption
The total number of tokens actually consumed by the model for the input (Prompt) part when the large language model processes a request.
LLM completion token consumption
The total number of tokens actually consumed by the model for the output (Completion) part when the large language model generates a response. This metric is one of the core bases for evaluating model invocation costs.
Average LLM Provider Response Time (ms)
The average duration from when the gateway sends a request to the model provider to when the complete response is received. This metric reflects the end-to-end response performance of the model provider.
Average Time per token for LLM Provider (ms)
The average time spent by the model provider to consume each Token. This metric reflects the Token consumption speed of the model provider.

System Monitoring

This set of metrics applies to all traffic passing through the gateway, used to evaluate the general performance and health status of the gateway and backend services.
Instance/Node monitoring metrics
Metric Name
Metric Meaning
CPU Utilization
CPU utilization of the AI Gateway, averaged based on the selected time granularity.
Memory Utilization
Memory utilization of the AI Gateway, averaged based on the selected time granularity.
Inbound bandwidth traffic
Ingress bandwidth traffic of the AI Gateway, averaged based on the selected time granularity.
Outbound bandwidth traffic
Egress bandwidth traffic of the AI Gateway, averaged based on the selected time granularity.
TCP inbound connections
The number of TCP connections of the AI Gateway, averaged based on the selected time granularity.
Maximum memory utilization
Maximum memory utilization of the AI Gateway within the selected time granularity. It is used to observe memory usage peaks and determine whether there is a risk of a sudden memory surge, such as memory leaks or sudden traffic pressure.
Maximum CPU utilization
Maximum CPU utilization of the AI Gateway within the selected time granularity. It is used to discover CPU load peak fluctuations and locate performance surges caused by compute-intensive operations, such as complex authentication and protocol conversion.
Number of running nodes
Number of healthy nodes in the AI Gateway within the selected time granularity. It reflects the deployment scale and available node status. An abnormal decrease in the number of nodes may indicate a failure or scaling operation.
New connections from client to gateway process
Number of newly established TCP connections between the client and the gateway process within the selected time granularity. It is used to observe the connection establishment frequency over a short period and determine client connection activity.
Active connections from client to gateway process
Number of TCP connections in an active communication state between the client and the gateway process within the selected time granularity. It reflects the effective connection load currently borne by the gateway.
Inactive connections from client to gateway process
Number of TCP connections that are established but have no active communication between the client and the gateway process within the selected time granularity. It assists in determining connection resource idleness. An excessive number may indicate that the connection reclamation / management mechanism needs optimization.
Concurrent connections from client to gateway process
Total number of concurrent TCP connections (including active and inactive) between the client and the gateway process within the selected time granularity. It directly reflects the concurrent connection pressure on the gateway and is a key metric for evaluating the gateway's connection capacity.
Inbound traffic from client to gateway process
Total data volume sent from the client to the gateway process within the selected time granularity.
Outbound traffic from gateway process to client
Total data volume sent from the gateway process to the client within the selected time granularity.
Inbound bandwidth from client to gateway process
Average bandwidth usage from the client to the gateway process within the selected time granularity (traffic transmission rate per unit of time). It is used to evaluate the bandwidth pressure from the client to the gateway and to avoid connection / transmission delays caused by bandwidth bottlenecks.
Outbound bandwidth from gateway process to client
Average bandwidth usage from the gateway process to the client within the selected time granularity (traffic transmission rate per unit of time). It is used in conjunction with "inbound bandwidth" to analyze the gateway's outbound bandwidth load and prevent bandwidth bottlenecks from affecting response transmission.
Public Network CLB Monitoring Metrics
1. Client to LB Monitoring
Metric Name
Metric Meaning
Inbound traffic
Traffic from the client to CLB within the statistical granularity
Outbound traffic
Traffic from CLB to the client within the statistical granularity
Number of inbound packets
Number of data packets sent from the client to CLB within the statistical granularity
Number of outbound packets
Number of data packets sent from CLB to the client within the statistical granularity
Inbound bandwidth
Bandwidth used by traffic from the client to CLB within the statistical granularity
Outbound bandwidth
Bandwidth used by traffic from CLB to the client within the statistical granularity
Number of Active Connections
Number of active connections from the client to CLB within the statistical granularity
Inactive connections
Number of inactive connections from the client to CLB within the statistical granularity
Number of concurrent connections
Number of concurrent connections from the client to CLB within the statistical granularity
New connections
Number of new connections from the client to CLB within the statistical granularity
2. Discard/Utilization Monitoring
Metric Name
Metric Meaning
Inbound bandwidth utilization
Utilization of bandwidth used by the client to access CLB through the public network within the statistical granularity
Outbound bandwidth utilization
Utilization of bandwidth used by CLB to access the public network within the statistical granularity
Concurrent connection utilization
Utilization of concurrent connections from the client to CLB at a specific moment within the statistical granularity compared to the performance upper limit of concurrent connections specified in the CLB specifications.
New connection utilization
Ratio of new connections from the client to CLB within the statistical granularity to the maximum number of new connections in the CLB specifications
Discarded connections.
Number of connections discarded by CLB within the statistical granularity
Discarded inbound bandwidth
Discarded data when the client accesses CLB through the public network within the statistical granularity
Discarded outbound bandwidth
Discarded data when CLB accesses the public network within the statistical granularity
Discarded inbound packets
Number of data packets discarded when the client accesses CLB through the public network within the statistical granularity
Discarded outbound packets
Number of data packets discarded when CLB accesses the public network within the statistical granularity
Discarded QPS
Number of requests discarded by CLB within the statistical granularity
QPS utilization
Ratio of QPS of CLB within the statistical granularity to the maximum QPS in the CLB specifications
3. Monitoring from LB to Backend
Metric Name
Metric Meaning
Outbound traffic
Traffic from backend servers to the CLB within the statistical granularity.
Inbound bandwidth
Bandwidth used by traffic from the CLB to backend servers within the statistical granularity.
Outbound bandwidth
Bandwidth used by traffic from backend servers to the CLB within the statistical granularity.
4. Layer 7 Protocol Monitoring
Metric Name
Metric Meaning
3xx status codes returned by CLB
Number of requests with status code 3xx returned by CLB within the statistical granularity (sum of codes returned by CLB and the gateway node)
4xx status codes returned by CLB
Number of requests with status code 4xx returned by CLB within the statistical granularity (sum of codes returned by CLB and the gateway node)
5xx status codes returned by CLB
Number of requests with status code 5xx returned by CLB within the statistical granularity (sum of codes returned by CLB and the gateway node)
404 status code returned by CLB
Number of requests with status code 404 returned by CLB within the statistical granularity (sum of codes returned by CLB and the gateway node)
499 status code returned by CLB
Number of requests with status code 499 returned by CLB within the statistical granularity (sum of codes returned by CLB and gateway node)
502 status code returned by CLB
Number of requests with status code 502 returned by CLB within the statistical granularity (sum of codes returned by CLB and the gateway node)
503 status code returned by CLB
Number of requests with status code 503 returned by CLB within the statistical granularity (sum of codes returned by CLB and the gateway node)
504 status code returned by CLB
Number of requests with status code 504 returned by CLB within the statistical granularity (sum of codes returned by CLB and the gateway node)
2xx status codes
Number of requests with status code 2xx returned by the backend service within the statistical granularity.
3xx status codes
Number of requests with status code 3xx returned by the backend service within the statistical granularity.
4xx status codes
Number of requests with status code 4xx returned by the backend service within the statistical granularity.
5xx status codes
Number of requests with status code 5xx returned by the backend service within the statistical granularity.
404 status code
Number of requests with status code 404 returned by the backend service within the statistical granularity.
499 status code
Number of requests with status code 499 returned by the backend service within the statistical granularity.
502 status code
Number of requests with status code 502 returned by the backend service within the statistical granularity.
503 status code
Number of requests with status code 503 returned by the backend service within the statistical granularity.
504 status code
Number of requests with status code 504 returned by the backend service within the statistical granularity.
Maximum request time
Maximum request time of CLB within the statistical granularity
Average Response Time
Average response time of CLB within the statistical granularity
Maximum response time
Maximum response time of CLB within the statistical granularity
Number of response timeouts
Number of responses from CLB timed out within the statistical granularity
Successful requests per minute
Number of successful requests of CLB within the statistical granularity
Requests per second.
Number of requests of CLB within the statistical granularity
5. Health Check Monitoring
Metric Name
Metric Meaning
Number of Health Check Exceptions
Number of health check exceptions for the CLB within the statistical period

ヘルプとサポート

この記事はお役に立ちましたか?

フィードバック