Metric Name | Metric Meaning |
Total number of requests | Total number of requests, summed based on the selected time granularity |
Average request latency | Average request latency, calculated based on the selected time granularity. |
Maximum request latency | Maximum request latency, calculated based on the selected time granularity. |
Number of requests directly returned by the gateway | Number of requests that are not forwarded to the backend but directly responded to by the gateway (for example, when authentication fails or traffic throttling is triggered), summed based on the selected time granularity. |
Average gateway latency | Average time taken by the gateway itself to process requests. |
Maximum gateway latency | Maximum time taken by the gateway itself to process requests. |
Number of 2xx requests | Number of requests sent from the client to the AI Gateway that are successful (for example, 200 OK), summed based on the selected time granularity. |
Number of 3xx requests | Number of requests sent from the client to the AI Gateway that are redirected, summed based on the selected time granularity. |
Number of 4xx requests | Number of requests sent from the client to the AI Gateway that are illegal (for example, due to authentication failure or exceeding the throttling threshold) and are directly responded to by the gateway with client error codes (such as 401 Unauthorized, 403 Forbidden, 429 Too Many Requests), summed based on the selected time granularity. |
Number of 5xx requests | Number of requests forwarded by the AI Gateway to the backend service that result in server error responses from the backend (for example, 500 Internal Server Error, 502 Bad Gateway, 504 Gateway Timeout), summed based on the selected time granularity. |
Number of 404 requests | Number of requests that fail to reach the backend service because the requested resource is not found on the backend server, summed based on the selected time granularity. |
Number of 429 requests | Number of requests failed to be sent to the backend service because the requests are throttled, summed based on the selected time granularity |
Number of 499 requests | Number of requests that fail to reach the backend service because the client actively disconnects before a response is received from the backend, summed based on the selected time granularity. |
Number of 502 requests | Number of requests that fail to reach the backend service because the requests are throttled, summed based on the selected time granularity. |
Number of 504 requests | Number of requests that fail to reach the backend service because the backend server is unreachable when the gateway attempts to execute the requests, summed based on the selected time granularity. |
Number of requests forwarded to the backend | Number of requests successfully forwarded by the gateway to the backend service, summed based on the selected time granularity. |
Average backend latency | Average time taken by the backend service to process requests, calculated based on the selected time granularity. |
Maximum backend latency | Maximum time taken by the backend service to process requests, calculated based on the selected time granularity. |
Number of backend 2xx requests | Number of requests from the backend service that are successful (for example, 200 OK), summed based on the selected time granularity. |
Number of backend 3xx requests | Number of requests from the backend service that are redirected, summed based on the selected time granularity. |
Number of backend 4xx requests | Number of requests from the backend service that are illegal, summed based on the selected time granularity. |
Number of backend 5xx requests | Number of server-side errors returned by the backend service (for example, 500 backend exception, 502 backend invalid response, 504 backend unreachable), summed based on the selected time granularity. |
Number of backend 404 requests | Number of requests that fail because the requested backend service resource is not found on the backend server, summed based on the selected time granularity. |
Number of backend 429 requests | Number of requests from the backend service that fail because the requests are throttled, summed based on the selected time granularity. |
Number of backend 499 requests | Number of requests from the backend service that fail because the client actively disconnects before it receives a response from the backend, summed based on the selected time granularity. |
Number of backend 502 requests | Number of requests from the backend service that fail because the backend service receives an invalid response, summed based on the selected time granularity. |
Number of backend 504 requests | Number of requests from the backend service that fail because the backend server is unreachable, summed based on the selected time granularity. |
Metric Name | Metric Meaning |
Number of LLM HTTP Requests | The number of HTTP calls initiated by the gateway to the LLM provider. This metric directly reflects the invocation frequency of the model API. |
Total LLM Token Consumption | The total number of tokens consumed by the gateway from the LLM provider, which is the sum of the actual tokens consumed for input (Prompt) and output (Completion). It is used to evaluate the total data throughput of Token consumption. |
LLM prompt token Consumption | The total number of tokens actually consumed by the model for the input (Prompt) part when the large language model processes a request. |
LLM completion token consumption | The total number of tokens actually consumed by the model for the output (Completion) part when the large language model generates a response. This metric is one of the core bases for evaluating model invocation costs. |
Average LLM Provider Response Time (ms) | The average duration from when the gateway sends a request to the model provider to when the complete response is received. This metric reflects the end-to-end response performance of the model provider. |
Average Time per token for LLM Provider (ms) | The average time spent by the model provider to consume each Token. This metric reflects the Token consumption speed of the model provider. |
Metric Name | Metric Meaning |
CPU Utilization | CPU utilization of the AI Gateway, averaged based on the selected time granularity. |
Memory Utilization | Memory utilization of the AI Gateway, averaged based on the selected time granularity. |
Inbound bandwidth traffic | Ingress bandwidth traffic of the AI Gateway, averaged based on the selected time granularity. |
Outbound bandwidth traffic | Egress bandwidth traffic of the AI Gateway, averaged based on the selected time granularity. |
TCP inbound connections | The number of TCP connections of the AI Gateway, averaged based on the selected time granularity. |
Maximum memory utilization | Maximum memory utilization of the AI Gateway within the selected time granularity. It is used to observe memory usage peaks and determine whether there is a risk of a sudden memory surge, such as memory leaks or sudden traffic pressure. |
Maximum CPU utilization | Maximum CPU utilization of the AI Gateway within the selected time granularity. It is used to discover CPU load peak fluctuations and locate performance surges caused by compute-intensive operations, such as complex authentication and protocol conversion. |
Number of running nodes | Number of healthy nodes in the AI Gateway within the selected time granularity. It reflects the deployment scale and available node status. An abnormal decrease in the number of nodes may indicate a failure or scaling operation. |
New connections from client to gateway process | Number of newly established TCP connections between the client and the gateway process within the selected time granularity. It is used to observe the connection establishment frequency over a short period and determine client connection activity. |
Active connections from client to gateway process | Number of TCP connections in an active communication state between the client and the gateway process within the selected time granularity. It reflects the effective connection load currently borne by the gateway. |
Inactive connections from client to gateway process | Number of TCP connections that are established but have no active communication between the client and the gateway process within the selected time granularity. It assists in determining connection resource idleness. An excessive number may indicate that the connection reclamation / management mechanism needs optimization. |
Concurrent connections from client to gateway process | Total number of concurrent TCP connections (including active and inactive) between the client and the gateway process within the selected time granularity. It directly reflects the concurrent connection pressure on the gateway and is a key metric for evaluating the gateway's connection capacity. |
Inbound traffic from client to gateway process | Total data volume sent from the client to the gateway process within the selected time granularity. |
Outbound traffic from gateway process to client | Total data volume sent from the gateway process to the client within the selected time granularity. |
Inbound bandwidth from client to gateway process | Average bandwidth usage from the client to the gateway process within the selected time granularity (traffic transmission rate per unit of time). It is used to evaluate the bandwidth pressure from the client to the gateway and to avoid connection / transmission delays caused by bandwidth bottlenecks. |
Outbound bandwidth from gateway process to client | Average bandwidth usage from the gateway process to the client within the selected time granularity (traffic transmission rate per unit of time). It is used in conjunction with "inbound bandwidth" to analyze the gateway's outbound bandwidth load and prevent bandwidth bottlenecks from affecting response transmission. |
Metric Name | Metric Meaning |
Inbound traffic | Traffic from the client to CLB within the statistical granularity |
Outbound traffic | Traffic from CLB to the client within the statistical granularity |
Number of inbound packets | Number of data packets sent from the client to CLB within the statistical granularity |
Number of outbound packets | Number of data packets sent from CLB to the client within the statistical granularity |
Inbound bandwidth | Bandwidth used by traffic from the client to CLB within the statistical granularity |
Outbound bandwidth | Bandwidth used by traffic from CLB to the client within the statistical granularity |
Number of Active Connections | Number of active connections from the client to CLB within the statistical granularity |
Inactive connections | Number of inactive connections from the client to CLB within the statistical granularity |
Number of concurrent connections | Number of concurrent connections from the client to CLB within the statistical granularity |
New connections | Number of new connections from the client to CLB within the statistical granularity |
Metric Name | Metric Meaning |
Inbound bandwidth utilization | Utilization of bandwidth used by the client to access CLB through the public network within the statistical granularity |
Outbound bandwidth utilization | Utilization of bandwidth used by CLB to access the public network within the statistical granularity |
Concurrent connection utilization | Utilization of concurrent connections from the client to CLB at a specific moment within the statistical granularity compared to the performance upper limit of concurrent connections specified in the CLB specifications. |
New connection utilization | Ratio of new connections from the client to CLB within the statistical granularity to the maximum number of new connections in the CLB specifications |
Discarded connections. | Number of connections discarded by CLB within the statistical granularity |
Discarded inbound bandwidth | Discarded data when the client accesses CLB through the public network within the statistical granularity |
Discarded outbound bandwidth | Discarded data when CLB accesses the public network within the statistical granularity |
Discarded inbound packets | Number of data packets discarded when the client accesses CLB through the public network within the statistical granularity |
Discarded outbound packets | Number of data packets discarded when CLB accesses the public network within the statistical granularity |
Discarded QPS | Number of requests discarded by CLB within the statistical granularity |
QPS utilization | Ratio of QPS of CLB within the statistical granularity to the maximum QPS in the CLB specifications |
Metric Name | Metric Meaning |
Outbound traffic | Traffic from backend servers to the CLB within the statistical granularity. |
Inbound bandwidth | Bandwidth used by traffic from the CLB to backend servers within the statistical granularity. |
Outbound bandwidth | Bandwidth used by traffic from backend servers to the CLB within the statistical granularity. |
Metric Name | Metric Meaning |
3xx status codes returned by CLB | Number of requests with status code 3xx returned by CLB within the statistical granularity (sum of codes returned by CLB and the gateway node) |
4xx status codes returned by CLB | Number of requests with status code 4xx returned by CLB within the statistical granularity (sum of codes returned by CLB and the gateway node) |
5xx status codes returned by CLB | Number of requests with status code 5xx returned by CLB within the statistical granularity (sum of codes returned by CLB and the gateway node) |
404 status code returned by CLB | Number of requests with status code 404 returned by CLB within the statistical granularity (sum of codes returned by CLB and the gateway node) |
499 status code returned by CLB | Number of requests with status code 499 returned by CLB within the statistical granularity (sum of codes returned by CLB and gateway node) |
502 status code returned by CLB | Number of requests with status code 502 returned by CLB within the statistical granularity (sum of codes returned by CLB and the gateway node) |
503 status code returned by CLB | Number of requests with status code 503 returned by CLB within the statistical granularity (sum of codes returned by CLB and the gateway node) |
504 status code returned by CLB | Number of requests with status code 504 returned by CLB within the statistical granularity (sum of codes returned by CLB and the gateway node) |
2xx status codes | Number of requests with status code 2xx returned by the backend service within the statistical granularity. |
3xx status codes | Number of requests with status code 3xx returned by the backend service within the statistical granularity. |
4xx status codes | Number of requests with status code 4xx returned by the backend service within the statistical granularity. |
5xx status codes | Number of requests with status code 5xx returned by the backend service within the statistical granularity. |
404 status code | Number of requests with status code 404 returned by the backend service within the statistical granularity. |
499 status code | Number of requests with status code 499 returned by the backend service within the statistical granularity. |
502 status code | Number of requests with status code 502 returned by the backend service within the statistical granularity. |
503 status code | Number of requests with status code 503 returned by the backend service within the statistical granularity. |
504 status code | Number of requests with status code 504 returned by the backend service within the statistical granularity. |
Maximum request time | Maximum request time of CLB within the statistical granularity |
Average Response Time | Average response time of CLB within the statistical granularity |
Maximum response time | Maximum response time of CLB within the statistical granularity |
Number of response timeouts | Number of responses from CLB timed out within the statistical granularity |
Successful requests per minute | Number of successful requests of CLB within the statistical granularity |
Requests per second. | Number of requests of CLB within the statistical granularity |
Metric Name | Metric Meaning |
Number of Health Check Exceptions | Number of health check exceptions for the CLB within the statistical period |
피드백