tencent cloud

Tencent Cloud TI Platform

Related Agreement
Kebijakan Privasi
Perjanjian Pemrosesan dan Keamanan Data
DokumentasiTencent Cloud TI Platform

Online Service Operations

Mode fokus
Ukuran font
Terakhir diperbarui: 2025-05-09 15:12:44

Auto Scaling

If your business workload has distinct peak and off-peak hours, you can use the auto scaling capability of the online service module to enhance the utilization efficiency of inference computing power resources. This feature supports automatically adjusting the number of instances of online services according to your configured scaling policy. In this way, it automatically scales out instances during peak hours and achieves automatic scale-in during off-peak hours. Auto scaling supports two types of adjustment policies: time-based adjustment and HPA-based adjustment. The following will introduce in detail how to use the two adjustment policies.

Time-Based Adjustment

If your business workload has distinct time characteristics, you can configure the auto scaling policy based on time.
1. How to enable scheduled scaling
1.1 Log in to the TI-ONE console. Select Model Service > Online Services in the left navigation bar to go to the online service list page;
1.2 Locate the service in the service list for which the scheduled policy needs to be enabled. Click Service Name to go to the version list page and Update to go to the service configuration update page, or click Scaling to go to the instance adjustment pop-up window;


1.3 On the service configuration page, set the "instance adjustment" field to "auto adjustment", and select "time-based" as the policy adjustment type. You can proceed with the rule configuration of the time-based adjustment policy;
1.4 You can configure multiple scheduled policy rules based on the time characteristics of your actual business load. For example, if the peak hours range from 8:00 to 20:00 and the off-peak hours range from 20:00 to 8:00, you can configure the scheduled policy as shown in the figure below. Specifically, scale out instances to 2 at 8:00 every day and scale in instances to 1 at 20:00 every day. If you select the default policy, the value is the initial number of instances after the service starts; 


1.5 If you have configured multiple scheduled policy rules and there is a time conflict between these rules, the policy with a higher priority level (that is, the policy ranked higher in the priority order) will prevail;
1.6 After configuring the scaling policy, click update service to save the configuration information. After the service is updated, the auto scaling policy you configured will take effect.
2. Exception time configuration rules
2.1 If a certain scheduled policy needs not to be performed at a specific time, an exception time can be configured for the scheduled policy rule. Multiple exception time values can be added;
2.2 Exception time needs to be configured through a Cron expression. A Cron expression contains 6 digits, representing "second", "minute", "hour", "day", "month" and "week" respectively. If the value of a specific digit is any value, use an asterisk (*). If the value of a specific digit needs to contain multiple consecutive values, use a hyphen (-). If the value of a specific digit needs to contain multiple discrete values, use a comma (,).
2.3 The minimum configuration granularity of the exception time is in days. Therefore, the first three digits of the Cron expression need to use "*" (other values configured for the first three digits will not take effect), and the last three digits can be configured as needed. The available value range is 1-31 for the 4th digit "day", 1-12 or JAN-DEC for the 5th digit "month", and 0-6 or SUN-SAT for the 6th digit "week".
2.4 For example, the Cron expression from October 1 to October 7 each year is "*** 1-7 10 *".



HPA-Based Adjustment

If scheduled adjustment is not suitable for your business model, you can choose the "HPA-based" auto scaling adjustment policy. Under this policy, the number of service instances can be automatically adjusted between the minimum and maximum numbers of instances based on the policy metrics and metric threshold you configure. Policy metrics support CPU utilization, memory usage, GPU usage, instance QPS, and maximum concurrency usage rate. When configuring the "maximum concurrency usage rate" as a policy metric, please first configure the "maximum number of concurrent connections of a single instance" in request traffic throttling.




Traffic Allocation

To meet the requirements of grayscale verification or A/B testing services, the platform allows users to add multiple versions to a single service and allocate traffic.
1. Log in to the TI-ONE console. Select Model Service > Online Services in the left navigation bar to go to the online service list page.
2. Locate the service to be tested, and click Add Version of the service to go to the service version creation page. Configure the container information and instance adjustment information on the current service version as needed.


3. Click Start Service. If it is in the post-paid mode, you need to confirm the freezing of the fees, and then you can complete the creation of the new version.
4. After you create a service version, the system will create a gateway backend and schedule computing resources for you. It takes some time. When the service version is successfully deployed, the status will become running.
5. Click Traffic Allocation above the service version list to set the traffic proportion of multiple versions.



Service Monitoring

To meet the demand for tracing the status of services, the platform provides the capabilities of monitoring service data, monitoring call data, and viewing events and logs.
1. On the online service list page, click Service Name to enter the version list page. Then, select Service Calling > Call Monitoring to view statistical information of service call status, including received request count, number of successful requests, number of failed requests, restricted request count, and average response time.
2. On the online service monitoring page, you can navigate to Alarm Management in the TCOP to add an alarm policy for the service.
3. On the online service list page, click service name to enter the version list page. Then, click service version name to enter the Version Details Page. You can view service monitoring metrics, including number of instances, number of running instances, CPU usage, MEM utilization, GPU utilization, video memory utilization, network traffic, and QPS and QPS throttling, event monitoring, and running logs. 

Service Update

The deployed services support updating instance adjustment information to adjust the scaling policy and updating instance container information to update and iterate the model. Moreover, for updating services with multiple instances, the backend will perform batch rolling updates on these instances, which will not affect the production business's calls to the model service.
1. On the online service list page, click the service name to enter the version list page. Then, click Update of the service version to enter the service update page. You can edit relevant information of instance container and instance adjustment module.


2. To perform service scaling operations, you can directly click Scaling to quickly update instances. When the status of the scaled-out instance is running, traffic is directed to the scaled-out instance.
3. To update the model information, you can modify the model file or runtime environment in the instance container module.
4. Confirm that the configuration message is correct and click Start Service to complete the update operation of the service parameter configuration.
5. On the service version list page, click service version name to enter the update history module. You can view the update record information of the current service version history.


Notes:
When you update the content in COS or CFS, you can trigger a service update in the following ways:
Rebuild an instance. You can click the name to enter service management, and then click the service name to enter the instance list.


When updating the service, fill in an environment variable with any content.




Bantuan dan Dukungan

Apakah halaman ini membantu?

masukan