Resource management and control supports both public and private cloud deployment scenarios, focusing on granular management of system resources and traffic. By enabling capabilities such as model resource quota allocation at the space and application levels, call frequency limits, and workflow timeout control, it ensures rational resource allocation and load balancing. This prevents system overload, guarantees continuous and stable service availability, and meets resource usage requirements across different business scenarios.
Applicable Roles
Primary users: Ops personnel (responsible for resource configuration, monitoring, and adjustment).
Secondary (low-frequency) users: Business personnel (R&D / product / operations, and so on) who need to view resource configurations or request adjustments.
Core Principles of Resource Management and Control
Hierarchical quota constraints: Application quota ≤ space quota, preventing the issue of resource overallocation where "child exceeds parent".
Resource sharing constraint: The total configuration of all applications within a single space ≤ the total configuration of that space.
Real-time control takes effect: Configuration changes are applied in real time to resource invocations. Exceeding limits triggers throttling or error reporting mechanisms.
Core Metrics Description
Resource management and control covers two major categories of metrics: timeout control and model control. It supports configuration and management at both the space and application levels. The specific metric definitions are as follows:
|
Timeout control | Workflow Synchronous Call Timeout | The maximum response time for workflow synchronous calls is 10 minutes by default, supports configuration in seconds/minutes units, with an upper limit of 15 minutes. If no response is received after timeout, the workflow will be terminated. | No | Yes |
| Workflow Async Call Timeout | The maximum response time for workflow async calls is 12 hours by default, supports configuration in minutes/hours units, with an upper limit of 24 hours. If no response is received after timeout, the workflow will be terminated. | No | Yes |
Model Control | TPM (Tokens / Minute) | The maximum total tokens (input + output) that a subject can consume within 1 minute, limiting token consumption rate, reset every minute; exceeding the limit will result in subsequent requests being rejected. | Yes | Yes |
| QPM (Requests / Minute) | The maximum number of large model invocation requests a subject can initiate within 1 minute, limiting the call frequency, reset every minute; exceeding the limit will result in subsequent requests being rejected or queued. | Yes | Yes |
| Concurrency Limit | Independent concurrency limits are set based on purchased resource packages. Otherwise, the system default configuration applies. Exceeding these limits will cause invocations to fail with an error. | Yes | Yes |
Additional Notes:
1. Resource control must be configured level by level: If no resource settings are applied at the space level, resource constraints (including metrics: QPM, TPM) cannot be enforced for applications under it.
2. Call Limits on Application QPM/TPM:
Applications without space-level resource control settings are subject to the total callable quota restriction of the tenant.
Applications with space-level resource control settings are subject to the total callable quota restriction of the space.
Applications with application-level resource control configured are subject to the total callable quota restriction of its own application.
Operation Instructions
1. Enterprise-Level Resource Control (Tenant Level)
1.1 Entry Path
1.2 List Feature
Displayed Content: Resource control information of all spaces under the current tenant, including Space Name, Status, Last Modification Time, and Last Editor.
Filtering feature: Supports filtering by Space Name and Status (Not Configured / Configured).
Quick Actions:
Settings: Click to go to the Resource Control configuration page for this space.
View Data: Click to open a new page and redirect to the monitoring dashboard to view resource usage data.
1.3 Configuration Process
Configuration Method Selection:
Unified Resource Configuration: Uniformly configure QPM and TPM for all resources within the space.
Per-Resource Configuration: Configure individually for each resource (such as youtu-mrc-pro, deepseek-v3, and so on).
Allocation Rule Settings:
Select the metrics to be configured (QPM / TPM), enter specific values. Only positive integers are supported.
Constraints: The space quota ≤ the tenant's total quota limit; the space quota ≥ the sum of configured application quotas under this space.
Configured Application Details: Displays the total configured QPM and TPM quotas for all applications under this space.
Save Configuration: After confirming the values are correct, click Save to make the configuration take effect immediately; click Cancel to abandon this setup.
2. Space-Level Resource Management (Application Tier)
2.1 Entry Path
2.2 List Feature
Resource management information for all applications under the current space, including application name, configuration status (configured / not configured), modification time, last modifier.
Quick Actions:
Settings: Click to go to the Resource Control configuration page for this application.
Application Shared Concurrency: Configures the maximum model invocation concurrency for applications (the original "Billing Resource List > Set Concurrency Limit" feature has been migrated here).
View Data: Click to open a new page to view the monitoring dashboard.
2.3 Configuration Process
Basic Settings:
Workflow Synchronous Call: Only positive integers are supported, with units in seconds/minutes. Default is 10 minutes, with an upper limit of 60 minutes.
Workflow Asynchronous Invocation: Only positive integers are supported, with units in minutes/hours. Default is 12 hours, with an upper limit of 24 hours.
Resource Management:
Configuration Method Selection:
Unified Resource Configuration: Uniformly configure QPM and TPM for all resources under the application.
Per-Resource Configuration): Configure individually for each resource.
Allocation Rule Settings:
Select the metrics to be configured (QPM / TPM) and enter specific values.
Constraints: The application quota ≤ the remaining allocatable quota of the space (space remaining quota = total space quota - sum of configured quotas for other applications).
Displayed on the right: Space configuration details (Total QPM / TPM quota) and remaining allocatable quota.
Save Configuration: After confirming the values are correct, click Save to make the configuration take effect immediately; click Cancel to abandon this setup.