tencent cloud

Task-Based Modeling
Last updated:2025-05-26 17:06:15
Task-Based Modeling
Last updated: 2025-05-26 17:06:15

Namespace

Namespace = QCE/TI_TRAINTASK

Monitoring Metrics

Metric Name
Metric Meaning
Description
Unit
Dimension
Statistical Rule
[period, statType]
CfsClientDataReadBandwidth
turocfs single-node server read bandwidth
turocfs single-node server read bandwidth
KBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
CfsClientDataWriteBandwidth
turocfs single-node server write bandwidth
turocfs single-node server write bandwidth
KBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
CfsDataReadIoBytes
cfs server read bandwidth
cfs Server Read Bandwidth
KBytes/s
InstanceIdAppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
CfsDataReadIoLatency
cfs Read Latency
cfs Read Latency
ms
InstanceIdAppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
CfsDataWriteIoBytes
cfs Server Write Bandwidth
cfs server write bandwidth
KBytes/s
InstanceIdAppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
CfsDataWriteIoLatency
cfs Write Latency
cfs Write Latency
ms
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
CfsStrageUsageGb
cfs storage data capacity
cfs storage data capacity
GBytes
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Cpuutil
CPU utilization
CPU utilization
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DcgmFiDevFbUsed
GPU memory usage
GPU memory usage
MBytes
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DcgmFiDevGpuUtil
GPU utilization
GPU utilization
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DcgmFiDevMemCopyUtil
GPU memory utilization
GPU memory utilization
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DiskIoUtil
Disk ioutil
Disk ioutil
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DiskIoWait
Disk iowait
Disk iowait
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DiskReadByte
Disk Read Bandwidth
Disk Read Bandwidth
MBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DiskReadIops
Disk read iops
Disk read iops
Count
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DiskUsageRadio
System Disk Partition Utilization
System Disk Partition Utilization
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DiskWriteByte
disk write bandwidth
disk write bandwidth
MBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DiskWriteIops
disk write iops
disk write iops
Count
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Fp16EngineActivity
FP16 active time ratio
FP16 Active Time Ratio
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Fp32EngineActivity
FP32 active time ratio
FP32 Active Time Ratio
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Fp64EngineActivity
FP64 Active Time Ratio
FP64 Active Time Ratio
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuFp16EngineActivity
FP16 active time ratio
FP16 active time ratio
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuFp32EngineActivity
FP32 Active Time Ratio
FP32 Active Time Ratio
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuFp64EngineActivity
FP64 Active Time Ratio
FP64 Active Time Ratio
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Gpumemutil
GPU vRAM Utilization
GPU vRAM Utilization
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Gpumemvalue
GPU memory usage
GPUmemory usage
MBytes
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuNvlinkBandwidth
nvlink transmission rate
nvlink transmission rate
Bytes/s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuPcieBandwidth
PCIe bus transmission rate
PCIe bus transmission rate
Bytes/s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuSmActivity
SM active state time ratio
SM Active State Time Ratio
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuTensorActivity
Tensor active state time ratio
Tensor active state time ratio
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Gpuutil
GPU utilization
GPU utilization
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Instancecpuutil
CPU utilization
CPU utilization
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Instancegpumemutil
GPU vRAM utilization
GPU vRAM utilization
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Instancegpumemvalue
GPUmemory usage
GPU memory usage
MBytes
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Instancegpuutil
GPU utilization
GPU utilization
%
AppId,InstanceId,SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Instancememutil
Memory utilization
Memory utilization
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Instancememvalue
Memory usage
Memory usage
MBytes
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Memutil
Memory utilization
Memory utilization
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Memvalue
Memory Usage
Memory Usage
MBytes
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
NvlinkBandwidth
nvlink transmission rate
nvlink transmission rate
Bytes/s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
PcieBandwidth
PCIe bus transmission rate
PCIe bus transmission rate
Bytes/s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
RdmaInpkt
RDMA network card inbound packets
RDMA network card inbound packets
pps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
RdmaIntraffic
RDMA network card received bandwidth
RDMA network interface received bandwidth
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
RdmaOutpkt
RDMA network card packet output
RDMA network card packet output
pps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
RdmaOuttraffic
RDMA network interface transmitted bandwidth
RDMA network interface transmitted bandwidth
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
SmActivity
SM Active State Time Ratio
SM Active State Time Ratio
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskCfsClientDataReadBandwidth
turocfs single-node server read bandwidth
turocfs single-node server read bandwidth
KBytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskCfsClientDataWriteBandwidth
turocfs single-node server write bandwidth
turocfs single-node server write bandwidth
KBytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskCfsDataReadIoBytes
cfs Server Read Bandwidth
cfs Server Read Bandwidth
KBytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskCfsDataReadIoLatency
cfs Read Latency
cfs Read Latency
ms
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskCfsDataWriteIoBytes
cfs server write bandwidth
cfs server write bandwidth
KBytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskCfsDataWriteIoLatency
cfs Write Latency
cfs Write Latency
ms
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskCfsStrageUsageGb
cfs storage data capacity
cfs storage data capacity
GBytes
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskDiskIoUtil
Disk ioutil
Disk ioutil
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskDiskIoWait
Disk iowait
Disk iowait
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskDiskReadByte
Disk Read Bandwidth
Disk Read Bandwidth
MBytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskDiskReadIops
Disk read iops
Disk read iops
Count
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskDiskUsageRadio
System Disk Partition Utilization
System Disk Partition Utilization
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskDiskWriteByte
disk write bandwidth
disk write bandwidth
MBytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskDiskWriteIops
disk write iops
disk write iops
Count
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskFp16EngineActivity
FP16 active time ratio
FP16 active time ratio
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskFp32EngineActivity
FP32 Active Time Ratio
FP32 Active Time Ratio
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskFp64EngineActivity
FP64 Active Time Ratio
FP64 Active Time Ratio
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskNvlinkBandwidth
nvlink transmission rate
nvlink transmission rate
Bytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskPcieBandwidth
PCIe bus transmission rate
PCIe bus transmission rate
Bytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskRdmaInpkt
RDMA network card inbound packets
RDMA network card inbound packets
pps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskRdmaIntraffic
RDMA network interface received bandwidth
RDMA network interface received bandwidth
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskRdmaOutpkt
RDMA network card packet output
RDMA network card packet output
pps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskRdmaOuttraffic
RDMA network interface transmitted bandwidth
RDMA network interface transmitted bandwidth
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskSmActivity
SM Active State Time Ratio
SM Active State Time Ratio
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskTensorActivity
Tensor active state time ratio
Tensor active state time ratio
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TensorActivity
Tensor active state time ratio
Tensor active state time ratio
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuDecUtil
GPU decode utilization
GPU decode utilization
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuEncUtil
GPU encoder utilization
GPU encoder utilization
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuMemoryClock
GPU Memory Frequency
GPU Memory Frequency
s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuMemoryFree
GPU Memory idle capacity
GPU Memory idle capacity
MBytes
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuMemoryUtil
GPU memory utilization
GPU memory utilization
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuNvlinkRxMb
nvlink amount of data received
nvlink amount of data received
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuNvlinkTxMb
nvlink amount of data sent
nvlink amount of data sent
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuPcieRxMb
pcie amount of data received
pcie amount of data received
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuPcieTxMb
pcie data transmission volume
pcie data transmission volume
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuSmClock
SM clock frequency
SM clock frequency
s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuDecUtil
GPU decode utilization
GPU decode utilization
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuEncUtil
GPU encoder utilization
GPU encoder utilization
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuMemoryClock
GPU Memory Frequency
GPU Memory Frequency
s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuMemoryFree
GPU Memory idle capacity
GPU Memory idle capacity
MBytes
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuMemoryUtil
GPU memory bandwidth utilization
GPU memory bandwidth utilization
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuNvlinkRxMb
nvlink amount of data received
nvlink amount of data received
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuNvlinkTxMb
nvlink amount of data sent
nvlink amount of data sent
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuPcieRxMb
pcie amount of data received
pcie amount of data received
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuPcieTxMb
pcie data transmission volume
pcie data transmission volume
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuSmClock
SM clock frequency
SM clock frequency
s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuDecUtilGpu
GPU decode usage
GPU decode usage
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuEncUtilGpu
GPU encoder usage
GPU encoder usage
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuMemoryClockGpu
GPU Memory Frequency
GPU Memory Frequency
s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskGpuMemoryFreeGpu
GPU Memory idle capacity
GPU Memory idle capacity
MBytes
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuMemoryUtilGpu
GPU memory bandwidth utilization
GPU memory bandwidth utilization
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuNvlinkRxMbGpu
nvlink amount of data received
nvlink amount of data received
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuNvlinkTxMbGpu
nvlink amount of data sent
nvlink amount of data sent
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuPcieRxMbGpu
pcie amount of data received
pcie amount of data received
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuPcieTxMbGpu
pcie data transmission volume
pcie data transmission volume
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuSmClockGpu
SM clock frequency
SM clock frequency
s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]

Overview of Parameters Corresponding to Each Dimension

Parameter Name
Dimension Name
Dimension Explanation
Format
Instances.N.Dimensions.0.Name
AppId
basic account information APPID dimension name
Enter the dimension name of String type: AppId (automatically selects during SDK call, no need to pass parameters)
Instances.N.Dimensions.0.Value
AppId
Basic account information APPID
Enter the ID, for example: 1231231231 (automatically selects during SDK call, no need to pass parameters)
Instances.N.Dimensions.1.Name
SubUin
sub-account ID dimension name
Enter the String-type dimension name: SubUin
Instances.N.Dimensions.1.Value
SubUin
sub-account ID
Input the ID, for example: 100001231231
Instances.N.Dimensions.2.Name
InstanceId
dimension name of training task instance ID
Enter the String-type dimension name: InstanceId
Instances.N.Dimensions.2.Value
InstanceId
Training task instance ID
Enter a specific instance ID, for example: train-9187850047592xxxxx-9ludoo1s1n9c-master-0
Instances.N.Dimensions.3.Name
InstanceGpuNum
dimension name of GPU Card Number used by training task instance (only for GPU whole card tasks)
Enter the String-type dimension name: InstanceGpuNum
Instances.N.Dimensions.3.Value
InstanceGpuNum
GPU Card Number used by training task instance (only for GPU whole card tasks)
Concatenate the training task instance ID with GPU Card Number/avg. Enter a specific instance ID, for example: train-9187850047592xxxxx-9ludoo1s1n9c-master-0-0, train-9187850047592xxxxx-9ludoo1s1n9c-master-0-avg.
Instances.N.Dimensions.4.Name
TaskId
dimension name of training task instance
Enter the String-type dimension name: TaskId
Instances.N.Dimensions.4.Value
TaskId
Training task instance
Enter an ID, for example: train-9187850047592xxxxx

Input Parameters

Query task-based modeling metric monitoring data. Values are as follows:
&Namespace=QCE/TI_TRAINTASK
&Instances.N.Dimensions.0.Name=AppId
&Instances.N.Dimensions.0.Value=specific account ID
&Instances.N.Dimensions.1.Name=SubUin
&Instances.N.Dimensions.1.Value=specific sub-account ID
&Instances.N.Dimensions.2.Name=InstanceId
&Instances.N.Dimensions.2.Value=training task instance ID
&Instances.N.Dimensions.3.Name=InstanceGpuNum
&Instances.N.Dimensions.3.Value=GPU Card Number used by training task instance
&Instances.N.Dimensions.4.Name=TaskId
&Instances.N.Dimensions.4.Value=training task instance
Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback