Metric Name | Metric Meaning | Description | Unit | Dimension | Statistical Rule [period, statType] |
CfsClientDataReadBandwidth | turocfs single-node server read bandwidth | turocfs single-node server read bandwidth | KBytes/s | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
CfsClientDataWriteBandwidth | turocfs single-node server write bandwidth | turocfs single-node server write bandwidth | KBytes/s | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
CfsDataReadIoBytes | cfs server read bandwidth | cfs Server Read Bandwidth | KBytes/s | InstanceIdAppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
CfsDataReadIoLatency | cfs Read Latency | cfs Read Latency | ms | InstanceIdAppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
CfsDataWriteIoBytes | cfs Server Write Bandwidth | cfs server write bandwidth | KBytes/s | InstanceIdAppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
CfsDataWriteIoLatency | cfs Write Latency | cfs Write Latency | ms | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
CfsStrageUsageGb | cfs storage data capacity | cfs storage data capacity | GBytes | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Cpuutil | CPU utilization | CPU utilization | % | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DcgmFiDevFbUsed | GPU memory usage | GPU memory usage | MBytes | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DcgmFiDevGpuUtil | GPU utilization | GPU utilization | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DcgmFiDevMemCopyUtil | GPU memory utilization | GPU memory utilization | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DiskIoUtil | Disk ioutil | Disk ioutil | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DiskIoWait | Disk iowait | Disk iowait | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DiskReadByte | Disk Read Bandwidth | Disk Read Bandwidth | MBytes/s | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DiskReadIops | Disk read iops | Disk read iops | Count | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DiskUsageRadio | System Disk Partition Utilization | System Disk Partition Utilization | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DiskWriteByte | disk write bandwidth | disk write bandwidth | MBytes/s | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DiskWriteIops | disk write iops | disk write iops | Count | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Fp16EngineActivity | FP16 active time ratio | FP16 Active Time Ratio | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Fp32EngineActivity | FP32 active time ratio | FP32 Active Time Ratio | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Fp64EngineActivity | FP64 Active Time Ratio | FP64 Active Time Ratio | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuFp16EngineActivity | FP16 active time ratio | FP16 active time ratio | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuFp32EngineActivity | FP32 Active Time Ratio | FP32 Active Time Ratio | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuFp64EngineActivity | FP64 Active Time Ratio | FP64 Active Time Ratio | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Gpumemutil | GPU vRAM Utilization | GPU vRAM Utilization | % | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Gpumemvalue | GPU memory usage | GPUmemory usage | MBytes | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuNvlinkBandwidth | nvlink transmission rate | nvlink transmission rate | Bytes/s | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuPcieBandwidth | PCIe bus transmission rate | PCIe bus transmission rate | Bytes/s | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuSmActivity | SM active state time ratio | SM Active State Time Ratio | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuTensorActivity | Tensor active state time ratio | Tensor active state time ratio | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Gpuutil | GPU utilization | GPU utilization | % | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Instancecpuutil | CPU utilization | CPU utilization | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Instancegpumemutil | GPU vRAM utilization | GPU vRAM utilization | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Instancegpumemvalue | GPUmemory usage | GPU memory usage | MBytes | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Instancegpuutil | GPU utilization | GPU utilization | % | AppId,InstanceId,SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Instancememutil | Memory utilization | Memory utilization | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Instancememvalue | Memory usage | Memory usage | MBytes | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Memutil | Memory utilization | Memory utilization | % | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Memvalue | Memory Usage | Memory Usage | MBytes | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
NvlinkBandwidth | nvlink transmission rate | nvlink transmission rate | Bytes/s | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
PcieBandwidth | PCIe bus transmission rate | PCIe bus transmission rate | Bytes/s | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
RdmaInpkt | RDMA network card inbound packets | RDMA network card inbound packets | pps | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
RdmaIntraffic | RDMA network card received bandwidth | RDMA network interface received bandwidth | Mbps | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
RdmaOutpkt | RDMA network card packet output | RDMA network card packet output | pps | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
RdmaOuttraffic | RDMA network interface transmitted bandwidth | RDMA network interface transmitted bandwidth | Mbps | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
SmActivity | SM Active State Time Ratio | SM Active State Time Ratio | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskCfsClientDataReadBandwidth | turocfs single-node server read bandwidth | turocfs single-node server read bandwidth | KBytes/s | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskCfsClientDataWriteBandwidth | turocfs single-node server write bandwidth | turocfs single-node server write bandwidth | KBytes/s | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskCfsDataReadIoBytes | cfs Server Read Bandwidth | cfs Server Read Bandwidth | KBytes/s | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskCfsDataReadIoLatency | cfs Read Latency | cfs Read Latency | ms | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskCfsDataWriteIoBytes | cfs server write bandwidth | cfs server write bandwidth | KBytes/s | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskCfsDataWriteIoLatency | cfs Write Latency | cfs Write Latency | ms | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskCfsStrageUsageGb | cfs storage data capacity | cfs storage data capacity | GBytes | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskDiskIoUtil | Disk ioutil | Disk ioutil | % | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskDiskIoWait | Disk iowait | Disk iowait | % | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskDiskReadByte | Disk Read Bandwidth | Disk Read Bandwidth | MBytes/s | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskDiskReadIops | Disk read iops | Disk read iops | Count | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskDiskUsageRadio | System Disk Partition Utilization | System Disk Partition Utilization | % | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskDiskWriteByte | disk write bandwidth | disk write bandwidth | MBytes/s | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskDiskWriteIops | disk write iops | disk write iops | Count | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskFp16EngineActivity | FP16 active time ratio | FP16 active time ratio | % | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskFp32EngineActivity | FP32 Active Time Ratio | FP32 Active Time Ratio | % | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskFp64EngineActivity | FP64 Active Time Ratio | FP64 Active Time Ratio | % | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskNvlinkBandwidth | nvlink transmission rate | nvlink transmission rate | Bytes/s | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskPcieBandwidth | PCIe bus transmission rate | PCIe bus transmission rate | Bytes/s | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskRdmaInpkt | RDMA network card inbound packets | RDMA network card inbound packets | pps | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskRdmaIntraffic | RDMA network interface received bandwidth | RDMA network interface received bandwidth | Mbps | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskRdmaOutpkt | RDMA network card packet output | RDMA network card packet output | pps | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskRdmaOuttraffic | RDMA network interface transmitted bandwidth | RDMA network interface transmitted bandwidth | Mbps | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskSmActivity | SM Active State Time Ratio | SM Active State Time Ratio | % | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskTensorActivity | Tensor active state time ratio | Tensor active state time ratio | % | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TensorActivity | Tensor active state time ratio | Tensor active state time ratio | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuDecUtil | GPU decode utilization | GPU decode utilization | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuEncUtil | GPU encoder utilization | GPU encoder utilization | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuMemoryClock | GPU Memory Frequency | GPU Memory Frequency | s | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuMemoryFree | GPU Memory idle capacity | GPU Memory idle capacity | MBytes | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuMemoryUtil | GPU memory utilization | GPU memory utilization | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuNvlinkRxMb | nvlink amount of data received | nvlink amount of data received | Mbps | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuNvlinkTxMb | nvlink amount of data sent | nvlink amount of data sent | Mbps | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuPcieRxMb | pcie amount of data received | pcie amount of data received | Mbps | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuPcieTxMb | pcie data transmission volume | pcie data transmission volume | Mbps | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuSmClock | SM clock frequency | SM clock frequency | s | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuDecUtil | GPU decode utilization | GPU decode utilization | % | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuEncUtil | GPU encoder utilization | GPU encoder utilization | % | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuMemoryClock | GPU Memory Frequency | GPU Memory Frequency | s | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuMemoryFree | GPU Memory idle capacity | GPU Memory idle capacity | MBytes | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuMemoryUtil | GPU memory bandwidth utilization | GPU memory bandwidth utilization | % | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuNvlinkRxMb | nvlink amount of data received | nvlink amount of data received | Mbps | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuNvlinkTxMb | nvlink amount of data sent | nvlink amount of data sent | Mbps | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuPcieRxMb | pcie amount of data received | pcie amount of data received | Mbps | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuPcieTxMb | pcie data transmission volume | pcie data transmission volume | Mbps | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuSmClock | SM clock frequency | SM clock frequency | s | AppId SubUin TaskId | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuDecUtilGpu | GPU decode usage | GPU decode usage | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuEncUtilGpu | GPU encoder usage | GPU encoder usage | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuMemoryClockGpu | GPU Memory Frequency | GPU Memory Frequency | s | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuMemoryFreeGpu | GPU Memory idle capacity | GPU Memory idle capacity | MBytes | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuMemoryUtilGpu | GPU memory bandwidth utilization | GPU memory bandwidth utilization | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuNvlinkRxMbGpu | nvlink amount of data received | nvlink amount of data received | Mbps | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuNvlinkTxMbGpu | nvlink amount of data sent | nvlink amount of data sent | Mbps | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuPcieRxMbGpu | pcie amount of data received | pcie amount of data received | Mbps | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuPcieTxMbGpu | pcie data transmission volume | pcie data transmission volume | Mbps | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TaskGpuSmClockGpu | SM clock frequency | SM clock frequency | s | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Parameter Name | Dimension Name | Dimension Explanation | Format |
Instances.N.Dimensions.0.Name | AppId | basic account information APPID dimension name | Enter the dimension name of String type: AppId (automatically selects during SDK call, no need to pass parameters) |
Instances.N.Dimensions.0.Value | AppId | Basic account information APPID | Enter the ID, for example: 1231231231 (automatically selects during SDK call, no need to pass parameters) |
Instances.N.Dimensions.1.Name | SubUin | sub-account ID dimension name | Enter the String-type dimension name: SubUin |
Instances.N.Dimensions.1.Value | SubUin | sub-account ID | Input the ID, for example: 100001231231 |
Instances.N.Dimensions.2.Name | InstanceId | dimension name of training task instance ID | Enter the String-type dimension name: InstanceId |
Instances.N.Dimensions.2.Value | InstanceId | Training task instance ID | Enter a specific instance ID, for example: train-9187850047592xxxxx-9ludoo1s1n9c-master-0 |
Instances.N.Dimensions.3.Name | InstanceGpuNum | dimension name of GPU Card Number used by training task instance (only for GPU whole card tasks) | Enter the String-type dimension name: InstanceGpuNum |
Instances.N.Dimensions.3.Value | InstanceGpuNum | GPU Card Number used by training task instance (only for GPU whole card tasks) | Concatenate the training task instance ID with GPU Card Number/avg. Enter a specific instance ID, for example: train-9187850047592xxxxx-9ludoo1s1n9c-master-0-0, train-9187850047592xxxxx-9ludoo1s1n9c-master-0-avg. |
Instances.N.Dimensions.4.Name | TaskId | dimension name of training task instance | Enter the String-type dimension name: TaskId |
Instances.N.Dimensions.4.Value | TaskId | Training task instance | Enter an ID, for example: train-9187850047592xxxxx |
Feedback