Metric Name | Metric Meaning | Description | Unit | Dimension | Statistical Rule [period, statType] |
CfsClientDataReadBandwidth | turocfs single-node server read bandwidth | turocfs single-node server read bandwidth | KBytes/s | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
CfsClientDataWriteBandwidth | turocfs single-node server write bandwidth | turocfs single-node server write bandwidth | KBytes/s | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
CfsDataReadIoBytes | cfs server read bandwidth | cfs Server Read Bandwidth | KBytes/s | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
CfsDataReadIoLatency | cfs Read Latency | cfs Read Latency | ms | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
CfsDataWriteIoBytes | cfs Server Write Bandwidth | cfs Server Write Bandwidth | KBytes/s | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
CfsDataWriteIoLatency | cfs Write Latency | cfs Write Latency | ms | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
CfsStrageUsageGb | cfs storage data capacity | cfs storage data capacity | GBytes | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DiskIoUtil | Disk ioutil | Disk ioutil | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DiskIoWait | Disk iowait | Disk iowait | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DiskReadByte | Disk Read Bandwidth | Disk Read Bandwidth | MBytes/s | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DiskReadIops | Disk read iops | Disk read iops | Count | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DiskUsageRadio | System disk partition utilization | System disk partition utilization | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DiskWriteByte | disk write bandwidth | disk write bandwidth | MBytes/s | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DiskWriteIops | disk write iops | disk write iops | Count | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Instancecpuutil | CPU utilization | CPU utilization | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Instancegpumemutil | GPU vRAM utilization | GPU vRAM utilization | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Instancegpumemvalue | GPU memory usage | GPU memory usage | MBytes | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Instancegpuutil | GPU utilization | GPU utilization | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Instancememutil | Memory utilization | Memory utilization | % | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Instancememvalue | Memory usage | Memory usage | MBytes | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuFp16EngineActivity | FP16 Active Time Ratio | FP16 Active Time Ratio | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuFp32EngineActivity | FP32 Active Time Ratio | FP32 Active Time Ratio | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuFp64EngineActivity | FP64 Active Time Ratio | FP64 Active Time Ratio | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
NvlinkBandwidth | nvlink transmission rate | nvlink transmission rate | Bytes/s | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
PcieBandwidth | PCIe bus transmission rate | PCIe bus transmission rate | Bytes/s | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuSmActivity | SM active state time Ratio | SM Active State Time Ratio | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
TensorActivity | Tensor active state time ratio | Tensor active state time ratio | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Dcgmfidevfbused | GPU memory usage | GPU memory usage | MBytes | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DcgmFiDevGpuUtil | GPU utilization | GPU utilization | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
DcgmFiDevMemCopyUtil | GPU memory utilization | GPU memory utilization | % | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuMemoryClockGpu | GPU Memory Frequency | GPU Memory Frequency | s | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuMemoryFreeGpuv | GPU Memory Idle Capacity | GPU Memory Idle Capacity | MBytes | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuNvlinkRxMb | nvlink amount of data received | nvlink amount of data received | Mbps | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuNvlinkTxMb | nvlink amount of data sent | nvlink data transmission volume | Mbps | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuPcieRxMb | pcie amount of data received | pcie data reception volume | Mbps | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuPcieTxMb | pcie data transmission volume | pcie data sent volume | Mbps | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
GpuSmClock | SM clock frequency | SM clock frequency | s | AppId InstanceGpuNum SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
PodDiskLimit | Total instance disk | Total instance disk | GBytes | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
PodDiskValue | Instance disk usage | Instance disk usage | GBytes | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
NodeDiskLimit | Total node disk | Total node disk | GBytes | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
NodeDiskValue | Node disk usage | Node disk usage | GBytes | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
RdmaInpkt | RDMA network card inbound packets | RDMA network card inbound packets | pps | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
RdmaOutpkt | RDMA network card outbound packets | RDMA network card packet output | pps | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
RdmaIntraffic | RDMA network interface received bandwidth | RDMA network card received bandwidth | Mbps | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
RdmaOuttraffic | RDMA network interface transmitted bandwidth | RDMA network interface transmitted bandwidth | Mbps | AppId InstanceId SubUin | [ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ] |
Parameter Name | Dimension Name | Dimension Explanation | Format |
Instances.N.Dimensions.0.Name | AppId | basic account information APPID dimension name | Enter the dimension name of String type: AppId (automatically selects during SDK call, no need to pass parameters) |
Instances.N.Dimensions.0.Value | AppId | Basic account information APPID | Input ID, for example: 1231231231 (automatically selects during SDK call, no need to pass parameters) |
Instances.N.Dimensions.1.Name | SubUin | sub-account ID dimension name | Enter the String-type dimension name: SubUin |
Instances.N.Dimensions.1.Value | SubUin | sub-account ID | Input the ID, for example: 100001231231 |
Instances.N.Dimensions.2.Name | InstanceId | development machine ID dimension name | Enter the String-type dimension name: InstanceId |
Instances.N.Dimensions.2.Value | InstanceId | Development machine ID | Input the specific instance ID, for example: nb-11521601712664xxxxx-9igs95i88a68 |
Instances.N.Dimensions.3.Name | InstanceGpuNum | Development machine. The dimension name of the GPU Card Number used by the instance (only for GPU whole card tasks). | Enter the String-type dimension name: InstanceGpuNum |
Instances.N.Dimensions.3.Value | InstanceGpuNum | Development machine. GPU Card Number used by the instance (only for GPU whole card tasks) | Instance ID concatenated with GPU Card Number/avg. Input specific instance ID, for example: nb-11521601712664xxxxx-9igs95i88a68-0 |
Feedback