tencent cloud

腾讯云可观测平台

动态与公告
产品动态
产品简介
产品概述
产品优势
基本功能
基本概念
应用场景
使用限制
购买指南
云产品监控
应用性能监控
终端性能监控
前端性能监控
云拨测
Prometheus 监控服务
Grafana 服务
事件总线
云压测
快速入门
监控概览
实例分组
云产品监控
应用性能监控
云拨测
云压测
Prometheus 监控服务
Grafana 服务
创建 Dashboard
事件总线
告警服务
云产品监控
云产品监控指标
控制台操作指南
云服务器监控组件
云产品监控对接 Grafana
故障处理
实践教程
应用性能监控
应用性能监控简介
接入指南
控制台操作指南
实践教程
参考信息
常见问题
终端性能监控
终端性能监控概述
控制台操作指南
接入指南
实践教程
前端性能监控
前端性能监控简介
控制台操作指南
接入指南
常见问题
云拨测
产品简介
控制台操作指南
常见问题
云压测
云压测概述
控制台操作指南
实践教程
JavaScript API 列表
常见问题
Prometheus 监控
Prometheus 监控简介
接入指南
控制台操作指南
实践教程
Terraform
常见问题
Grafana 服务
产品简介
控制台操作指南
Grafana 平台常用功能指引
常见问题
Dashboard
什么是 Dashboard
控制台操作指南
告警管理
控制台操作指南
故障处理
常见问题
事件总线
事件总线简介
控制台操作指南
实践教程
常见问题
报表管理
常见问题
腾讯云可观测平台常见问题
告警服务相关
一般性问题
监控图表相关
云服务器监控组件相关
动态阈值告警相关
云监控对接 Grafana 相关
文档阅读指南
相关协议
应用性能监控服务等级协议
APM 隐私协议
APM 数据处理和安全协议
前端性能监控服务等级协议
终端性能监控服务等级协议
云拨测服务等级协议
Prometheus 监控服务服务等级协议
Grafana 服务服务等级协议
云压测服务等级协议
云压测使用限制
Cloud Monitor Service Level Agreement
API 文档
History
Introduction
API Category
Making API Requests
Monitoring Data Query APIs
Alarm APIs
Legacy Alert APIs
Notification Template APIs
TMP APIs
Grafana Service APIs
Event Center APIs
TencentCloud Managed Service for Prometheus APIs
Monitoring APIs
Data Types
Error Codes
词汇表

任务式建模

PDF
聚焦模式
字号
最后更新时间: 2025-05-22 16:59:39

命名空间

Namespace = QCE/TI_TRAINTASK

监控指标

指标英文名
指标中文名
说明
单位
维度
统计规则
[period, statType]
CfsClientDataReadBandwidth
turocfs 单节点服务端读带宽
turocfs 单节点服务端读带宽
KBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
CfsClientDataWriteBandwidth
turocfs 单节点服务端写带宽
turocfs 单节点服务端写带宽
KBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
CfsDataReadIoBytes
cfs 服务端读带宽
cfs 服务端读带宽
KBytes/s
InstanceIdAppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
CfsDataReadIoLatency
cfs 读延迟
cfs 读延迟
ms
InstanceIdAppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
CfsDataWriteIoBytes
cfs 服务端写带宽
cfs 服务端写带宽
KBytes/s
InstanceIdAppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
CfsDataWriteIoLatency
cfs 写延迟
cfs 写延迟
ms
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
CfsStrageUsageGb
cfs 存储数据容量
cfs 存储数据容量
GBytes
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Cpuutil
CPU 利用率
CPU 利用率
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DcgmFiDevFbUsed
显存使用量
显存使用量
MBytes
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DcgmFiDevGpuUtil
GPU 使用率
GPU 使用率
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DcgmFiDevMemCopyUtil
显存使用率
显存使用率
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DiskIoUtil
磁盘 ioutil
磁盘 ioutil
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DiskIoWait
磁盘 iowait
磁盘 iowait
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DiskReadByte
磁盘读取带宽
磁盘读取带宽
MBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DiskReadIops
磁盘读取 iops
磁盘读取 iops
Count
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DiskUsageRadio
系统盘分区利用率
系统盘分区利用率
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DiskWriteByte
磁盘写入带宽
磁盘写入带宽
MBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DiskWriteIops
磁盘写入 iops
磁盘写入 iops
Count
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Fp16EngineActivity
FP16活跃时间比
FP16活跃时间比
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Fp32EngineActivity
FP32活跃时间比
FP32活跃时间比
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Fp64EngineActivity
FP64活跃时间比
FP64活跃时间比
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuFp16EngineActivity
FP16活跃时间比
FP16活跃时间比
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuFp32EngineActivity
FP32活跃时间比
FP32活跃时间比
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuFp64EngineActivity
FP64活跃时间比
FP64活跃时间比
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Gpumemutil
GPU 显存利用率
GPU 显存利用率
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Gpumemvalue
显存使用量
显存使用量
MBytes
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuNvlinkBandwidth
nvlink 传输速率
nvlink 传输速率
Bytes/s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuPcieBandwidth
PCIe 总线传输速率
PCIe 总线传输速率
Bytes/s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuSmActivity
SM 活跃状态时间比
SM 活跃状态时间比
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuTensorActivity
Tensor 活跃状态时间比
Tensor 活跃状态时间比
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Gpuutil
GPU 利用率
GPU 利用率
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Instancecpuutil
CPU 利用率
CPU 利用率
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Instancegpumemutil
GPU 显存利用率
GPU 显存利用率
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Instancegpumemvalue
显存使用量
显存使用量
MBytes
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Instancegpuutil
GPU 利用率
GPU 利用率
%
AppId,InstanceId,SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Instancememutil
内存利用率
内存利用率
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Instancememvalue
内存使用量
内存使用量
MBytes
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Memutil
内存利用率
内存利用率
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Memvalue
内存用量
内存用量
MBytes
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
NvlinkBandwidth
nvlink 传输速率
nvlink 传输速率
Bytes/s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
PcieBandwidth
PCIe 总线传输速率
PCIe 总线传输速率
Bytes/s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
RdmaInpkt
RDMA 网卡入包量
RDMA 网卡入包量
pps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
RdmaIntraffic
RDMA 网卡接收带宽
RDMA 网卡接收带宽
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
RdmaOutpkt
RDMA 网卡出包量
RDMA 网卡出包量
pps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
RdmaOuttraffic
RDMA 网卡发送带宽
RDMA 网卡发送带宽
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
SmActivity
SM 活跃状态时间比
SM 活跃状态时间比
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskCfsClientDataReadBandwidth
turocfs 单节点服务端读带宽
turocfs 单节点服务端读带宽
KBytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskCfsClientDataWriteBandwidth
turocfs 单节点服务端写带宽
turocfs 单节点服务端写带宽
KBytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskCfsDataReadIoBytes
cfs 服务端读带宽
cfs 服务端读带宽
KBytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskCfsDataReadIoLatency
cfs 读延迟
cfs 读延迟
ms
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskCfsDataWriteIoBytes
cfs 服务端写带宽
cfs 服务端写带宽
KBytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskCfsDataWriteIoLatency
cfs 写延迟
cfs 写延迟
ms
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskCfsStrageUsageGb
cfs 存储数据容量
cfs 存储数据容量
GBytes
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskDiskIoUtil
磁盘 ioutil
磁盘 ioutil
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskDiskIoWait
磁盘 iowait
磁盘 iowait
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskDiskReadByte
磁盘读取带宽
磁盘读取带宽
MBytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskDiskReadIops
磁盘读取 iops
磁盘读取 iops
Count
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskDiskUsageRadio
系统盘分区利用率
系统盘分区利用率
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskDiskWriteByte
磁盘写入带宽
磁盘写入带宽
MBytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskDiskWriteIops
磁盘写入iops
磁盘写入iops
Count
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskFp16EngineActivity
FP16活跃时间比
FP16活跃时间比
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskFp32EngineActivity
FP32活跃时间比
FP32活跃时间比
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskFp64EngineActivity
FP64活跃时间比
FP64活跃时间比
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskNvlinkBandwidth
nvlink 传输速率
nvlink 传输速率
Bytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskPcieBandwidth
PCIe 总线传输速率
PCIe 总线传输速率
Bytes/s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskRdmaInpkt
RDMA 网卡入包量
RDMA 网卡入包量
pps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskRdmaIntraffic
RDMA 网卡接收带宽
RDMA 网卡接收带宽
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskRdmaOutpkt
RDMA 网卡出包量
RDMA 网卡出包量
pps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskRdmaOuttraffic
RDMA 网卡发送带宽
RDMA 网卡发送带宽
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskSmActivity
SM 活跃状态时间比
SM 活跃状态时间比
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskTensorActivity
Tensor 活跃状态时间比
Tensor 活跃状态时间比
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TensorActivity
Tensor 活跃状态时间比
Tensor 活跃状态时间比
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuDecUtil
GPU 解码使用率
GPU 解码使用率
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuEncUtil
GPU 编码器使用率
GPU 编码器使用率
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuMemoryClock
GPU 显存频率
GPU 显存频率
s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuMemoryFree
GPU 显存空闲量
GPU 显存空闲量
MBytes
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuMemoryUtil
显存使用率
显存使用率
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuNvlinkRxMb
nvlink 接收数据量
nvlink 接收数据量
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuNvlinkTxMb
nvlink 发送数据量
nvlink 发送数据量
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuPcieRxMb
pcie 接收数据量
pcie 接收数据量
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuPcieTxMb
pcie 发送数据量
pcie 发送数据量
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuSmClock
SM 时钟频率
SM 时钟频率
s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuDecUtil
GPU 解码使用率
GPU 解码使用率
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuEncUtil
GPU 编码器使用率
GPU 编码器使用率
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuMemoryClock
GPU 显存频率
GPU 显存频率
s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuMemoryFree
GPU 显存空闲量
GPU 显存空闲量
MBytes
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuMemoryUtil
GPU 显存带宽使用率
GPU 显存带宽使用率
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuNvlinkRxMb
nvlink 接收数据量
nvlink 接收数据量
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuNvlinkTxMb
nvlink 发送数据量
nvlink 发送数据量
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuPcieRxMb
pcie 接收数据量
pcie 接收数据量
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuPcieTxMb
pcie 发送数据量
pcie 发送数据量
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuSmClock
SM 时钟频率
SM 时钟频率
s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuDecUtilGpu
GPU 解码使用率
GPU 解码使用率
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuEncUtilGpu
GPU 编码器使用率
GPU 编码器使用率
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuMemoryClockGpu
GPU 显存频率
GPU 显存频率
s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskGpuMemoryFreeGpu
GPU 显存空闲量
GPU 显存空闲量
MBytes
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuMemoryUtilGpu
GPU 显存带宽使用率
GPU 显存带宽使用率
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuNvlinkRxMbGpu
nvlink 接收数据量
nvlink 接收数据量
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuNvlinkTxMbGpu
nvlink 发送数据量
nvlink 发送数据量
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuPcieRxMbGpu
pcie 接收数据量
pcie 接收数据量
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuPcieTxMbGpu
pcie 发送数据量
pcie 发送数据量
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuSmClockGpu
SM 时钟频率
SM 时钟频率
s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]

各维度对应参数总览

参数名称
维度名称
维度解释
格式
Instances.N.Dimensions.0.Name
AppId
账号基本信息 APPID 的维度名称
输入 String 类型维度名称:AppId(SDK 调用时会自动获取,无需传参)
Instances.N.Dimensions.0.Value
AppId
账号基本信息 APPID
输入 ID,例如:1231231231(SDK 调用时会自动获取,无需传参)
Instances.N.Dimensions.1.Name
SubUin
子账号 ID 的维度名称
输入 String 类型维度名称:SubUin
Instances.N.Dimensions.1.Value
SubUin
子账号 ID
输入 ID,例如:100001231231
Instances.N.Dimensions.2.Name
InstanceId
训练任务实例 ID 的维度名称
输入 String 类型维度名称:InstanceId
Instances.N.Dimensions.2.Value
InstanceId
训练任务实例 ID
输入具体实例 ID,例如:train-9187850047592xxxxx-9ludoo1s1n9c-master-0
Instances.N.Dimensions.3.Name
InstanceGpuNum
训练任务实例使用的 GPU 卡号(仅限 GPU 整卡任务)的维度名称
输入 String 类型维度名称:InstanceGpuNum
Instances.N.Dimensions.3.Value
InstanceGpuNum
训练任务实例使用的 GPU 卡号(仅限 GPU 整卡任务)
训练任务实例 ID 拼接 GPU 卡号/avg,输入具体实例 ID,例如:train-9187850047592xxxxx-9ludoo1s1n9c-master-0-0,train-9187850047592xxxxx-9ludoo1s1n9c-master-0-avg
Instances.N.Dimensions.4.Name
TaskId
训练任务实例的维度名称
输入 String 类型维度名称:TaskId
Instances.N.Dimensions.4.Value
TaskId
训练任务实例
输入 ID,例如:train-9187850047592xxxxx

入参说明

查询任务式建模指标监控数据,取值如下:
&Namespace=QCE/TI_TRAINTASK
&Instances.N.Dimensions.0.Name=AppId
&Instances.N.Dimensions.0.Value=具体的账号 ID
&Instances.N.Dimensions.1.Name=SubUin
&Instances.N.Dimensions.1.Value=具体的子账号 ID
&Instances.N.Dimensions.2.Name=InstanceId
&Instances.N.Dimensions.2.Value=训练任务实例 ID
&Instances.N.Dimensions.3.Name=InstanceGpuNum
&Instances.N.Dimensions.3.Value=训练任务实例使用的 GPU 卡号
&Instances.N.Dimensions.4.Name=TaskId
&Instances.N.Dimensions.4.Value=训练任务实例

帮助和支持

本页内容是否解决了您的问题?

填写满意度调查问卷,共创更好文档体验。

文档反馈