Title | Metric | Unit | Description |
Nodes | NumActiveNMs | - | Number of live NodeManagers |
| NumDecommissionedNMs | - | Number of decommissioned NodeManagers |
| NumLostNMs | - | Number of lost NodeManagers |
| NumUnhealthyNMs | - | Number of unhealthy NodeManagers |
CPU cores | AllocatedVCores | - | Number of allocated VCores in the current queue |
| ReservedVCores | - | Number of reserved VCores in the current queue |
| AvailableVCores | - | Number of available VCores in the current queue |
| PendingVCores | - | Number of pending VCores in resource requests in the current queue |
Total applications | AppsSubmitted | - | Number of submitted jobs in the current queue |
| AppsRunning | - | Number of running jobs in the current queue |
| AppsPending | - | Number of pending jobs in the current queue |
| AppsCompleted | - | Number of completed jobs in the current queue |
| AppsKilled | - | Number of killed jobs in the current queue |
| AppsFailed | - | Number of failed jobs in the current queue |
| ActiveApplications | - | Number of active jobs in the current queue |
| running_0 | - | Number of running jobs in the current queue that have run for less than 60 minutes |
| running_60 | - | Number of running jobs in the current queue that have run for 60–300 minutes |
| running_300 | - | Number of running jobs in the current queue that have run for 300–1,440 minutes |
| running_1440 | - | Number of running jobs in the current queue that have run for more than 1,440 minutes |
Memory size | AllocatedMB | MB | Amount of allocated memory in the current queue |
| AvailableMB | MB | Amount of available memory in the current queue |
| PendingMB | MB | Amount of pending memory in resource requests in the current queue |
| ReservedMB | MB | Amount of reserved memory in the current queue |
Containers | AllocatedContainers | - | Number of allocated containers in the current queue |
| PendingContainers | - | Number of pending containers in resource requests in the current queue |
| ReservedContainers | - | Number of reserved containers in the current queue |
Total allocated/released containers | AggregateContainersAllocated | - | Total number of allocated containers in the current queue |
| AggregateContainersReleased | - | Total number of released containers in the current queue |
Users | ActiveUsers | - | Number of active users in the current queue |
Memory | allocatedMB | MB | Amount of allocated memory in the cluster |
| availableMB | MB | Amount of available memory in the cluster |
| reservedMB | MB | Amount of reserved memory in the cluster |
| totalMB | MB | Total amount of memory in the cluster |
Applications | completed | - | Number of completed jobs in the cluster during the statistical period |
| failed | - | Number of failed jobs in the cluster during the statistical period |
| killed | - | Number of killed jobs in the cluster during the statistical period |
| pending | - | Number of pending jobs in the cluster during the statistical period |
| running | - | Number of running jobs in the cluster during the statistical period |
| submitted | - | Number of submitted jobs in the cluster during the statistical period |
Containers | containersAllocated | - | Number of allocated containers in the cluster |
| containersPending | - | Number of pending containers in the cluster |
| containersReserved | - | Number of reserved containers in the cluster |
Memory utilization | usageRatio | % | Current memory utilization of the cluster |
Memory Utilization Size | configMemRatioMax_queue | % | The maximum proportion of the memory allocated to the queue |
| configMemRatio_queue | % | The proportion of the memory allocated to the queue |
The proportion of the memory to the cluster | configMemRatio_cluster | % | The rate of the memory allocated by the queue to the memory of the cluster |
| configMemMaxRatio_cluster | % | The rate of the maximum memory allocated for the queue to the memory of the cluster |
| usedMemRatio_cluster | % | The rate of the memory used by the queue to the memory of the cluster |
Cores | allocatedVirtualCores | - | Number of allocated CPU cores in the cluster |
| availableVirtualCores | - | Number of available CPU cores in the cluster |
| reservedVirtualCores | - | Number of reserved CPU cores in the cluster |
| totalVirtualCores | - | Total number of CPU cores in the cluster |
CPU utilization | usageRatio | % | Current CPU utilization of the cluster |
CPU Utilization Size | configVCoresRatioMax_queue | % | The maximum proportion of the CPU allocated to the queue |
| configVCoresRatio_queue | % | The proportion of the CPU allocated to the queue |
The proportion of the CPU to the cluster | configVCoresRatio_cluster | % | The proportion of the CPU allocated for the queue to the cluster CPU |
| configVCoresMaxRatio_cluster | % | The proportion of the maximum CPU allocated for the queue to the cluster CPU |
| usedVCoresRatio_cluster | % | The proportion of the CPU used by the queue to the cluster CPU |
Launched AMs | AMLaunchDelayNumOps | - | Launched AMs |
Average time for RM to launch AM | AMLaunchDelayAvgTime | ms | Average time for RM to launch AM |
Total registered AMs | AMRegisterDelayNumOps | - | Total registered AMs |
Average time for AM to register with RM | AMRegisterDelayAvgTime | ms | Average time for AM to register with RM |
Queue CPU utilization | YARN.RM.QUEUE.VCORES.RATIO | - | Utilization of CPU allocated for the current queue |
Queue memory utilization | YARN.RM.QUEUE.MEM.RATIO | - | Utilization of memory allocated for the current queue |
The percentage of available memory resource | availableMemPercentage | % | The percentage of currently available memory resources in cluster |
The percentage of pending container | containerPendingRatio | % | The percentage of pending container |
The percentage of available CPU | availableCoresPercentage | % | The percentage of available CPU |
Title | Metric | Unit | Description |
RPC authentications/authorizations | RpcAuthenticationFailures | - | Number of failed RPC authentications |
| RpcAuthenticationSuccesses | - | Number of successful RPC authentications |
| RpcAuthorizationFailures | - | Number of failed RPC authorizations |
| RpcAuthorizationSuccesses | - | Number of successful RPC authorizations |
Data received/sent by RPC | ReceivedBytes | bytes/s | Amount of data received by RPC |
| SentBytes | bytes/s | Amount of data sent by RPC |
RPC connections | NumOpenConnections | - | Current number of open connections |
RPC requests | RpcProcessingTimeNumOps | - | Number of RPC requests |
| RpcQueueTimeNumOps | - | Number of RPC requests |
RPC queue length | CallQueueLength | - | Length of the current RPC queue |
Average RPC processing time | RpcProcessingTimeAvgTime | s | Average RPC request processing time |
| RpcQueueTimeAvgTime | s | Average time of RPC in the queue |
GC count | YGC | - | Young GC count |
| FGC | - | Full GC count |
GC time | FGCT | s | Full GC time |
| GCT | s | Garbage collection time |
| YGCT | s | Young GC time |
Memory zone proportion | S0 | % | Percentage of used Survivor 0 memory |
| E | % | Percentage of used Eden memory |
| CCS | % | Percentage of used compressed class space memory |
| S1 | % | Percentage of used Survivor 1 memory |
| O | % | Percentage of used Old memory |
| M | % | Percentage of used Metaspace memory |
Heap memory utilization | MemHeapUsedRate | % | The percentage of the number of HeapMemory currently used by the JVM to the number of HeapMemory configured by the JVM |
JVM threads | ThreadsNew | - | Number of threads in NEW status |
| ThreadsRunnable | - | Number of threads in RUNNABLE status |
| ThreadsBlocked | - | Number of threads in BLOCKED status |
| ThreadsWaiting | - | Number of threads in WAITING status |
| ThreadsTimedWaiting | - | Number of threads in TIMED WAITING status |
| ThreadsTerminated | - | Number of threads in Terminated status |
JVM logs | LogFatal | - | Number of Fatal logs |
| LogError | - | Number of Error logs |
| LogWarn | - | Number of Warn logs |
| LogInfo | - | Number of Info logs |
JVM memory | MemNonHeapUsedM | MB | Non-heap memory size used by process |
| MemNonHeapCommittedM | MB | Non-heap memory size committed to process |
| MemHeapUsedM | MB | Heap memory size used by process |
| MemHeapCommittedM | MB | Heap memory size committed to process |
| MemHeapMaxM | MB | Maximum heap memory size available to process |
| MemMaxM | MB | Maximum memory size available to process |
CPU utilization | ProcessCpuLoad | % | CPU utilization |
Cumulative CPU usage time | ProcessCpuTime | ms | Cumulative CPU usage time |
File descriptors | MaxFileDescriptorCount | - | Maximum number of file descriptors |
| OpenFileDescriptorCount | - | Number of opened file descriptors |
Process execution duration | Uptime | s | Process execution duration |
Worker threads | DaemonThreadCount | - | Number of daemon threads in the process |
| ThreadCount | - | Number of threads in the process |
Node status | haState | 1: Active 0: Standby | ResourceManager active/standby status |
Active/Standby switch | switchOccurred | - | ResourceManager active/standby switch |
Number of CPU Cores under Tag | AllocatedVCores | Count | Number of VCore allocated to the current Tag |
Memory size under Tag | AllocatedMB | MBytes | Memory size allocated to the current Tag |
Number of audit log writing failures to ES | WriteEsFailed | Count | Number of audit log writing failures to ES |
Number of audit log writing successes to ES | WriteEsSuccess | Count | Number of audit log writing successes to ES |
Title | Metric | Unit | Description |
JVM threads | ThreadsNew | - | Number of threads in NEW status |
| ThreadsRunnable | - | Number of threads in RUNNABLE status |
| ThreadsBlocked | - | Number of threads in BLOCKED status |
| ThreadsWaiting | - | Number of threads in WAITING status |
| ThreadsTimedWaiting | - | Number of threads in TIMED WAITING status |
| ThreadsTerminated | - | Number of threads in Terminated status |
JVM logs | LogFatal | - | Number of FATAL-level logs |
| LogError | - | Number of ERROR-level logs |
| LogWarn | - | Number of WARN-level logs |
| LogInfo | - | Number of INFO-level logs |
JVM memory | MemNonHeapUsedM | MB | Non-heap memory size used by process |
| MemNonHeapCommittedM | MB | Non-heap memory size committed to process |
| MemHeapUsedM | MB | Heap memory size used by process |
| MemHeapCommittedM | MB | Heap memory size committed to process |
| MemHeapMaxM | MB | Maximum heap memory size available to process |
| MemMaxM | MB | Maximum memory size available to process |
Heap memory utilization | MemHeapUsedRate | % | The percentage of the number of HeapMemory currently used by the JVM to the number of HeapMemory configured by the JVM |
GC count | YGC | - | Young GC count |
| FGC | - | Full GC count |
GC time | FGCT | s | Full GC time |
| GCT | s | Garbage collection time |
| YGCT | s | Young GC time |
Memory zone proportion | S0 | % | Percentage of used Survivor 0 memory |
| E | % | Percentage of used Eden memory |
| CCS | % | Percentage of used compressed class space memory |
| S1 | % | Percentage of used Survivor 1 memory |
| O | % | Percentage of used Old memory |
| M | % | Percentage of used Metaspace memory |
CPU utilization | ProcessCpuLoad | % | CPU utilization |
Cumulative CPU usage time | ProcessCpuTime | ms | Cumulative CPU usage time |
File descriptors | MaxFileDescriptorCount | - | Maximum number of file descriptors |
| OpenFileDescriptorCount | - | Number of opened file descriptors |
Process execution duration | Uptime | s | Process execution duration |
Worker threads | DaemonThreadCount | - | Number of daemon threads in the process |
| ThreadCount | - | Number of threads in the process |
Current active connections | numActiveConnections | Count | Current active connections |
Total count of exceptions captured by the Shuffle service | numCaughtExceptions | Count | Total count of exceptions captured by the Shuffle service |
Direct memory used by the Shuffle service | usedDirectMemory | Bytes | Direct memory used by the Shuffle service |
Delay in fetching merged block metadata | fetchMergedBlocksMetaLatencyMillis_mean | ms | Average delay in fetching merged block metadata |
Final stage delay for merging Shuffle data | finalizeShuffleMergeLatencyMillis_mean | ms | Average final stage delay for merging Shuffle data |
Heap memory used by the Shuffle service | usedHeapMemory | Bytes | Heap memory used by the Shuffle service |
Delay in opening data blocks | openBlockRequestLatencyMillis_mean | Count | Average delay in opening data blocks |
Current number of registered client connections | numRegisteredConnections | Count | Current number of registered client connections |
Number of Executors registered to the Shuffle service | registeredExecutorsSize | Count | Number of Executors registered to the Shuffle service |
Executor registration request latency | registerExecutorRequestLatencyMillis_mean | ms | Average Executor registration request latency |
Title | Metric | Unit | Description |
JVM GC count | GcCount | count | JVM GC count |
JVM GC time | GcTimeMillis | ms | JVM GC time |
JVM memory | MemNonHeapUsedM | MB | Non-heap memory size used by the process |
| MemNonHeapCommittedM | MB | Non-heap memory size committed to the process |
| MemNonHeapMaxM | MB | Heap memory size used by the process |
| MemHeapUsedM | MB | Heap memory size committed to the process |
| MemHeapCommittedM | MB | Maximum heap memory size available to the process |
| MemHeapMaxM | MB | Non-heap memory size used by the process |
Get domain operand | Ops | count | Get domain operand |
Batch obtain domain operands | Ops | count | Batch obtain domain operands |
Batch get domains average time | Time | ms | Bulk get domains average time |
Get domain average time | Time | ms | Get domain average time |
Bulk get entities operand | Ops | count | Bulk get entities operand |
Get batch entities average time | Time | ms | Get batch entities average time |
Get entity operand | Ops | count | Get entity operand |
Get entity average time | Time | ms | Get entity average time |
Get batch events operand | Ops | count | Get batch events operand |
Get batch events average time | Time | ms | Get batch events average time |
Update batch entities operand | Ops | count | Update batch entities operand |
Update batch entities average time | Time | ms | Update batch entities average time |
Update domain operand | Ops | count | Update domain operand |
Update domain average time | Time | ms | Update domain average time |
Feedback