Title | Metric | Unit | Description |
Cluster regions in RIT status | ritCount | - | Number of regions in transition |
| ritCountOverThreshold | - | Number of regions that have been in transition for more than the threshold time |
Cluster RIT time | ritOldestAge | ms | Age of the longest region in transition |
Average number of regions per RegionServer | averageLoad | - | Average number of regions per RegionServer |
Cluster RegionServers | numRegionServers | - | Number of live RegionServers |
| numDeadRegionServers | - | Number of dead RegionServers |
Data read/written from/to HMaster | receivedBytes | bytes/s | Amount of data received by cluster |
| sentBytes | bytes/s | Amount of data sent by cluster |
Total cluster API requests | clusterRequests | count/s | Total number of cluster requests |
Cluster assignment manager operation | Assign_num_ops | - | Number of region assignments |
| BulkAssign_num_ops | - | Number of bulk region assignments |
Cluster load balancing operations | BalancerCluster_num_ops | - | Number of cluster load balancing operations |
Title | Metric | Unit | Description |
GC count | YGC | - | Young GC count |
| FGC | - | Full GC count |
GC time | FGCT | s | Full GC time |
| GCT | s | Garbage collection time |
| YGCT | s | Young GC time |
Memory zone proportion | S0 | % | Percentage of used Survivor 0 memory |
| E | % | Percentage of used Eden memory |
| CCS | % | Percentage of used compressed class space memory |
| S1 | % | Percentage of used Survivor 1 memory |
| O | % | Percentage of used Old memory |
| M | % | Percentage of used Metaspace memory |
JVM logs | LogFatal | - | Number of Fatal logs |
| LogError | - | Number of Error logs |
| LogWarn | - | Number of Warn logs |
| LogInfo | - | Number of Info logs |
JVM memory | MemNonHeapUsedM | MB | Non-heap memory size used by process |
| MemNonHeapCommittedM | MB | Non-heap memory size committed to process |
| MemHeapUsedM | MB | Heap memory size used by process |
| MemHeapCommittedM | MB | Heap memory size committed to process |
| MemHeapMaxM | MB | Maximum heap memory size available to process |
| MemMaxM | MB | Maximum memory size available to process |
Heap memory utilization | MemHeapUsedRate | % | The percentage of the number of HeapMemory currently used by the JVM to the number of HeapMemory configured by the JVM |
JVM threads | ThreadsNew | - | Number of threads in NEW status |
| ThreadsRunnable | - | Number of threads in RUNNABLE status |
| ThreadsBlocked | - | Number of threads in BLOCKED status |
| ThreadsWaiting | - | Number of threads in WAITING status |
| ThreadsTimedWaiting | - | Number of threads in TIMED WAITING status |
| ThreadsTerminated | - | Number of threads currently in TERMINATED status |
RPC connections | numOpenConnections | - | Number of RPC connections |
RPC exceptions | FailedSanityCheckException | - | Number of FailedSanityCheckException exceptions |
| NotServingRegionException | - | Number of NotServingRegionException exceptions |
| OutOfOrderScannerNextException | - | Number of OutOfOrderScannerNextException exceptions |
| RegionMovedException | - | Number of RegionMovedException exceptions |
| RegionTooBusyException | - | Number of RegionTooBusyException exceptions |
| UnknownScannerException | - | Number of UnknownScannerException exceptions |
RPC queue requests | numCallsInPriorityQueue | - | Number of requests in the general queue |
| numCallsInReplicationQueue | - | Number of RPC requests in the replication queue |
Process start time | masterActiveTime | s | Master active time |
| masterStartTime | s | Master process start time |
Title | Metric | Unit | Description |
GC count | YGC | - | Young GC count |
| FGC | - | Full GC count |
GC time | FGCT | s | Full GC time |
| GCT | s | Garbage collection time |
| YGCT | s | Young GC time |
Memory zone proportion | S0 | % | Percentage of used Survivor 0 memory |
| E | % | Percentage of used Eden memory |
| CCS | % | Percentage of used compressed class space memory |
| S1 | % | Percentage of used Survivor 1 memory |
| O | % | Percentage of used Old memory |
| M | % | Percentage of used Metaspace memory |
JVM logs | LogFatal | - | Number of Fatal logs |
| LogError | - | Number of Error logs |
| LogWarn | - | Number of Warn logs |
| LogInfo | - | Number of Info logs |
JVM memory | MemNonHeapUsedM | MB | Non-heap memory size used by process |
| MemNonHeapCommittedM | MB | Non-heap memory size committed to process |
| MemHeapUsedM | MB | Heap memory size used by process |
| MemHeapCommittedM | MB | Heap memory size committed to process |
| MemHeapMaxM | MB | Maximum heap memory size available to process |
| MemMaxM | MB | Maximum memory size available to process |
Heap memory utilization | MemHeapUsedRate | % | The percentage of the number of HeapMemory currently used by the JVM to the number of HeapMemory configured by the JVM |
JVM threads | ThreadsNew | - | Number of threads in NEW status |
| ThreadsRunnable | - | Number of threads in RUNNABLE status |
| ThreadsBlocked | - | Number of threads in BLOCKED status |
| ThreadsWaiting | - | Number of threads in WAITING status |
| ThreadsTimedWaiting | - | Number of threads in TIMED WAITING status |
| ThreadsTerminated | - | Number of threads currently in TERMINATED status |
Regions | regionCount | - | Number of regions |
Region localization | percentFilesLocal | % | Percentage of HFiles on the local HDFS data node in the region |
Region replica localization | percentFilesLocalSecondaryRegions | % | Percentage of HFiles on the local HDFS data node in the region replica |
RPC authentications | authenticationFailures | - | Number of RPC authentication failures |
| authenticationSuccesses | - | Number of RPC authentication successes |
RPC connections | numOpenConnections | - | Number of RPC connections |
RPC exceptions | FailedSanityCheckException | - | Number of FailedSanityCheckException exceptions |
| NotServingRegionException | - | Number of NotServingRegionException exceptions |
| OutOfOrderScannerNextException | - | Number of OutOfOrderScannerNextException exceptions |
| RegionMovedException | - | Number of RegionMovedException exceptions |
| RegionTooBusyException | - | Number of RegionTooBusyException exceptions |
| UnknownScannerException | - | Number of UnknownScannerException exceptions |
RPC handlers | numActiveHandler | - | Number of active RPC handlers |
| numActiveWriteHandler | - | Number of active read RPC handlers |
| numActiveReadHandler | - | Number of active write RPC handlers |
| numActiveScanHandler | - | Number of active scan RPC handlers |
| | | |
RPC queue requests | numCallsInPriorityQueue | - | Number of requests in the priority queue |
| numCallsInReplicationQueue | - | Number of RPC requests in the replication queue |
| numCallsInPriorityQueue | - | Number of requests in the general queue |
| numCallsInWriteQueue | - | Number of RPC calls in the write call queue |
| numCallsInReadQueue | - | Number of RPC calls in the read call queue |
| numCallsInScanQueue | - | Number of RPC calls in the scan call queue |
WAL files | hlogFileCount | - | Number of WAL files |
WAL file size | hlogFileSize | Byte | WAL file size |
MemStore size | memStoreSize | MB | MemStore size |
Stores | storeCount | - | Number of stores |
StoreFiles | storeFileCount | - | Number of StoreFiles |
StoreFile size | storeFileSize | MB | StoreFile size |
Disk write rate | flushedCellsSize | bytes/s | Disk write rate |
Average latency | Append_mean | ms | Average Append latency |
| Replay_mean | ms | Average Replay latency |
| Get_mean | ms | Average GET latency |
| updatesBlockedTime | ms | Number of milliseconds updates have been blocked so the memstore can be flushed |
RegionServer disk writes | FlushTime_num_ops | - | Number of MemStore flushes |
Requests in operation queue | splitQueueLength | - | Length of the split queue |
| compactionQueueLength | - | Length of the compaction queue |
| flushQueueLength | - | Length of the region flush queue |
Replay operations | Replay_num_ops | - | Number of Replay operations |
Slow operations | slowAppendCount | - | Number of Append requests that took over 1s to complete |
| slowDeleteCount | - | Number of Delete requests that took over 1s to complete |
| slowGetCount | - | Number of Get requests that took over 1s to complete |
| slowIncrementCount | - | Number of Increment requests that took over 1s to complete |
| slowPutCount | - | Number of Put requests that took over 1s to complete |
Split request | splitRequestCount | - | Number of split requested |
| splitSuccessCount | - | Number of successfully executed splits |
Cache blocks | blockCacheCount | - | Number of blocks in the block cache |
| blockCacheHitCount | - | Number of block cache hits |
| blockCacheMissCount | - | Number of block cache misses |
Cache read hit rate | blockCacheExpressHitPercent | % | Cache read hit rate |
Memory size used by the cache block | blockCacheSize | Byte | Memory size used by the cache block |
Index size | staticBloomSize | Byte | Uncompressed size of static bloom filters |
| staticIndexSize | Byte | Uncompressed size of static indexes |
| storeFileIndexSize | Byte | Size of indexes in StoreFiles on disk |
Received bytes | receivedBytes | bytes/s | Received bytes |
| sentBytes | bytes/s | Sent bytes |
Read and write requests | Total | count/s | Total number of requests. When there are scan requests, this value will be smaller than the sum of read and write requests |
| Read | count/s | Number of read requests |
| Write | count/s | Number of write requests |
| Append_num_ops | count/s | Number of Append requests |
| Mutate_num_ops | count/s | Number of Mutate requests |
| Delete_num_ops | count/s | Number of Delete requests |
| Increment_num_ops | count/s | Number of Increment requests |
| Get_num_ops | count/s | Number of Get requests |
| Put_num_ops | count/s | Number of Put requests |
| ScanTime_num_ops | count/s | Scan requests (time) |
| ScanSize_num_ops | count/s | Scan requests (size) |
Mutations | mutationsWithoutWALCount | - | Number of mutations |
Mutation size | mutationsWithoutWALSize | Byte | Mutation size |
Process start time | regionServerStartTime | s | Process start time |
Log sync | source.sizeOfLogQueue | - | Total length of synced logs |
Sync duration | source.ageOfLastShippedOp | ms | Sync duration |
Requests | ReadRequestCount | count/s | Read requests/s |
| WriteRequestCount | count/s | Write requests/s |
Requests | Read | count/s | Read requests/s |
| Write | count/s | Write requests/s |
Store size | memstoreSize | Byte | MemStore size |
| storeFileSize | Byte | StoreFile size |
Table-level request latency | getTime_99th_percentile | ms | 99th percentile of request processing latency |
| scanTime_99th_percentile | ms | 99th percentile of request processing latency |
| putTime_99th_percentile | ms | 99th percentile of request processing latency |
| incrementTime_99th_percentile | ms | 99th percentile of request processing latency |
| appendTime_99th_percentile | ms | 99th percentile of request processing latency |
| deleteTime_99th_percentile | ms | 99th percentile of request processing latency |
Request processing latency | 99th_percentile | ms | 99th percentile of request processing latency |
| 99.9th_percentile | ms | 99.9% request processing latency |
Request queueing latency | 99th_percentile | ms | 99th percentile of request queueing latency |
| 99.9th_percentile | ms | 99.9% request queueing latency |
Scan size | max | bytes | Maximum scan size |
| mean | bytes | Average scan size |
| min | bytes | Minimum scan size |
Scan time | max | s | Maximum scan time |
| mean | s | Average scan time |
| min | s | Minimum scan time |
Bulkload latency | 99th_percentile | ms | Bulkload latency |
| 999th_percentile | ms | ms |
Append latency | 99th_percentile | ms | Append latency |
| 999th_percentile | ms | ms |
Delete latency | 99th_percentile | ms | Delete latency |
| 999th_percentile | ms | ms |
MultiGet latency | 99th_percentile | ms | MultiGet latency |
| 999th_percentile | ms | ms |
Get latency | 99th_percentile | ms | Get latency |
| 999th_percentile | ms | ms |
PutBatch Latency | 99th_percentile | ms | PutBatch Latency |
| 999th_percentile | ms | ms |
Put latency | 99th_percentile | ms | Put latency |
| 999th_percentile | ms | ms |
Increment latency | 99th_percentile | ms | Increment latency |
| 999th_percentile | ms | ms |
Compacted count rate | MinorCompactedCells | per second | Average number of minor compacted cells per second |
| MajorCompactedCells | per second | Average number of major compacted cells per second |
Compacted size rate | Minor CompactedCells | bytes/s | Average size of minor compacted cells per second |
| MajorCompactedCells | bytes/s | Average size of major compacted cells per second |
Region localization | percentFilesLocal | % | Percentage of HFiles on the local HDFS data node in the region |
Average latency | updatesBlockedTime | ms | Number of milliseconds updates have been blocked so the memStore can be flushed |
pauseThresholdExceeded | info | count | INFO-level pause alarm count |
| warn | count | WARN-level pause alarm count |
Number of non-GC pause operations | ops | count | Number of non-GC pause operations |
Maximum duration of non-GC pauses | max | ms | Maximum duration of non-GC pauses |
Number of GC pause operations | ops | count | Number of GC pause operations |
Maximum duration of GC pauses | max | ms | Maximum duration of GC pauses |
L1 cache hits per second | l1HitCount | count/s | L1 cache hits per second |
L1 cache misses per second | l1MissCount | count/s | L1 cache misses per second |
L1 cache hit rate | l1HitRatio | % | L1 cache hit rate |
L2 cache hits per second | l2HitCount | count/s | L2 cache hits per second |
L2 cache misses per second | l2MissCount | count/s | L2 cache misses per second |
L2 cache hit rate | l2HitRatio | % | L2 cache hit rate |
Data synchronization latency time | ageOfLastShippedOp | ms | Latency of the last successfully replicated WAL log operation from the source cluster |
| Sink_ageOfLastAppliedOp_max | ms | Latency of the last successfully applied replication operation from the source cluster |
Number of synchronized WAL files | source_sizeOfLogQueue | count | Number of pending WAL files in the replication queue of the source cluster |
| source_completedLogs | count | Number of WAL files successfully acknowledged and sent to associated nodes |
| source_uncleanlyClosedLogs | count | Number of WAL files considered completed by the replication system when facing improperly closed files |
| source_ignoredUncleanlyClosedLogContentsInBytes | bytes | Number of partially serialized entry bytes remaining at the end of skipped files when WAL files are not properly closed, as determined by the replication system |
| source_restartedLogReading | Times | Number of times the replication system detected properly closed WAL files that could not be correctly parsed |
| source_closedLogsWithUnknownFileLength | count | Number of WAL files where the replication system reached the end of the file but could not determine the file length |
Number of synchronized operations | source_shippedOps | count | Number of transmitted change operations |
| source_logEditsRead | count | Number of change operations read by the source cluster from WAL files |
Number of data synchronization recovery queues | source_completedRecoverQueues | count | Number of recovery queues that have completed synchronization after the source cluster RegionServer crash |
Title | Metric Name | Metric Unit | Metric Meaning |
GC count | YGC | Times | Young GC count |
| FGC | Times | Full GC count |
GC time | FGCT | s | Full GC consumption time |
| GCT | s | Garbage collection consumption time |
| YGCT | s | Young GC consumption time |
Memory zone proportion | S0 | % | Percentage of used Survivor 0 memory |
| E | % | Percentage of used Eden memory |
| CCS | % | Percentage of used compressed class space memory |
| S1 | % | Percentage of used Survivor 1 memory |
| O | % | Percentage of used Old memory |
| M | % | Percentage of used Metaspace memory |
Number of JVM threads | ThreadsNew | - | Number of threads in NEW status |
| ThreadsRunnable | - | Number of threads in RUNNABLE status |
| ThreadsBlocked | - | Number of threads in BLOCKED status |
| ThreadsWaiting | - | Number of threads in WAITING status |
| ThreadsTimedWaiting | - | Number of threads in TIMED WAITING status |
| ThreadsTerminated | - | Number of threads currently in TERMINATED status |
Number of JVM logs | LogFatal | Times | Number of FATAL-level logs |
| LogError | Times | Number of ERROR-level logs |
| LogWarn | Times | Number of WARN-level logs |
| LogInfo | Times | Number of INFO-level logs |
JVM memory | MemNonHeapUsedM | MB | Non-heap memory size used by the process |
| MemNonHeapCommittedM | MB | Non-heap memory size committed to the process |
| MemHeapUsedM | MB | Heap memory size used by the process |
| MemHeapCommittedM | MB | Heap memory size committed to the process |
| MemHeapMaxM | MB | Maximum heap memory size available to the process |
| MemMaxM | MB | Maximum memory size available to the process |
Heap memory utilization | MemHeapUsedRate | % | Percentage of used heap memory |
Average waiting duration of the Thrift request queue | mean | ms | Average waiting duration of the Thrift request queue |
Waiting length for the Thrift request | len | count | Waiting length for the Thrift request |
Thrift cumulative request volume | ops | count | Thrift cumulative request volume |
Feedback