tencent cloud

Feedback

HDFS Monitoring Metrics

Last updated: 2022-07-04 11:23:37

    HDFS - Overview

    Title Metric Unit Description
    Cluster storage capacity CapacityTotal GB Total cluster storage capacity
    CapacityUsed GB Used cluster storage capacity
    CapacityRemaining GB Remaining cluster storage capacity
    CapacityUsedNonDFS GB Non-HDFS used cluster capacity
    Cluster load TotalLoad 1 Number of current connections
    Total files in cluster FilesTotal - Total number of files
    Blocks BlocksTotal - Total number of blocks
    PendingReplicationBlocks - Number of blocks waiting to be backed up
    UnderReplicatedBlocks - Number of blocks with insufficient replicas
    CorruptBlocks - Number of corrupted blocks
    ScheduledReplicationBlocks - Number of blocks arranged for backup
    PendingDeletionBlocks - Number of blocks waiting to be deleted
    ExcessBlocks - Number of excess blocks
    PostponedMisreplicatedBlocks - Number of abnormal blocks postponed to be processed
    Block capacity BlockCapacity - Block capacity
    Cluster data node NumLiveDataNodes - Number of live data nodes
    NumDeadDataNodes - Number of data nodes marked as dead
    NumDecomLiveDataNodes - Number of decommissioned live nodes
    NumDecomDeadDataNodes - Number of decommissioned dead nodes
    NumDecommissioningDataNodes - Number of decommissioning nodes
    NumStaleDataNodes - Number of DataNodes marked as stale
    HDFS storage space utilization CapacityUsedRate - HDFS cluster storage space utilization
    Snapshots Snapshots - Number of snapshots
    Disk failure VolumeFailuresTotal - Total number of volume failures across all DataNodes

    HDFS - NameNode

    Title Metric Unit Description
    Data traffic ReceivedBytes Bytes/s Data receiving rate
    SentBytes Bytes/s Data sending rate
    QPS RpcQueueTimeNumOps 1/s RPC call rate
    Request processing latency RpcQueueTimeAvgTime ms Average RPC latency
    RpcProcessingTimeAvgTime ms Average RPC request processing time
    Authentication and authorization RpcAuthenticationFailures - Number of RPC authentication failures
    RpcAuthenticationSuccesses - Number of RPC authentication successes
    RpcAuthorizationFailures - Number of RPC authorization failures
    RpcAuthorizationSuccesses - Number of RPC authorization successes
    Current connections NumOpenConnections - Number of current connections
    Length of RPC processing queue CallQueueLength - Length of current RPC processing queue
    JVM memory MemNonHeapUsedM MB Size of NonHeapMemory currently used by JVM
    MemNonHeapCommittedM MB Size of NonHeapCommittedM configured by JVM
    MemHeapUsedM MB Size of HeapMemory currently used by JVM
    MemHeapCommittedM MB Committed size of JVM HeapMemory
    MemHeapMaxM MB Size of HeapMemory configured by JVM
    MemMaxM MB Maximum size of memory available to JVM runtime
    Block reporting latency BlockReportAvgTime count/s Average latency of processing DataNode blocks per second
    JVM threads ThreadsNew - Number of threads in NEW status
    ThreadsRunnable - Number of threads in RUNNABLE status
    ThreadsBlocked - Number of threads in BLOCKED status
    ThreadsWaiting - Number of threads in WAITING status
    ThreadsTimedWaiting - Number of threads in TIMED WAITING status
    ThreadsTerminated - Number of threads in Terminated status
    JVM logs LogFatal - Number of FATAL-level logs
    LogError - Number of ERROR-level logs
    LogWarn - Number of WARN-level logs
    LogInfo - Number of INFO-level logs
    GC count YGC - Young GC count
    FGC - Full GC count
    GC time FGCT s Full GC time
    GCT s Garbage collection time
    YGCT s Young GC time
    Memory zone proportion S0 % Percentage of used Survivor 0 memory
    S1 % Percentage of used Survivor 1 memory
    E % Percentage of used Eden memory
    O % Percentage of used Old memory
    M % Percentage of used Metaspace memory
    CCS % Percentage of used compressed class space memory
    Storages marked as content stale NumStaleStorages - Number of DataNode storages marked as content stale
    Pending block-related messages for later processing on the standby NameNode PendingDataNodeMessageCount count/s Number of DataNode requests queued on the standby NameNode
    Missing blocks NumberOfMissingBlocks - Number of missing blocks
    NumberOfMissingBlocksWithReplicationFactorOne - Number of missing blocks (rf = 1)
    Snapshot operation AllowSnapshotOps count/s Number of AllowSnapshot operations executed per second
    DisallowSnapshotOps count/s Number of DisallowSnapshot operations executed per second
    CreateSnapshotOps count/s Number of CreateSnapshot operations executed per second
    DeleteSnapshotOps count/s Number of DeleteSnapshot operations executed per second
    ListSnapshottableDirOps count/s Number of ListSnapshottableDir operations executed per second
    SnapshotDiffReportOps count/s Number of SnapshotDiffReportOps operations executed per second
    RenameSnapshotOps count/s Number of RenameSnapshotOps operations executed per second
    File operation CreateFileOps count/s Number of CreateFile operations executed per second
    GetListingOps count/s Number of GetListing operations executed per second
    TotalFileOps count/s Number of TotalFileOps operations executed per second
    DeleteFileOps count/s Number of DeleteFile operations executed per second
    FileInfoOps count/s Number of FileInfo operations executed per second
    GetAdditionalDatanodeOps count/s Number of GetAdditionalDatanode operations executed per second
    CreateSymlinkOps count/s Number of CreateSymlink operations executed per second
    GetLinkTargetOps count/s Number of GetLinkTarget operations executed per second
    FilesInGetListingOps count/s Number of FilesInGetListing operations executed per second
    File statistics FilesDeleted count Number of deleted or renamed files and folders
    FilesCreated count Number of created files and folders
    FilesAppended count Number of appended files
    Transaction operation TransactionsNumOps count/s Number of journal transaction operations processed per second
    TransactionsBatchedInSync count/s Number of journal transaction operations batch processed per second
    Image operation GetEditNumOps count/s Number of GetEditNumOps operations executed per second
    GetImageNumOps count/s Number of GetImageNumOps operations executed per second
    PutImageNumOps count/s Number of PutImageNumOps operations executed per second
    Sync operation SyncsNumOps count/s Number of journal sync operations processed per second
    Block operation BlockReceivedAndDeletedOps count/s Number of BlockReceivedAndDeletedOps operations executed per second
    BlockOpsQueued count/s Number of processed DataNode block reporting operations
    Cache reporting CacheReportNumOps count/s Number of CacheReport operations processed per second
    Block reporting BlockReportNumQps count/s Number of DataNode block reporting operations processed per second
    Sync operation latency SyncsAvgTime ms Average latency of processing journal sync operations
    Cache reporting latency CacheReportAvgTime ms Average latency of cache reporting
    Image operation latency GetEditAvgTime ms Average latency of reading Edit files
    GetImageAvgTime ms Average latency of reading image files
    PutImageAvgTime ms Average latency of writing image files
    Transaction operation latency TransactionsAvgTime ms Average latency of processing journal transaction operations
    Start time StartTime ms Process start time
    Active/Standby status State 1 NameNode HA status
    Active/Standby status State 1: Active. 0: Standby NameNode active/standby status
    Threads PeakThreadCount - Peak number of threads
    ThreadCount - Number of threads
    DaemonThreadCount - Number of backend threads
    Transactions since the last checkpoint SinceLastCheckpoint count Total number of transactions since the last checkpoint
    Checkpoint time LastCheckpoint time Time since the last checkpoint
    Length of the queue waiting for file locks LockQueueLength count LockQueueLength - length of the queue waiting for file locks
    Average RPC time (1) CompleteAvgTime ms Average latency of Complete requests
    CreateAvgTime ms Average latency of Create requests
    RenameAvgTime ms Average latency of Rename requests
    AddBlockAvgTime ms Average latency of AddBlock requests
    GetListingAvgTime ms Average latency of GetListing requests
    GetFileInfoAvgTime ms Average latency of GetFileInfo requests
    SendHeartbeatAvgTime ms Average latency of SendHeartbeat requests
    Average RPC time (2) RegisterDatanodeAvgTime ms Average latency of RegisterDatanode requests
    BlockReportAvgTime ms Average latency of BlockReport requests
    DeleteAvgTime ms Average latency of Delete requests
    RenewLeaseAvgTime ms Average latency of RenewLease requests
    BlockReceivedAndDeletedAvgTime ms Average latency of BlockReceivedAndDeleted requests
    FsyncAvgTime ms Average latency of fsync requests
    VersionRequestAvgTime ms Average latency of VersionRequest requests
    Average RPC time (3) ListEncryptionZonesAvgTime ms Average latency of ListEncryptionZones requests
    SetPermissionAvgTime ms Average latency of SetPermission requests
    SetTimesAvgTime ms Average latency of SetTimes requests
    SetSafeModeAvgTime ms Average latency of SetSafeMode requests
    MkdirsAvgTime ms Average latency of Mkdirs requests
    GetServerDefaultsAvgTime ms Average latency of GetServerDefaults requests
    GetBlockLocationsAvgTime ms Average latency of GetBlockLocations requests
    RPC statistics (1) CompleteNumOps count/s Number of Complete calls per second
    CreateNumOps count/s Number of Create calls per second
    RenameNumOps count/s Number of Rename calls per second
    AddBlockNumOps count/s Number of AddBlock calls per second
    GetListingNumOps count/s Number of GetListing calls per second
    GetFileInfoNumOps count/s Number of GetFileInfo calls per second
    SendHeartbeatNumOps count/s Number of SendHeartbeat calls per second
    RPC statistics (2) RegisterDatanodeNumOps count/s Number of RegisterDatanode calls per second
    BlockReportNumOps count/s Number of BlockReport calls per second
    DeleteNumOps count/s Number of Delete calls per second
    RenewLeaseNumOps count/s Number of RenewLease calls per second
    BlockReceivedAndDeletedNumOps count/s Number of BlockReceivedAndDeleted calls per second
    FsyncNumOps count/s Number of fsync calls per second
    VersionRequestNumOps count/s Number of VersionRequest calls per second
    RPC statistics (3) ListEncryptionZonesNumOps count/s Number of ListEncryptionZones calls per second
    SetPermissionNumOps count/s Number of SetPermission calls per second
    SetTimesNumOps count/s Number of SetTimes calls per second
    SetSafeModeNumOps count/s Number of SetSafeMode calls per second
    MkdirsNumOps count/s Number of Mkdirs calls per second
    GetServerDefaultsNumOps count/s Number of GetServerDefaults calls per second
    GetBlockLocationsNumOps count/s Number of GetBlockLocations calls per second

    HDFS - DataNode

    Title Metric Unit Description
    Xceivers XceiverCount - Number of Xceivers
    Data read/write rate BytesWrittenMB Bytes/s DataNode byte write rate
    BytesReadMB Bytes/s DataNode byte read rate
    RemoteBytesReadMB Bytes/s Remote client byte read rate
    RemoteBytesWrittenMB Bytes/s Remote client byte write rate
    Client connections WritesFromRemoteClient - Remote client write QPS
    WritesFromLocalClient - Local client write QPS
    ReadsFromRemoteClient - Remote client read QPS
    ReadsFromLocalClient - Local client read QPS
    Block verification failure BlockVerificationFailures count/s Number of block verification failures
    Disk failure VolumeFailures count/s Number of disk failures
    Network error DatanodeNetworkErrors count/s Network error statistics
    Heartbeat latency HeartbeatsAvgTime ms Average heartbeat time
    Heartbeat QPS HeartbeatsNumOps count/s Heartbeat QPS
    Packet transfer RT SendDataPacketTransferNanosAvgTime ms Average time of sending packets
    Block operation ReadBlockOpNumOps count/s Block read OPS from DataNode
    WriteBlockOpNumOps count/s Block write OPS to DataNode
    BlockChecksumOpNumOps count/s Checksum OPS by DataNode
    CopyBlockOpNumOps count/s Block copying OPS
    ReplaceBlockOpNumOps count/s Block replacement OPS
    BlockReportsNumOps count/s Block reporting OPS
    IncrementalBlockReportsNumOps count/s Incremental block reporting OPS
    CacheReportsNumOps count/s Cache reporting OPS
    PacketAckRoundTripTimeNanosNumOps count/s Number of ACK round trips processed per second
    Fsync operation FsyncNanosNumOps count/s Number of fsync operations processed per second
    Flush operation FlushNanosNumOps count/s Number of flush operations processed per second
    Block operation latency statistics ReadBlockOpAvgTime ms Average block read time
    WriteBlockOpAvgTime ms Average block write time
    BlockChecksumOpAvgTime ms Average block check time
    CopyBlockOpAvgTime ms Average block copy time
    ReplaceBlockOpAvgTime ms Average block replacement time
    BlockReportsAvgTime ms Average block reporting time
    IncrementalBlockReportsAvgTime ms Average time of incremental block reporting
    CacheReportsAvgTime ms Average time of cache reporting
    PacketAckRoundTripTimeNanosAvgTime ms Average time of ACK round trip processing
    Flush latency FlushNanosAvgTime ms Average flush time
    Fsync latency FsyncNanosAvgTime ms Average fsync time
    RamDisk Blocks RamDiskBlocksWrite blocks/s Total number of blocks written to memory
    RamDiskBlocksWriteFallback blocks/s Total number of blocks failed to be written to memory (failover to disk)
    RamDiskBlocksDeletedBeforeLazyPersisted blocks/s Total number of blocks deleted before the application is saved to the disk
    RamDiskBlocksReadHits blocks/s Number of blocks read from memory
    RamDiskBlocksEvicted blocks/s Total number of blocks cleared in memory
    RamDiskBlocksEvictedWithoutRead blocks/s Total number of blocks retrieved from memory
    RamDiskBlocksLazyPersisted blocks/s Number of disk writes by lazy writer
    RamDiskBytesLazyPersisted Bytes/s Total number of bytes written to disk by lazy writer
    RamDisk write speed RamDiskBytesWrite Bytes/s Total number of bytes written to memory
    JVM memory MemNonHeapUsedM MB Size of NonHeapMemory currently used by JVM
    MemNonHeapCommittedM MB Size of NonHeapCommittedM configured by JVM
    MemHeapUsedM MB Size of HeapMemory currently used by JVM
    MemHeapCommittedM MB Committed size of JVM HeapMemory
    MemHeapMaxM MB Size of HeapMemory configured by JVM
    MemMaxM MB Maximum size of memory available to JVM runtime
    JVM threads ThreadsNew - Number of threads in NEW status
    ThreadsRunnable - Number of threads in RUNNABLE status
    ThreadsBlocked - Number of threads in BLOCKED status
    ThreadsWaiting - Number of threads in WAITING status
    ThreadsTimedWaiting - Number of threads in TIMED WAITING status
    ThreadsTerminated - Number of threads in Terminated status
    JVM logs LogFatal - Number of Fatal logs
    LogError - Number of Error logs
    LogWarn - Number of Warn logs
    LogInfo - Number of Info logs
    GC count YGC - Young GC count
    FGC - Full GC count
    GC time FGCT s Full GC time
    GCT s Garbage collection time
    YGCT s Young GC time
    Memory zone proportion S0 % Percentage of used Survivor 0 memory
    E % Percentage of used Eden memory
    CCS % Percentage of used compressed class space memory
    S1 % Percentage of used Survivor 1 memory
    O % Percentage of used Old memory
    M % Percentage of used Metaspace memory
    Data traffic ReceivedBytes Bytes/s Data receiving rate
    SentBytes Bytes/s Data sending rate
    QPS RpcQueueTimeNumOps count/s RPC call rate
    Request processing latency RpcQueueTimeAvgTime ms Average RPC latency
    RpcProcessingTimeAvgTime count/s Average RPC request processing time
    Authentication and authorization RpcAuthenticationFailures count/s Number of RPC authentication failures
    RpcAuthenticationSuccesses count/s Number of RPC authentication successes
    RpcAuthorizationFailures count/s Number of RPC authorization failures
    RpcAuthorizationSuccesses count/s Number of RPC authorization successes
    Current connections NumOpenConnections - Number of current connections
    Length of RPC processing queue CallQueueLength 1 Length of current RPC processing queue
    CPU time CurrentThreadSystemTime ms System time
    CurrentThreadUserTime ms User time
    Start time StartTime s Process start time
    Threads PeckThreadCount - Peak number of threads
    DaemonThreadCount - Number of backend threads
    Read/Write latency write ms Write time
    read ms Read time
    Packet transfer QPS DataPacketOps count/s Packet transfer QPS
    Blocks Related to disk information, such as `/data/qcloud/data/hdfs` - Number of blocks
    Used disk capacity Related to disk information, such as `/data/qcloud/data/hdfs` GB Used disk capacity
    Free disk capacity Related to disk information, such as `/data/qcloud/data/hdfs` GB Free disk capacity
    Reserved disk capacity Related to disk information, such as `/data/qcloud/data/hdfs` GB Reserved disk capacity
    ### HDFS - JournalNode
    Title Metric Unit Description
    JVM memory MemNonHeapUsedM MB Size of NonHeapMemory currently used by JVM
    MemNonHeapCommittedM MB Size of NonHeapCommittedM configured by JVM
    MemHeapUsedM MB Size of HeapMemory currently used by JVM
    MemHeapCommittedM MB Committed size of JVM HeapMemory
    MemHeapMaxM MB Size of HeapMemory configured by JVM
    MemMaxM MB Maximum size of memory available to JVM runtime
    JVM threads ThreadsNew - Number of threads in NEW status
    ThreadsRunnable - Number of threads in RUNNABLE status
    ThreadsBlocked - Number of threads in BLOCKED status
    ThreadsWaiting - Number of threads in WAITING status
    ThreadsTimedWaiting - Number of threads in TIMED WAITING status
    ThreadsTerminated - Number of threads in Terminated status
    JVM logs LogFatal - Number of FATAL-level logs
    LogError - Number of ERROR-level logs
    LogWarn - Number of WARN-level logs
    LogInfo - Number of INFO-level logs
    GC count YGC - Young GC count
    FGC - Full GC count
    GC time FGCT s Full GC time
    GCT s Garbage collection time
    YGCT s Young GC time
    Memory zone proportion S0 % Percentage of used Survivor 0 memory
    E % Percentage of used Eden memory
    CCS % Percentage of used compressed class space memory
    S1 % Percentage of used Survivor 1 memory
    O % Percentage of used Old memory
    M % Percentage of used Metaspace memory
    Data traffic ReceivedBytes Bytes/s Data receiving rate
    SentBytes Bytes/s Data sending rate
    Request processing latency RpcQueueTimeAvgTime ms Average RPC latency
    Authentication and authorization RpcAuthenticationFailures count/s Number of RPC authentication failures
    RpcAuthenticationSuccesses count/s Number of RPC authentication successes
    RpcAuthorizationFailures count/s Number of RPC authorization failures
    RpcAuthorizationSuccesses count/s Number of RPC authorization successes
    Current connections NumOpenConnections - Number of current connections
    Length of RPC processing queue CallQueueLength 1 Length of current RPC processing queue
    CPU time CurrentThreadSystemTime ms System time
    CurrentThreadUserTime ms User time
    Start time StartTime s Process start time
    Threads PeckThreadCount - Peak number of threads
    DaemonThreadCount - Number of backend threads

    HDFS - ZKFC

    Title Metric Unit Description
    GC count YGC - Young GC count
    FGC - Full GC count
    GC time FGCT s Full GC time
    GCT s Garbage collection time
    YGCT s Young GC time
    Memory zone proportion S0 % Percentage of used Survivor 0 memory
    E % Percentage of used Eden memory
    CCS % Percentage of used compressed class space memory
    S1 % Percentage of used Survivor 1 memory
    O % Percentage of used Old memory
    M % Percentage of used Metaspace memory
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support