tencent cloud

Feedback

Cloud Monitor Event

Last updated: 2022-06-13 11:49:14

    Overview

    Cloud Monitor events are collected from Tencent Cloud service modules and underlying infrastructure services. They are then aggregated, analyzed, condensed, and ultimately presented. Information sources are system logs and monitoring items under each module, which ensure the accuracy and value of the information passed through to customers.

    Currently, Tencent Cloud's Cloud Monitor has been fully integrated into EventBridge. After you activate EventBridge, all Cloud Monitor events will be automatically delivered to the Tencent Cloud service event bus.

    Cloud Monitor Event Format

    Taking a "ping unreachable" event generated by CVM as an example, the standard format for delivering the event to EventBridge is as follows:

    {
       "specversion":"1.0",
       "id":"13a3f42d-7258-4ada-da6d-023a333b4662",
       "source":"${ProductName}.cloud.tencent",
       "type":"cvm:ErrorEvent:ping_unreachable",
       "subject":"${resource ID}",
       "time": 1615430559146,
       "region":"ap-guangzhou",
       "resource":[
           "qcs::eb:ap-guangzhou:uid1250000000:eventbusid/eventruleid"
       ],
       "datacontenttype":"application/json;charset=utf-8",
       "tags":{
           "key1":"value1",
           "key2":"value2"
        },
       "status":"1",
       "data":{
           "appId":"1250000011",
           "instanceId":"ins-sjdksjk",
           "projectId":"11",
           "dimensions":{
               "ip":"127.0.0.1"
               },
           "additionalMsg":{
               "IP":"something unnormal"
               }
       }
    }
    

    Cloud Monitor Event Source

    Based on event information sources, causes, characteristics and forms, Cloud Monitor events are divided into two categories:

    • Events generated by resource instances and products (such as CVM instances) that are purchased and used by customers in Tencent Cloud. These events are directly or indirectly triggered by customers during use. They belong to specific resource instances. Customers can control and manage them. Resource instances affected by and associated with events can be explicitly determined.
    • Events generated by the underlying platform infrastructure and services that support Tencent Cloud services, such as Virtual Machine Manager (VMM) that supports CVM at the virtualization layer and the underlying physical machines, networks, and storage modules. These events are generated or caused by the infrastructure and services of Tencent Cloud, and are not the result of customer behavior. They belong to services. Customers cannot control events, which instead can only be handled by Tencent Cloud. Services or product modules affected by and associated with events can be determined, but the affected and associated resource instances cannot always be determined.

    Event List

    The following lists Cloud Monitor events generated by underlying platform infrastructure and services.

    Event Type Event Cause Impact
    Problem CVM storage problem CVM infrastructure storage module The I/O performance of the CVM instance decreases, and data read/write exceptions occur
    Problem CVM network connection problem CVM infrastructure network The speed of the CVM instance network slows down, or the network is disconnected
    Problem CVM running exception CVM infrastructure The CVM instance bears a high load or crashes, causing service unavailability

    The following lists Cloud Monitor events generated by resource instances and products (such as CVM instances) that are purchased and used by customers in Tencent Cloud.

    CVM

    Event
    Event
    Name
    Event
    Type
    Dimension Recoverable

    Event Description Solution and Suggestion
    Kernel failure GuestCoreError Exception
    CVM instance No An OS kernel bug or driver issue causes a fatal error in the OS kernel 1. Check whether any kernel driver modules are loaded into the system other than those provided by the kernel. Try not to load these modules and observe the operating status of the system.
    2. View released bug reports of the kernel and OS, and try to upgrade the kernel.
    3. By default, kdump is enabled for CVM. When a panic occurs, system memory dump information will be generated in the /var/crash directory. You can analyze it with the crash tool
    OOM GuestOom Exception
    CVM instance No System memory usage is overloaded 1. Check whether the memory capacity configured in the current system meets business requirements. If additional capacity is required, we recommend upgrading the CVM memory configuration.
    2. View processes that are killed during OOM based on system logs such as dmesg and /var/log/messages to check whether the memory used by processes is as expected. Use tools such as valgrind to analyze whether memory leakage occurs
    Ping failure PingUnreachable Exception
    CVM instance Yes The network of the CVM instance is not pingable 1. Check whether the running status of the CVM instance is normal. If any exceptions occur (for example, the system crashes), force restart the CVM instance in the console to restore it.
    2. If the CVM instance is running normally, check the CVM network configuration, including the internal network service, firewall, and security group configuration
    Read-only disk DiskReadonly Exception
    CVM instance Yes Data cannot be written into the disk 1. Check whether the disk is full
    2. In Linux, run the `df -i` command to check whether inode is used up
    3. Check whether the file system is damaged
    Server restart GuestReboot Status
    change
    CVM instance No The CVM instance restarts This event is triggered when the CVM instance restarts. Check whether the status change is as expected
    Packet loss caused by over-limit outbound internet bandwidth PacketDroppedByQosWanOutBandwidth Exception
    CVM instance Yes The public network outbound bandwidth of the CVM instance exceeds the upper limit, causing packet loss. Packet loss caused by bandwidth glitches is not reflected in the bandwidth view because the minimum granularity for bandwidth statistics is 10 (total traffic in 10 seconds/10 seconds). If the constant bandwidth is not significantly exceeded, the event can be ignored Increase the upper limit of the public network bandwidth.
    If the maximum purchase limit is reached,
    you can reduce the bandwidth consumption of the server through load balancing and other means
    CVM nvme device error NvmeError Exception
    CVM instance No CVM NVMe disk failure 1. Isolate the read/write of the disk and unmount the corresponding directory
    2. Submit a ticket and wait for the technical personnel to replace the disk
    3. After the disk is replaced, format the new disk before use
    The instance has been restarted (host system error) GuestRestarted_HostFailure Status
    change
    CVM instance No The CVM host is abnormal. The instance has been troubleshooted and restarted Check whether the service is recovered. If so, you can ignore the event
    Planned instance restart (maintenance of host system) GuestScheduledToRes Status
    change
    CVM instance No The CVM host is abnormal and is being repaired If your business has the disaster recovery capability, perform a primary-secondary switch and authorize maintenance in time

    Cloud Load Balancer

    Event
    Event
    Name
    Event
    Type
    Dimension Recoverable
    Event Description Solution and Suggestion
    Blocked public IP VipBlockInfo Exception
    CLB instance Yes The CLB public IP under attack is blocked after an exception is detected by the security system Submit a ticket to query the specific blocking causes and unblocking solutions
    Server port status has an exception RsPortStatusChange Exception
    Real server port Yes An exception is found at the real server port of the public network CLB instance during health check View the service status of the real server port

    VPN Gateway

    Event
    Event
    Name
    Event
    Type
    Dimension Recoverable
    Description Solution and Suggestion
    Packet loss caused by over-limit outbound internet bandwidth PacketDroppedByQosWanOutBandwidth Exception
    VPN Gateway instance Yes The public network outbound bandwidth of a VPN gateway instance exceeds the upper limit, causing packet loss. Packet loss caused by bandwidth glitches is not reflected in the bandwidth view because the minimum granularity for bandwidth statistics is 10 (total traffic in 10 seconds/10 seconds). If the constant bandwidth is not significantly exceeded, the event can be ignored Increase the upper limit for public network bandwidth
    Packet loss caused by over-limit connections PacketDroppedByQosConnectionSession Exception
    VPN Gateway instance Yes The number of connections to the VPN Gateway instance exceeds the limit, causing packet loss Submit a ticket to contact us
    VPN tunnel disconnected VpnconnDisconnected Exception
    VPN Gateway instance Yes The number of connections to the VPN Gateway instance exceeds the limit, causing packet loss Submit a ticket to contact us

    Peering connection

    Event
    Event
    Name
    Event
    Type
    Dimension Recoverable
    Description Solution and Suggestion
    Packet loss caused by over-limit inbound bandwidth PacketDroppedByQosInBandwidth Exception
    Peering Connection instance Yes The public network inbound bandwidth of a peering connection instance exceeds the upper limit, causing packet loss. Packet loss caused by bandwidth glitches is not reflected in the bandwidth view because the minimum granularity for bandwidth statistics is 10 (total traffic in 10 seconds/10 seconds). If the constant bandwidth is not significantly exceeded, the event can be ignored Increase the upper limit for public network bandwidth
    Packet loss caused by over-limit outbound bandwidth PacketDroppedByQosOutBandwidth Exception
    Peering Connection instance Yes The public network inbound bandwidth of a peering connection instance exceeds the upper limit, causing packet loss. Packet loss caused by bandwidth glitches is not reflected in the bandwidth view because the minimum granularity for bandwidth statistics is 10 (total traffic in 10 seconds/10 seconds). If the constant bandwidth is not significantly exceeded, the event can be ignored Increase the upper limit for public network bandwidth

    NAT Gateway

    Event
    Event
    Name
    Event
    Type
    Dimension Recoverable
    Description Solution and Suggestion
    Packet loss caused by over-limit connections PacketDroppedByConnLimit Exception
    NAT Gateway instance Yes There are too many connections to the NAT gateway instance. The maximum number of connections from one EIP to the same destination service is 55,000. If the limit is exceeded, packet loss occurs Submit a ticket to contact us
    Packet loss caused by over-limit outbound bandwidth PacketDroppedByBandwidthLimit Exception
    NAT Gateway instance Yes The public network outbound bandwidth of a NAT gateway instance exceeds the upper limit, causing packet loss. Packet loss caused by bandwidth glitches is not reflected in the bandwidth view because the minimum granularity for bandwidth statistics is 10 (total traffic in 10 seconds/10 seconds). If the constant bandwidth is not significantly exceeded, the event can be ignored Increase the upper limit for public network bandwidth

    TKE

    Event
    Event
    Name
    Event
    Type
    Dimension Recoverable
    Event Description Solution and Suggestion
    Host node OOM oom Exception
    Service dimension Yes OOM occurs on the host node due to high memory utilization Check the causes of OOM on the host node by querying monitoring data, syslog, demsg, and more

    TencentDB for MySQL

    Event
    Name
    Event
    Parameter
    Event
    Type
    Dimension Recoverable
    Event Description Solution and Suggestion
    OOM OutOfMemory Exception
    TencentDB for MySQL instance Yes Database memory usage is overloaded Check whether the memory capacity configured in the database meets business requirements. If additional capacity is required, we recommend upgrading the MySQL memory configuration
    Primary-secondary switch PrimarySwitch Exception
    TencentDB for MySQL instance No A primary-secondary switch occurs This event can be triggered when a physical machine fails. Check whether the instance status is normal
    Read-only instance removal RORemoval Exception
    TencentDB for MySQL instance Yes A read-only instance fails or exceeds the latency threshold If the read-only group contains only one instance, switch the read traffic after the read-only instance is removed to avoid a single point of failure. We recommend purchasing at least two read-only instances for the group
    Instance migration caused by server failure ServerfailureInstanceMigration Exception
    TencentDB for MySQL instance No Server failure results in instance migration The migration time is subject to the maintenance window. Change the migration time promptly if needed. The new migration time will be subject to the new maintenance window
    Auditing function is not enabled Auditclose Exception
    TencentDB for MySQL instance No This event has been disused This event has been disused
    Instance replication status InstRepStatus Exception
    TencentDB for MySQL instance Yes The master-replica synchronization between the read-only instance and master instance is abnormal. You need to configure the read-only instance This exception can be caused by the size of the read-only instance or by large transactions in the master instance. You can increase the read-only instance configuration or reduce large transactions as needed
    The database agent mount node is removed ProxyNodeRemoval Exception
    TencentDB for MySQL instance No Read-only nodes that satisfy the minimum number of reserved read-only nodes and whose removal time is delayed are removed due to an excessive delay, connection failure, I/O thread exception, or SQL thread exception If the database proxy has only one read-only instance, you are advised to configure at least two read-only instances for the database proxy to avoid single points of failure caused by read-only instances
    Database agent exception ProxyNotAvailable Exception
    TencentDB for MySQL instance Yes The proxy node is faulty and cannot provide the proxy service. If the database proxy is abnormal, the database proxy VIP cannot be used to access the database instance. Ensure that the database proxy failover capability is enabled

    TencentDB for Redis

    Event
    Name
    Event
    Parameter
    Event
    Type
    Dimension Recoverable
    Event Description Solution and Suggestion
    Primary-secondary switch MasterSlaveSwitched Status
    change
    TencentDB for Redis instance No A TencentDB for Redis switch occurs This event will cause Redis service disconnection and brief unavailability. Check whether your business has an automatic reconnection mechanism to ensure fast business recovery
    Service unavailable ServiceNotAvailable Exception
    TencentDB for Redis instance Yes A TencentDB for Redis fault occurs and the service is unavailable We will recover the service as soon as possible and send a service recovery notice when the service is recovered. If you have a disaster recovery instance, try to switch your business over to it
    Read replica failover ReadonlyReplicaSwitched Status
    change
    TencentDB for Redis instance No A TencentDB for Redis read-only replica switch occurs We will recover the service as soon as possible and send a service recovery notice when the service is recovered. If you have a disaster recovery instance, try to switch your business over to it or add a read-only replica
    The read-only replica is unavailable ReadonlyReplicaNotAvailable Exception
    TencentDB for Redis instance Yes A TencentDB for Redis read-only replica fault occurs We will recover the service as soon as possible and send a service recovery notice when the service is recovered. If you have a disaster recovery instance, try to switch your business over to it or add a read-only replica

    TencentDB for MongoDB

    Event
    Name
    Event
    Parameter
    Event
    Type
    Dimension Recoverable
    Event Description Solution and Suggestion
    Insufficient oplog backup oplogInsufficient Exception
    TencentDB for MongoDB instance No When TencentDB for MongoDB backs up data, it cannot read the full oplog from the last backup to the current backup. This affects database rollback to any time point in the last seven days We recommend adjusting the size or backup frequency of the TencentDB for MongoDB oplog in the MongoDB console. If you do not need this event notification, disable it on the backup page in the MongoDB console
    The number of connections exceeds the limit connectionOverlimit Exception
    TencentDB for MongoDB instance Yes The number of connections to the instance exceeds the limit Check whether the maximum number of connections configured for the TencentDB for MongoDB instance meets business requirements. If additional connections are required, we recommend upgrading the instance configuration
    Primary-secondary switch primarywitch Exception
    TencentDB for MongoDB instance Yes A primary-secondary switch occurs This event can be triggered when a physical machine fails. Check whether the instance status is normal
    The disk capacity has run out instanceOutOfDisk Exception
    TencentDB for MongoDB instance Yes The disk capacity is full and the instance becomes read-only Clean up the disk
    Instance rollback instanceRollback Exception
    TencentDB for MongoDB instance Yes Instance data rollback This event may be triggered if the primary node fails and a primary-secondary switch occurs when some data on the primary node has not been synced to the secondary node. Check whether the instance status is normal
    Node CPU exception NodeCPUAbnormal Exception
    TencentDB for MongoDB instance Yes When the CPU usage of any node in the cluster reaches 80%, this alarm is triggered A single alarm of this type only indicates that the instance has a high load on a single node. You can use other instance running statistics such as the number of connections and slow logs as well to evaluate the overall running status of the cluster. If necessary, upgrade the configuration of the TencentDB for MongoDB instance

    TencentDB for PostgreSQL

    Event
    Name
    Event
    Parameter
    Event
    Type
    Dimension Recoverable
    Event Description Solution and Suggestion
    HA switch HASwitch Exception
    TencentDB for PostgreSQL instance Yes A TencentDB for PostgreSQL HA switch occurs Submit a ticket to contact us

    Direct Connect (connection, dedicated tunnel)

    Event
    Event
    Name
    Event
    Type
    Dimension Recoverable
    Event Description Solution and Suggestion
    Connection downtime DirectConnecDown Exception
    Connection Yes The physical link of the connection is interrupted or has an exception 1. Check whether the physical link has an exception or is interrupted (for example, the fiber cable is cut off, or the line is unplugged)
    2. Check whether the receiving port and optical/electrical modules are normal
    3. Check whether the network device port is disabled
    Dedicated tunnel downtime DirectConnectTunnelDown Exception
    Dedicated tunnel Yes The physical link of the connection is interrupted or has an exception 1. Check whether the physical link has an exception or is interrupted (for example, the fiber cable is cut off, or the line is unplugged)
    2. Check whether the receiving port and optical/electrical modules are normal
    3. Check whether the network device port is disabled
    Dedicated tunnel BGP session downtime DirectConnectTunnelBGPSessionDown Exception
    Dedicated tunnel Yes The dedicated tunnel BGP session is interrupted 1. Check whether the BGP process of the network device is normal
    2. Check whether the dedicated tunnel is normal
    3. Check whether the physical line is normal
    Alarm for exceeded number of BGP tunnel routes DirectConnectTunnelRouteTableOverload Exception
    Dedicated tunnel No The number of BGP session routes in a dedicated tunnel exceeds the threshold by 80% Check whether routes published by the BGP session of the dedicated tunnel have reached 80% of the threshold, which is 100 by default. For more information, see Use Limits.
    Dedicated channel BFD detection interrupted DirectConnectTunnelBFDDown Exception
    Dedicated tunnel Yes The dedicated tunnel BFD is interrupted 1. Check whether the dedicated tunnel is normal
    2. Check whether the physical line is normal

    Anti-DDoS

    Event
    Name
    Event
    Parameter
    Event
    Type
    Dimension Recoverable
    Event Description Solution and Suggestion
    DDoS abnormal traffic DDoSAbnormalFlow Exception
    Anti-DDoS instance No - Submit a ticket to contact us
    DDoS Attack DDoSAlaram Exception
    Anti-DDoS instance No - Submit a ticket to contact us
    Block DDoSBlock Exception
    Anti-DDoS instance No - Submit a ticket to contact us
    CC attack CCAlaram Exception
    Anti-DDoS instance No - Submit a ticket to contact us

    Database backup service

    Event
    Name
    Event
    Parameter
    Event
    Type
    Dimension Recoverable
    Event Description Solution and Suggestion
    Full backup error FullBackFail Exception
    Back task instance No This event alarm will be triggered if a full backup task is interrupted or abnormal Submit a ticket to contact us
    Incremental backup error IncrementBackFail Exception
    Back task instance No This event alarm will be triggered if an incremental backup task is interrupted or abnormal Submit a ticket to contact us
    Data recovery error RestoreFail Exception
    Back task instance No This event alarm will be triggered if a data recovery task is interrupted or abnormal Submit a ticket to contact us
    Data recovered successfully RestoreSuccess Exception
    Back task instance No This event notification will be triggered if a data recovery task is successfully completed Submit a ticket to contact us

    Tencent Cloud Service Engine

    Event
    Name
    Event
    Parameter
    Event
    Type
    Dimension Recoverable
    Event Description Solution and Suggestion
    Service instance error InstanceTurnUnHealth Exception
    Instance No - Submit a ticket to contact us
    Service instance isolated InstanceOpenIsolate Exception
    Instance No - Submit a ticket to contact us
    Abnormal instance recovered InstanceTurnHealth Exception
    Instance No - Submit a ticket to contact us
    Service instance released from isolation InstanceCloseIsolate Exception
    Instance No - Submit a ticket to contact us
    The service instance goes offline InstanceOffline Exception
    Instance No - Submit a ticket to contact us

    Stream Compute Service

    Event
    Name
    Event
    Parameter
    Event
    Type
    Dimension Recoverable
    Event Description Solution and Suggestion
    TaskManager is under a high back pressure OceanusBackpressureHigh Exception
    Task instance No - Submit a ticket to contact us
    TaskManager is under a severely high back pressure OceanusBackpressureTooHigh Exception
    Instance No - Submit a ticket to contact us
    TaskManager CPU workload is too high OceanusTaskmanagerLoadTooHigh Exception
    Instance No - Submit a ticket to contact us
    TaskManager Pod exited abnormally OceanusTaskmanagerPodExitUnexpectedly Exception
    Instance No - Submit a ticket to contact us
    JobManager Pod exited abnormally OceanusJobmanagerPodExitUnexpectedly Exception
    Instance No - Submit a ticket to contact us
    TaskManager Full GC takes too long OceanusTaskmanagerFullGcTooLong Exception
    Instance No - Submit a ticket to contact us
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support