tencent cloud

Feedback

Exception Alarms

Last updated: 2022-09-01 18:34:46

    The exception alarm page displays the information overview of exception alarms (exceptions detected by "24/7 Exception Diagnosis") generated by database instances connected to DBbrain under your account.

    Note:

    Currently, exception alarm is supported only for TencentDB for MySQL (excluding basic single-node instances), TDSQL-C for MySQL, self-built MySQL, TencentDB for Redis, and TencentDB for MongoDB.

    Viewing an Exception Alarm

    Log in to the DBbrain console and select Monitoring and Alarming > Exception Alarm on the left sidebar. On the displayed page, select a database type and an instance at the top.

    • The charts of distribution of risks per risk level and exception distribution are displayed at the top. If there are multiple instances, you can filter them by instance ID in the exception distribution chart.

    • The exception alarm list at the bottom displays the basic information of the database instance, risk level, diagnosis items, duration, and operations. In the search bar, you can search for instances by instance ID, instance name, diagnosis item, etc. You can also filter instances by region and time.

      • Risk levels include note, alarm, serious, and critical. You can filter, aggregate, and search for alarms by field. You can also click Details to view specific information of the exception and corresponding optimization suggestions.
      • There are over 30 diagnosis items for exception diagnosis, such as slow SQL, primary-secondary switch, deadlock, uncommitted transaction, and OOM. You can filter, aggregate, and search for items by field. You can also sort them by duration.

    Ignoring/Unignoring an Alarm

    You can ignore or unignore exception alarms that are not generated by health inspections to better filter exception alarms.

    • In the exception alarm list, locate an alarm and click Ignore in the Operation column to ignore it. By doing so, other diagnosis item alarms of the instance generated by the same root cause will also be ignored.
    • The ignored exception alarm is grayed out. You can click Unignore to unignore it, and other diagnosis item alarms of the instance generated by the same root cause will also be unignored.

    Viewing an Alarm from a Database

    Option 1

    Log in to the TencentDB for MySQL console. If an exception diagnosis problem occurs on an instance when you are in the console, a window will pop up in the top-right corner of the console in real time to push the exception alarm message notification, which contains the database instance information such as instance ID, instance name, diagnosis item, and start time, so you can quickly and conveniently stay on top of the running status of the database instance.

    • Click View Exception Diagnosis Details in the message notification to view the specific diagnosis details and optimization suggestion for the instance.
    • If you check No alarm again today in the message notification, when an exception diagnosis problem occurs in a database instance under your account, no exception alarm messages will be pushed to you in a pop-up window.

    Option 2

    Log in to the TencentDB for MySQL console, select Instance List, Task List, Parameter Template, Recycle Bin or Placement Group on the left sidebar, and click Exception Alarm in the top-right corner to expand the list of historical exception alarm messages. The number of alarms generated in the instances under your account is displayed next to the button.

    In the unfolded list of historical exception alarm messages, you can view all pushed exception alarm messages, view them by region, and filter them by alarm level. You can also click a message to view the diagnosis details of the exception alarm event.

    Detailed Descriptions of Diagnosis Items

    A diagnosis item is an item diagnosed intelligently, which can be divided into four categories: performance, availability, reliability, and maintainability. Each diagnosis item belongs to only one category.

    Diagnosis Item Name Category Description
    Connectivity check Availability Unable to connect to the database
    Slow insertion, update, or deletion Performance There is a thread pending for a long time
    Slow SQL Performance There is a thread that is in the status of temp table creation, temp table replication, result sorting, etc.
    Row lock wait Performance There is a transaction with lock wait
    Uncommitted transaction Performance There is a thread in sleep status for a long time
    DDL statement metadata lock wait Performance There is a thread running DDL statements with metadata lock wait
    INSERT, UPDATE, and DELETE statement metadata lock wait Performance There is a thread running IUD statements with metadata lock wait
    SELECT statement metadata lock wait Performance There is a thread running SELECT statements with metadata lock wait
    Deadlock Reliability A deadlock is detected in the monitoring data, and the deadlock information exists in INNODB STATUS
    Read-only lock Performance There is a thread with global read-only lock wait
    SQL statement metadata lock wait Performance There is a thread running DDL statements with metadata lock wait
    Waiting for flush tables Performance There is a thread waiting for flush table
    High number of active sessions Performance The number of active sessions exceeds three times the CPU specification of the database instance
    High disk utilization Reliability The disk utilization is too high
    Memory utilization Reliability The memory utilization is too high
    High CPU utilization Performance The CPU utilization is too high
    Low hit rate of table open cache Performance The hit rate of the table open cache is low
    High-Risk account Maintainability There are anonymous or password-free accounts
    Big table Maintainability The size of a single table exceeds 10% of the instance disk specification
    I/O replication thread interruption Reliability A replication monitoring metric is abnormal and triggers diagnosis, and there is an I/O thread exception in SHOW SLAVE STATUS
    SQL replication thread interruption Reliability A replication monitoring metric is abnormal and triggers diagnosis, and there is a SQL thread exception in SHOW SLAVE STATUS
    Replication delay caused by DDL Reliability A replication monitoring metric is abnormal and triggers diagnosis, and there is a thread running DDL statements with metadata lock wait
    Replication delay caused by transaction Reliability A replication monitoring metric is abnormal and triggers diagnosis, and there is a thread in sleep status with metadata lock wait
    Replication delay caused by read-only lock Reliability A replication monitoring metric is abnormal and triggers diagnosis, and there is a thread with global read-only lock wait
    Primary-Secondary switch Availability The primary-secondary switch monitoring metric is abnormal
    Instance migration caused by server failure Availability The monitoring metric of instance migration is abnormal due to a server failure
    Read-only instance removal Availability The read-only instance removal monitoring metric is abnormal
    Disk limit exceeded Availability The disk limit monitoring metric is abnormal
    Memory limit exceeded Availability The memory limit monitoring metric is abnormal
    OOM Availability The database memory is overloaded
    Error command (ErrCmd) Maintainability There is a command execution error. The current number of errors is %d (only for Redis)
    High-risk command (RiskCmd) Maintainability The KEYS command is detected (only for Redis)
    Proxy load/inbound traffic/outbound traffic/balance conditions Maintainability The proxy load/inbound traffic/outbound traffic/balance conditions are abnormal (only for Redis)
    Note:

    • The diagnosis items related to source-replica replication are currently unavailable for self-built databases accessed through the agent.
    • The diagnosis items related to server resources and source-replica replication are currently unavailable for self-built databases accessed through direction connection.
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support