Feature Overview
Tencent Cloud Distributed Cache audit feature is a complete system for recording and analyzing database operation logs. It collects, securely stores, and efficiently queries all commands executed by users in real time, comprehensively enhancing the security protection and observability of the database.
Full Behavior Logging: Real-time collection and recording of all operation instructions for the distributed cache database.
Meeting Compliance Requirements: Helping enterprises easily meet stringent compliance audit standards in industries such as finance, healthcare, and government affairs (e.g., classified cybersecurity protection compliance).
Problem Tracing: Provides O&M and security teams with historical operation query and security monitoring capabilities, enabling rapid risk localization and responsibility attribution.
Feature List
|
Enabling and disabling on demand | Enabling Database Audit | Supports one-click enabling of the audit feature for single or batch instances. After enabling, the system automatically collects and logs command execution details in the background. |
| Disable Database Audit | Supports disabling audit collection for single or batch instances at any time, with related costs ceasing to incur after it is disabled. Note: Disabling audit will simultaneously clear all historical audit logs for the instance, and the data cannot be restored. Proceed with caution. |
Custom Audit Rules | Adjust audit types on demand | Supports dynamically adjusting the audit scope based on security policies, with modifications becoming immediately effective: Write commands (default): Audits write operation commands such as SET, DEL, and HSET, targeting data modification activities. Read commands: Audits read operation commands such as GET, HGET, and LRANGE, tracking data access behavior. All commands: Audits all read/write operation commands to achieve full operation traceability. |
| Automatic Degradation Policy | Supports configuring the P99 latency threshold for triggering audit degradation (default: 500ms, range: 300-1000ms). Note: When the instance latency reaches this threshold, the system will automatically suspend log collection to prioritize the availability of core database operations. |
Audit Log Management | Set Log Retention Period | Supports customizable retention period (7-30 days, default: 7 days), automatically purged upon expiration. |
| Tiered Storage | Provides a tiered architecture combining high-frequency storage and low-frequency storage. The system automatically stores recently active logs in the high-frequency zone to ensure fast search speeds for daily troubleshooting, while automatically migrating historical logs to the low-frequency zone, balancing long-term compliance retention with storage cost control. |
| Complete Audit Information Record | Comprehensive recording of command execution context throughout the entire chain, including: execution time, executed command (with full content), user account, database name, client IP address, node ID, execution duration, and error messages. |
Conditions for Use
|
Instance Version | Redis version: version 4.0 or above. Valkey version: 8.0 or above. Proxy version: 5.8.25 or above. |
Instance Status | Instances must be in the "Running" state. Instances in states such as "Creating," "Upgrading," "Isolated," "Destroyed," etc., cannot enable or configure the audit feature. |
Region Availability | The region where the instance is located has enabled the audit feature. Currently, the audit feature covers all public cloud regions. During the gradual rollout period, newly added regions may be subject to allowlist restrictions. Please refer to the actual display in the console. |
Concepts
P99 latency
Definition: P99 latency (99th Percentile Latency) is a key metric for measuring system performance stability, indicating that 99% of all requests have response times not exceeding this value, while only 1% of requests have response times that exceed it.
Role in Auditing: P99 latency serves as the core trigger metric for the audit feature's degradation protection mechanism. When high instance workload causes P99 latency to exceed the configured threshold, the system automatically triggers the degradation policy. This temporarily suspends audit log collection and prioritizes allocating computational resources to business request processing, ensuring core business operations remain unaffected.
Degradation Policy
Definition: A degradation policy is an adaptive protection mechanism. When an instance's performance metric (P99 latency) exceeds the preset threshold, the system automatically suspends the collection and reporting of audit logs. It prioritizes allocating system resources to business request processing, ensuring business availability is not affected by the audit feature.
Normal State: The instance's P99 latency is within the normal range. All user commands that comply with audit rules are fully recorded in the audit logs.
Degradation Trigger: When P99 latency continuously exceeds the configured threshold (default: 500ms) for a certain time window, the system automatically enters degradation state and suspends audit data collection.
During Degradation: Commands executed in the degraded state will not be recorded in the audit logs, but this does not affect their normal execution.
Automatic Restoration: When P99 latency returns to normal levels and remains stable, the system automatically exits the degradation state and resumes normal collection of audit logs.
FAQs
Q1: Does enabling the audit feature affect performance?
There will be some impact, but it is controlled within an acceptable range through multiple optimization mechanisms. The audit feature adopts an asynchronous log collection mechanism, decoupling command execution from log reporting. Additionally, the system has a built-in degradation policy that automatically suspends auditing when P99 latency exceeds the configured threshold (default: 500ms), prioritizing business availability. Auditing automatically resumes after performance restoration. For *read commands* and *all commands* auditing, the larger log volume requires performance impact evaluation based on the actual read/write ratio of the business. We recommend conducting stress testing and verification in a test environment before the feature is enabled in production.
Q2: How long can audit logs be retained?
Audit logs support flexible retention period configuration for 7-30 days, with a default retention of 7 days. You can adjust the retention duration at any time in the console based on industry compliance requirements and cost budgets. The modification takes effect immediately for newly generated logs. Logs exceeding the retention period will be automatically purged from the system and cannot be restored after deletion.
Q3: What information is recorded in audit logs?
Audit logs will fully record information in the following dimensions: providing sufficient basis for troubleshooting and compliance audits.
Command Execution Time: The exact timestamp of command execution.
Client IP: The client IP address initiating the request, used to trace the source of operations.
User Account: The database account name used to run the command (displays as "default" if the default account is used).
Database Number: Database number (0-15), which identifies the specific database targeted by the command.
Command Type: The name of the executed command, such as SET, GET, HSET, DEL, LPUSH, and so on.
Node ID: The identifier of the specific node executing the command in a cluster architecture, which facilitates locating issues at the shard level.
Command Details: The complete command content, including Key names and parameters (sensitive data such as passwords and keys will be automatically masked).
Execution Duration: Command execution time in milliseconds, which can be used to identify slow queries.
Error Message: Description of the error cause when command execution fails.