Scenarios
Tencent Cloud Distributed Cache provides automatic failover capability, ensuring service availability. Automatic failover includes failover of Proxy nodes and failover of data storage nodes. You can experience the failure simulation feature in the console. The system implements failure simulation by sending the shutdown command to all master nodes, triggering the automatic HA (High Availability) logic.
Proxy Node Failover
Tencent Cloud Distributed Cache Both standard architecture and cluster architecture have Proxy nodes. The standard architecture has 3 Proxy nodes, while the number of Proxy nodes in the cluster architecture grows linearly with the number of shards. The high availability design of Proxy nodes is as follows:
Multiple Proxy nodes ensure high availability and load balancing of Proxy services.
Proxy nodes are deployed on 3 physical devices to ensure high availability.
Once a Proxy node fails, the test system detects that the node is unavailable and automatically adds a new node.
Redis Server Failover
Tencent Cloud Distributed Cache Both standard architecture and cluster architecture adopt the native cluster management mechanism of Redis Cluster, which relies on the Gossip protocol among nodes within the cluster to determine node status. The timeliness of node failure detection depends on the cluster-node-timeout parameter, with a default value of 15000ms. It is recommended not to modify this parameter. For node failure detection, see Redis Cluster Native Design. Usage Instructions
Only instances in the Running status support failure simulation.
Only instances in multi-AZ deployment mode support failure simulation. Instances deployed in the same AZ do not support this operation.
Must-Knows
Failure simulation will cause the Tencent Cloud Distributed Cache service to be unavailable for a certain period. Typically, restoration takes less than 1 minute. Data loss may occur if you are writing data during this process. Please proceed with caution.
Service unavailability caused by failure simulation is not involved in the scope of the guaranteed Redis service SLA. Prerequisites
The database version is 4.0 or later.
The instance status is Running.
Operation Steps
2. At the upper part of the Instance List page on the right, select a region.
3. Find the target multi-AZ instance that requires failure simulation in the instance list.
4. Click the instance ID to enter the Instance Details page.
5. Click the Node Management tab on the Instance Details page, and click Simulate Failure in the drop-down list of More.
6. Confirm the instance name and ID in the pop-up dialog box of Simulate Failure, learn about the principle and notes of failure simulation, and click OK. The instance status will change to Processing.
7. Click Task Management in the left sidebar and wait for the task to complete. When the instance status changes to Running, the simulation is successful.
APIs
|
| |
| Simulates faults on a Proxy node. |