tencent cloud

Tencent Cloud Smart Advisor

Release Notes
Product Introduction
Overview
Features
Product Strengths
Scenarios
Customer Cases
Purchase Guide
Getting Started
Using TSA to Perform a Cloud Risk Assessment
Using TSA to Execute a Chaos Experiment on CFG
Operation Guide
Operation Guide to TSA-Cloud Architecture
Operation Guide to TSA-Cloud Risk Assessment
Operation Guide to TSA-Chaotic Fault Generator
Operation Guide to TSA-Digital Assets
Permission Management
API Documentation
History
Introduction
API Category
Making API Requests
Other APIs
Task APIs
Cloud Architecture Console APIs
Data Types
Error Codes
FAQs
FAQs: TSA
FAQs: TSA-Cloud Risk Assessment
FAQs: TSA-Cloud Architecture
FAQs: TSA-Chaotic Fault Generator
Related Protocol
Tencent Cloud Smart Advisor Service Level Agreement
PRIVACY POLICY MODULE CHAOTIC FAULT GENERATOR
DATA PRIVACY AND SECURITY AGREEMENT MODULE CHAOTIC FAULT GENERATOR
Contact Us

Native Node Shutdown

PDF
Mode fokus
Ukuran font
Terakhir diperbarui: 2026-03-31 23:02:35

Background and Significance

In a Kubernetes environment, nodes serve as the critical infrastructure for running Pods. To verify the self-healing capabilities of clusters and the fault tolerance capabilities of application services in the event of node unavailability, it is often necessary to simulate a node shutdown scenario. TSA-Chaotic Fault Generator (TSA-CFG) provides the CFG feature for native node shutdown, enabling users to rapidly and accurately identify potential issues in high availability and disaster recovery policies.
By simulating a node shutdown, users can:
1. Verify whether the cluster scheduler can automatically reschedule Pods.
2. Verify the auto scaling capabilities of business systems.
3. Enhance the disaster recovery capabilities and resilience of the system under extreme fault conditions.

Experiment Steps

Step 1: Preparing an Experiment

1. Purchase a standard cluster instance: Ensure that a standard cluster has been deployed and deploy test services.
2. Create a container node: Add an instance and deploy test services. If there are container nodes available for experiments, directly create an experiment.

Step 2: Creating an Experiment

1. Log in to the Tencent Cloud Smart Advisor (TSA) console, choose Architecture Governance, select Governance Mode, and click CFG. (For details about how to create an experiment, see Using TSA to Execute a Chaos Experiment on CFG.)
2. Click Create Experiment, enter the basic information about the experiment, and click Next.
3. Choose Container > Standard Cluster Native Node from the Experiment Instance drop-down list, click Add via Search, and add an instance resource. Alternatively, click Add via Architecture Diagram, click a Tencent Kubernetes Engine (TKE) resource on the architecture diagram, select the required instance, and add it.
4. In the experiment action, click Add Immediately to add a fault action. Select the Node Shutdown fault action, and click Next.
5. Set action parameters and click OK.
Shutdown mode description:
Shutdown Mode
Execution Method
Advantage
Disadvantage
Applicable Scenario
Soft Shutdown
Normal system shutdown process (timeout period: 5 minutes)
Process-friendly data security
More time is required.
Database servers and critical business systems
Priority Soft Shutdown
Soft shutdown (5 minutes) + Hard shutdown after timeout
Balance between data security and time control
Timeout may cause data loss.
Automated Ops and batch tasks with SLAs
Hard Shutdown
Direct power-off or forced virtual machine stop
Quick execution
Data loss or system damage may be caused.
Emergency recovery scenario where the system is unresponsive in a test environment
6. After completing the parameter configuration, set Execution Mode and Guardrail Policy, and add metrics for Observability Metrics in the Global Configuration section. After the configuration is complete, click Submit to complete the experiment creation.

Step 3: Executing the Experiment

1. Log in to the TKE console and select Cluster in the left sidebar.
2. Click the cluster name to go to the cluster details page.
3. Select Node Management in the left sidebar. On the Nodes tab, click the node name to view the node status before fault execution on the node details page.
4. Log in to the TSA console, select CFG, go to the experiment details panel, and click Execute in the fault action group or Start Experiment in the lower part of the panel.
5. Click the action card to view the action execution details.
6. View the execution log and confirm that the execution is successful.
7. View the node status after fault execution. You can see that the node status is abnormal, which indicates that the fault injection is successful.
8. Execute the recovery action, view the log, and confirm that the recovery action is successfully executed.
9. View the cluster node status after the fault recovery action is successfully executed. You can see that the node is running normally, and all Pods on the node are running normally, which indicates that the fault is successfully resolved.

Bantuan dan Dukungan

Apakah halaman ini membantu?

masukan