Release Notes
Product Introduction
- Overview
- Features
- Product Strengths
- Scenarios
- Customer Cases
Purchase Guide
Getting Started
- Using TSA to Perform a Cloud Risk Assessment
- Using TSA to Execute a Chaos Experiment on CFG
Operation Guide
- Operation Guide to TSA-Cloud Architecture
- Operation Guide to TSA-Cloud Risk Assessment
- Operation Guide to TSA-Chaotic Fault Generator
- Operation Guide to TSA-Digital Assets
- Permission Management
API Documentation
- History
- Introduction
- API Category
- Making API Requests
- Other APIs
- Task APIs
- Cloud Architecture Console APIs
- Data Types
- Error Codes
FAQs
Related Protocol
Contact Us

Native Node Restart

다운로드

포커스 모드

폰트 크기

마지막 업데이트 시간: 2026-03-31 23:02:35

Background and Significance
In a Kubernetes cluster, nodes are the fundamental resources for running Pods. Abnormal restart or maintenance-induced restart of nodes may affect the stability and disaster recovery capabilities of cluster services. By simulating a native node restart, you can verify the following scenarios:
1. Cluster scheduling capability: whether the Pods can be scheduled to other nodes during the restart.
2. Service disaster recovery capability: whether the service can maintain continuity when the node is briefly unavailable.
3. Node recovery capability: whether the node can correctly rejoin the cluster after the restart.
TSA-Chaotic Fault Generator (TSA-CFG) provides the CFG feature for native node restart, helping you identify potential issues in cluster scheduling and disaster recovery and optimize recovery policies.
Experiment Steps
Step 1: Preparing an Experiment
1. Purchase a standard cluster instance: Ensure that a standard cluster has been deployed and deploy test services.
2. Create a container node: Add an instance and deploy test services. If there are container nodes available for experiments, directly create an experiment.
Step 2: Creating an Experiment
1. Log in to the Tencent Cloud Smart Advisor (TSA) console, choose Architecture Governance, select Governance Mode, and click CFG. (For details about how to create an experiment, see Using TSA to Execute a Chaos Experiment on CFG.)
2. Click Create Experiment, enter the basic information about the experiment, and click Next.
3. Choose Container > Standard Cluster Native Node from the Experiment Instance drop-down list, click Add via Search, and add an instance resource. Alternatively, click Add via Architecture Diagram, click a Tencent Kubernetes Engine (TKE) resource on the architecture diagram, select the required instance, and add it.
4. In the experiment action, click Add Immediately to add a fault action. Select the Node Restart fault action, and click Next.
5. Set action parameters and click OK. The shutdown mode is described as follows:
Shutdown Mode
Execution Method
Advantage
Disadvantage
Applicable Scenario
Soft Shutdown
Normal system shutdown process (timeout period: 5 minutes)
Process-friendly data security
More time is required.
Database servers and critical business systems
Priority Soft Shutdown
Soft shutdown (5 minutes) + Hard shutdown after timeout
Balance between data security and time control
Timeout may cause data loss.
Automated Ops and batch tasks with SLAs
Hard Shutdown
Direct power-off or forced virtual machine stop
Quick execution
Data loss or system damage may be caused.
Emergency recovery scenario where the system is unresponsive in a test environment
6. After completing the parameter configuration, set Execution Mode and Guardrail Policy, and add metrics for Observability Metrics in the Global Configuration section. After the configuration is complete, click Submit to complete the experiment creation.
Step 3: Executing the Experiment
1. Log in to the TKE console and select Cluster in the left sidebar.
2. Click the cluster name to go to the cluster details page.
3. Select Node Management in the left sidebar. On the Nodes tab, click the node name to view the node status before fault execution on the node details page.
4. Log in to the TSA console, select CFG, go to the experiment details panel, and click Execute in the fault action group or Start Experiment in the lower part of the panel.
5. Click the action card to view the action execution details.
6. View the execution log and confirm that the execution is successful.
7. View the node status after fault execution. You can see that the node status is abnormal, which indicates that the fault injection is successful.
8. This fault action does not involve recovery. Wait for the native node to restart and then observe the cluster node status.
﻿

도움말 및 지원

문제 해결에 도움이 되었나요?

더 자세한 내용은 문의하기 또는 티겟 제출 을 통해 문의할 수 있습니다.

피드백

Shutdown Mode	Execution Method	Advantage	Disadvantage	Applicable Scenario
Soft Shutdown	Normal system shutdown process (timeout period: 5 minutes)	Process-friendly data security	More time is required.	Database servers and critical business systems
Priority Soft Shutdown	Soft shutdown (5 minutes) + Hard shutdown after timeout	Balance between data security and time control	Timeout may cause data loss.	Automated Ops and batch tasks with SLAs
Hard Shutdown	Direct power-off or forced virtual machine stop	Quick execution	Data loss or system damage may be caused.	Emergency recovery scenario where the system is unresponsive in a test environment

tencent cloud

Tencent Cloud Smart Advisor

Native Node Restart

Background and Significance

Experiment Steps

Step 1: Preparing an Experiment

Step 2: Creating an Experiment

Step 3: Executing the Experiment

도움말 및 지원