Release Notes
Product Introduction
- Overview
- Features
- Product Strengths
- Scenarios
- Customer Cases
Purchase Guide
Getting Started
- Using TSA to Perform a Cloud Risk Assessment
- Using TSA to Execute a Chaos Experiment on CFG
Operation Guide
- Operation Guide to TSA-Cloud Architecture
- Operation Guide to TSA-Cloud Risk Assessment
- Operation Guide to TSA-Chaotic Fault Generator
- Operation Guide to TSA-Digital Assets
- Permission Management
API Documentation
- History
- Introduction
- API Category
- Making API Requests
- Other APIs
- Task APIs
- Cloud Architecture Console APIs
- Data Types
- Error Codes
FAQs
Related Protocol
Contact Us

Node drain

Download

포커스 모드

폰트 크기

마지막 업데이트 시간: 2026-03-31 23:02:35

Background and Significance
In a Kubernetes environment, nodes serve as the critical infrastructure for running Pods. When a node requires maintenance or upgrade or encounters a fault, the node drain operation is typically used to evict Pods from the node to other nodes, ensuring service continuity and high availability. TSA-Chaotic Fault Generator (TSA-CFG) provides the CFG feature for node drain, helping users verify the following capabilities:
1. Whether the cluster scheduler can automatically reschedule Pods.
2. Whether the service can maintain business continuity when the node is unavailable.
3. Enhancing the disaster recovery capabilities and resilience of the system under extreme conditions.
By simulating the node drain operation, users can identify potential scheduling issues and optimize disaster recovery policies.
Experiment Steps
Step 1: Preparing an Experiment
1. Purchase a standard cluster instance: Ensure that a Kubernetes standard cluster has been deployed.
2. Deploy test services: Deploy at least 1 test service on the node to observe the impact of node operations.
Step 2: Configuring Experiment Resources
Create container nodes: Create a new node and add it to the cluster. Deploy test services (such as Nginx or simple Pod services).
Use existing nodes: If the cluster already has running native nodes, you can directly use the existing nodes to conduct experiments.
Step 3: Creating an Experiment
1. Log in to the Tencent Cloud Smart Advisor (TSA) console, choose Architecture Governance, select Governance Mode, and click CFG. (For details about how to create an experiment, see Using TSA to Execute a Chaos Experiment on CFG.)
2. Click Create Experiment, enter the basic information about the experiment, and click Next.
3. Choose Container > Standard Cluster Ordinary Node or Container > Standard Cluster Native Node from the Experiment Instance drop-down list, click Add via Search, and add an instance resource. Alternatively, click Add via Architecture Diagram, click a Tencent Kubernetes Engine (TKE) resource on the architecture diagram, select the required instance, and add it.
4. In the experiment action, click Add Immediately to add a fault action. Select the Node Drain fault action, and click Next.
5. Set action parameters, and click OK.
Pod Eviction Timeout (s): specifies the timeout period for Pod eviction. If Pods are not evicted within the specified time, the action will fail to be executed.
Delete Pods with Local Storage: equivalent to --delete-local-data. If this parameter is set to Yes, Pods using emptyDir will be evicted, and there is a risk of local data being deleted.
6. After completing the parameter configuration, set Execution Mode and Guardrail Policy, and add metrics for Observability Metrics in the Global Configuration section. After the configuration is complete, click Submit to complete the experiment creation.
Step 4: Executing the Experiment
1. Log in to the TKE console and select Cluster in the left sidebar.
2. Click the cluster name to go to the cluster details page.
3. In the Node Management module, view the node status before fault execution.
Node health check: Before the experiment begins, ensure that the target node is in a normal running status.
Workload check: Check whether the Pods on the node are running normally.
4. Log in to the TSA console, select CFG, go to the experiment details panel, and click Execute in the fault action group or Start Experiment in the lower part of the panel.
5. Click the action card to view the action execution details.
6. View the execution log and confirm that the execution is successful. Verify whether the node status has changed to unschedulable and whether the Pods on the node have been rescheduled to other available nodes.
Step 5: Verifying the Experiment Effect
1. Node status: In the TKE console, check whether the node status has changed to Cordoned on the Node Management page.
2. Pod scheduling status: Check whether all Pods on the node have been successfully migrated to other nodes and remain running normally.
3. Service availability: Verify whether the service can continue to function properly when the node is unavailable.
Step 6: Performing the Recovery Operation
1. Go to the experiment details panel.
2. Execute the fault recovery action, and confirm that the recovery action is successfully executed.
3. Check whether the node status is healthy after recovery.

도움말 및 지원

문제 해결에 도움이 되었나요?

더 자세한 내용은 문의하기 또는 티겟 제출 을 통해 문의할 수 있습니다.

피드백

tencent cloud

Tencent Cloud Smart Advisor

Node drain

Background and Significance

Experiment Steps

Step 1: Preparing an Experiment

Step 2: Configuring Experiment Resources

Step 3: Creating an Experiment

Step 4: Executing the Experiment

Step 5: Verifying the Experiment Effect

Step 6: Performing the Recovery Operation

도움말 및 지원