tencent cloud

Tencent Cloud Smart Advisor

Release Notes
Product Introduction
Overview
Features
Product Strengths
Scenarios
Customer Cases
Purchase Guide
Getting Started
Using TSA to Perform a Cloud Risk Assessment
Using TSA to Execute a Chaos Experiment on CFG
Operation Guide
Operation Guide to TSA-Cloud Architecture
Operation Guide to TSA-Cloud Risk Assessment
Operation Guide to TSA-Chaotic Fault Generator
Operation Guide to TSA-Digital Assets
Permission Management
API Documentation
History
Introduction
API Category
Making API Requests
Other APIs
Task APIs
Cloud Architecture Console APIs
Data Types
Error Codes
FAQs
FAQs: TSA
FAQs: TSA-Cloud Risk Assessment
FAQs: TSA-Cloud Architecture
FAQs: TSA-Chaotic Fault Generator
Related Protocol
Tencent Cloud Smart Advisor Service Level Agreement
PRIVACY POLICY MODULE CHAOTIC FAULT GENERATOR
DATA PRIVACY AND SECURITY AGREEMENT MODULE CHAOTIC FAULT GENERATOR
Contact Us

Elasticsearch Service Node Down

PDF
포커스 모드
폰트 크기
마지막 업데이트 시간: 2025-03-24 15:23:00

Background

An Elasticsearch cluster comprises multiple nodes that work together to process client requests. In production environments, nodes may encounter abnormal issues due to hardware faults, network problems, or software defects. If a node encounters a fault, it can lead to a decrease in the overall cluster performance and even disrupt normal business operations. Therefore, the CFG provides node fault simulation.
Node fault simulation can help us understand how the Elasticsearch cluster performs under various fault scenarios. For example, by simulating node down, network partitions, disk damage, and other faults, you can observe the cluster's recovery process and assess risks such as data loss and inquiry delay. Continuous fault simulation helps identify and fix potential issues, optimize cluster configuration, and enhance cluster robustness. Additionally, node fault simulation can be used for training and experiments. By simulating real-world fault scenarios, team members can become familiar with fault troubleshooting processes and improve their ability to respond to faults. Meanwhile, fault simulation can also serve as a stress testing tool to verify the cluster's stability under high-load conditions.
Conducting node fault simulations for Elasticsearch is a crucial method for ensuring cluster stability and reliability. By simulating various fault scenarios, you can proactively discover and resolve issues, improve the cluster's fault tolerance and availability, and ensure the smooth operation of the business.

Experiment Preparation

Prepare an ES cluster instance for experiments.

Step 1: Create an experiment

2. In the left sidebar, select Experiment Management page, and click Create a New Experiment.
3. Click Skip and create a blank experiment.
4. After filling in the basic information, you can enter the experiment object configuration. Select Big Data as the resource type, and Elasticsearch Cluster as the resource object, then click Add Instance. After you click Add Instance, a list of all Elasticsearch cluster instances in the current region will appear. You can filter instances based on cluster name, cluster ID, or private IP address.
5. After selecting the target instance, click Add Now to add the ES Node down experiment action, then click Next.
6. Set action parameters. In this document, the Random Node Downtime is selected. Click Confirm.(Specific fault parameters can be selected based on the experiment's objectives.)
7. Click Next to go to Global Configuration. See Quick Start for Global Configuration.
8. After confirmation, click Submit.
9. After creating the experiment, click Experiment Details in the pop-up dialog box to enter the Experiment Details page.

Step 2: Execute the experiment

1. Observe the instance monitoring data before the experiment, focusing on the advanced monitoring metrics. You can go to ES console and click Elasticsearch Cluster > Cluster ID/Name > Node Monitoring to view.
2. On the Experiment Details page, click Execute to initiate the fault actions.
3. After the fault injection is successful, click the Fault Action panel to view the results and the executed nodes.



도움말 및 지원

문제 해결에 도움이 되었나요?

피드백