tencent cloud

Tencent Cloud Smart Advisor

Release Notes
Product Introduction
Overview
Features
Product Strengths
Scenarios
Customer Cases
Purchase Guide
Getting Started
Using TSA to Perform a Cloud Risk Assessment
Using TSA to Execute a Chaos Experiment on CFG
Operation Guide
Operation Guide to TSA-Cloud Architecture
Operation Guide to TSA-Cloud Risk Assessment
Operation Guide to TSA-Chaotic Fault Generator
Operation Guide to TSA-Digital Assets
Permission Management
API Documentation
History
Introduction
API Category
Making API Requests
Other APIs
Task APIs
Cloud Architecture Console APIs
Data Types
Error Codes
FAQs
FAQs: TSA
FAQs: TSA-Cloud Risk Assessment
FAQs: TSA-Cloud Architecture
FAQs: TSA-Chaotic Fault Generator
Related Protocol
Tencent Cloud Smart Advisor Service Level Agreement
PRIVACY POLICY MODULE CHAOTIC FAULT GENERATOR
DATA PRIVACY AND SECURITY AGREEMENT MODULE CHAOTIC FAULT GENERATOR
Contact Us
문서Tencent Cloud Smart AdvisorProduct IntroductionCustomer CasesTSA Helps Moomoo Conduct Disaster Recovery Experiments

TSA Helps Moomoo Conduct Disaster Recovery Experiments

PDF
포커스 모드
폰트 크기
마지막 업데이트 시간: 2026-04-01 18:16:24

Background

Futu Holdings Limited (Futu) is a leading digital fintech company dedicated to providing users with fully digital financial services across multiple markets, thereby enhancing investment experiences. As the business expands, the company faces challenges in complexity and stability. Cloud migration, distributed architectures, and rapid iterations have increased the probability of faults. To proactively identify system availability issues and deliver more stable services to customers, Moomoo, a brand of Futu, worked with the TSA-Chaotic Fault Generator (TSA-CFG) team to conduct GameDay disaster recovery experiment practices.

Business Challenges

1. Involving multiple resource object types and complex operations
Fault experiments across multiple cloud products are involved. If traditional manual Ops operations are adopted, multiple product teams need to be coordinated to implement the fault injection, resulting in high collaboration and communication costs and low experiment efficiency.
2. Involving large-scale automated instance operations
To simulate the real scenario of an availability zone (AZ) fault, fault injections and rollback operations need to be completed for hundreds of instances at a time, which is difficult.
3. Difficulties in real-time monitoring and observation
During the GameDay experiment activities, business teams need to monitor multiple cloud product metrics in real time to evaluate the fault experiment effectiveness and control risks. In actual experiments, issues such as decentralized cloud product monitoring dashboards, cumbersome interface switching between dashboards, and incomplete monitoring metric information often arise. These issues lead to poor observability efficiency and insufficient experiment capabilities.

Solutions

1. Rich cloud-based fault scenarios and flexible orchestration capabilities
TSA-CFG allows users to perform fault injections for over 20 types of object resources such as hosts, containers, databases, and Direct Connect (DC) resources of Tencent Cloud and provides nearly 100 fault simulation scenarios. Users can easily orchestrate fault action combinations on the platform, reducing team communication and Ops costs and greatly improving the GameDay experiment efficiency.
2. Concurrent fault injections and automated rollback operations for a large number of instances
TSA-CFG can be used to perform concurrent fault injections on multiple instances, enabling realistic and effective simulation of AZ-level fault scenarios. In addition, the system automates fault recovery, reducing the risk of manual intervention.
3. Comprehensive monitoring metric system
TSA-CFG integrates the monitoring metric systems of various basic cloud products, including Tencent Cloud Observability Platform (TCOP). Users can centrally view instance-level monitoring changes of cloud products, observe the fault injection effectiveness in real time, and verify the effectiveness of the alarm system.

Customer Benefits

1. Experiment efficiency improvement: Automated fault injection and recovery capabilities are used to reduce communication and Ops costs and improve the GameDay experiment efficiency.
2. Realistic scenario simulation: Concurrent fault injections on multiple instances effectively simulate AZ-level fault scenarios, helping users better address actual issues.
3. Fault effectiveness monitoring: The monitoring metric system of Tencent Cloud is integrated to help observe the fault injection effectiveness in real time, providing a basis for addressing actual risks.

도움말 및 지원

문제 해결에 도움이 되었나요?

피드백