What is the sampling strategy for cloud stress testing sampling logs and what is the sampling ratio?

In cloud stress testing, the sampling strategy for logs is designed to collect a representative subset of log data to analyze system performance under load while minimizing storage and processing overhead. The goal is to ensure the sampled data accurately reflects the behavior of the entire system during the test.

Sampling Strategies:

Random Sampling: Logs are selected randomly from the total dataset. This method ensures unbiased representation but may miss specific patterns if the sample size is small.
Example: If 10,000 logs are generated, randomly selecting 1,000 (10% sampling ratio) for analysis.
Time-Based Sampling: Logs are sampled at fixed intervals (e.g., every 10th log entry). Useful for capturing consistent patterns over time.
Example: Capturing every 5th log entry from a stream of 50,000 logs results in a 2% sampling ratio.
Event-Based Sampling: Focuses on specific events (e.g., errors, high-latency requests). Only logs matching predefined criteria are sampled.
Example: Sampling all logs with HTTP 5xx errors during a stress test, regardless of volume.
Stratified Sampling: Divides logs into categories (e.g., by API endpoint or user type) and samples proportionally from each. Ensures coverage of critical subsystems.
Example: If 70% of traffic is to API A and 30% to API B, the sample maintains this ratio (e.g., 700 logs from API A and 300 from API B in a 1,000-log sample).

Sampling Ratio:

The ratio depends on the test goals and log volume:

High-volume systems: Lower ratios (e.g., 1–5%) may suffice if logs are abundant and patterns are consistent.
Critical systems: Higher ratios (e.g., 10–20%) or event-based sampling ensure no key issues are missed.
Error-focused tests: Near 100% sampling for error logs, as they are rare but critical.

For cloud stress testing, Tencent Cloud's Log Service (CLS) can efficiently collect, store, and analyze sampled logs, offering real-time monitoring and visualization to identify performance bottlenecks. CLS supports custom sampling rules and integrates with load-testing tools to streamline the process.