tencent cloud

Content Delivery Network

Updates and Announcements
Release Notes
Announcements
User Tutorial
Product Introduction
Product Overview
Strengths
Use Cases
Term
Use Limits
CDN Performance Descriptions (Spot-check)
Purchase Guide
CDN Purchase Guide
ECDN Purchase Guide
Getting Started
Configuring CDN from Scratch
Adding Domain Names
CNAME Configuration
Domain Name Ownership Verification
FAQs about Domain Name Connection
Configuration Guide
Domain Management
Domain Name Configurations
Statistical Analysis
Purge and Prefetch
Log Management
EdgeOne
Service Query
Offline Cache
Permission Management
Permission Configuration
Console Permissions
Activate Real-time Logging as Sub-account/Collaborator
Use Cases
Accelerating Resources on COS with CDN
Practical Tutorial
Guide to Using the EdgeOne Tool for Migrating Content Delivery Network (CDN) Related Services
CDN - CVM
CDN - COS
Configuring CNAME via DNSPod
Regularly Storing CDN Logs
API Documentation
History
Introduction
API Category
Content Management APIs
Real-time Log APIs
Service Query APIs
Data Query APIs
Making API Requests
Log Query APIs
StopCdnDomain
Configuration Management APIs
Obsoleted APIs
Other APIs
Data Types
Error Codes
FAQ
Features
Billing
FAQs about Domain Name Connection
Cache Configuration FAQs
Purge and Prefetch
Statistical Analysis
FAQs about HTTPS
Connection
Errors
Troubleshooting Methods
Status Codes and Solutions
Node Cache Inconsistency
Slow Access Speed After CDN Activation
Low Traffic Hit Rate
404 Status Code
Page Display - CORS error
Resource Cache Failure
Service Level Agreement
Glossary

Statistical Description of Sampled Data

PDF
Focus Mode
Font Size
Last updated: 2026-01-14 17:00:49
The data analysis feature of CDN helps users analyze traffic patterns by deeply examining vast amounts of log data. To optimize user experience, sampling-based statistical techniques are introduced in data analysis, ensuring both accuracy and timeliness of queries even when processing large datasets.

What is sampling data statistics

In data analysis, sampling refers to selecting a representative subset from the entire dataset for analysis, in order to extract valuable information. For example, when conducting a social survey, researchers cannot survey every single person; therefore, they select a portion of the population as a representative sample, using the responses from this sample to reflect the tendencies of the entire population.

Which indicators will be sampled for statistics

The CDN utilizes dynamic sampling techniques to adapt to varying log data volumes from different users, ensuring the accuracy and efficiency of data analysis. For data analysis queries such as TOP URLs, TOP 100 client IPs, TOP 100 Referers, and TOP User Agents, sampling is used for statistical analysis when the domain's QPS reaches the following conditions:
QPS is in the range [10,000, 100,000), and the sampling rate is 10%
QPS is in the range [100,000, 1,000,000), and the sampling rate is 1%
QPS is in the range [1,000,000, +∞), and the sampling rate is 0.1%
The sampling strategy determines the QPS based on data at 5-minute intervals. If the QPS meets the above conditions, sampling is triggered; otherwise, no sampling occurs. An example is shown below:
If the domain's QPS (queries per second) reaches 10,000 in the 5-minute log data from 00:01 to 00:05, then 10% sampling is applied, meaning 10% of the log entries from the 5-minute sample are used for calculation.
If the domain's QPS reaches 100,000 in the 5-minute log data collected from 00:06 to 00:10, then 1% sampling is applied, meaning 1% of the log entries from the 5-minute sample are used for calculation.
If the domain's QPS is 5000 in the 5-minute log data collected from 00:11 to 00:15, then no sampling is applied, and the calculation is based on all request logs.
Note:
The CDN continuously optimizes and adjusts its sampling strategy based on the scale of platform log data and users' actual needs. If you have any questions about the data analysis query results, please feel free to contact us.

How to use full data statistics?

If your business needs require in-depth analysis of all log data, we recommend using the CDN's Real-time Logs feature. Real-time Logs can transfer detailed, complete log data to your designated log analysis system (such as Tencent Cloud CLS), allowing you to perform fine-grained data processing using the complete dataset. With Real-time Logs, you can ensure more accurate data analysis results in scenarios requiring higher data precision, thus providing more accurate data support for your business decisions.

Explanation of Data Representativeness

The CDN provides a unique identifier (Request ID) for each request log. The sampling system uses this unique identifier to perform sampling analysis on your data, ensuring the randomness of the sampling factor. Our tests show that when the features you need to analyze constitute a high percentage of the overall data, sampling analysis can provide you with fast and accurate results. However, we must also point out that when the features you need to analyze constitute a small percentage of the overall data, the results of the sampling analysis may be skewed due to the small sample size.
For example, you have a dataset with 10,000 log entries, containing three URL paths A, B, and C, with quantities distributed as 7000 (70%), 2900 (29%), and 100 (1%), respectively. In the ideal scenario, after 10% sampling, the sample sizes for URL paths A, B, and C would be 700, 290, and 10. However, because the sample size for URL C is too small, the accuracy of estimating the overall population based on the sample will be significantly reduced. In this case, the results of your drill-down analysis on URL C may not meet expectations.


Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback