Updates and Announcements

Release Notes

Announcements

User Tutorial

Product Introduction

Product Overview

Strengths

Use Cases

Term

Use Limits

CDN Performance Descriptions (Spot-check)

Purchase Guide

CDN Purchase Guide

ECDN Purchase Guide

Getting Started

Configuring CDN from Scratch

Adding Domain Names

CNAME Configuration

Domain Name Ownership Verification

FAQs about Domain Name Connection

Configuration Guide

Domain Management

Domain Name Configurations

Statistical Analysis

Purge and Prefetch

Log Management

EdgeOne

Service Query

Offline Cache

Permission Management

Permission Configuration

Console Permissions

Activate Real-time Logging as Sub-account/Collaborator

Use Cases

Accelerating Resources on COS with CDN

Practical Tutorial

Guide to Using the EdgeOne Tool for Migrating Content Delivery Network (CDN) Related Services

CDN - CVM

CDN - COS

Configuring CNAME via DNSPod

Regularly Storing CDN Logs

API Documentation

History

Introduction

API Category

Content Management APIs

Real-time Log APIs

Service Query APIs

Data Query APIs

Making API Requests

Log Query APIs

StopCdnDomain

Configuration Management APIs

Obsoleted APIs

Other APIs

Data Types

Error Codes

FAQ

Features

Billing

FAQs about Domain Name Connection

Cache Configuration FAQs

Purge and Prefetch

Statistical Analysis

FAQs about HTTPS

Connection

Errors

Troubleshooting Methods

Status Codes and Solutions

Node Cache Inconsistency

Slow Access Speed After CDN Activation

Low Traffic Hit Rate

404 Status Code

Page Display - CORS error

Resource Cache Failure

Service Level Agreement

Glossary

Statistical Description of Sampled Data

PDF

Focus Mode

Font Size

Last updated: 2026-01-14 17:00:49

The data analysis feature of CDN helps users analyze traffic patterns by deeply examining vast amounts of log data. To optimize user experience, sampling-based statistical techniques are introduced in data analysis, ensuring both accuracy and timeliness of queries even when processing large datasets.
What is sampling data statistics
In data analysis, sampling refers to selecting a representative subset from the entire dataset for analysis, in order to extract valuable information. For example, when conducting a social survey, researchers cannot survey every single person; therefore, they select a portion of the population as a representative sample, using the responses from this sample to reflect the tendencies of the entire population.
Which indicators will be sampled for statistics
The CDN utilizes dynamic sampling techniques to adapt to varying log data volumes from different users, ensuring the accuracy and efficiency of data analysis. For data analysis queries such as TOP URLs, TOP 100 client IPs, TOP 100 Referers, and TOP User Agents, sampling is used for statistical analysis when the domain's QPS reaches the following conditions:
QPS is in the range [10,000, 100,000), and the sampling rate is 10%
QPS is in the range [100,000, 1,000,000), and the sampling rate is 1%
QPS is in the range [1,000,000, +∞), and the sampling rate is 0.1%
The sampling strategy determines the QPS based on data at 5-minute intervals. If the QPS meets the above conditions, sampling is triggered; otherwise, no sampling occurs. An example is shown below:
If the domain's QPS (queries per second) reaches 10,000 in the 5-minute log data from 00:01 to 00:05, then 10% sampling is applied, meaning 10% of the log entries from the 5-minute sample are used for calculation.
If the domain's QPS reaches 100,000 in the 5-minute log data collected from 00:06 to 00:10, then 1% sampling is applied, meaning 1% of the log entries from the 5-minute sample are used for calculation.
If the domain's QPS is 5000 in the 5-minute log data collected from 00:11 to 00:15, then no sampling is applied, and the calculation is based on all request logs.
Note：
The CDN continuously optimizes and adjusts its sampling strategy based on the scale of platform log data and users' actual needs. If you have any questions about the data analysis query results, please feel free to contact us.
How to use full data statistics?
If your business needs require in-depth analysis of all log data, we recommend using the CDN's Real-time Logs feature. Real-time Logs can transfer detailed, complete log data to your designated log analysis system (such as Tencent Cloud CLS), allowing you to perform fine-grained data processing using the complete dataset. With Real-time Logs, you can ensure more accurate data analysis results in scenarios requiring higher data precision, thus providing more accurate data support for your business decisions.
Explanation of Data Representativeness
The CDN provides a unique identifier (Request ID) for each request log. The sampling system uses this unique identifier to perform sampling analysis on your data, ensuring the randomness of the sampling factor. Our tests show that when the features you need to analyze constitute a high percentage of the overall data, sampling analysis can provide you with fast and accurate results. However, we must also point out that when the features you need to analyze constitute a small percentage of the overall data, the results of the sampling analysis may be skewed due to the small sample size.
For example, you have a dataset with 10,000 log entries, containing three URL paths A, B, and C, with quantities distributed as 7000 (70%), 2900 (29%), and 100 (1%), respectively. In the ideal scenario, after 10% sampling, the sample sizes for URL paths A, B, and C would be 700, 290, and 10.  However, because the sample size for URL C is too small, the accuracy of estimating the overall population based on the sample will be significantly reduced. In this case, the results of your drill-down analysis on URL C may not meet expectations.
﻿

Help and Support

Was this page helpful?

You can also Contact sales or Submit a Ticket for help.

Help us improve! Rate your documentation experience in 5 mins.

Feedback

tencent cloud

Content Delivery Network

Statistical Description of Sampled Data

What is sampling data statistics

Which indicators will be sampled for statistics

How to use full data statistics?

Explanation of Data Representativeness

Help and Support