tencent cloud

Tencent Cloud Observability Platform

Release Notes and Announcements
Release Notes
Product Introduction
Overview
Strengths
Basic Features
Basic Concepts
Use Cases
Use Limits
Purchase Guide
Tencent Cloud Product Monitoring
Application Performance Management
Mobile App Performance Monitoring
Real User Monitoring
Cloud Automated Testing
Prometheus Monitoring
Grafana
EventBridge
PTS
Quick Start
Monitoring Overview
Instance Group
Tencent Cloud Product Monitoring
Application Performance Management
Real User Monitoring
Cloud Automated Testing
Performance Testing Service
Prometheus Getting Started
Grafana
Dashboard Creation
EventBridge
Alarm Service
Cloud Product Monitoring
Tencent Cloud Service Metrics
Operation Guide
CVM Agents
Cloud Product Monitoring Integration with Grafana
Troubleshooting
Practical Tutorial
Application Performance Management
Product Introduction
Access Guide
Operation Guide
Practical Tutorial
Parameter Information
FAQs
Mobile App Performance Monitoring
Overview
Operation Guide
Access Guide
Practical Tutorial
Tencent Cloud Real User Monitoring
Product Introduction
Operation Guide
Connection Guide
FAQs
Cloud Automated Testing
Product Introduction
Operation Guide
FAQs
Performance Testing Service
Overview
Operation Guide
Practice Tutorial
JavaScript API List
FAQs
Prometheus Monitoring
Product Introduction
Access Guide
Operation Guide
Practical Tutorial
Terraform
FAQs
Grafana
Product Introduction
Operation Guide
Guide on Grafana Common Features
FAQs
Dashboard
Overview
Operation Guide
Alarm Management
Console Operation Guide
Troubleshooting
FAQs
EventBridge
Product Introduction
Operation Guide
Practical Tutorial
FAQs
Report Management
FAQs
General
Alarm Service
Concepts
Monitoring Charts
CVM Agents
Dynamic Alarm Threshold
CM Connection to Grafana
Documentation Guide
Related Agreements
Application Performance Management Service Level Agreement
APM Privacy Policy
APM Data Processing And Security Agreement
RUM Service Level Agreement
Mobile Performance Monitoring Service Level Agreement
Cloud Automated Testing Service Level Agreement
Prometheus Service Level Agreement
TCMG Service Level Agreements
PTS Service Level Agreement
PTS Use Limits
Cloud Monitor Service Level Agreement
API Documentation
History
Introduction
API Category
Making API Requests
Monitoring Data Query APIs
Alarm APIs
Legacy Alert APIs
Notification Template APIs
TMP APIs
Grafana Service APIs
Event Center APIs
TencentCloud Managed Service for Prometheus APIs
Monitoring APIs
Data Types
Error Codes
Glossary

Tencent Cloud Enhanced OpenTelemetry Probe Self-Protection Mechanism

PDF
Focus Mode
Font Size
Last updated: 2025-10-13 19:10:51
Tencent Cloud Enhanced OpenTelemetry Java Agent implemented fuse and flow control two self-protection mechanisms. When application load is particularly high, the agent will perform downgrade processing on instrumentation and data reporting to reduce impact on normal business operations in extreme cases. When application load returns to normal water level, the agent will automatically restore instrumentation and data reporting.

Circuit Breaker Protection Mechanism

Fuse Triggering Logic

The probe determines whether to trigger circuit breaking based on the memory usage and CPU utilization of the application. The computation logic is as follows:
Memory usage = Used heap memory / Maximum heap memory
CPU utilization = Total CPU time used by JVM in the statistical period / (Statistical cycle span * Number of CPU cores)

Fuse Triggering Behavior

When the resource utilization rate exceeds the preset threshold, the probe will limit data reporting. The limitation is as follows:
Drop part of the link data: Randomly drop 50% of the link data (no longer supported in agent version 2.3-20241031 and later).
Completely disable data reporting: Disable link data reporting completely.

Default Fuse Threshold

Default fuse thresholds and fuse actions for different agent versions, see the following table:
Judgment Rule / Agent Version
Drop Partial Link Data
Completely Disable Data Reporting
Trigger threshold
Recovery threshold
Trigger threshold
Recovery threshold
Memory Utilization
Probe Version before 2.3-20241031
65%
60%
75%
70%
Probe Version between 2.3-20241031 and 2.11-20250704
-
-
90%
85%
Probe Version 2.11-20250825 and later
-
-
90%
CPU Utilization
Probe Version before 2.3-20241031
80%
75%
90%
85%
Probe Version between 2.3-20241031 and 2.11-20250704
-
-
Probe Version 2.11-20250825 and later
-
-
90%
CPU utilization as an example, the performance behavior of the fuse protection mechanism is as follows:
Probe Version before 2.3-20241031: When the application's CPU utilization reaches 80%, the probe will introduce a sampling mechanism and randomly drop 50% of link data. If the CPU utilization subsequently drops to 75% or below, the probe will restore normal data reporting. When the application's CPU utilization reaches 90%, the probe will completely disable data reporting.
Probe Version between 2.3-20241031 and 2.11-20250704: When the application's CPU utilization reaches 90%, the probe will completely disable data reporting. If the CPU utilization subsequently drops to 85% or below, the probe will restore normal data reporting.
Probe Version 2.11-20250825 and later: When the application's CPU utilization reaches 90%, the probe will completely disable data reporting. If the CPU utilization subsequently drops below 90%, the probe will restore normal data reporting.
Note:
Starting from version 2.3-20241031, the probe has optimized instrumentation and data reporting to enhance performance while no longer providing the protection mechanism of dropping partial link data.
Starting from version 2.11-20250825, the probe has simplified the fuse protection mechanism by removing the recovery threshold and supporting fuse threshold configuration in the console.
The new version of the agent can improve data reporting success rate and reduce performance overhead. It is recommended to upgrade the probe version as soon as possible.

Custom Fuse Threshold

Console Mode (Recommended)

Note:
Note: Console mode is suitable for Tencent Cloud enhanced OpenTelemetry probe version 2.11-20250825 and later.
Log in to the APM console, go to System Configuration > Application Configuration > Probe Configuration. You can dynamically modify the fuse protection threshold. If necessary, disable the memory-based or CPU utilization-based protection mechanism by setting the threshold to 100.

JVM Startup Parameters and Environment Variable Method

Note:
Console mode is suitable for Tencent Cloud enhanced OpenTelemetry probe version 2.3-20241031 and later.
2.11-20250825 and later agent versions no longer support the recovery threshold.
Customize the threshold through JVM startup parameters or environment variables. See the following table for related parameters.
Setting Method
Judgment Rule
Disable Data Reporting - Trigger Threshold
Disable Data Reporting - Recovery Threshold
JVM Startup Parameters
Memory Utilization
disable.reporting.on.memory.percentage
recover.reporting.on.memory.percentage
CPU Utilization
disable.reporting.on.cpu.percentage
recover.reporting.on.cpu.percentage
Environment Variable
Memory Utilization
DISABLE_REPORTING_ON_MEMORY_PERCENTAGE
RECOVER_REPORTING_ON_MEMORY_PERCENTAGE
CPU Utilization
DISABLE_REPORTING_ON_CPU_PERCENTAGE
RECOVER_REPORTING_ON_CPU_PERCENTAGE
Taking the memory usage-based trigger threshold as an example, if you want to set it to 95%, you can add -Ddisable.reporting.on.memory.percentage=95 in the JVM startup parameters. The complete Java launch command is as follows:
java -javaagent:/path/to/opentelemetry-javaagent.jar \\
-Dotel.resource.attributes=service.name=myService,token=myToken\\
-Dotel.exporter.otlp.endpoint=http://pl-demo.ap-guangzhou.apm.tencentcs.com:4317 \\
-Ddisable.reporting.on.memory.percentage=95
-jar SpringCloudApplication.jar
You can also configure it in the environmental variable: add the following content.
export DISABLE_REPORTING_ON_MEMORY_PERCENTAGE=95
To disable the fuse protection mechanism based on memory usage or CPU utilization, set the trigger threshold to 0 or 100. For example, add -Ddisable.reporting.on.memory.percentage=100 in the JVM startup parameters.

Traffic Control Protection Mechanism

The Traffic Control Protection Mechanism dynamically controls the rate of data acquisition and reporting to ensure the monitoring system's resource consumption is always within a controllable range. For the default persistence threshold, see the following table:
Configuration Name
Default Value
Description
Maximum Span creation count per second
5000
Spans exceeding the configuration quantity per second will not be created.
send queue length
10240
If the span data cannot be reported to the APM server promptly due to network issues, pending spans will accumulate in queue. Spans that the queue cannot hold will be discarded.
Maximum number of spans per batch
2048
This parameter determines the maximum number of Span data the probe reports to the APM server every batch.
You can customize the flow control persistence threshold through JVM startup parameters or environment variables. See the following table for the configuration method.
Setting Method
Rule
Parameter Configuration
JVM
Maximum Span creation count per second
max.span.per.second
send queue length
otel.bsp.max.queue.size
Maximum number of spans per batch
otel.bsp.max.export.batch.size
Environment Variable
Maximum Span creation count per second
MAX_SPAN_PER_SECOND
send queue length
OTEL_BSP_MAX_QUEUE_SIZE
Maximum number of spans per batch
OTEL_BSP_MAX_EXPORT_BATCH_SIZE
Note:
Custom stream persistence threshold may lead to data loss or add probe overhead. Keep default values as much as possible.



Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback