tencent cloud

Tencent Cloud Enhanced OpenTelemetry Probe Self-Protection Mechanism
Last updated:2025-10-13 19:10:51
Tencent Cloud Enhanced OpenTelemetry Probe Self-Protection Mechanism
Last updated: 2025-10-13 19:10:51
Tencent Cloud Enhanced OpenTelemetry Java Agent implemented fuse and flow control two self-protection mechanisms. When application load is particularly high, the agent will perform downgrade processing on instrumentation and data reporting to reduce impact on normal business operations in extreme cases. When application load returns to normal water level, the agent will automatically restore instrumentation and data reporting.

Circuit Breaker Protection Mechanism

Fuse Triggering Logic

The probe determines whether to trigger circuit breaking based on the memory usage and CPU utilization of the application. The computation logic is as follows:
Memory usage = Used heap memory / Maximum heap memory
CPU utilization = Total CPU time used by JVM in the statistical period / (Statistical cycle span * Number of CPU cores)

Fuse Triggering Behavior

When the resource utilization rate exceeds the preset threshold, the probe will limit data reporting. The limitation is as follows:
Drop part of the link data: Randomly drop 50% of the link data (no longer supported in agent version 2.3-20241031 and later).
Completely disable data reporting: Disable link data reporting completely.

Default Fuse Threshold

Default fuse thresholds and fuse actions for different agent versions, see the following table:
Judgment Rule / Agent Version
Drop Partial Link Data
Completely Disable Data Reporting
Trigger threshold
Recovery threshold
Trigger threshold
Recovery threshold
Memory Utilization
Probe Version before 2.3-20241031
65%
60%
75%
70%
Probe Version between 2.3-20241031 and 2.11-20250704
-
-
90%
85%
Probe Version 2.11-20250825 and later
-
-
90%
CPU Utilization
Probe Version before 2.3-20241031
80%
75%
90%
85%
Probe Version between 2.3-20241031 and 2.11-20250704
-
-
Probe Version 2.11-20250825 and later
-
-
90%
CPU utilization as an example, the performance behavior of the fuse protection mechanism is as follows:
Probe Version before 2.3-20241031: When the application's CPU utilization reaches 80%, the probe will introduce a sampling mechanism and randomly drop 50% of link data. If the CPU utilization subsequently drops to 75% or below, the probe will restore normal data reporting. When the application's CPU utilization reaches 90%, the probe will completely disable data reporting.
Probe Version between 2.3-20241031 and 2.11-20250704: When the application's CPU utilization reaches 90%, the probe will completely disable data reporting. If the CPU utilization subsequently drops to 85% or below, the probe will restore normal data reporting.
Probe Version 2.11-20250825 and later: When the application's CPU utilization reaches 90%, the probe will completely disable data reporting. If the CPU utilization subsequently drops below 90%, the probe will restore normal data reporting.
Note:
Starting from version 2.3-20241031, the probe has optimized instrumentation and data reporting to enhance performance while no longer providing the protection mechanism of dropping partial link data.
Starting from version 2.11-20250825, the probe has simplified the fuse protection mechanism by removing the recovery threshold and supporting fuse threshold configuration in the console.
The new version of the agent can improve data reporting success rate and reduce performance overhead. It is recommended to upgrade the probe version as soon as possible.

Custom Fuse Threshold

Console Mode (Recommended)

Note:
Note: Console mode is suitable for Tencent Cloud enhanced OpenTelemetry probe version 2.11-20250825 and later.
Log in to the APM console, go to System Configuration > Application Configuration > Probe Configuration. You can dynamically modify the fuse protection threshold. If necessary, disable the memory-based or CPU utilization-based protection mechanism by setting the threshold to 100.

JVM Startup Parameters and Environment Variable Method

Note:
Console mode is suitable for Tencent Cloud enhanced OpenTelemetry probe version 2.3-20241031 and later.
2.11-20250825 and later agent versions no longer support the recovery threshold.
Customize the threshold through JVM startup parameters or environment variables. See the following table for related parameters.
Setting Method
Judgment Rule
Disable Data Reporting - Trigger Threshold
Disable Data Reporting - Recovery Threshold
JVM Startup Parameters
Memory Utilization
disable.reporting.on.memory.percentage
recover.reporting.on.memory.percentage
CPU Utilization
disable.reporting.on.cpu.percentage
recover.reporting.on.cpu.percentage
Environment Variable
Memory Utilization
DISABLE_REPORTING_ON_MEMORY_PERCENTAGE
RECOVER_REPORTING_ON_MEMORY_PERCENTAGE
CPU Utilization
DISABLE_REPORTING_ON_CPU_PERCENTAGE
RECOVER_REPORTING_ON_CPU_PERCENTAGE
Taking the memory usage-based trigger threshold as an example, if you want to set it to 95%, you can add -Ddisable.reporting.on.memory.percentage=95 in the JVM startup parameters. The complete Java launch command is as follows:
java -javaagent:/path/to/opentelemetry-javaagent.jar \\
-Dotel.resource.attributes=service.name=myService,token=myToken\\
-Dotel.exporter.otlp.endpoint=http://pl-demo.ap-guangzhou.apm.tencentcs.com:4317 \\
-Ddisable.reporting.on.memory.percentage=95
-jar SpringCloudApplication.jar
You can also configure it in the environmental variable: add the following content.
export DISABLE_REPORTING_ON_MEMORY_PERCENTAGE=95
To disable the fuse protection mechanism based on memory usage or CPU utilization, set the trigger threshold to 0 or 100. For example, add -Ddisable.reporting.on.memory.percentage=100 in the JVM startup parameters.

Traffic Control Protection Mechanism

The Traffic Control Protection Mechanism dynamically controls the rate of data acquisition and reporting to ensure the monitoring system's resource consumption is always within a controllable range. For the default persistence threshold, see the following table:
Configuration Name
Default Value
Description
Maximum Span creation count per second
5000
Spans exceeding the configuration quantity per second will not be created.
send queue length
10240
If the span data cannot be reported to the APM server promptly due to network issues, pending spans will accumulate in queue. Spans that the queue cannot hold will be discarded.
Maximum number of spans per batch
2048
This parameter determines the maximum number of Span data the probe reports to the APM server every batch.
You can customize the flow control persistence threshold through JVM startup parameters or environment variables. See the following table for the configuration method.
Setting Method
Rule
Parameter Configuration
JVM
Maximum Span creation count per second
max.span.per.second
send queue length
otel.bsp.max.queue.size
Maximum number of spans per batch
otel.bsp.max.export.batch.size
Environment Variable
Maximum Span creation count per second
MAX_SPAN_PER_SECOND
send queue length
OTEL_BSP_MAX_QUEUE_SIZE
Maximum number of spans per batch
OTEL_BSP_MAX_EXPORT_BATCH_SIZE
Note:
Custom stream persistence threshold may lead to data loss or add probe overhead. Keep default values as much as possible.


Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback