In recent years, LLMs have shown a trend of rapid development. With the rise of DeepSeek, AI application barriers have been further reduced, introducing efficient and low-cost solutions across various industries, promoting an explosion in AI implementation.
The operation of LLM applications involves multiple components and complex interaction processes. Application Performance Management (APM) uses distributed tracing technology to clearly show the call trace of requests between components. In the event of a failure, it enables quick location of the specific issue. Meanwhile, APM can monitor the running state of model applications in real time, detect exceptions timely, and report alarms, allowing O&M personnel to take timely measures to avoid escalation.
Supported LLM Components and Frameworks
Tencent Cloud's self-developed Python probe simultaneously supports automatic event tracking for common LLM frameworks and traditional (non-LLM) Python frameworks, compatible with OpenTelemetry protocol standard, enabling interconnectivity with other applications using OpenTelemetry solution, and supports the following components or frameworks.
Note:
Requires Python 3.9 and above versions.
|
LLM Components and Frameworks | OpenAI SDK (openai >= 0.27.0): An officially provided API encapsulation by OpenAI for direct calls to all large models compatible with OpenAI standards. Ollama (ollama >= 0.4.0): A lightweight reasoning framework to run locally and manage open-source large models. LangChain / LangGraph (langchain-core > 0.1.0): A workflow framework to build and orchestrate large model applications, supporting complex chains and status management. LlamaIndex (llama-index >= 0.7.0, llama-index-core >= 0.7.0): A RAG framework focused on external data access for LLMs, providing retrieve and index power. |
Traditional Python components and frameworks | |
Access Process
Note:
If the service is deployed in a TKE cluster and the cluster has the Tencent Cloud APM operator installed, manually add the environment variable OTEL_EXPORTER_METRICS_TEMPORALITY_PREFERENCE with the value true in the workload.
Getting Access Point and Token
2. In the left menu bar, select APM, then click Application list > Access application.
3. In the Access application drawer frame that pops up on the right, click the Python language.
4. On the Access Python application page, select the Region and Business System.
5. Select OpenTelemetry as Access protocol type.
6. Select a Reporting method through which you want to report data, and obtain your Access Point and Token.
Note:
Report over private network: This reporting method requires your service to run in the Tencent Cloud VPC. The direct connection through VPC can help avoid the security risks of public network communication and save costs on reporting traffic.
Report over public network: If your service is deployed locally or in non-Tencent Cloud VPC, you can report data in this method. However, it involves security risks in public network communication and incurs reporting traffic fees.
Installing pip Packages
Install Tencent Cloud's self-developed probe via pip command, including OpenTelemetry-SDK related dependencies.
pip install tapm-distro opentelemetry-exporter-otlp==1.34.1
tapm-bootstrap -a install
Command Line Method Reporting
Add the tapm-instrument prefix to complete instrumentation and startup. Assuming the original project start command was python app.py, you can now execute the following command to start the Python application.
tapm-instrument --traces_exporter otlp \\
--metrics_exporter otlp \\
--logs_exporter none \\
--service_name <service_name> \\
--resource_attributes "token=<token>,host.name=<host.name>" \\
--exporter_otlp_endpoint <endpoint> \\
python app.py
The corresponding field descriptions are as follows. Replace based on the actual situation.
<service_name>: Application name. Multiple application processes connecting with the same serviceName are displayed as multiple instances under the same application in APM. The application name can contain up to 63 characters, can only contain lowercase letters, digits, and the separator "-", and must start with a lowercase letter and end with a digit or lowercase letter.
<token>: business system Token obtained in preliminary steps.
<host.name>: The hostname of this instance, which is the unique identifier of the application instance. It can usually be set to the IP address of the application instance.
<endpoint>: The access point obtained in the preliminary steps.
Access Verification
After completing the access work, start up the LLM application, and the application will report link data to APM. The connected application will be displayed on the Application Performance Monitoring > LLM Observability > Application list page. Since there is latency in the processing of observable data, if the application or instance does not appear in the console after connecting, please wait about 30 seconds.