tencent cloud

Tencent Cloud TI Platform

Related Agreement
개인 정보 보호 정책
데이터 처리 및 보안 계약
문서Tencent Cloud TI Platform

Automated Evaluation: Quick Configuration File Specifications for Metrics

포커스 모드
폰트 크기
마지막 업데이트 시간: 2026-01-23 17:02:05
You can quickly configure metrics by uploading a file on the automated evaluation configuration page. The file must include the evaluation set, the corresponding metric names for the evaluation set, and the detailed configuration information corresponding to each metric name (such as judge model information, scoring prompts, and preprocessing and post-processing scripts). You can click Quick Configuration to upload a custom YAML configuration file and any files that need to be referenced. Once the upload is completed, click Apply. The platform will automatically populate the configuration fields on the page by matching the evaluation set name​ that you specify with the metrics​ and their configuration information defined in the YAML configuration file.


Requirements for the YAML Configuration File Structure

The YAML file must clearly define the evaluation rules for the evaluation set. Examples of core fields are as follows:
- data_name: your_dataset_name # Data set name. This field is associated with the name entered on the evaluation set configuration page.
metrics: # Configuration for one or more metrics included in this data set.
# name: metric name.
- name: judge model scoring.
# The pipeline field is used to define the scoring process, where each element in this array corresponds to a processing node in sequence.
pipeline:
# The type field is set to PREPROCESS, which indicates that this node is a preprocessing node.
- type: PREPROCESS
# filename specifies the name of the Python script on which preprocessing or postprocessing depends.
# This field should be filled with the relative path of the corresponding file to the root directory within the compressed package.
filename: scripts/preprocessor.py
# file_content: For preprocessing or postprocessing steps, you can directly enter the script content in this field.
file_content: |-
def preprocess(data, resp, **kwargs) -> bool | int | str | float | None:
pass
# type is JUDGE_MODEL, which indicates that this node is a scoring node based on a judge model.
- type: JUDGE_MODEL
# Configure the judge model information by specifying the judge_model field.
judge_model:
name: DeepSeek-R1 # Name of a judge model.
# Method for calling a judge model:
# If MS is entered, it indicates using the Online Services module of Tencent Cloud TI-ONE Platform (TI-ONE).
# If a URL is entered, it indicates using a third-party URL.
source: MS
# ti_model_service_api: This field is required if the Online Services module of TI-ONE is used.
ti_model_service_api:
# service_group_id: ID of an online service, for example, ms-45mrs4rv.
service_group_id: ms-xxxx
# service_group_name: Name of an online service.
service_group_name: "DeepSeek-R1 judge model"
# service_id: ID of an online service. You can choose Online Service Details > Service Management to view the ID of a version, for example, ms-45mrs4rv-1.
# If this field is left unspecified, service_group_id with the suffix -1 will be generated by default.
service_id: ms-xxxx-1
# url_prefix: address of Regular Service Calling. You can choose Online Service Details > Service Call > Regular Service Calling to view it.
url_prefix: http://ms-xxxx-uuuu-sw.gw.ap-region.ti.tencentcs.com/ms-xxxx
# path_suffix: path for the calling chat API exposed by an online service.
path_suffix: /v1/chat/completions
# auth_token: authentication token. You can choose Service Authentication > Online Service Details to view it.
auth_token: "aaaaaaaa"
# third_party_api: If URL is specified for calling, this field is required.
third_party_api:
# url: calling URL.
url: http://ms-xxxx-uuuu-sw.gw.ap-region.ti.tencentcs.com/ms-xxxx/v1/chat/completions
# authorization_header: authentication HTTP header.
authorization_header:
# key HTTP Header Key
key: authorization
# value: HTTP header content.
value: your_token
# generation_params: parameters for calling a judge model.
# Note: According to restrictions in Tencent Cloud API specifications, this field should be a string.
# Namely, add `|-` at the end of this field to limit the field type to a string.
# During subsequent parsing, we will convert this field into one of the parameters in the judge model request body.
generation_params: |-
temperature: 0.8
top_p: 0.85
# judge_template_filename: scoring template file of a judge model.
judge_template_filename: template.jinja
# judge_template_content: If no file is specified, you can use this field to specify the scoring template content of a judge model.
judge_template_content: |-
You are a judge. Please rate the answers, with 5 as the highest score and 1 as the lowest score.
[Question]
{{ data.question }}
[Answer from the to-be-evaluated model]
{{ response.content }}
[Reference answer]
{{ data.ref_answer }}
Now give your score.
# type is POSTPROCESS, which indicates that this node is a postprocessing node.
- type: POSTPROCESS
filename: scripts/postprocessor.py
- data_name: another_dataset_name # You can configure multiple data sets in a YAML configuration file.
metrics:
- name: judge model scoring.
pipeline:
- type: PREPROCESS
filename: preprocessor-at-root.py
- type: JUDGE_MODEL
filename: judge.yaml # You can also configure a judge model separately through a YAML file.
- type: POSTPROCESS
filename: postprocessor-at-root.py

File Upload Specifications

Other files referenced in the YAML configuration file can be uploaded through the frontend. You can select files one by one to upload or compress the files into a ZIP package to upload.
File download examples:


도움말 및 지원

문제 해결에 도움이 되었나요?

피드백