Kebijakan Privasi
Perjanjian Pemrosesan dan Keamanan Data
Parameter Name | Required | Parameter Description |
messages | Required | Main content of an evaluation chat, including system (system settings), role (assistant or user), and content. |
ref_answer | Optional | Reference answer of a model. |
Other Fields | Optional | User-defined fields. |
{"messages": [{"role": "system","content": "Intelligent Assistant is an LLM developed by xxx. xxx is a Chinese technology company that has been dedicated to research related to LLMs."},{"role": "user","content": "Hello!"},{"role": "assistant","content": "Hello! Is there anything I can do for you"},{"role": "user","content": "What is the sum of 1 and 1"}],"ref_answer": "The answer is 2","max_tokens": 4096,"extra_content": "a user-defined field. It can be referenced in a judge model"}
{"messages": [{"role": "system","content": "ht is an LLM developed by xxx. xxx is a Chinese technology company that has been dedicated to researches related to LLMs."},{"role": "user","content": "Hello!"},{"role": "assistant","content": "Hello! Is there anything I can do for you"},{"role": "user","content": "What is the sum of 1 and 1"},{"role": "assistant","content": "2"}],"ref_answer": "The answer is 2","extra_content": "a user-defined field. It can be referenced in a judge model"}
{"messages": [{"role": "user","content": "What is the sum of 1 and 1"}],"ref_answer": "The answer is 2","extra_content": "a user-defined field. It can be referenced in a judge model"}
{"system": "You are helpful.", "conversation": [{"prompt": "712+165+223+711=","response": "1811"}]}
{"conversation": [{"prompt": "712+165+223+711=","response": "1811"}]}
Parameter Name | Required | Parameter Description |
messages | Required | Main content of an evaluation chat, including system (system settings), role (assistant or user), and content. |
model_outputs | Required | Inference result of a model. |
ref_answer | Optional | Reference answer of a model. |
Other Fields | Optional | User-defined fields. |
{"messages":[{"role": "user","content": "Hello!"},{"role": "assistant","content": "Hello! Is there anything I can do for you"},{"role": "user","content": "What is the sum of 16 and 16"}],"ref_answer": "The answer is 32","id": "141234314","max_tokens": 1024,"temperature": 0.7,"top_p": 0.9,"top_k": 50,"model_outputs": // Above are the fields in the original data set.// This field is used to record the outputs of each model. The type is a list, and each object in the list contains the following:// model_name: name of the model.// responses: output of the model.[{"model_name": "llama3","responses":[{"content": "The answer is 32."},{"content": "32 is the answer."},]},{"model_name": "qwen3","responses":[{"content": "The answer is 32.","reasoning_content": "Wait..."},{"content": "The answer is 32.","reasoning_content": "Thinking..."}]}]}
Key Name | Type | Required | Description |
model_name | String | Yes | Unique identifier of the model to be tested, for example, llama3-8b. Note: In the entire data set, the same name should be used for the same model so that the report can correctly summarize all the performance data of the model. |
responses | Array | Yes | An array containing one or more inference results of the model. It supports scenarios where n is greater than 1. That is, inference is performed multiple times for the same input prompt. |
Key Name | Type | Required | Description |
content | String | Yes | Final result generated by the model to be tested. It is referenced during scoring based on a judge model. |
reasoning_content | String | No | Used to store the thinking process, chain of thought, or any intermediate drafts before the model to be tested generates the final content. This content is provided to help with deeper analysis. If the model does not output this content, this key can be omitted. |
// Line 1: contains the inference results of two models, where qwen2 has reasoning_content.{"messages": [{"role": "user", "content": "What is the sum 16 and 16"}], "ref_answer": "The answer is 32", "id": "141234314", "model_outputs": [{"model_name": "llama3", "responses": [{"content": "The answer is 32."}]}, {"model_name": "qwen2", "responses": [{"content": "The answer is 32.", "reasoning_content": "The user is asking a simple math question. 16 + 16 equals 32."}]}]}// Second line: contains two different inference results of a model (n=2).{"messages": [{"role": "user", "content": "Write a poem praising programmers"}], "ref_answer": "Code is like poetry, expressing the beauty of logic and shaping a world using your fingertips.", "id": "141234315", "model_outputs": [{"model_name": "llama3", "responses": [{"content": "Ten lines of code can change the world."}, {"content": "Fingers dance and code likes a song."}]}]}
Apakah halaman ini membantu?
Anda juga dapat Menghubungi Penjualan atau Mengirimkan Tiket untuk meminta bantuan.
masukan