Authorization: Bearer {your_access_token}
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
POST https://api.example.com/api/v1/document/parse/submit
Parameter Name | Type | Required | Description |
Authorization | string | Yes | Bearer Token Authentication |
Content-Type | string | Yes | application/json |
Parameter Name | Type | Required | Description | Examples |
FileType | string | Yes | File types supported: PDF, DOCX, DOC, XLSX, XLS, PPTX, PPT, TXT, MD, HTML | "PDF" |
FileName | string | Yes | File name, including extension | "example.pdf" |
FileUrl | string | Yes | File download address must be an accessible HTTPS link, supporting signed temporary URLs (validity period should be ≥ 1 hour). | File url |
{"FileType": "PDF","FileName": "example.pdf","FileUrl": "https://example.cos.ap-jakarta.myqcloud.com/public/example/example.pdf"}
Parameter Name | Type | Description |
Response | object | Response object. |
Response.RequestId | string | Request ID for troubleshooting. |
Response.TaskId | string | Task ID for querying parsing results. |
{"Response": {"RequestId": "5e148c27-9c21-43cd-992c-799117bb4216","TaskId": "236e51fd-827b-41cb-b303-56003a817ce5"}}
{"Response": {"Error": {"Code": "InvalidParameter","Message": "FileUrl is required"},"RequestId": "5e148c27-9c21-43cd-992c-799117bb4216"}}
Error Code | HTTP Status Code | Description |
InvalidParameter | 400 | The request parameters were incorrect. |
InvalidFileUrl | 400 | File URL invalid or inaccessible |
UnsupportedFileType | 400 | Unsupported file type |
Unauthorized | 401 | Unauthorized, Token missing |
Forbidden | 403 | Token invalid or expired |
FileTooLarge | 413 | File size exceeds the limit. |
TooManyRequests | 429 | Request rate limit exceeded. |
InternalError | 500 | Internal service error |
POST https://api.example.com/api/v1/document/parse/query
Parameter Name | Type | Required | Description |
Authorization | string | Yes | Bearer Token Authentication |
Content-Type | string | Yes | application/json |
Parameter Name | Type | Required | Description | Examples |
TaskId | string | Yes | Task ID returned by the submission interface. | "236e51fd-827b-41cb-b303-56003a817ce5" |
{"TaskId": "236e51fd-827b-41cb-b303-56003a817ce5"}
Parameter Name | Type | Description |
Response | object | Response object. |
Response.RequestId | string | Request ID |
Response.Status | string | Task Status: Pending (Waiting) Processing (In Progress) Success (Succeeded) Failed (Failed) |
Response.DocumentRecognizeResultUrl | string | Download URLs for the parsed result files (ZIP format), returned only when Status is Success. |
Response.Progress | integer | Task progress (0-100), returned only when Status is Processing. |
Response.ErrorCode | string | Error code, returned only when Status is Failed. |
Response.ErrorMessage | string | Error message, returned only when Status is Failed. |
{"Response": {"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384","Status": "Success","DocumentRecognizeResultUrl": "https://example.cos.ap-jakarta.myqcloud.com/public/example/example.zip"}}
{"Response": {"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384","Status": "Processing","Progress": 65}}
{"Response": {"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384","Status": "Pending"}}
{"Response": {"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384","Status": "Failed","ErrorCode": "ParseError","ErrorMessage": "Document format corrupted and cannot be parsed"}}
{"Response": {"Error": {"Code": "TaskNotFound","Message": "Task not found"},"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384"}}
Status | Description | Recommended Actions |
Pending | Task has been submitted and is pending processing. | Continue polling the task status. |
Processing | Task in progress. | Continue polling the task status. |
Success | Task completed successfully. | Download the result file. |
Failed | Task failed. | View the error message and resubmit. |
Error Code | HTTP Status Code | Description |
InvalidParameter | 400 | The request parameters were incorrect. |
TaskNotFound | 404 | Task does not exist. |
Unauthorized | 401 | Unauthorized, Token missing. |
Forbidden | 403 | Token invalid or expired. |
InternalError | 500 | Internal service error |
POST https://api.example.com/api/v1/document/parse/sync_parse
Parameter Name | Type | Required | Description |
Authorization | string | Yes | Bearer Token Authentication |
Content-Type | string | Yes | application/json |
Parameter Name | Type | Required | Description | Examples |
FileType | string | Yes | File types supported: PDF, DOCX, DOC, XLSX, XLS, PPTX, PPT, TXT, MD, HTML. | "PDF" |
FileName | string | Yes | File name, including extension | "example.pdf" |
FileUrl | string | Yes | File download address must be an accessible HTTPS link, supporting signed temporary URLs (validity period should be ≥ 1 hour). | File url |
{"FileType": "PDF","FileName": "example.pdf","FileUrl": "https://example.cos.ap-jakarta.myqcloud.com/public/example/example.pdf"}
Parameter Name | Type | Description |
Response | object | Response object. |
Response.RequestId | string | Request ID |
Response.Status | string | Task Status: Pending (Waiting) Processing (In Progress) Success (Succeeded) Failed (Failed) |
Response.DocumentRecognizeResultUrl | string | Download URLs for the parsed result files (ZIP format), returned only when Status is Success. |
Response.Progress | integer | Task progress (0-100), returned only when Status is Processing. |
Response.ErrorCode | string | Error code, returned only when Status is Failed. |
Response.ErrorMessage | string | Error message, returned only when Status is Failed. |
{"Response": {"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384","Status": "Success","DocumentRecognizeResultUrl": "https://example.cos.ap-jakarta.myqcloud.com/public/example/example.zip"}}
{"Response": {"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384","Status": "Failed","ErrorCode": "ParseError","ErrorMessage": "Document format corrupted and cannot be parsed"}}
{"Response": {"Error": {"Code": "InvalidParameter","Message": "FileUrl is required"},"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384"}}
Error Code | HTTP Status Code | Description |
InvalidParameter | 400 | The request parameters were incorrect. |
InvalidFileUrl | 400 | File URL invalid or inaccessible. |
UnsupportedFileType | 400 | Unsupported file type |
Unauthorized | 401 | Unauthorized, Token missing |
Forbidden | 403 | Token invalid or expired |
FileTooLarge | 413 | File size exceeds the limit. |
TooManyRequests | 429 | Request rate limit exceeded. |
RequestTimeout | 408 | Request timeout (parsing takes too long) |
InternalError | 500 | Internal service error |
DocumentRecognizeResultUrl returns a download URL for a ZIP archive, containing the following:76aef68b-c444-41d2-829a-d513fa35e42b.zip # zip file downloaded from DocumentRecognizeResultUrl├── 76aef68b-c444-41d2-829a-d513fa35e42b/ # Subdirectory (named with task ID)│ ├── 76aef68b-c444-41d2-829a-d513fa35e42b_parse_page0.json # Parsing result of page 1│ ├── 76aef68b-c444-41d2-829a-d513fa35e42b_parse_page1.json # Parsing result of page 2│ ├── ... # Parsing results of more pages│ └── images/ # Directory for extracted images│ ├── 76c7b6051d432f6527bd91a02321d126-image.png # Image file (named with UUID)│ └── ...
*_parse_page{N}.json), containing the complete identification information of that page.Field Name | Type | Description |
PageNumber | integer | The page number, starting from 1 |
Angle | integer | Page rotation angle (°) |
RotatedAngle | integer | Current rotation angle (°) |
Height | integer | Page height (px) |
Width | integer | Page width (px) |
OriginHeight | integer | Original page height (px) |
OriginWidth | integer | Original page width (px) |
Elements | array | Page element list, see below for details. |
Field Name | Type | Description |
Index | integer | The index of the element in the page, starting from 0. |
Type | string | Element types: title (title), text (text), figure (figure), table (table), figure_text (figure text) |
Text | string | Element text content |
Level | integer | Element level: 0 indicates top-level elements, 1 indicates nested elements |
Polygon | object | Element position coordinates (quadrilateral), see below for details. |
InsetImageName | string | Embedded image name (if any) |
Elements | array | Nested child element list (recursive structure) |
ImagePath | string | Image file path (relative to the ZIP root directory) |
Field Name | Type | Description |
LeftTop | object | Top-left corner coordinates {"X": int, "Y": int} |
RightTop | object | Top-right corner coordinates {"X": int, "Y": int} |
LeftBottom | object | Bottom-left corner coordinates {"X": int, "Y": int} |
RightBottom | object | Bottom-right corner coordinates {"X": int, "Y": int} |
Type | Description | Characteristic |
title | Title | They are typically section headings in documents. |
text | Plain text | Paragraphs, body text |
figure | Chart | Contains visual content such as images, charts, and so on, and may include nested figure_text |
figure_text | Text in charts | Text content identified within charts |
table | Table | Structured tabular data |
{"PageNumber": 1,"Angle": 0,"RotatedAngle": 0,"Height": 286,"Width": 736,"OriginHeight": 286,"OriginWidth": 736,"Elements": [{"Index": 0,"Type": "title","Text": "# Data Scale","Level": 0,"Polygon": {"LeftTop": {"X": 3, "Y": 98},"RightTop": {"X": 25, "Y": 98},"LeftBottom": {"X": 3, "Y": 169},"RightBottom": {"X": 25, "Y": 169}},"InsetImageName": "","Elements": null,"ImagePath": ""},{"Index": 1,"Type": "figure","Text": "","Level": 0,"Polygon": {"LeftTop": {"X": 41, "Y": 4},"RightTop": {"X": 733, "Y": 4},"LeftBottom": {"X": 41, "Y": 286},"RightBottom": {"X": 733, "Y": 286}},"InsetImageName": "","Elements": [{"Index": 0,"Type": "figure_text","Text": "10 000 000\\n1 000 000\\n100 000\\n10 000\\n1 000","Level": 1,"Polygon": {"LeftTop": {"X": 41, "Y": 4},"RightTop": {"X": 733, "Y": 4},"LeftBottom": {"X": 41, "Y": 286},"RightBottom": {"X": 733, "Y": 286}},"InsetImageName": "","Elements": null,"ImagePath": ""}],"ImagePath": "images/76c7b6051d432f6527bd91a02321d126-image.png"}]}
{"PageNumber": 2,"Angle": 0,"RotatedAngle": 0,"Height": 842,"Width": 595,"OriginHeight": 842,"OriginWidth": 595,"Elements": [{"Index": 0,"Type": "text","Text": "This is a plain text used to describe the main information of the document.","Level": 0,"Polygon": {"LeftTop": {"X": 50, "Y": 100},"RightTop": {"X": 545, "Y": 100},"LeftBottom": {"X": 50, "Y": 130},"RightBottom": {"X": 545, "Y": 130}},"InsetImageName": "","Elements": null,"ImagePath": ""},{"Index": 1,"Type": "table","Text": "Name\\tAge\\tPosition\\nZhang San\\t28\\tEngineer\\nLi Si\\t32\\tManager","Level": 0,"Polygon": {"LeftTop": {"X": 50, "Y": 200},"RightTop": {"X": 545, "Y": 200},"LeftBottom": {"X": 50, "Y": 350},"RightBottom": {"X": 545, "Y": 350}},"InsetImageName": "","Elements": null,"ImagePath": ""}]}
import requestsimport timeimport json# ConfigurationAPI_BASE_URL = "https://api.example.com"BEARER_TOKEN = "your_access_token"headers = {"Authorization": f"Bearer {BEARER_TOKEN}","Content-Type": "application/json"}# 1. Submit a parsing taskdef submit_parse_task(file_url, file_name, file_type):url = f"{API_BASE_URL}/api/v1/document/parse/submit"payload = {"FileType": file_type,"FileName": file_name,"FileUrl": file_url}response = requests.post(url, headers=headers, json=payload)response.raise_for_status()result = response.json()return result["Response"]["TaskId"]# 2. Query the parsing resultdef query_parse_result(task_id, max_retries=60, interval=5):url = f"{API_BASE_URL}/api/v1/document/parse/query"payload = {"TaskId": task_id}for i in range(max_retries):response = requests.post(url, headers=headers, json=payload)response.raise_for_status()result = response.json()["Response"]status = result["Status"]print(f"[{i+1}/{max_retries}] Task status: {status}")if status == "Success":return result["DocumentRecognizeResultUrl"]elif status == "Failed":raise Exception(f"Parse failed: {result.get('ErrorMessage')}")time.sleep(interval)raise Exception("Query timeout")# 3. Usage Exampletry:# Submit Tasktask_id = submit_parse_task(file_url="https://example.com/document.pdf",file_name="document.pdf",file_type="PDF")print(f"Task submitted: {task_id}")# Query Resultresult_url = query_parse_result(task_id)print(f"Parse completed: {result_url}")# Download Results# download_and_extract(result_url)except Exception as e:print(f"Error: {e}")
Operating System: Ubuntu 24.04.3 LTS / x86_64
Runtime Version: Python 3.11.1
import requestsimport json# ConfigurationAPI_BASE_URL = "https://api.example.com"BEARER_TOKEN = "your_access_token"headers = {"Authorization": f"Bearer {BEARER_TOKEN}","Content-Type": "application/json"}# Synchronous Parsing (Get results in a single request)def sync_parse_document(file_url, file_name, file_type):url = f"{API_BASE_URL}/api/v1/document/parse/sync_parse"payload = {"FileType": file_type,"FileName": file_name,"FileUrl": file_url}# Set a longer timeout (recommended 5 minutes)response = requests.post(url, headers=headers, json=payload, timeout=300)response.raise_for_status()result = response.json()["Response"]if result["Status"] == "Success":return result["DocumentRecognizeResultUrl"]elif result["Status"] == "Failed":raise Exception(f"Parse failed: {result.get('ErrorMessage', 'Unknown error')}")else:raise Exception(f"Unexpected status: {result['Status']}")# Usage Exampletry:result_url = sync_parse_document(file_url="https://example.com/document.pdf",file_name="document.pdf",file_type="PDF")print(f"Parse completed: {result_url}")# Download Results# download_and_extract(result_url)except requests.exceptions.Timeout:print("Error: Request timeout (file too large or complex)")except Exception as e:print(f"Error: {e}")
Operating System: Ubuntu 24.04.3 LTS / x86_64
Runtime Version: Python 3.11.1
curl -X POST https://api.example.com/api/v1/document/parse/submit \\-H "Authorization: Bearer your_access_token" \\-H "Content-Type: application/json" \\-d '{"FileType": "PDF","FileName": "example.pdf","FileUrl": "https://qidian-qbot-1251316161.cos.ap-jakarta.myqcloud.com/public/example/example.pdf"}'
curl -X POST https://api.example.com/api/v1/document/parse/query \\-H "Authorization: Bearer your_access_token" \\-H "Content-Type: application/json" \\-d '{"TaskId": "236e51fd-827b-41cb-b303-56003a817ce5"}'
curl -X POST https://api.example.com/api/v1/document/parse/sync_parse \\-H "Authorization: Bearer your_access_token" \\-H "Content-Type: application/json" \\--max-time 300 \\-d '{"FileType": "PDF","FileName": "example.pdf","FileUrl": "https://qidian-qbot-1251316161.cos.ap-jakarta.myqcloud.com/public/example/example.pdf"}'
Feedback