tencent cloud

ADP Document Parsing Protocol
Last updated:2026-02-06 15:30:35
ADP Document Parsing Protocol
Last updated: 2026-02-06 15:30:35
Note:
ADP Document Parsing Protocol is primarily used to clarify the document parsing protocol defined by ADP, guiding users in service encapsulation. To understand the end-to-end process of accessing document parsing services in ADP, you may refer to the ADP Document Parsing Service Access Guide.

Version information

Version number:v1.0
Update date:2025-12-23
Protocol:HTTPS
Request method:POST
Data format:JSON
Character encoding:UTF-8

Authentication method

Bearer Token Authentication

All API requests must carry a Bearer Token in the HTTP Header for authentication.
Header format:
Authorization: Bearer {your_access_token}
Example:
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...

API List

1. Submit a Document Parsing Task

API Description

Submit a document parsing task that supports various document formats such as PDF, Word, Excel, and PPT. After task submission, a TaskId will be returned for subsequent querying of parsing results.

API address

POST https://api.example.com/api/v1/document/parse/submit

Request Headers (Headers)

Parameter Name
Type
Required
Description
Authorization
string
Yes
Bearer Token Authentication
Content-Type
string
Yes
application/json

Request Parameters (Request Body)

Parameter Name
Type
Required
Description
Examples
FileType
string
Yes
File types supported: PDF, DOCX, DOC, XLSX, XLS, PPTX, PPT, TXT, MD, HTML
"PDF"
FileName
string
Yes
File name, including extension
"example.pdf"
FileUrl
string
Yes
File download address must be an accessible HTTPS link, supporting signed temporary URLs (validity period should be ≥ 1 hour).
File url

Request Example

{
"FileType": "PDF",
"FileName": "example.pdf",
"FileUrl": "https://example.cos.ap-jakarta.myqcloud.com/public/example/example.pdf"
}

Response Parameters (Response)

Parameter Name
Type
Description
Response
object
Response object.
Response.RequestId
string
Request ID for troubleshooting.
Response.TaskId
string
Task ID for querying parsing results.

Response Example

Successful Response (200 OK):
{
"Response": {
"RequestId": "5e148c27-9c21-43cd-992c-799117bb4216",
"TaskId": "236e51fd-827b-41cb-b303-56003a817ce5"
}
}
Error Response:
{
"Response": {
"Error": {
"Code": "InvalidParameter",
"Message": "FileUrl is required"
},
"RequestId": "5e148c27-9c21-43cd-992c-799117bb4216"
}
}

Error Code Description

Error Code
HTTP Status Code
Description
InvalidParameter
400
The request parameters were incorrect.
InvalidFileUrl
400
File URL invalid or inaccessible
UnsupportedFileType
400
Unsupported file type
Unauthorized
401
Unauthorized, Token missing
Forbidden
403
Token invalid or expired
FileTooLarge
413
File size exceeds the limit.
TooManyRequests
429
Request rate limit exceeded.
InternalError
500
Internal service error

2. Query Document Parsing Results

API Description

Query the status and results of a document parsing task based on the TaskId. Upon completion of parsing, download URLs for the result files will be provided.

API address

POST https://api.example.com/api/v1/document/parse/query

Request Headers (Headers)

Parameter Name
Type
Required
Description
Authorization
string
Yes
Bearer Token Authentication
Content-Type
string
Yes
application/json

Request Parameters (Request Body)

Parameter Name
Type
Required
Description
Examples
TaskId
string
Yes
Task ID returned by the submission interface.
"236e51fd-827b-41cb-b303-56003a817ce5"

Request Example

{
"TaskId": "236e51fd-827b-41cb-b303-56003a817ce5"
}

Response Parameters (Response)

Parameter Name
Type
Description
Response
object
Response object.
Response.RequestId
string
Request ID
Response.Status
string
Task Status:
Pending (Waiting)
Processing (In Progress)
Success (Succeeded)
Failed (Failed)
Response.DocumentRecognizeResultUrl
string
Download URLs for the parsed result files (ZIP format), returned only when Status is Success.
Response.Progress
integer
Task progress (0-100), returned only when Status is Processing.
Response.ErrorCode
string
Error code, returned only when Status is Failed.
Response.ErrorMessage
string
Error message, returned only when Status is Failed.

Response Example

Successful Response - Task Completed (200 OK):
{
"Response": {
"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384",
"Status": "Success",
"DocumentRecognizeResultUrl": "https://example.cos.ap-jakarta.myqcloud.com/public/example/example.zip"
}
}
Successful Response - Task in Progress (200 OK):
{
"Response": {
"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384",
"Status": "Processing",
"Progress": 65
}
}
Successful Response - Task Pending (200 OK):
{
"Response": {
"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384",
"Status": "Pending"
}
}
Successful Response - Task Failed (200 OK):
{
"Response": {
"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384",
"Status": "Failed",
"ErrorCode": "ParseError",
"ErrorMessage": "Document format corrupted and cannot be parsed"
}
}
Error Response - Task Does Not Exist (404 Not Found):
{
"Response": {
"Error": {
"Code": "TaskNotFound",
"Message": "Task not found"
},
"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384"
}
}

Task Status Descriptions

Status
Description
Recommended Actions
Pending
Task has been submitted and is pending processing.
Continue polling the task status.
Processing
Task in progress.
Continue polling the task status.
Success
Task completed successfully.
Download the result file.
Failed
Task failed.
View the error message and resubmit.

Error Code Description

Error Code
HTTP Status Code
Description
InvalidParameter
400
The request parameters were incorrect.
TaskNotFound
404
Task does not exist.
Unauthorized
401
Unauthorized, Token missing.
Forbidden
403
Token invalid or expired.
InternalError
500
Internal service error

3. Synchronous Parsing Interface

API Description

Submit a document parsing task that supports various document formats such as PDF, Word, Excel, and PPT. After task submission, a TaskId will be returned for subsequent querying of parsing results.

API address

POST https://api.example.com/api/v1/document/parse/sync_parse

Request Headers (Headers)

Parameter Name
Type
Required
Description
Authorization
string
Yes
Bearer Token Authentication
Content-Type
string
Yes
application/json

Request Parameters (Request Body)

Parameter Name
Type
Required
Description
Examples
FileType
string
Yes
File types supported: PDF, DOCX, DOC, XLSX, XLS, PPTX, PPT, TXT, MD, HTML.
"PDF"
FileName
string
Yes
File name, including extension
"example.pdf"
FileUrl
string
Yes
File download address must be an accessible HTTPS link, supporting signed temporary URLs (validity period should be ≥ 1 hour).
File url

Request Example

{
"FileType": "PDF",
"FileName": "example.pdf",
"FileUrl": "https://example.cos.ap-jakarta.myqcloud.com/public/example/example.pdf"
}

Response Parameters (Response)

Parameter Name
Type
Description
Response
object
Response object.
Response.RequestId
string
Request ID
Response.Status
string
Task Status:
Pending (Waiting)
Processing (In Progress)
Success (Succeeded)
Failed (Failed)
Response.DocumentRecognizeResultUrl
string
Download URLs for the parsed result files (ZIP format), returned only when Status is Success.
Response.Progress
integer
Task progress (0-100), returned only when Status is Processing.
Response.ErrorCode
string
Error code, returned only when Status is Failed.
Response.ErrorMessage
string
Error message, returned only when Status is Failed.

Response Example

Successful Response - Task Completed (200 OK):
{
"Response": {
"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384",
"Status": "Success",
"DocumentRecognizeResultUrl": "https://example.cos.ap-jakarta.myqcloud.com/public/example/example.zip"
}
}
Error Response - Task Failed (200 OK):
{
"Response": {
"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384",
"Status": "Failed",
"ErrorCode": "ParseError",
"ErrorMessage": "Document format corrupted and cannot be parsed"
}
}
Error Response - Parameter Error (400 Bad Request):
{
"Response": {
"Error": {
"Code": "InvalidParameter",
"Message": "FileUrl is required"
},
"RequestId": "ffe23ed8-2b64-4835-aedc-ca9a5b5a7384"
}
}

Error Code Description

Error Code
HTTP Status Code
Description
InvalidParameter
400
The request parameters were incorrect.
InvalidFileUrl
400
File URL invalid or inaccessible.
UnsupportedFileType
400
Unsupported file type
Unauthorized
401
Unauthorized, Token missing
Forbidden
403
Token invalid or expired
FileTooLarge
413
File size exceeds the limit.
TooManyRequests
429
Request rate limit exceeded.
RequestTimeout
408
Request timeout (parsing takes too long)
InternalError
500
Internal service error

Parsing Result File Description

Result File Format

After the parsing is complete, DocumentRecognizeResultUrl returns a download URL for a ZIP archive, containing the following:
76aef68b-c444-41d2-829a-d513fa35e42b.zip # zip file downloaded from DocumentRecognizeResultUrl
├── 76aef68b-c444-41d2-829a-d513fa35e42b/ # Subdirectory (named with task ID)
│ ├── 76aef68b-c444-41d2-829a-d513fa35e42b_parse_page0.json # Parsing result of page 1
│ ├── 76aef68b-c444-41d2-829a-d513fa35e42b_parse_page1.json # Parsing result of page 2
│ ├── ... # Parsing results of more pages
│ └── images/ # Directory for extracted images
│ ├── 76c7b6051d432f6527bd91a02321d126-image.png # Image file (named with UUID)
│ └── ...

Parsing Result File Structure Description

The parsing result of each page is saved in a separate JSON file (*_parse_page{N}.json), containing the complete identification information of that page.

Parsing Page Structure (Page Object)

Field Name
Type
Description
PageNumber
integer
The page number, starting from 1
Angle
integer
Page rotation angle (°)
RotatedAngle
integer
Current rotation angle (°)
Height
integer
Page height (px)
Width
integer
Page width (px)
OriginHeight
integer
Original page height (px)
OriginWidth
integer
Original page width (px)
Elements
array
Page element list, see below for details.

Element Structure (Element Object)

Field Name
Type
Description
Index
integer
The index of the element in the page, starting from 0.
Type
string
Element types: title (title), text (text), figure (figure), table (table), figure_text (figure text)
Text
string
Element text content
Level
integer
Element level: 0 indicates top-level elements, 1 indicates nested elements
Polygon
object
Element position coordinates (quadrilateral), see below for details.
InsetImageName
string
Embedded image name (if any)
Elements
array
Nested child element list (recursive structure)
ImagePath
string
Image file path (relative to the ZIP root directory)

Coordinate Structure (Polygon Object)

Field Name
Type
Description
LeftTop
object
Top-left corner coordinates {"X": int, "Y": int}
RightTop
object
Top-right corner coordinates {"X": int, "Y": int}
LeftBottom
object
Bottom-left corner coordinates {"X": int, "Y": int}
RightBottom
object
Bottom-right corner coordinates {"X": int, "Y": int}
Coordinate system description:
The origin (0, 0) is located at the top-left corner of the page.
The X-axis increases to the right.
The Y-axis increases downward.

Element Type Description

Type
Description
Characteristic
title
Title
They are typically section headings in documents.
text
Plain text
Paragraphs, body text
figure
Chart
Contains visual content such as images, charts, and so on, and may include nested figure_text
figure_text
Text in charts
Text content identified within charts
table
Table
Structured tabular data

Parsing Result Example

Example 1: Page with Title and Charts

{
"PageNumber": 1,
"Angle": 0,
"RotatedAngle": 0,
"Height": 286,
"Width": 736,
"OriginHeight": 286,
"OriginWidth": 736,
"Elements": [
{
"Index": 0,
"Type": "title",
"Text": "# Data Scale",
"Level": 0,
"Polygon": {
"LeftTop": {"X": 3, "Y": 98},
"RightTop": {"X": 25, "Y": 98},
"LeftBottom": {"X": 3, "Y": 169},
"RightBottom": {"X": 25, "Y": 169}
},
"InsetImageName": "",
"Elements": null,
"ImagePath": ""
},
{
"Index": 1,
"Type": "figure",
"Text": "",
"Level": 0,
"Polygon": {
"LeftTop": {"X": 41, "Y": 4},
"RightTop": {"X": 733, "Y": 4},
"LeftBottom": {"X": 41, "Y": 286},
"RightBottom": {"X": 733, "Y": 286}
},
"InsetImageName": "",
"Elements": [
{
"Index": 0,
"Type": "figure_text",
"Text": "10 000 000\\n1 000 000\\n100 000\\n10 000\\n1 000",
"Level": 1,
"Polygon": {
"LeftTop": {"X": 41, "Y": 4},
"RightTop": {"X": 733, "Y": 4},
"LeftBottom": {"X": 41, "Y": 286},
"RightBottom": {"X": 733, "Y": 286}
},
"InsetImageName": "",
"Elements": null,
"ImagePath": ""
}
],
"ImagePath": "images/76c7b6051d432f6527bd91a02321d126-image.png"
}
]
}

Example 2: Page Containing Text and Tables

{
"PageNumber": 2,
"Angle": 0,
"RotatedAngle": 0,
"Height": 842,
"Width": 595,
"OriginHeight": 842,
"OriginWidth": 595,
"Elements": [
{
"Index": 0,
"Type": "text",
"Text": "This is a plain text used to describe the main information of the document.",
"Level": 0,
"Polygon": {
"LeftTop": {"X": 50, "Y": 100},
"RightTop": {"X": 545, "Y": 100},
"LeftBottom": {"X": 50, "Y": 130},
"RightBottom": {"X": 545, "Y": 130}
},
"InsetImageName": "",
"Elements": null,
"ImagePath": ""
},
{
"Index": 1,
"Type": "table",
"Text": "Name\\tAge\\tPosition\\nZhang San\\t28\\tEngineer\\nLi Si\\t32\\tManager",
"Level": 0,
"Polygon": {
"LeftTop": {"X": 50, "Y": 200},
"RightTop": {"X": 545, "Y": 200},
"LeftBottom": {"X": 50, "Y": 350},
"RightBottom": {"X": 545, "Y": 350}
},
"InsetImageName": "",
"Elements": null,
"ImagePath": ""
}
]
}

API Call Sample Code

Python Examples

import requests
import time
import json

# Configuration
API_BASE_URL = "https://api.example.com"
BEARER_TOKEN = "your_access_token"

headers = {
"Authorization": f"Bearer {BEARER_TOKEN}",
"Content-Type": "application/json"
}

# 1. Submit a parsing task
def submit_parse_task(file_url, file_name, file_type):
url = f"{API_BASE_URL}/api/v1/document/parse/submit"
payload = {
"FileType": file_type,
"FileName": file_name,
"FileUrl": file_url
}

response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()

result = response.json()
return result["Response"]["TaskId"]

# 2. Query the parsing result
def query_parse_result(task_id, max_retries=60, interval=5):
url = f"{API_BASE_URL}/api/v1/document/parse/query"
payload = {"TaskId": task_id}

for i in range(max_retries):
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()

result = response.json()["Response"]
status = result["Status"]

print(f"[{i+1}/{max_retries}] Task status: {status}")

if status == "Success":
return result["DocumentRecognizeResultUrl"]
elif status == "Failed":
raise Exception(f"Parse failed: {result.get('ErrorMessage')}")

time.sleep(interval)

raise Exception("Query timeout")

# 3. Usage Example
try:
# Submit Task
task_id = submit_parse_task(
file_url="https://example.com/document.pdf",
file_name="document.pdf",
file_type="PDF"
)
print(f"Task submitted: {task_id}")

# Query Result
result_url = query_parse_result(task_id)
print(f"Parse completed: {result_url}")

# Download Results
# download_and_extract(result_url)

except Exception as e:
print(f"Error: {e}")

Running Environment

Operating System: Ubuntu 24.04.3 LTS / x86_64

Runtime Version: Python 3.11.1

Python Synchronous Interface Example

import requests
import json

# Configuration
API_BASE_URL = "https://api.example.com"
BEARER_TOKEN = "your_access_token"

headers = {
"Authorization": f"Bearer {BEARER_TOKEN}",
"Content-Type": "application/json"
}

# Synchronous Parsing (Get results in a single request)
def sync_parse_document(file_url, file_name, file_type):
url = f"{API_BASE_URL}/api/v1/document/parse/sync_parse"
payload = {
"FileType": file_type,
"FileName": file_name,
"FileUrl": file_url
}

# Set a longer timeout (recommended 5 minutes)
response = requests.post(url, headers=headers, json=payload, timeout=300)
response.raise_for_status()

result = response.json()["Response"]

if result["Status"] == "Success":
return result["DocumentRecognizeResultUrl"]
elif result["Status"] == "Failed":
raise Exception(f"Parse failed: {result.get('ErrorMessage', 'Unknown error')}")
else:
raise Exception(f"Unexpected status: {result['Status']}")

# Usage Example
try:
result_url = sync_parse_document(
file_url="https://example.com/document.pdf",
file_name="document.pdf",
file_type="PDF"
)
print(f"Parse completed: {result_url}")

# Download Results
# download_and_extract(result_url)

except requests.exceptions.Timeout:
print("Error: Request timeout (file too large or complex)")
except Exception as e:
print(f"Error: {e}")

Running Environment

Operating System: Ubuntu 24.04.3 LTS / x86_64

Runtime Version: Python 3.11.1

CURL Example

Asynchronous Interface Invocation

Submit Task:
curl -X POST https://api.example.com/api/v1/document/parse/submit \\
-H "Authorization: Bearer your_access_token" \\
-H "Content-Type: application/json" \\
-d '{
"FileType": "PDF",
"FileName": "example.pdf",
"FileUrl": "https://qidian-qbot-1251316161.cos.ap-jakarta.myqcloud.com/public/example/example.pdf"
}'
Query Result:
curl -X POST https://api.example.com/api/v1/document/parse/query \\
-H "Authorization: Bearer your_access_token" \\
-H "Content-Type: application/json" \\
-d '{
"TaskId": "236e51fd-827b-41cb-b303-56003a817ce5"
}'

Synchronous Interface Call

Synchronous Parsing (Get results in a single request):
curl -X POST https://api.example.com/api/v1/document/parse/sync_parse \\
-H "Authorization: Bearer your_access_token" \\
-H "Content-Type: application/json" \\
--max-time 300 \\
-d '{
"FileType": "PDF",
"FileName": "example.pdf",
"FileUrl": "https://qidian-qbot-1251316161.cos.ap-jakarta.myqcloud.com/public/example/example.pdf"
}'
Note:
1. --max-time 300: sets the request timeout to 300 seconds (5 minutes), and can be adjusted appropriately based on file size.
2. The response directly returns the parsing results without requiring additional queries.

Must-Knows

If authentication is involved, authentication error codes must strictly return 401/403; otherwise, it will prevent the addition of custom models.
Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback