tencent cloud

TencentDB for PostgreSQL

DocumentationTencentDB for PostgreSQLAI CapabilitiesPractical TutorialBuilding a GraphRAG Knowledge Graph Application Based on Apache AGE

Building a GraphRAG Knowledge Graph Application Based on Apache AGE

Download
Focus Mode
Font Size
Last updated: 2026-06-08 10:32:14

Overview

GraphRAG is an advanced search paradigm that combines knowledge graphs with RAG. It constructs a knowledge graph through entity extraction, stores it in Apache AGE, and integrates vector search using pgvector. This approach enables more accurate multi-hop reasoning capabilities compared to pure vector-based RAG.

GraphRAG vs. Traditional RAG

Level
Traditional RAG
GraphRAG
Search Methods
Pure vector similarity
Graph traversal + vector hybrid
Multi-hop Reasoning
Not supported
Reasoning via graph relationships
Entity Relationships
Implicit in text
Explicitly stored as a graph schema
Global Comprehension
Limited to a single chunk
Obtains a global perspective via graphs
Applicable Scenario
Simple Q&A
Complex reasoning, relationship querying

Architecture Design

┌──────────────────────────────────────────────────────┐
│ GraphRAG System │
├──────────────────────────────────────────────────────┤
│ │
│ Document Input → Entity Extraction → Graph Construction → Index Creation │
│ │
User Query → Entity Recognition → Graph Search + Vector Search → Fusion Generation │
│ │
├──────────────────────────────────────────────────────┤
│ Storage Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ Apache AGE │ │ pgvector │ │ Relational Table │ │
│ │ (Knowledge Graph) │ │ (Document Vector) │ │ (Metadata) │ │
│ └──────────────┘ └──────────────┘ └───────────┘ │
│ │
│ AI Layer │
│ ┌─────────────────────────────────────────────┐ │
│ │ tencentdb_ai (Entity Extraction / Relation Recognition / Generation) │ │
│ └─────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘

Database Preparation

-- Create the extension
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS age;

-- Set search_path
SET search_path = ag_catalog, "$user", public;

-- Create a knowledge graph
SELECT create_graph('knowledge_graph');

-- Document storage table
CREATE TABLE graphrag.documents (
id BIGSERIAL PRIMARY KEY,
title TEXT,
content TEXT NOT NULL,
embedding vector(1024),
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX ON graphrag.documents
USING hnsw (embedding vector_cosine_ops);

-- Entity table (auxiliary index)
CREATE TABLE graphrag.entities (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
entity_type TEXT NOT NULL,
description TEXT,
embedding vector(1024),
source_doc_id BIGINT REFERENCES graphrag.documents(id),
UNIQUE(name, entity_type)
);

CREATE INDEX ON graphrag.entities
USING hnsw (embedding vector_cosine_ops);

Core Process

Note:
The vector dimension must match the output dimension of the large model you are using. Common model dimensions: OpenAI text-embedding-3-small (1536 dimensions), BGE-M3 (1024 dimensions), ChatGLM Embedding (1024 dimensions).

Entity Extraction

-- Use a large model to extract entities and relations from documents
CREATE OR REPLACE FUNCTION graphrag.extract_entities(doc_content TEXT)
RETURNS JSONB AS $$
DECLARE
result TEXT;
BEGIN
SELECT tencentdb_ai.chat_completions(
'<your-llm-model>',
'Extract entities and relations from the following text and return them in JSON format.' ||
'Format: {"entities": [{"name": "entity name", "type": "type", "description": "description"}], ' ||
"relations": [{"source": "source entity", "target": "target entity", "relation": "relation type", "description": "description"}]}' ||
E'\\n\\nText: ' || doc_content
) INTO result;

RETURN result::JSONB;
END;
$$ LANGUAGE plpgsql;

Knowledge Graph Construction (Including Security Verification)

The entity.type / relation.relation from the LLM are used as Cypher Tags in queries. You must validate them against an allowlist to prevent arbitrary strings from entering Cypher:
-- Allowlist for permitted entity types and relation types
CREATE TABLE IF NOT EXISTS graphrag.allowed_entity_types (type TEXT PRIMARY KEY);
CREATE TABLE IF NOT EXISTS graphrag.allowed_relation_types (type TEXT PRIMARY KEY);

INSERT INTO graphrag.allowed_entity_types(type) VALUES
('Person'), ('Company'), ('Product'), ('Technology'), ('Concept')
ON CONFLICT DO NOTHING;

INSERT INTO graphrag.allowed_relation_types(type) VALUES
('WORKS_AT'), ('USES'), ('HAS_FEATURE'), ('RELATED_TO'), ('PART_OF')
ON CONFLICT DO NOTHING;

-- Store the extracted entities in the graph
CREATE OR REPLACE FUNCTION graphrag.build_graph(doc_id BIGINT)
RETURNS VOID AS $$
DECLARE
doc_content TEXT;
extracted JSONB;
entity JSONB;
relation JSONB;
e_type TEXT;
r_type TEXT;
BEGIN
SELECT content INTO doc_content FROM graphrag.documents WHERE id = doc_id;
extracted := graphrag.extract_entities(doc_content);

-- Create entity nodes (with allowlist validation for types)
FOR entity IN SELECT * FROM jsonb_array_elements(extracted->'entities')
LOOP
e_type := entity->>'type';
IF NOT EXISTS (SELECT 1 FROM graphrag.allowed_entity_types WHERE type = e_type) THEN
RAISE NOTICE 'Skip entity with disallowed type: %', e_type;
CONTINUE;
END IF;

PERFORM * FROM cypher('knowledge_graph', format($$
MERGE (n:%I {name: %L})
SET n.description = %L, n.doc_id = %s
$$, e_type, entity->>'name', entity->>'description', doc_id))
AS (v agtype);
END LOOP;

-- Create relations (with allowlist validation for relation types)
FOR relation IN SELECT * FROM jsonb_array_elements(extracted->'relations')
LOOP
r_type := relation->>'relation';
IF NOT EXISTS (SELECT 1 FROM graphrag.allowed_relation_types WHERE type = r_type) THEN
RAISE NOTICE 'Skip relation with disallowed type: %', r_type;
CONTINUE;
END IF;

PERFORM * FROM cypher('knowledge_graph', format($$
MATCH (a {name: %L}), (b {name: %L})
MERGE (a)-[r:%I]->(b)
SET r.description = %L
$$, relation->>'source', relation->>'target',
r_type, relation->>'description'))
AS (r agtype);
END LOOP;
END;
$$ LANGUAGE plpgsql;
Key Points:
Node Tags / Relation Types: Use dual validation with %I safe identifiers and an allowlist.
Attribute values: Use %L for safe escaping.
If it fails the allowlist check: skip it directly, log the event, and do not allow any string to enter the Cypher statement.

GraphRAG Search

-- Hybrid Search: Graph Traversal + Vector Search
CREATE OR REPLACE FUNCTION graphrag.hybrid_retrieve(query TEXT, top_k INT DEFAULT 5)
RETURNS TABLE(source TEXT, content TEXT, score FLOAT) AS $$
BEGIN
-- Vector Search Documentation
RETURN QUERY
SELECT 'vector'::TEXT, d.content,
(1 - (d.embedding <=> tencentdb_ai.get_embedding('bge-m3', query)))::FLOAT
FROM graphrag.documents d
ORDER BY d.embedding <=> tencentdb_ai.get_embedding('bge-m3', query)
LIMIT top_k;

-- Graph Search for Relevant Entities
RETURN QUERY
SELECT 'graph'::TEXT,
(a->>'name' || ' --[' || r->>'relation' || ']--> ' || b->>'name')::TEXT,
0.8::FLOAT
FROM cypher('knowledge_graph', format($$
MATCH (a)-[r]->(b)
WHERE a.name CONTAINS %L OR b.name CONTAINS %L
RETURN properties(a) as a, properties(r) as r, properties(b) as b
LIMIT %s
$$, query, query, top_k)) AS (a agtype, r agtype, b agtype);
END;
$$ LANGUAGE plpgsql;

Generating Answers

-- GraphRAG Q&A
CREATE OR REPLACE FUNCTION graphrag.answer(question TEXT)
RETURNS TEXT AS $$
DECLARE
context TEXT;
answer TEXT;
BEGIN
SELECT string_agg('[' || source || '] ' || content, E'\\n')
INTO context
FROM graphrag.hybrid_retrieve(question, 5);

SELECT tencentdb_ai.chat_completions(
'<your-llm-model>',
Answer the question based on the following knowledge graph information and document content. Mark knowledge graph information as [graph] and document content as [vector].
E'\\n\\nContext:\\n' || COALESCE(context, 'None') ||
E'\\n\\nQuestion: ' || question
) INTO answer;

RETURN answer;
END;
$$ LANGUAGE plpgsql;

Usage Examples

-- Import Documents and Build Graph
INSERT INTO graphrag.documents (title, content, embedding)
VALUES (
'PostgreSQL and AI',
TencentDB for PostgreSQL supports directly invoking large models within SQL through the tencentdb_ai plugin...
tencentdb_ai.get_embedding('bge-m3', 'Tencent Cloud PostgreSQL supports directly calling large models within SQL through the tencentdb_ai plugin')
);

-- Build Graph
SELECT graphrag.build_graph(1);

-- GraphRAG Q&A
SELECT graphrag.answer('What AI capabilities does PostgreSQL support?');

Best Practices

1. Entity Deduplication: Use MERGE to avoid duplicate entities.
2. Incremental Update: New documents are only built incrementally, not rebuilt in full.
3. Graph Index: Create an index for frequently queried attributes.
4. Community Detection: Document clustering is performed using graph algorithms.
5. Hybrid Recall: Graph context + vector documents complement each other.
6. Allowlist Protection: Entity types / relation types must be validated against the allowlist to prevent arbitrary string injection from LLMs.

References

Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback