LightRAG — Graph RAG nhẹ, nhanh và cập nhật tăng dần

Layer 1 overview cho LightRAG HKUDS/LightRAG — bắt đầu bằng cách tiếp cận, cài đặt, ứng dụng; các phần sau đi sâu vào thành phần và kỹ thuật.

Snapshot: v1.4.15 / 64d3326Package: lightrag-hku==1.4.15Paper: EMNLP Findings 2025Ngày: 2026-04-22Deep dives: 4

Tổng quan Intro

Đây là báo cáo lớp 1. Mỗi nhóm kỹ thuật lớn có trang layer 2 riêng: Indexing, Retrieval, Storage/API, và Operations.

LightRAG là một graph-based RAG framework của HKUDS. Thay vì chỉ lưu chunks trong vector DB như NaiveRAG, LightRAG extract entity và relationship từ từng chunk, lưu graph + vector representations, rồi query bằng hai tầng: low-level cho entity cụ thể và high-level cho relationship/theme rộng hơn. Mục tiêu là giữ được context multi-hop mà không phải đi theo hướng community-report nặng như GraphRAG.

34k GitHub stars repo HKUDS/LightRAG

v1.4.15 Release đọc source 2026-04-19

11.2s Avg query time paper appendix vs GraphRAG 23.6s

39.5MB Storage test paper appendix vs GraphRAG 286.7MB

Sơ đồ kiến trúc LightRAG gồm graph-based text indexing, low-level keys, high-level keys, index graph và dual-level retrieval — Kiến trúc tổng quan: raw text được chuyển thành graph entity/relation, sau đó retrieval dùng cả low-level entity keys và high-level relation/theme keys. ↗ HKUDS/LightRAG README

Điểm đáng học: LightRAG không chỉ là một app RAG có WebUI. Đây là một thiết kế storage + retrieval đáng copy: tách KV/vector/graph/doc-status, query context 4 stage, default mix mode, và pipeline vận hành có concurrency/cancellation/status rõ ràng.

Cách tiếp cận Approach

LightRAG giải quyết một trade-off phổ biến trong RAG: flat vector chunks nhanh và đơn giản nhưng yếu với câu hỏi cần quan hệ nhiều bước; GraphRAG giàu ngữ cảnh nhưng community traversal và report regeneration có chi phí cao. LightRAG chọn hướng trung gian: xây graph nhỏ hơn, query bằng vector search trên entity/relation keys, rồi chỉ lấy subgraph/chunks cần thiết.

NaiveRAG: Query -> Vector chunks -> LLM Fast, simple, but weak at multi-hop synthesis GraphRAG: Query -> Community reports -> LLM Rich global summaries, but expensive to retrieve/update LightRAG: Query -> low-level entity keys + high-level relation keys + vector chunks -> LLM Graph-aware, incremental, and cheaper than report traversal

Khía cạnh	NaiveRAG	GraphRAG	LightRAG
Index artifact	Vector chunks	Entity graph + community reports	Entity/relation graph + vector stores
Query target	Nearest chunks	Community summaries + graph traversal	Entity keys, relation keys, chunks
Update mới	Add chunks	Có thể phải rebuild community reports	Merge incremental entities/relations
Điểm mạnh	Đơn giản, latency thấp	Global summaries tốt	Cân bằng local detail và global relation
Điểm yếu	Mất relation structure	Chi phí retrieval/update cao	Phụ thuộc LLM extraction và embedding/rerank quality

Cách đọc LightRAG

Đừng đọc như một vector DB wrapper; hãy đọc như graph artifact builder cộng với query context compiler.
Nếu corpus không có entity/relation đáng kể, LightRAG có thể là overkill so với NaiveRAG.
Nếu corpus thay đổi liên tục, incremental merge là lý do chính để cân nhắc LightRAG thay GraphRAG.

Cài đặt Setup

Có hai cách dùng chính: chạy server/API/WebUI như một dịch vụ độc lập, hoặc embed LightRAG Core vào ứng dụng Python. Repo hiện khuyến nghị uv, nhưng vẫn hỗ trợ pip.

1
Chọn mode triển khai
Dùng server nếu muốn WebUI, API, document upload, graph exploration và Ollama-compatible endpoints. Dùng core nếu bạn đang viết app Python hoặc nghiên cứu thuật toán.
```
uv tool install "lightrag-hku[api]"
# hoặc: uv pip install lightrag-hku
```
2
Tạo cấu hình model/storage
Dùng wizard để tạo .env thay vì sửa tay toàn bộ. Bước bắt buộc đầu tiên là LLM, embedding và reranker.
```
make env-base
make env-storage
make env-server
make env-security-check
```
3
Chạy server hoặc core
Server đọc .env ở startup directory; core cần initialize_storages trước khi insert/query.
```
lightrag-server
# production non-Windows: lightrag-gunicorn --workers 4
```

Core API tối giản

import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed

async def main():
    rag = LightRAG(
        working_dir="./rag_storage",
        llm_model_func=gpt_4o_mini_complete,
        embedding_func=openai_embed,
    )
    await rag.initialize_storages()
    try:
        await rag.ainsert("Your text")
        answer = await rag.aquery(
            "What are the top themes?",
            param=QueryParam(mode="mix"),
        )
        print(answer)
    finally:
        await rag.finalize_storages()

asyncio.run(main())

Không đổi embedding model sau khi index. Vector dimension thường nằm trong schema table/collection. Nếu đổi từ bge-m3 sang text-embedding-3-large, cần recreate vector storage thay vì chỉ đổi env var.

Ứng dụng Use cases

LightRAG phù hợp nhất khi câu hỏi cần tổng hợp entity/relation qua nhiều tài liệu. Nếu chỉ cần hỏi đáp một vài file ngắn, chi phí graph extraction có thể không đáng. Nếu tài liệu dài, nhiều quan hệ, và cần cập nhật tăng dần, LightRAG có lợi thế rõ.

Use case	Vì sao hợp	Lưu ý
Legal	Hợp đồng, compliance, governance	Multi-hop
Tech Docs	Tài liệu kiến trúc, runbook, dependency graph	Ops
Research	Paper corpus, entity/concept relations, literature review	Research
Product	FAQ + docs + changelog cần cập nhật liên tục	Incremental

Ưu điểm

Tốt cho câu hỏi cần liên kết nhiều entity/relation thay vì chỉ retrieve top-k chunks.
Có server/WebUI sẵn để thử nhanh, graph exploration, upload và query.
Storage backend đa dạng: local JSON/NetworkX cho dev, PostgreSQL/Neo4j/Milvus/Qdrant/OpenSearch cho production.
Có đường tiến hóa rõ: rerank, references, Langfuse tracing, RAGAS evaluation, RAG-Anything multimodal.

Nhược điểm

Indexing tốn LLM calls và nhạy với chất lượng extraction.
Model nhỏ hoặc reasoning model dùng sai stage có thể tạo graph noisy hoặc chậm.
Cần quản trị schema embedding/storage nghiêm túc để tránh mismatch khi vận hành.
Không thay thế exact citation pipeline nếu yêu cầu audit từng claim ở mức chunk/source nghiêm ngặt.

Bảng tóm tắt kỹ thuật Overview

ID	Kỹ thuật	Theme
T1	Graph-based text indexing	Indexing
T2	Incremental graph update	Indexing
T3	Dual-level retrieval	Retrieval
T4	Mix mode: KG + vector chunks	Retrieval
T5	Token budget + context compiler	Context
T6	4-way storage abstraction	Storage
T7	Server/API/WebUI + setup wizard	API
T8	Rerank + references	Quality
T9	Concurrent control hierarchy	Ops
T10	Observability, cache, evaluation	Ops

Thành phần và kỹ thuật chuyên sâu Deep

1. Graph-based text indexing

Paper §3.1; Source: lightrag/operate.py

Đây là khác biệt nền tảng so với NaiveRAG. Chunks không chỉ được embed; chúng được dùng để tạo entity nodes và relation edges. Retrieval sau đó có thể tìm theo semantic key nhưng vẫn giữ được structure của knowledge graph.

Pipeline gồm chunking, entity/relation extraction, parse output, merge duplicate nodes/edges, và upsert vào cả graph store lẫn vector stores. Entity dùng tên làm key; relation có keywords và description để hỗ trợ high-level retrieval.

Flowchart indexing LightRAG minh họa document splitting, entity relationship extraction, JSON KV store, vector database và graph — Indexing flow: document được chia chunk, LLM extract entity/relation, sau đó ghi vào KV store, vector DB và knowledge graph. ↗ LearnOpenCV

Chunking theo token size — lightrag/operate.py

def chunking_by_token_size(
    tokenizer,
    content,
    split_by_character=None,
    split_by_character_only=False,
    chunk_overlap_token_size=100,
    chunk_token_size=1200,
):
    tokens = tokenizer.encode(content)
    results = []
    for index, start in enumerate(
        range(0, len(tokens), chunk_token_size - chunk_overlap_token_size)
    ):
        chunk_content = tokenizer.decode(tokens[start : start + chunk_token_size])
        results.append({
            "tokens": min(chunk_token_size, len(tokens) - start),
            "content": chunk_content.strip(),
            "chunk_order_index": index,
        })
    return results

Ưu điểm

Giữ được relation structure thay vì chỉ similarity chunks.
Entity/relation có thể query độc lập qua vector stores.
Graph artifact có thể visualize, export và rebuild một phần.

Nhược điểm

Indexing phụ thuộc LLM extraction; model yếu tạo noisy graph.
Chi phí insert cao hơn NaiveRAG vì phải gọi LLM trên chunks.
Cần kiểm soát entity types, delimiter và source metadata.

Tham khảo

Phân tích sâu: indexing pipeline, extraction parser, merge và incremental update

2. Incremental graph update

Paper §3.1; Source: lightrag/lightrag.py + operate.py

Incremental update là lý do LightRAG nhẹ hơn GraphRAG trong corpus động. Document mới được extract thành graph fragment rồi merge vào graph hiện có; không cần rebuild community reports toàn cục.

Trong source, insert không ghi thẳng vào graph một cách ad-hoc. Nó đi qua queue, status storage, pipeline lock và merge helpers. Khi document deletion xảy ra, các entity/relation bị ảnh hưởng được rebuild từ chunks/cache còn lại để tránh stale source.

New document -> chunk -> extract entities/relations -> merge source_ids + descriptions -> upsert graph nodes/edges -> upsert entity/relationship vectors -> update doc_status

Không có free lunch. Incremental graph merge giảm chi phí rebuild, nhưng nếu extraction tạo entity aliases không nhất quán, graph vẫn bị phân mảnh. Production cần prompt/schema entity types rõ và có workflow quan sát graph.

3. Dual-level retrieval

Paper §3.2; Source: lightrag/operate.py

Dual-level retrieval là phần biến graph thành query advantage. Low-level tập trung vào entity cụ thể; high-level tập trung vào relationship/theme. Full LightRAG kết hợp cả hai để vừa có depth vừa có breadth.

Paper mô tả low-level retrieval cho câu hỏi cụ thể như entity facts, còn high-level retrieval cho câu hỏi abstract. Source hiện thực hóa bằng ll_keywords và hl_keywords: local branch query entities_vdb, global branch query relationships_vdb.

QueryParam modes — lightrag/base.py

@dataclass
class QueryParam:
    mode: Literal["local", "global", "hybrid", "naive", "mix", "bypass"] = "mix"
    only_need_context: bool = False
    only_need_prompt: bool = False
    response_type: str = "Multiple Paragraphs"
    stream: bool = False
    top_k: int = int(os.getenv("TOP_K", str(DEFAULT_TOP_K)))
    chunk_top_k: int = int(os.getenv("CHUNK_TOP_K", str(DEFAULT_CHUNK_TOP_K)))
    enable_rerank: bool = os.getenv("RERANK_BY_DEFAULT", "true").lower() == "true"
    include_references: bool = False

Ưu điểm

Local branch trả lời tốt câu hỏi entity/detail.
Global branch mở rộng sang relationships và themes.
Hybrid/mix cân bằng breadth/depth tốt hơn single mode.

Nhược điểm

Keyword extraction sai làm retrieval lệch hướng.
High-level relation VDB phụ thuộc relation keywords do LLM tạo.
Cần tune top_k/chunk_top_k/token budget theo corpus.

Phân tích sâu: query modes, keyword extraction, mix mode, rerank và context build

4. Mix mode: KG + vector chunks

Source: lightrag/operate.py::_perform_kg_search

mix là default thực dụng. Nó không bắt người dùng chọn giữa graph và chunks: local/global graph search lấy entities/relations, direct vector search lấy chunks có semantic match cao, rồi merge/dedupe trước khi build prompt.

Flowchart retrieval LightRAG minh họa query, low-level retrieval, high-level retrieval, graph context, text chunks và response generation — Retrieval flow: query được tách thành low-level/high-level keys, kết hợp graph retrieval và vector retrieval trước khi đưa context vào LLM. ↗ LearnOpenCV

Mix branch trong search — lightrag/operate.py

if query_param.mode == "mix" and chunks_vdb:
    vector_chunks = await _get_vector_context(
        query,
        chunks_vdb,
        query_param,
        query_embedding,
    )
    for i, chunk in enumerate(vector_chunks):
        chunk_id = chunk.get("chunk_id") or chunk.get("id")
        if chunk_id:
            chunk_tracking[chunk_id] = {
                "source": "C",
                "frequency": 1,
                "order": i + 1,
            }

5. Token budget + context compiler

Source: lightrag/operate.py::_build_query_context

Retrieval chỉ hữu ích nếu context cuối cùng vừa đủ và sạch. LightRAG không nối tất cả entity/relation/chunk vào prompt. Nó search, truncate, merge chunks, rồi build context theo token budget.

_build_query_context() 1. _perform_kg_search() 2. _apply_token_truncation() 3. _merge_all_chunks() 4. _build_context_str()

Stage 4 tính trước system prompt tokens, KG context tokens, query tokens và buffer. Phần token còn lại mới cấp cho chunks. Đây là pattern đáng học: context compiler phải biết overhead, không chỉ truncate theo một constant cố định.

Tham khảo

6. 4-way storage abstraction

Source: lightrag/lightrag.py; docs/ProgramingWithCore.md

LightRAG production readiness nằm nhiều ở storage abstraction. Nó không hard-code một backend. KV, vector, graph và doc-status có thể chọn backend khác nhau.

Storage role	Default dev	Production options	Dùng để làm gì
KV	`JsonKVStorage`	PostgreSQL, Redis, MongoDB, OpenSearch	Full docs, text chunks, caches, entity/relation metadata
Vector	`NanoVectorDBStorage`	PGVector, Milvus, Qdrant, Faiss, MongoVector, OpenSearch	Entities, relationships, chunks semantic search
Graph	`NetworkXStorage`	Neo4j, PostgreSQL AGE, Memgraph, OpenSearch	Nodes, edges, degree, traversal, subgraph
Doc status	`JsonDocStatusStorage`	PostgreSQL, MongoDB, OpenSearch	Pending/processing/processed/failed và pipeline monitoring

Storage objects được tạo trong __post_init__ — lightrag/lightrag.py

self.text_chunks = self.key_string_value_json_storage_cls(
    namespace=NameSpace.KV_STORE_TEXT_CHUNKS,
    workspace=self.workspace,
    embedding_func=self.embedding_func,
)
self.chunk_entity_relation_graph = self.graph_storage_cls(
    namespace=NameSpace.GRAPH_STORE_CHUNK_ENTITY_RELATION,
    workspace=self.workspace,
    embedding_func=self.embedding_func,
)
self.entities_vdb = self.vector_db_storage_cls(
    namespace=NameSpace.VECTOR_STORE_ENTITIES,
    workspace=self.workspace,
    embedding_func=self.embedding_func,
    meta_fields={"entity_name", "source_id", "content", "file_path"},
)

Phân tích sâu: storage backend matrix, API server, WebUI, Docker và setup wizard

7. Server/API/WebUI + setup wizard

docs/LightRAG-API-Server.md; docs/InteractiveSetup.md

WebUI biến LightRAG từ library thành deployable service. Nó hỗ trợ document indexing, knowledge graph exploration, query interface và Ollama-compatible endpoint cho các chatbot khác.

Screenshot LightRAG WebUI hiển thị graph exploration và giao diện quản lý/query — LightRAG Server cung cấp WebUI để upload/index documents, quan sát knowledge graph và chạy RAG query. ↗ HKUDS/LightRAG README

Setup wizard mới trong 2026.03. Thay vì sửa env.example thủ công, có thể chạy make env-base, make env-storage, make env-server và make env-security-check.

8. Rerank + references

Source: lightrag/rerank.py; docs/AdvancedFeatures.md

Rerank là quality gate sau retrieval. Graph/vector retrieval lấy candidate rộng; reranker giúp giảm noise, đặc biệt với mix mode và corpus có chunks dài.

`rerank.py` có helper chunk documents vượt token limit trước khi gọi rerank API, rồi aggregate scores về document gốc bằng max, mean hoặc first. Đây là chi tiết nhỏ nhưng quan trọng khi reranker có max 512 tokens/doc.

Rerank long documents — lightrag/rerank.py

def aggregate_chunk_scores(chunk_results, doc_indices, num_original_docs, aggregation="max"):
    doc_scores = {i: [] for i in range(num_original_docs)}
    for result in chunk_results:
        chunk_idx = result["index"]
        score = result["relevance_score"]
        if 0 <= chunk_idx < len(doc_indices):
            doc_scores[doc_indices[chunk_idx]].append(score)

    aggregated_results = []
    for doc_idx, scores in doc_scores.items():
        if not scores:
            continue
        final_score = max(scores) if aggregation == "max" else sum(scores) / len(scores)
        aggregated_results.append({"index": doc_idx, "relevance_score": final_score})
    return sorted(aggregated_results, key=lambda x: x["relevance_score"], reverse=True)

9. Concurrent control hierarchy

docs/LightRAG_concurrent_explain.md; lightrag/lightrag.py

Concurrency trong graph RAG dễ tạo bottleneck và conflict. LightRAG tách control theo document, chunk, graph merge và global LLM queue để tránh document processing chiếm hết tài nguyên query.

Document level: max_parallel_insert Chunk extraction level: per-document semaphore = llm_model_max_async Graph merge/rebuild level: graph_max_async ~= llm_model_max_async * 2 LLM level: global prioritized queue user query > merge > extraction

Ưu điểm

Có pipeline_status để monitor busy/request_pending/cancellation.
Query user được ưu tiên hơn extraction/merge.
Giới hạn max_parallel_insert giảm conflict entity naming giữa files.

Nhược điểm

Set concurrency quá cao không chắc nhanh hơn vì LLM queue vẫn là bottleneck.
Parallel document insert tăng chi phí rollback/retry nếu nhiều file đang midway.
Local LLM cần tính context length và throughput trước khi nâng MAX_ASYNC.

Phân tích sâu: concurrency, deletion/rebuild, cache, tracing và failure modes

10. Observability, cache, evaluation

T10

docs/AdvancedFeatures.md

Graph RAG khó debug nếu không thấy token, cache, trace và retrieved context. LightRAG có các mảnh cần thiết để vận hành: TokenTracker, export graph, cache tools, Langfuse và RAGAS evaluation.

Capability	Dùng khi nào	Ghi chú
TokenTracker	Theo dõi token usage khi insert/query batch	Hữu ích để phát hiện extraction cost bất thường
Graph export	Backup, audit, phân tích ngoài LightRAG	CSV, Excel, Markdown, TXT
Cache management	Debug query cache/extraction cache	`aclear_cache()` xóa toàn llm_response_cache; query cache có tool riêng
Langfuse	Trace LLM calls OpenAI-compatible	Chưa cover mọi backend như Ollama/Azure/AWS Bedrock
RAGAS	Đánh giá RAG quality/context precision	Có script evaluation riêng trong repo

Decision matrix Decision

Tình huống	Nên dùng LightRAG?	Lý do
Corpus nhỏ, Q&A đơn giản	Không ưu tiên	NaiveRAG hoặc full-text/vector search đơn giản đủ và rẻ hơn.
Corpus nhiều entity/relation	Có	Graph indexing giúp câu hỏi multi-hop tốt hơn top-k chunks.
Data cập nhật liên tục	Có	Incremental graph merge tránh rebuild toàn bộ community reports.
Yêu cầu exact citation strict	Cân nhắc	LightRAG có references, nhưng cần kiểm soát audit pipeline riêng nếu compliance nghiêm ngặt.
Team chưa có LLM/embedding ops	Prototype trước	Dễ chạy demo, nhưng production cần quản lý model, storage, cache, trace và backup.

Tổng kết Wrap

LightRAG đáng nghiên cứu vì nó gom nhiều bài học engineering của graph RAG vào một implementation tương đối gọn: graph-based indexing, dual-level retrieval, mix mode, storage abstraction và pipeline vận hành có concurrency/status rõ ràng. Điểm mạnh chính là corpus động và câu hỏi multi-hop; điểm yếu chính là chi phí/chất lượng extraction.

Kết luận thực dụng

Dùng LightRAG khi corpus có quan hệ rõ và cần cập nhật tăng dần.
Bắt đầu bằng server/WebUI để quan sát graph trước khi embed core vào app.
Production phải khóa embedding model, setup trace/evaluation, và tune concurrency theo LLM backend.
Nếu chỉ cần chatbot hỏi tài liệu ngắn, LightRAG là dư thừa.

Deep dives

Tham khảo