Deep Dive: LightRAG Retrieval & Query Modes

Phân tích chi tiết từ báo cáo Tổng quan LightRAG — từ user query đến final prompt/context.

Báo cáo cha: ← LightRAG OverviewTopic: RetrievalDefault mode: mixNgày: 2026-04-22

Tổng quan Intro

Retrieval là phần LightRAG biến graph artifact thành context cho LLM. Query được phân tích thành low-level keywords và high-level keywords, tìm candidates trong entity/relationship vector stores, bổ sung neighbors/chunks từ graph, rồi compile thành context có token budget.

LightRAG retrieval flowchart mô tả query, low-level retrieval, high-level retrieval, knowledge graph, text chunks và answer generation — Retrieval flow: query không đi thẳng vào chunks; nó đi qua keyword extraction, entity/relation retrieval, graph context và vector chunks. ↗ LearnOpenCV

Docs và source có một khác biệt đáng chú ý: một số docs cũ mô tả default mode là global, nhưng source snapshot v1.4.15 đặt QueryParam.mode = "mix". Khi viết app, hãy kiểm tra source/package version thay vì chỉ dựa vào snippet cũ.

Query modes Modes

Mode	Retrieval source	Nên dùng khi nào	Rủi ro
`naive`	Direct vector chunks	Baseline nhanh, corpus nhỏ, debug vector search	Bỏ qua entity/relation graph
`local`	Entity VDB + related graph data	Câu hỏi detail về entity cụ thể	Có thể thiếu broader context
`global`	Relationship VDB + related entities	Câu hỏi topic/theme rộng	Có thể thiếu fact cụ thể
`hybrid`	Local + global graph branches	Cân bằng entity và relationship	Không lấy direct vector chunks như mix
`mix`	Hybrid graph + direct vector chunks	Default thực dụng cho production	Nhiều candidates hơn, cần rerank/token control
`bypass`	Không retrieval	Test LLM hoặc custom prompt	Không grounded vào corpus

Kiến trúc LightRAG cho thấy low-level keys và high-level keys trong dual-level retrieval — Dual-level retrieval trong architecture diagram: entity keys phục vụ local detail, relation/theme keys phục vụ global abstraction. ↗ HKUDS/LightRAG README

Retrieval internals Code

1. Keyword extraction tách low-level và high-level intent

R.1

lightrag/operate.py:get_keywords_from_query

LightRAG không search bằng raw query duy nhất. Nó chuyển query thành hai nhóm keywords để điều hướng retrieval: low-level cho entity cụ thể, high-level cho relationships/themes.

User query: "How do electric vehicles affect urban air quality and transportation infrastructure?" Low-level keywords: electric vehicles, urban air quality, transportation infrastructure High-level keywords: environmental impact, infrastructure planning, policy trade-offs

Ưu điểm

Tách intent giúp local/global branches hoạt động đúng vai trò.
Có thể truyền sẵn hl_keywords/ll_keywords trong QueryParam để debug.
Giảm phụ thuộc vào một vector query duy nhất.

Nhược điểm

Thêm LLM call trước retrieval nếu keywords không được truyền sẵn.
Keyword extraction sai sẽ kéo sai candidate graph.
Cần prompt/model đủ tốt cho domain-specific terminology.

2. Local/global branches truy vấn entity và relationship VDB

R.2

lightrag/operate.py:_perform_kg_search

Graph retrieval vẫn bắt đầu bằng vector search. LightRAG không brute-force graph. Nó dùng vector DB để chọn entity/relation candidates, sau đó mới dùng graph để mở rộng context.

Branch selection trong _perform_kg_search

if query_param.mode == "local" and len(ll_keywords) > 0:
    local_entities, local_relations = await _get_node_data(
        ll_keywords,
        knowledge_graph_inst,
        entities_vdb,
        query_param,
        query_embedding=ll_embedding,
    )
elif query_param.mode == "global" and len(hl_keywords) > 0:
    global_relations, global_entities = await _get_edge_data(
        hl_keywords,
        knowledge_graph_inst,
        relationships_vdb,
        query_param,
        query_embedding=hl_embedding,
    )
else:
    # hybrid or mix: run both branches

Tối ưu đáng chú ý: source batch pre-compute embeddings cho query, low-level keywords và high-level keywords trong một call nếu cần. Điều này tránh 2-3 round-trip embedding tuần tự.

3. Mix mode thêm vector chunks và round-robin merge

R.3

lightrag/operate.py:3694-4053

mix giữ lại lợi ích của NaiveRAG. Dù graph có ích, nhiều câu hỏi vẫn cần đoạn text raw. Mix mode lấy thêm chunks từ chunks_vdb, rồi merge với chunks liên quan đến entities và relations.

Round-robin merge chunks từ ba nguồn

max_len = max(len(vector_chunks), len(entity_chunks), len(relation_chunks))
for i in range(max_len):
    if i < len(vector_chunks):
        # source C: direct vector chunk
        merged_chunks.append(vector_chunks[i])
    if i < len(entity_chunks):
        # source E: entity-related chunk
        merged_chunks.append(entity_chunks[i])
    if i < len(relation_chunks):
        # source R: relation-related chunk
        merged_chunks.append(relation_chunks[i])

Source	Trong log	Ý nghĩa
Direct vector chunks	`C`	Semantic chunks từ raw query, giống NaiveRAG branch
Entity-related chunks	`E`	Chunks liên quan đến entity candidates từ local branch
Relation-related chunks	`R`	Chunks liên quan đến relation candidates từ global branch

4. Context compiler áp token budget theo stage

R.4

lightrag/operate.py:_build_query_context, _build_context_str

Candidate nhiều không đồng nghĩa context tốt. LightRAG tách search và context build. Sau khi có entity/relation/chunk candidates, nó truncate theo token budget rồi build prompt cuối.

4-stage query context architecture

# _build_query_context()
search_result = await _perform_kg_search(...)
truncation_result = await _apply_token_truncation(search_result, query_param, ...)
merged_chunks = await _merge_all_chunks(
    filtered_entities=truncation_result["filtered_entities"],
    filtered_relations=truncation_result["filtered_relations"],
    vector_chunks=search_result["vector_chunks"],
    ...
)
context, raw_data = await _build_context_str(
    truncation_result["entities_context"],
    truncation_result["relations_context"],
    merged_chunks,
    query,
    query_param,
    ...
)

Debug tip: dùng only_need_context=True để xem context thật trước khi LLM generate answer. Đây là cách nhanh nhất để biết lỗi nằm ở retrieval hay generation.

5. Rerank và references là quality/trace layer

R.5

lightrag/rerank.py; docs/AdvancedFeatures.md

Rerank giúp mix mode không bị quá rộng. Khi retrieval lấy candidates từ graph và chunks, reranker có thể lọc lại chunks trước khi đưa vào LLM. References giúp response có trace về source path.

LightRAG WebUI screenshot hiển thị graph/query interaction — WebUI/API là nơi hữu ích để kiểm tra retrieved context và graph query behavior trước khi nhúng vào app. ↗ LightRAG API Server docs

Rerank chunking guard

if enable_chunking:
    documents, doc_indices = chunk_documents_for_rerank(
        documents, max_tokens=max_tokens_per_doc
    )
    if top_n is not None:
        top_n = None  # aggregate document scores after chunk-level rerank

Failure modes Pitfalls

Failure mode	Triệu chứng	Cách xử lý
Keyword extraction lệch	Context trả entity/relation không liên quan	Log/inspect `hl_keywords`/`ll_keywords`, thử truyền keywords thủ công
Entity aliases phân mảnh	Graph có nhiều node gần trùng nghĩa	Tighten entity types/prompt; review graph bằng WebUI; cân nhắc normalization ngoài pipeline
Chunks chiếm hết context	Answer giống NaiveRAG, ít relation reasoning	Giảm `chunk_top_k`, bật rerank, tăng budget cho entity/relation
Relation VDB noisy	High-level query trả broad nhưng shallow	Tune extraction prompt/model; kiểm tra relation keywords và description
Embedding mismatch	Query lỗi schema/dimension hoặc trả kết quả rỗng	Không đổi embedding sau index; recreate vector storage nếu đổi model

Tổng kết Wrap

Retrieval takeaways

mix là mode nên bắt đầu vì kết hợp graph và vector chunks.
LightRAG query context là pipeline 4 stage; debug từng stage sẽ nhanh hơn debug answer cuối.
Rerank không phải garnish; với corpus lớn, nó là quality gate để giảm noise.
Nếu query cần exact audit, luôn inspect references/raw context trước khi tin output.

Tham khảo