← LLM Wiki Overview

Deep Dive: Knowledge Graph & Hybrid Search

Phân tích chi tiết từ báo cáo Tổng quan LLM Wiki — knowledge graph layer và hybrid search cho wiki scale.
Báo cáo cha: ← LLM Wiki OverviewTopic: Knowledge Graph & Hybrid SearchBenchmark: 95.2% LongMemEval-S (agentmemory)Ngày: 2026-04-20

Tổng quan Intro

Original LLM Wiki dùng index.md như primary navigation — một file catalog listing mọi page. Approach này hoạt động tốt đến ~200 pages, nhưng sau đó index trở nên quá dài để LLM đọc trong một pass.

Deep dive này cover hai layers được thêm vào khi wiki grow past 200 pages: Knowledge Graph (entity + typed relationships, thay thế flat wikilinks) và Hybrid Search (BM25 + vector + graph với RRF fusion, thay thế brute-force index.md reading).

Obsidian graph view showing interconnected entity pages as nodes with wiki-links as edges
Obsidian graph view của LLM Wiki sau 5 papers. Knowledge graph nổi lên từ [[wiki-links]] — transformer architecture linking to attention mechanism, BERT linking to fine-tuning. ↗ Data Science Dojo Tutorial

Knowledge Graph Layer Graph

1. Entity Extraction — Structured Knowledge từ Text

1
LLM Wiki v2 — Beyond flat pages
Flat pages với wikilinks để phí structure. Khi LLM ingest một source, nó không chỉ viết prose — nó extract structured entities với types, attributes, relationships. Đây là foundation của knowledge graph.

Entity types

Entity TypeVí dụKey Attributes
ConceptAttention Mechanismdefinition, properties, variants
PersonAshish Vaswanirole, affiliation, contributions
ProjectBERT, GPT-3status, owner, tech stack
LibraryPyTorch, TensorFlowversion, language, use case
DecisionUse Transformer over RNNmade_by, date, rationale, status
EventGPT-4 releasedate, impact, related entities

Entity extraction prompt

Entity extraction during ingest

MD
Before writing wiki pages, extract all entities from this source.

For each entity found, list:
- Name: (exact name, consistent with existing wiki)
- Type: Concept | Person | Project | Library | Decision | Event
- Key attributes: (3-5 most important facts)
- Relationships to other entities: (type: name)

Format:
```
Entity: Attention Mechanism
Type: Concept
Attributes:
  - Scaled dot-product of Query, Key, Value vectors
  - Computational complexity: O(n²) in sequence length
  - Enables direct dependency between any two positions
Relationships:
  - part_of: Transformer Architecture
  - enables: Long-range dependency modeling
  - variant_of: [none — this IS the base]
  - used_by: BERT, GPT series
```

Wait for my confirmation before creating pages.
Entity extraction first, then page creation. Confirm entity list với user trước khi viết pages. Đây là bước LLM dễ make mistakes nhất (duplicate entities, wrong types). Catch early.

2. Typed Relationships — Beyond Flat Wikilinks

2
LLM Wiki v2 — Typed relationships
"A relates to B" ít hữu ích hơn "A caused B (confidence 0.9, 3 sources)". Typed relationships cho phép graph traversal queries mà keyword search hoàn toàn bỏ lỡ.

Relationship taxonomy

STRUCTURAL: part_of — "Attention is part_of Transformer" contains — "Transformer contains Attention, MLP, LayerNorm" variant_of — "Multi-head Attention is variant_of Attention" DEPENDENCY: depends_on — "BERT depends_on Transformer Architecture" uses — "GPT-3 uses Byte Pair Encoding" requires — "LoRA requires pre-trained weights" TEMPORAL: supersedes — "GPT-4 supersedes GPT-3.5" (with date) evolved_from — "Llama2 evolved_from Llama1" preceded_by — "Attention preceded_by seq2seq with RNN" CAUSAL: caused — "Scaling laws caused shift to larger models" enables — "Attention enables long-range dependencies" prevents — "Gradient clipping prevents training instability" EPISTEMIC: contradicts — "GPT-3 paper contradicts Chinchilla on optimal token ratio" supports — "Scaling laws supports compute-optimal training" challenges — "MoE architecture challenges standard dense scaling"

Relationship trong entity page

wiki/entities/bert.md

MD
# BERT
**Type**: Project
**Confidence**: 0.90

## Relationships
- **depends_on**: [[transformer-architecture]] — uses encoder-only stack
- **uses**: [[masked-language-modeling]] — core pre-training objective
- **uses**: [[next-sentence-prediction]] — secondary pre-training task
- **evolved_from**: [[elmo]] — BERT replaced ELMo as SOTA on NLP benchmarks
- **superseded_by**: [[roberta]] — same arch, better training (no NSP)
- **enables**: [[transfer-learning-nlp]] — fine-tune on downstream tasks
- **contradicts**: [[gpt-architecture]] — bidirectional vs autoregressive
- **part_of**: [[pre-training-paradigm]] — foundational to modern LLMs
Ưu điểm
  • Graph traversal queries: "what depends on X?" → walk dependency edges
  • Contradiction chains: "what contradicts GPT-3?" → instant answers
  • Impact analysis: "if we deprecate Redis?" → walk used_by edges downstream
  • Timeline reconstruction: evolved_from/supersedes chains → automatic lineage
Nhược điểm
  • Typed relationships cần đồng thuận trong team/schema về taxonomy
  • LLM có thể assign wrong relationship types → cần review
  • More verbose pages → slightly slower ingest

3. Graph Traversal Queries

3
LLM Wiki v2 — Graph traversal
Graph traversal catches connections keyword search misses. "What's the impact of upgrading Redis?" không thể được trả lời bằng keyword search trên "Redis" — phải walk "used_by" edges để find tất cả components downstream.

Graph traversal query patterns

Query patternTraversalVí dụ
Impact analysisWalk used_by + depends_on edges outward"What breaks if Redis goes down?"
Lineage / provenanceWalk evolved_from + preceded_by backwards"What came before Transformer?"
Contradiction chainWalk contradicts edges"What does the wiki disagree about?"
Dependency graphWalk depends_on recursively"What does BERT need to work?"
DiscoveryWalk enables edges"What does attention mechanism enable?"

Graph query prompt

Graph traversal query

MD
Use graph traversal to answer: "What would be impacted if we deprecated the attention mechanism?"

Start at entity: [[attention-mechanism]]
Walk outward via edges: used_by, depends_on, enables

For each entity encountered:
- Note it
- Walk its outward edges (max depth: 3)

Return a dependency tree showing all impacted entities.
Format as nested list with relationship type on each edge.

Scale Path & Tooling Tools

8. Scale Breakpoints — Khi nào cần gì

8
LLM Wiki v2 — Implementation spectrum

Scale breakpoints

0–50 pages: index.md đủ tốt → LLM đọc toàn bộ index trong 1 pass (<10K tokens) → Không cần gì thêm 50–200 pages: index.md + structured format → Add confidence scores, typed relationships → LLM đọc index + drill into relevant pages → Vẫn không cần search infra 200–500 pages: BM25 → index.md quá dài để read in 1 pass → Add BM25 index (whoosh hoặc rank-bm25) → LLM query BM25 → get top-10 → read those pages 500–2000 pages: BM25 + Vector → BM25 misses semantic matches → Add vector index (FAISS + sentence-transformers) → Hybrid search với simple score fusion 2000+ pages: Full hybrid (BM25 + Vector + Graph) → Graph traversal essential cho complex queries → RRF fusion → Consider hosted solution (Weaviate, Qdrant, Pinecone)

9. Obsidian — Visual Layer cho Knowledge Graph

9
Karpathy's workflow + tutorial
Obsidian là "IDE" của LLM Wiki. Karpathy: "Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase." Graph view của Obsidian render [[wiki-links]] thành visual knowledge graph — không cần external graph database.

Obsidian workflow

Setup: 1. Install Obsidian (free, local) 2. Open folder as vault → select wiki/ 3. Press Ctrl+G → Graph View Graph View shows: - Nodes = entity pages - Edges = [[wiki-links]] between pages - Clusters = naturally formed topic groups - Isolated nodes = orphan pages (lint target!) Useful Obsidian features cho LLM Wiki: - Graph View: visual navigation - Backlinks panel: "what links here?" - Quick switcher: jump between pages - Search: full-text trong vault - Web Clipper plugin: save URLs → raw/ as markdown

Obsidian Web Clipper

Install Obsidian Web Clipper browser extension → converts bất kỳ webpage nào thành markdown và save vào raw/. Bookmarking articles trở thành ingest-ready nhanh như bookmark thông thường.

ToolRole trong LLM Wiki
ObsidianRead/browse wiki, graph view, backlinks
Claude CodeLLM agent: ingest, query, lint
Web ClipperSave web articles → raw/ as markdown
VS CodeAlternative editor, better cho large files
GitVersion control cho wiki/ (optional nhưng recommended)

Tổng kết Wrap

Knowledge graph và hybrid search không phải bước đầu tiên của LLM Wiki — chúng là bước thứ ba và thứ năm trong scaling path. Bắt đầu với index.md. Add typed relationships khi bạn bắt đầu miss connections. Add BM25 khi index.md too long. Add vectors khi semantic queries fail. Add graph traversal khi structural queries fail.

agentmemory's 95.2% trên LongMemEval-S với BM25 + vector + graph là proof rằng approach này works ở production scale. Nhưng hầu hết wikis cá nhân sẽ không cần đến tầng đó.