memex — Signal & Noise Analysis

The Core Idea

Instead of making the LLM rediscover knowledge from raw documents on every query — the RAG way — Karpathy proposes having the LLM compile a structured, interlinked wiki once at ingest time. Knowledge accumulates. The LLM maintains the wiki, not the human.

Architecture

Three Layers

Layer 1

raw/

PDFs, articles, web clips. Immutable. Human adds, LLM never modifies.

→

Process

🤖 LLM

Reads sources. Synthesizes, links, and compiles structured pages. Runs lint checks.

→

Layer 2

wiki/

Compiled markdown pages. Encyclopedia-style articles with cross-references.

Layer 3

schema

CLAUDE.md / AGENTS.md. Rules that discipline the LLM's behavior as maintainer.

↓ Tap any row to expand analysis

▲ Strengths

✓Knowledge Compounds Over Time

Unlike RAG — where every query starts from scratch re-deriving connections — the LLM wiki is stateful. Each new source you add integrates into existing pages, strengthening existing connections and building new ones. The system gets more valuable with every addition, not just bigger.

✓Zero Maintenance Burden on Humans

The grunt work of knowledge management — cross-referencing, updating related pages, creating summaries, flagging contradictions — is what kills every personal wiki humans try to maintain. LLMs do this tirelessly. The human's job shrinks to: decide what to read, and what questions to ask.

✓Token-Efficient at Personal Scale

At ~100 articles, the wiki's index.md fits in context. The LLM reads the index, identifies relevant articles, and loads only those — no embedding, no vector search, no retrieval noise. This is faster and cheaper per query than a full RAG pipeline for this scale.

✓Human-Readable & Auditable

The wiki is just markdown. You can open it in any editor, read it yourself, version it in git, and inspect every claim. There's no black-box vector math. Every connection the LLM made is visible. This transparency is a genuine advantage over opaque embeddings.

✓Future-Proof & Portable

Plain markdown files work with any tool, any model, any era. No vendor lock-in. No proprietary database. When the next-gen model releases, you point it at the same folder. The data outlives the tooling.

✓Self-Healing via Lint Passes

Karpathy describes periodic "health check" passes where the LLM scans the entire wiki for contradictions, orphaned pages (no links pointing to them), and concepts referenced but not yet given their own page. The wiki actively repairs itself rather than rotting silently.

✓Path to Fine-Tuning

As the wiki matures and gets "purified" through continuous lint passes, it becomes high-quality synthetic training data. Karpathy points to the possibility of fine-tuning a smaller, efficient model directly on the wiki — so the LLM "knows" your knowledge base in its own weights, not just its context.

▼ Weaknesses

✗Errors Persist & Compound

This is the most serious structural flaw. With RAG, hallucinations are ephemeral — wrong answer this query, clean slate next time. With an LLM wiki, if the LLM incorrectly links two concepts at ingest time, that mistake becomes a prior that future ingest passes build upon. Persistent errors are more dangerous than ephemeral ones.

✗Hard Scale Ceiling (~50K tokens)

The wiki approach stops working reliably when the index can no longer fit in the model's context window — roughly 50,000–100,000 tokens. Karpathy's own wiki is ~100 articles / ~400K words on a single topic. A mid-size company has thousands of documents; a large one has millions. The architecture simply doesn't extend to that scale.

✗No Access Control or Multi-User Support

It's a folder of markdown files. There is no Role-Based Access Control, no audit logging, no concurrency handling for simultaneous writes, no permissions model. Multiple users or agents creating write conflicts is unmanaged. This is not a limitation that can be patched — it's a structural consequence of the architecture.

✗Manual Cross-Checking Burden Returns

In precision-critical domains (API specs, version constraints, legal records), LLM-generated content requires human cross-checking against raw sources to catch subtle factual errors. At that point, the maintenance burden you thought you'd eliminated returns in a different form: verification overhead.

✗Cognitive Outsourcing Risk

Critics argued that the bookkeeping Karpathy outsources — filing, cross-referencing, summarizing — is precisely where genuine understanding forms. By handing this to an LLM, you may end up with a comprehensive wiki you haven't internalized. You have a great reference; you may lack deep ownership of the knowledge.

✗Knowledge Staleness Without Active Upkeep

Community reports show that most people who try this pattern get the folder structure right but end up with a wiki that slowly becomes unreliable or gets abandoned. The system requires consistent source ingestion and regular lint passes. If you stop feeding it, the wiki rots — its age relative to your domain's pace of change becomes a liability.

✗Weaker Semantic Retrieval than RAG

Markdown wikilinks are explicit and manually-created. Vector embeddings discover semantic connections across differently-worded text that manual linking simply cannot — finding that an article titled "caching strategies" is semantically related to "performance bottlenecks" without an explicit link. At large corpora, RAG's fuzzy matching is the superior retrieval mechanism.

RAG retrieves and forgets. A wiki accumulates and compounds. — Design rationale for memex, April 2026

⚠

Scale matters most here. The comparison is not absolute — it is highly scale-dependent. Below ~50K tokens, the wiki wins. Above that threshold, RAG's architecture becomes necessary regardless of the storage format.

Dimension	LLM Wiki	RAG
Knowledge Accumulation	✦ Compounds with each ingest	Stateless — restarts every query
Maintenance Cost	✦ LLM does the filing	Chunking pipelines need upkeep
Scale Ceiling	~50–100K tokens hard limit	✦ Millions of documents, no ceiling
Human Readability	✦ Plain markdown, fully auditable	Black-box vector space
Semantic Retrieval	Explicit links only	✦ Fuzzy semantic matching
Error Persistence	Errors compound into future pages	Errors are ephemeral per query
Multi-user / RBAC	None — flat file system	✦ Supported by most platforms
Query Latency	✦ Fast at personal scale	Embedding search overhead
Setup Complexity	✦ Just folders & markdown	Vector DB, chunking, embeddings
Vendor Lock-in	✦ Zero — any model, any editor	Often tied to embedding provider
Cross-reference Quality	✦ Rich, named wikilinks	Implicit via similarity score
Fine-tuning Pathway	✦ Wiki becomes training data	Raw chunks are poor training data

Excellent Fit

Solo Deep Research

Reading papers, articles, and reports over weeks or months on a single topic. Karpathy's primary use case — his ML research wiki has ~100 articles and 400K words, all compiled without writing a line manually.

Excellent Fit

Personal Knowledge Base

Goals, health tracking, journal entries, podcast notes — building a structured picture of yourself over time. The LLM creates concept pages for recurring themes and connects them across months or years.

Good Fit

Small Team Wiki (<500 articles)

Engineering team internal docs, competitive analysis, trip planning. Works well if one person owns ingestion and the team reads via Obsidian. Breaks at concurrent writes or RBAC requirements.

Good Fit

Agentic Pipeline Memory

AI agent systems that need persistent memory between sessions. The wiki prevents agents from "waking up blank." Session context is compiled rather than re-derived, dramatically cutting token overhead.

Poor Fit

Mission-Critical Precision

API parameter specs, version constraints, legal records, medical protocols. LLM-generated pages can silently misstate critical details. Manual cross-checking eliminates the maintenance savings that make this pattern attractive.

Avoid

Enterprise Knowledge Management

Millions of documents, hundreds of users, RBAC, audit trails, regulatory compliance. The flat file architecture cannot address concurrency, access control, or governance. This is a personal productivity hack, not enterprise infrastructure.

A breakdown of where the pattern generates real signal vs. where the noise grows louder.

Signal

The Compile-Time Insight

Moving synthesis from query-time (RAG) to ingest-time (wiki) is a genuinely novel architectural choice with real benefits for accumulation. This is the core innovation and it holds up to scrutiny.

Strong

Signal

LLM as Librarian

Offloading the maintenance bottleneck — the work that kills all human-maintained wikis — to an LLM is elegant and correct. The pattern solves a real problem people actually have.

Strong

Noise

"RAG is Dead"

Community hyperbole. RAG and the wiki pattern solve different problems at different scales. The wiki pattern is a personal productivity tool, not a replacement for enterprise-grade retrieval infrastructure.

High Noise

Noise

Error Amplification Risk

Real and underweighted by enthusiasts. The persistent-error problem is structural — not a bug to fix with better prompting. It's a genuine trade-off the pattern makes, and it's most dangerous in precision-critical domains.

Real Risk

Signal

The Idea File Paradigm

Karpathy's framing of sharing an "idea file" vs. a code repo — letting each person's agent instantiate a custom version — is genuinely forward-thinking about how patterns propagate in the agent era.

Solid

Noise

"It'll Replace Enterprise RAG"

Karpathy explicitly scoped this to individual researchers. The limitations (no RBAC, no concurrency, ~50K token ceiling) are not bugs — they are consequences of the design assumptions. Enterprise use requires entirely different infrastructure.

Pure Noise

The schema file is a wish, not a discipline. The lack of an actual security model structurally makes this a pattern with a dedicated output directory and no guardrails. — Community critique, April 2026

The bottleneck for personal knowledge bases was never the reading. It was the boring maintenance work nobody wanted to do. LLMs eliminate that bottleneck. — Design rationale for memex

These are the real engineering answers. For each known limitation, concrete mitigations exist. Some from Karpathy's own gist, others from production implementations and community analysis. Every mitigation below maps to a component in the memex repository. Click any row to expand the full approach. The Active Upkeep section is the one that matters most.

📈

Scaling Past the Token Ceiling

High Priority

01 Add qmd as your search layer at 50–100+ articles qmd · CLI + MCP ▶

The index.md breaks around 100–150 articles when it stops fitting cleanly in context. The fix is qmd — built by Tobi Lütke (Shopify CEO) and explicitly recommended by Karpathy himself. It's a local, on-device search engine for markdown files using hybrid BM25 + vector search with LLM re-ranking. No API calls, no data leaves your machine.

memex uses qmd from day one with three collections: wiki (live), wiki-archive (excluded by default), and wiki-conversations (mined sessions). Wing + room structural filtering narrows retrieval before search runs.

In memex: Configured at install time via docs/SETUP.md. The agent picks the right collection per query via guidance in the example CLAUDE.md files.

02 Shard the index — one sub-index per topic domain Schema · CLAUDE.md ▶

Before reaching for qmd, a simpler scaling step is to split index.md into domain-specific sub-indexes: wiki/patterns/index.md, wiki/decisions/index.md, etc. A root index.md points to sub-indexes, keeping any single file within comfortable context window bounds.

03 Consolidation tiers — promote stable knowledge up the stack memex · confidence field ▶

Structure knowledge in tiers by confidence and stability. Low-confidence claims live in draft pages. After multi-source confirmation, the LLM promotes them. Core principles graduate to a high-confidence tier that rarely changes.

In memex: Implemented via the confidence frontmatter field with time-based decay (6/9/12 month thresholds). Pages age out naturally as the automation re-promotes or archives them.

⚠️

Cross-Check & Error Persistence

High Priority

01 Confidence scoring — every claim carries a decay score Frontmatter · Schema ▶

Make uncertainty explicit. Every factual claim carries metadata: confidence level, last verified date, source count. Confidence decays with time and strengthens with reinforcement from new sources.

In memex: Implemented with confidence: high|medium|low + last_verified + sources: fields. The hygiene script auto-decays stale pages and flags them for re-verification.

Key benefit: Errors become visible, decaying warnings instead of permanent silent landmines.

02 Typed supersession — new info explicitly replaces old claims archive/ · log.md ▶

When new information contradicts an existing claim, the wrong pattern is leaving the old claim with an appended note. The right pattern: the new claim explicitly supersedes the old one, which moves to archive with a link to its replacement.

In memex: Pages with status: Superseded by ... are auto-archived. The archive retains the full history with archived_date, archived_reason, and original_path fields.

★ Biggest Mitigation Challenge

Active Upkeep — The Real Failure Mode

Community analysis of 120+ comments on Karpathy's gist converged on one clear finding: most people who try this pattern get the folder structure right and still end up with a wiki that slowly becomes unreliable, redundant, or abandoned. The difference between a wiki that compounds and one that quietly rots comes down to operational discipline — not technical setup. This is why memex's automation layer exists.

Daily

Feed the Machine

Extract new Claude Code sessions (hourly cron)
Summarize + index (daily 2am)
Harvest URLs + quick hygiene (daily 3am)

Weekly

Deep Pass

Full hygiene with LLM checks (Sun 4am)
Duplicate detection (auto-merge)
Contradiction report (human review)
Technology lifecycle checks

Continuous

Decay & Archive

last_verified refreshes from new sessions
Unused pages decay 6mo → 9mo → 12mo
Stale pages auto-archive
Archive auto-restores on reference

Review

Human in Loop

Staging pipeline for automated content
wiki-staging.py --review workflow
Hygiene reports split fixed vs needs-review
Promote / reject / defer

From analysis to implementation

Architecture

Solo Deep Research

Personal Knowledge Base

Small Team Wiki (<500 articles)

Agentic Pipeline Memory

Mission-Critical Precision

Enterprise Knowledge Management

The Compile-Time Insight

LLM as Librarian

"RAG is Dead"

Error Amplification Risk

The Idea File Paradigm

"It'll Replace Enterprise RAG"

Scaling Past the Token Ceiling

Cross-Check & Error Persistence

Active Upkeep — The Real Failure Mode