Design Artifact
April 2026
memex project

memex

Karpathy's Pattern — Signal & Noise
Source: karpathy.gist
Analysis: Signal vs Noise
Decision Rationale
PERSISTENT MEMORY RAG vs WIKI COMPILE ONCE · QUERY FOREVER ~100 ARTICLES SWEET SPOT KNOWLEDGE COMPOUNDS PERSONAL SCALE ONLY HALLUCINATIONS PERSIST MARKDOWN IS FUTURE-PROOF PERSISTENT MEMORY RAG vs WIKI COMPILE ONCE · QUERY FOREVER ~100 ARTICLES SWEET SPOT KNOWLEDGE COMPOUNDS PERSONAL SCALE ONLY HALLUCINATIONS PERSIST MARKDOWN IS FUTURE-PROOF
17M+
Views · Karpathy Tweet
~100
Articles · Sweet Spot
400K
Words · Karpathy's Wiki
50K
Token Ceiling
The Core Idea

Instead of making the LLM rediscover knowledge from raw documents on every query — the RAG way — Karpathy proposes having the LLM compile a structured, interlinked wiki once at ingest time. Knowledge accumulates. The LLM maintains the wiki, not the human.

★ This analysis produced memex

From analysis to implementation

This document was the design artifact that preceded the memex repository — a structured Signal & Noise pass over Karpathy's pattern that found seven real weaknesses and worked out concrete mitigations for each. Every automation component in memex maps directly to a mitigation identified here.

Read the repo:

Architecture

Layer 1
raw/
PDFs, articles, web clips. Immutable. Human adds, LLM never modifies.
Process
🤖 LLM
Reads sources. Synthesizes, links, and compiles structured pages. Runs lint checks.
Layer 2
wiki/
Compiled markdown pages. Encyclopedia-style articles with cross-references.
+
Layer 3
schema
CLAUDE.md / AGENTS.md. Rules that discipline the LLM's behavior as maintainer.

↓ Tap any row to expand analysis

Strengths
Knowledge Compounds Over Time
Unlike RAG — where every query starts from scratch re-deriving connections — the LLM wiki is stateful. Each new source you add integrates into existing pages, strengthening existing connections and building new ones. The system gets more valuable with every addition, not just bigger.
+
Zero Maintenance Burden on Humans
The grunt work of knowledge management — cross-referencing, updating related pages, creating summaries, flagging contradictions — is what kills every personal wiki humans try to maintain. LLMs do this tirelessly. The human's job shrinks to: decide what to read, and what questions to ask.
+
Token-Efficient at Personal Scale
At ~100 articles, the wiki's index.md fits in context. The LLM reads the index, identifies relevant articles, and loads only those — no embedding, no vector search, no retrieval noise. This is faster and cheaper per query than a full RAG pipeline for this scale.
+
Human-Readable & Auditable
The wiki is just markdown. You can open it in any editor, read it yourself, version it in git, and inspect every claim. There's no black-box vector math. Every connection the LLM made is visible. This transparency is a genuine advantage over opaque embeddings.
+
Future-Proof & Portable
Plain markdown files work with any tool, any model, any era. No vendor lock-in. No proprietary database. When the next-gen model releases, you point it at the same folder. The data outlives the tooling.
+
Self-Healing via Lint Passes
Karpathy describes periodic "health check" passes where the LLM scans the entire wiki for contradictions, orphaned pages (no links pointing to them), and concepts referenced but not yet given their own page. The wiki actively repairs itself rather than rotting silently.
+
Path to Fine-Tuning
As the wiki matures and gets "purified" through continuous lint passes, it becomes high-quality synthetic training data. Karpathy points to the possibility of fine-tuning a smaller, efficient model directly on the wiki — so the LLM "knows" your knowledge base in its own weights, not just its context.
+
Weaknesses
Errors Persist & Compound
This is the most serious structural flaw. With RAG, hallucinations are ephemeral — wrong answer this query, clean slate next time. With an LLM wiki, if the LLM incorrectly links two concepts at ingest time, that mistake becomes a prior that future ingest passes build upon. Persistent errors are more dangerous than ephemeral ones.
+
Hard Scale Ceiling (~50K tokens)
The wiki approach stops working reliably when the index can no longer fit in the model's context window — roughly 50,000–100,000 tokens. Karpathy's own wiki is ~100 articles / ~400K words on a single topic. A mid-size company has thousands of documents; a large one has millions. The architecture simply doesn't extend to that scale.
+
No Access Control or Multi-User Support
It's a folder of markdown files. There is no Role-Based Access Control, no audit logging, no concurrency handling for simultaneous writes, no permissions model. Multiple users or agents creating write conflicts is unmanaged. This is not a limitation that can be patched — it's a structural consequence of the architecture.
+
Manual Cross-Checking Burden Returns
In precision-critical domains (API specs, version constraints, legal records), LLM-generated content requires human cross-checking against raw sources to catch subtle factual errors. At that point, the maintenance burden you thought you'd eliminated returns in a different form: verification overhead.
+
Cognitive Outsourcing Risk
Critics argued that the bookkeeping Karpathy outsources — filing, cross-referencing, summarizing — is precisely where genuine understanding forms. By handing this to an LLM, you may end up with a comprehensive wiki you haven't internalized. You have a great reference; you may lack deep ownership of the knowledge.
+
Knowledge Staleness Without Active Upkeep
Community reports show that most people who try this pattern get the folder structure right but end up with a wiki that slowly becomes unreliable or gets abandoned. The system requires consistent source ingestion and regular lint passes. If you stop feeding it, the wiki rots — its age relative to your domain's pace of change becomes a liability.
+
Weaker Semantic Retrieval than RAG
Markdown wikilinks are explicit and manually-created. Vector embeddings discover semantic connections across differently-worded text that manual linking simply cannot — finding that an article titled "caching strategies" is semantically related to "performance bottlenecks" without an explicit link. At large corpora, RAG's fuzzy matching is the superior retrieval mechanism.
+
RAG retrieves and forgets. A wiki accumulates and compounds. — Design rationale for memex, April 2026
Scale matters most here. The comparison is not absolute — it is highly scale-dependent. Below ~50K tokens, the wiki wins. Above that threshold, RAG's architecture becomes necessary regardless of the storage format.
Dimension LLM Wiki RAG
Knowledge Accumulation✦ Compounds with each ingestStateless — restarts every query
Maintenance Cost✦ LLM does the filingChunking pipelines need upkeep
Scale Ceiling~50–100K tokens hard limit✦ Millions of documents, no ceiling
Human Readability✦ Plain markdown, fully auditableBlack-box vector space
Semantic RetrievalExplicit links only✦ Fuzzy semantic matching
Error PersistenceErrors compound into future pagesErrors are ephemeral per query
Multi-user / RBACNone — flat file system✦ Supported by most platforms
Query Latency✦ Fast at personal scaleEmbedding search overhead
Setup Complexity✦ Just folders & markdownVector DB, chunking, embeddings
Vendor Lock-in✦ Zero — any model, any editorOften tied to embedding provider
Cross-reference Quality✦ Rich, named wikilinksImplicit via similarity score
Fine-tuning Pathway✦ Wiki becomes training dataRaw chunks are poor training data
Excellent Fit

Solo Deep Research

Reading papers, articles, and reports over weeks or months on a single topic. Karpathy's primary use case — his ML research wiki has ~100 articles and 400K words, all compiled without writing a line manually.

Excellent Fit

Personal Knowledge Base

Goals, health tracking, journal entries, podcast notes — building a structured picture of yourself over time. The LLM creates concept pages for recurring themes and connects them across months or years.

Good Fit

Small Team Wiki (<500 articles)

Engineering team internal docs, competitive analysis, trip planning. Works well if one person owns ingestion and the team reads via Obsidian. Breaks at concurrent writes or RBAC requirements.

Good Fit

Agentic Pipeline Memory

AI agent systems that need persistent memory between sessions. The wiki prevents agents from "waking up blank." Session context is compiled rather than re-derived, dramatically cutting token overhead.

Poor Fit

Mission-Critical Precision

API parameter specs, version constraints, legal records, medical protocols. LLM-generated pages can silently misstate critical details. Manual cross-checking eliminates the maintenance savings that make this pattern attractive.

Avoid

Enterprise Knowledge Management

Millions of documents, hundreds of users, RBAC, audit trails, regulatory compliance. The flat file architecture cannot address concurrency, access control, or governance. This is a personal productivity hack, not enterprise infrastructure.

A breakdown of where the pattern generates real signal vs. where the noise grows louder.

Signal

The Compile-Time Insight

Moving synthesis from query-time (RAG) to ingest-time (wiki) is a genuinely novel architectural choice with real benefits for accumulation. This is the core innovation and it holds up to scrutiny.

Strong
Signal

LLM as Librarian

Offloading the maintenance bottleneck — the work that kills all human-maintained wikis — to an LLM is elegant and correct. The pattern solves a real problem people actually have.

Strong
Noise

"RAG is Dead"

Community hyperbole. RAG and the wiki pattern solve different problems at different scales. The wiki pattern is a personal productivity tool, not a replacement for enterprise-grade retrieval infrastructure.

High Noise
Noise

Error Amplification Risk

Real and underweighted by enthusiasts. The persistent-error problem is structural — not a bug to fix with better prompting. It's a genuine trade-off the pattern makes, and it's most dangerous in precision-critical domains.

Real Risk
Signal

The Idea File Paradigm

Karpathy's framing of sharing an "idea file" vs. a code repo — letting each person's agent instantiate a custom version — is genuinely forward-thinking about how patterns propagate in the agent era.

Solid
Noise

"It'll Replace Enterprise RAG"

Karpathy explicitly scoped this to individual researchers. The limitations (no RBAC, no concurrency, ~50K token ceiling) are not bugs — they are consequences of the design assumptions. Enterprise use requires entirely different infrastructure.

Pure Noise
The schema file is a wish, not a discipline. The lack of an actual security model structurally makes this a pattern with a dedicated output directory and no guardrails. — Community critique, April 2026
The bottleneck for personal knowledge bases was never the reading. It was the boring maintenance work nobody wanted to do. LLMs eliminate that bottleneck. — Design rationale for memex
These are the real engineering answers. For each known limitation, concrete mitigations exist. Some from Karpathy's own gist, others from production implementations and community analysis. Every mitigation below maps to a component in the memex repository. Click any row to expand the full approach. The Active Upkeep section is the one that matters most.
📈

Scaling Past the Token Ceiling

High Priority
01 Add qmd as your search layer at 50–100+ articles qmd · CLI + MCP

The index.md breaks around 100–150 articles when it stops fitting cleanly in context. The fix is qmd — built by Tobi Lütke (Shopify CEO) and explicitly recommended by Karpathy himself. It's a local, on-device search engine for markdown files using hybrid BM25 + vector search with LLM re-ranking. No API calls, no data leaves your machine.

memex uses qmd from day one with three collections: wiki (live), wiki-archive (excluded by default), and wiki-conversations (mined sessions). Wing + room structural filtering narrows retrieval before search runs.

In memex: Configured at install time via docs/SETUP.md. The agent picks the right collection per query via guidance in the example CLAUDE.md files.
02 Shard the index — one sub-index per topic domain Schema · CLAUDE.md

Before reaching for qmd, a simpler scaling step is to split index.md into domain-specific sub-indexes: wiki/patterns/index.md, wiki/decisions/index.md, etc. A root index.md points to sub-indexes, keeping any single file within comfortable context window bounds.

03 Consolidation tiers — promote stable knowledge up the stack memex · confidence field

Structure knowledge in tiers by confidence and stability. Low-confidence claims live in draft pages. After multi-source confirmation, the LLM promotes them. Core principles graduate to a high-confidence tier that rarely changes.

In memex: Implemented via the confidence frontmatter field with time-based decay (6/9/12 month thresholds). Pages age out naturally as the automation re-promotes or archives them.

⚠️

Cross-Check & Error Persistence

High Priority
01 Confidence scoring — every claim carries a decay score Frontmatter · Schema

Make uncertainty explicit. Every factual claim carries metadata: confidence level, last verified date, source count. Confidence decays with time and strengthens with reinforcement from new sources.

In memex: Implemented with confidence: high|medium|low + last_verified + sources: fields. The hygiene script auto-decays stale pages and flags them for re-verification.

Key benefit: Errors become visible, decaying warnings instead of permanent silent landmines.
02 Typed supersession — new info explicitly replaces old claims archive/ · log.md

When new information contradicts an existing claim, the wrong pattern is leaving the old claim with an appended note. The right pattern: the new claim explicitly supersedes the old one, which moves to archive with a link to its replacement.

In memex: Pages with status: Superseded by ... are auto-archived. The archive retains the full history with archived_date, archived_reason, and original_path fields.

★ Biggest Mitigation Challenge

Active Upkeep — The Real Failure Mode

Community analysis of 120+ comments on Karpathy's gist converged on one clear finding: most people who try this pattern get the folder structure right and still end up with a wiki that slowly becomes unreliable, redundant, or abandoned. The difference between a wiki that compounds and one that quietly rots comes down to operational discipline — not technical setup. This is why memex's automation layer exists.

Daily
Feed the Machine
  • Extract new Claude Code sessions (hourly cron)
  • Summarize + index (daily 2am)
  • Harvest URLs + quick hygiene (daily 3am)
Weekly
Deep Pass
  • Full hygiene with LLM checks (Sun 4am)
  • Duplicate detection (auto-merge)
  • Contradiction report (human review)
  • Technology lifecycle checks
Continuous
Decay & Archive
  • last_verified refreshes from new sessions
  • Unused pages decay 6mo → 9mo → 12mo
  • Stale pages auto-archive
  • Archive auto-restores on reference
Review
Human in Loop
  • Staging pipeline for automated content
  • wiki-staging.py --review workflow
  • Hygiene reports split fixed vs needs-review
  • Promote / reject / defer