Files
memex/docs/DESIGN-RATIONALE.md
Eric Turner ee54a2f5d4 Initial commit — memex
A compounding LLM-maintained knowledge wiki.

Synthesis of Andrej Karpathy's persistent-wiki gist and milla-jovovich's
mempalace, with an automation layer on top for conversation mining, URL
harvesting, human-in-the-loop staging, staleness decay, and hygiene.

Includes:
- 11 pipeline scripts (extract, summarize, index, harvest, stage,
  hygiene, maintain, sync, + shared library)
- Full docs: README, SETUP, ARCHITECTURE, DESIGN-RATIONALE, CUSTOMIZE
- Example CLAUDE.md files (wiki schema + global instructions) tuned for
  the three-collection qmd setup
- 171-test pytest suite (cross-platform, runs in ~1.3s)
- MIT licensed
2026-04-12 21:16:02 -06:00

15 KiB

Design Rationale — Signal & Noise

Why each part of this repo exists. This is the "why" document; the other docs are the "what" and "how."

Before implementing anything, the design was worked out interactively with Claude as a structured Signal & Noise analysis of Andrej Karpathy's original persistent-wiki pattern:

Interactive design artifact: The LLM Wiki — Karpathy's Pattern — Signal & Noise

That artifact walks through the pattern's seven genuine strengths, seven real weaknesses, and concrete mitigations for each weakness. This repo is the implementation of those mitigations. If you want to understand why a component exists, the artifact has the longer-form argument; this document is the condensed version.


Where the pattern is genuinely strong

The analysis found seven strengths that hold up under scrutiny. This repo preserves all of them:

Strength How this repo keeps it
Knowledge compounds over time Every ingest adds to the existing wiki rather than restarting; conversation mining and URL harvesting continuously feed new material in
Zero maintenance burden on humans Cron-driven harvest + hygiene; the only manual step is staging review, and that's fast because the AI already compiled the page
Token-efficient at personal scale index.md fits in context; qmd kicks in only at 50+ articles; the wake-up briefing is ~200 tokens
Human-readable & auditable Plain markdown everywhere; every cross-reference is visible; git history shows every change
Future-proof & portable No vendor lock-in; you can point any agent at the same tree tomorrow
Self-healing via lint passes wiki-hygiene.py runs quick checks daily and full (LLM) checks weekly
Path to fine-tuning Wiki pages are high-quality synthetic training data once purified through hygiene

Where the pattern is genuinely weak — and how this repo answers

The analysis identified seven real weaknesses. Five have direct mitigations in this repo; two remain open trade-offs you should be aware of.

1. Errors persist and compound

The problem: Unlike RAG — where a hallucination is ephemeral and the next query starts clean — an LLM wiki persists its mistakes. If the LLM incorrectly links two concepts at ingest time, future ingests build on that wrong prior.

How this repo mitigates:

  • confidence field — every page carries high/medium/low with decay based on last_verified. Wrong claims aren't treated as permanent — they age out visibly.
  • Archive + restore — decayed pages get moved to archive/ where they're excluded from default search. If they get referenced again they're auto-restored with confidence: medium (never straight to high — they have to re-earn trust).
  • Raw harvested material is immutableraw/harvested/*.md files are the ground truth. Every compiled wiki page can be traced back to its source via the sources: frontmatter field.
  • Full-mode contradiction detectionwiki-hygiene.py --full uses sonnet to find conflicting claims across pages. Report-only (humans decide which side wins).
  • Staging review — automated content goes to staging/ first. Nothing enters the live wiki without human approval, so errors have two chances to get caught (AI compile + human review) before they become persistent.

2. Hard scale ceiling at ~50K tokens

The problem: The wiki approach stops working when index.md no longer fits in context. Karpathy's own wiki was ~100 articles / 400K words — already near the ceiling.

How this repo mitigates:

  • qmd from day oneqmd (BM25 + vector + LLM re-ranking) is set up in the default configuration so the agent never has to load the full index. At 50+ pages, qmd search replaces cat index.md.
  • Wing/room structural filtering — conversations are partitioned by project code (wing) and topic (room, via the topics: frontmatter). Retrieval is pre-narrowed to the relevant wing before search runs. This extends the effective ceiling because qmd works on a relevant subset, not the whole corpus.
  • Hygiene full mode flags redundancy — duplicate detection auto-merges weaker pages into stronger ones, keeping the corpus lean.
  • Archive excludes stale content — the wiki-archive collection has includeByDefault: false, so archived pages don't eat context until explicitly queried.

3. Manual cross-checking burden returns in precision-critical domains

The problem: For API specs, version constraints, legal records, and medical protocols, LLM-generated content needs human verification. The maintenance burden you thought you'd eliminated comes back as verification overhead.

How this repo mitigates:

  • Staging workflow — every automated page goes through human review. For precision-critical content, that review IS the cross-check. The AI does the drafting; you verify.
  • compilation_notes field — staging pages include the AI's own explanation of what it did and why. Makes review faster — you can spot-check the reasoning rather than re-reading the whole page.
  • Immutable raw sources — every wiki claim traces back to a specific file in raw/harvested/ with a SHA-256 content_hash. Verification means comparing the claim to the source, not "trust the LLM."
  • confidence: low for precision domains — the agent's instructions (via CLAUDE.md) tell it to flag low-confidence content when citing. Humans see the warning before acting.

Residual trade-off: For truly mission-critical data (legal, medical, compliance), no amount of automation replaces domain-expert review. If that's your use case, treat this repo as a drafting tool, not a canonical source.

4. Knowledge staleness without active upkeep

The problem: Community analysis of 120+ comments on Karpathy's gist found this is the #1 failure mode. Most people who try the pattern get the folder structure right and still end up with a wiki that slowly becomes unreliable because they stop feeding it. Six-week half-life is typical.

How this repo mitigates (this is the biggest thing):

  • Automation replaces human discipline — daily cron runs wiki-maintain.sh (harvest + hygiene + qmd reindex); weekly cron runs --full mode. You don't need to remember anything.
  • Conversation mining is the feed — you don't need to curate sources manually. Every Claude Code session becomes potential ingest. The feed is automatic and continuous, as long as you're doing work.
  • last_verified refreshes from conversation references — when the summarizer links a conversation to a wiki page via related:, the hygiene script picks that up and bumps last_verified. Pages stay fresh as long as they're still being discussed.
  • Decay thresholds force attention — pages without refresh signals for 6/9/12 months get downgraded and eventually archived. The wiki self-trims.
  • Hygiene reportsreports/hygiene-YYYY-MM-DD-needs-review.md flags the things that do need human judgment. Everything else is auto-fixed.

This is the single biggest reason this repo exists. The automation layer is entirely about removing "I forgot to lint" as a failure mode.

5. Cognitive outsourcing risk

The problem: Hacker News critics argued that the bookkeeping Karpathy outsources — filing, cross-referencing, summarizing — is precisely where genuine understanding forms. Outsource it and you end up with a comprehensive wiki you haven't internalized.

How this repo mitigates:

  • Staging review is a forcing function — you see every automated page before it lands. Even skimming forces engagement with the material.
  • qmd query "..." for exploration — searching the wiki is an active process, not passive retrieval. You're asking questions, not pulling a file.
  • The wake-up briefingcontext/wake-up.md is a 200-token digest the agent reads at session start. You read it too (or the agent reads it to you) — ongoing re-exposure to your own knowledge base.

Residual trade-off: This is a real concern even with mitigations. The wiki is designed as augmentation, not replacement. If you never read your own wiki and only consult it through the agent, you're in the outsourcing failure mode. The fix is discipline, not architecture.

6. Weaker semantic retrieval than RAG at scale

The problem: At large corpora, vector embeddings find semantically related content across different wording in ways explicit wikilinks can't match.

How this repo mitigates:

  • qmd is hybrid (BM25 + vector) — not just keyword search. Vector similarity is built into the retrieval pipeline from day one.
  • Structural navigation complements semantic search — project codes (wings) and topic frontmatter narrow the search space before the hybrid search runs. Structure + semantics is stronger than either alone.
  • Missing cross-reference detection — full-mode hygiene asks the LLM to find pages that should link to each other but don't, then auto-adds them. This is the explicit-linking approach catching up to semantic retrieval over time.

Residual trade-off: At enterprise scale (millions of documents), a proper vector DB with specialized retrieval wins. This repo is for personal / small-team scale where the hybrid approach is sufficient.

7. No access control or multi-user support

The problem: It's a folder of markdown files. No RBAC, no audit logging, no concurrency handling, no permissions model.

How this repo mitigates:

  • Git-based sync with merge-union — concurrent writes on different machines auto-resolve because markdown is set to merge=union in .gitattributes. Both sides win.
  • Network boundary as soft access control — the suggested deployment is over Tailscale or a VPN, so the network does the work a RBAC layer would otherwise do. Not enterprise-grade, but sufficient for personal/family/small-team use.

Residual trade-off: This is the big one. The repo is not a replacement for enterprise knowledge management. No audit trails, no fine-grained permissions, no compliance story. If you need any of that, you need a different architecture. This repo is explicitly scoped to the personal/small-team use case.


The #1 failure mode — active upkeep

Every other weakness has a mitigation. Active upkeep is the one that kills wikis in the wild. The community data is unambiguous:

  • People who automate the lint schedule → wikis healthy at 6+ months
  • People who rely on "I'll remember to lint" → wikis abandoned at 6 weeks

The entire automation layer of this repo exists to remove upkeep as a thing the human has to think about:

Cadence Job Purpose
Every 15 min wiki-sync.sh Commit/pull/push — cross-machine sync
Every 2 hours wiki-sync.sh full Full sync + qmd reindex
Every hour mine-conversations.sh --extract-only Capture new Claude Code sessions (no LLM)
Daily 2am summarize-conversations.py --claude + index Classify + summarize (LLM)
Daily 3am wiki-maintain.sh Harvest + quick hygiene + reindex
Weekly Sun 4am wiki-maintain.sh --hygiene-only --full LLM-powered duplicate/contradiction/cross-ref detection

If you disable all of these, you get the same outcome as every abandoned wiki: six-week half-life. The scripts aren't optional convenience — they're the load-bearing answer to the pattern's primary failure mode.


What was borrowed from where

This repo is a synthesis of two ideas with an automation layer on top:

From Karpathy

  • The core pattern: LLM-maintained persistent wiki, compile at ingest time instead of retrieve at query time
  • Separation of raw/ (immutable sources) from wiki/ (compiled pages)
  • CLAUDE.md as the schema that disciplines the agent
  • Periodic "lint" passes to catch orphans, contradictions, missing refs
  • The idea that the wiki becomes fine-tuning material over time

From mempalace

  • Wings = per-person or per-project namespaces → this repo uses project codes (mc, wiki, web, etc.) as the same thing in conversations/<project>/
  • Rooms = topics within a wing → the topics: frontmatter on conversation files
  • Halls = memory-type corridors (fact / event / discovery / preference / advice / tooling) → the halls: frontmatter field classified by the summarizer
  • Closets = summary layer → the summary body of each summarized conversation
  • Drawers = verbatim archive, never lost → the extracted conversation transcripts under conversations/<project>/*.md
  • Tunnels = cross-wing connections → the related: frontmatter linking conversations to wiki pages
  • Wing + room structural filtering gives a documented +34% retrieval boost over flat search

The MemPalace taxonomy solved a problem Karpathy's pattern doesn't address: how do you navigate a growing corpus without reading everything? The answer is to give the corpus structural metadata at ingest time, then filter on that metadata before doing semantic search. This repo borrows that wholesale.

What this repo adds

  • Automation layer tying the pieces together with cron-friendly orchestration
  • Staging pipeline as a human-in-the-loop checkpoint for automated content
  • Confidence decay + auto-archive + auto-restore as the "retention curve" that community analysis identified as critical for long-term wiki health
  • qmd integration as the scalable search layer (chosen over ChromaDB because it uses the same markdown storage as the wiki — one index to maintain, not two)
  • Hygiene reports with fixed vs needs-review separation so automation handles mechanical fixes and humans handle ambiguity
  • Cross-machine sync via git with markdown merge-union so the same wiki lives on multiple machines without merge hell

Honest residual trade-offs

Five items from the analysis that this repo doesn't fully solve and where you should know the limits:

  1. Enterprise scale — this is a personal/small-team tool. Millions of documents, hundreds of users, RBAC, compliance: wrong architecture.
  2. True semantic retrieval at massive scaleqmd hybrid search is great for thousands of pages, not millions.
  3. Cognitive outsourcing — no architecture fix. Discipline yourself to read your own wiki, not just query it through the agent.
  4. Precision-critical domains — for legal/medical/regulatory data, use this as a drafting tool, not a source of truth. Human domain-expert review is not replaceable.
  5. Access control — network boundary (Tailscale) is the fastest path; nothing in the repo itself enforces permissions.

If any of these are dealbreakers for your use case, a different architecture is probably what you need.


Further reading