diff --git a/README.md b/README.md index 2d42e9c..76eaa65 100644 --- a/README.md +++ b/README.md @@ -100,30 +100,34 @@ keeping cross-references intact, and flagging ambiguity for review. --- -## Why each part exists +## How memex extends Karpathy's pattern Before implementing anything, the design was worked out interactively -with Claude as a -[Signal & Noise analysis of Karpathy's pattern](https://eric-turner.com/memex/signal-and-noise.html). -That analysis found seven real weaknesses in the core pattern. This -repo exists because each weakness has a concrete mitigation — and -every component maps directly to one: +with Claude as a structured +[Signal & Noise analysis](https://eric-turner.com/memex/signal-and-noise.html). +Karpathy's original gist is a concept pitch, not an implementation — +he was explicit that he was sharing an "idea file" for others to build +on. memex is one attempt at that build-out. The analysis identified +seven places where the core idea needed an engineering layer to become +practical day-to-day, and every automation component in this repo maps +to one of those extensions: -| Karpathy-pattern weakness | How this repo answers it | -|---------------------------|--------------------------| -| **Errors persist and compound** | `confidence` field with time-based decay → pages age out visibly. Staging review catches automated content before it goes live. Full-mode hygiene does LLM contradiction detection. | -| **Hard ~50K-token ceiling** | `qmd` (BM25 + vector + re-ranking) set up from day one. Wing/room structural filtering narrows search before retrieval. Archive collection is excluded from default search. | -| **Manual cross-checking returns** | Every wiki claim traces back to immutable `raw/harvested/*.md` with SHA-256 hash. Staging review IS the cross-check. `compilation_notes` field makes review fast. | -| **Knowledge staleness** (the #1 failure mode in community data) | Daily + weekly cron removes "I forgot" as a failure mode. `last_verified` auto-refreshes from conversation references. Decayed pages auto-archive. | -| **Cognitive outsourcing risk** | Staging review forces engagement with every automated page. `qmd query` makes retrieval an active exploration. Wake-up briefing ~200 tokens the human reads too. | -| **Weaker semantic retrieval** | `qmd` hybrid (BM25 + vector). Full-mode hygiene adds missing cross-references. Structural metadata (wings, rooms) complements semantic search. | -| **No access control** | Git sync with `merge=union` markdown handling. Network-boundary ACL via Tailscale is the suggested path. *This one is a residual trade-off — see [DESIGN-RATIONALE.md](docs/DESIGN-RATIONALE.md).* | +| What memex adds | How it works | +|-----------------|--------------| +| **Time-decaying confidence** — pages earn trust through reinforcement and fade without it | `confidence` field + `last_verified`, 6/9/12 month decay thresholds, auto-archive. Full-mode hygiene also adds LLM contradiction detection across pages. | +| **Scalable search beyond the context window** | `qmd` (BM25 + vector + LLM re-ranking) from day one, with three collections (`wiki` / `wiki-archive` / `wiki-conversations`) so queries can route to the right surface. | +| **Traceable sources for every claim** | Every compiled page traces back to an immutable `raw/harvested/*.md` file with a SHA-256 content hash. Staging review is the built-in cross-check, and `compilation_notes` makes review fast. | +| **Continuous feed without manual discipline** | Daily + weekly cron chains extract → summarize → harvest → hygiene → reindex. `last_verified` auto-refreshes from new conversation references; decayed pages auto-archive and auto-restore when referenced again. | +| **Human-in-the-loop staging** for automated content | Every automated page lands in `staging/` first with `origin: automated`, `status: pending`. Nothing bypasses human review — one promotion step and it's in the live wiki with `last_verified` set. | +| **Hybrid retrieval** — structural navigation + semantic search | Wings/rooms/halls (borrowed from mempalace) give structural filtering that narrows the search space before qmd's hybrid BM25 + vector pass runs. Full-mode hygiene also auto-adds missing cross-references. | +| **Cross-machine git sync** for collaborative knowledge bases | `.gitattributes` with `merge=union` on markdown so concurrent writes on different machines merge additively. Harvest and hygiene state files sync across machines so both agree on what's been processed. | -The short version: Karpathy published the idea, the community found the -holes, and this repo is the automation layer that plugs the holes. -See **[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md)** for the -full argument with honest residual trade-offs and what this repo -explicitly does NOT solve. +The short version: Karpathy shared the idea, milla-jovovich's mempalace +added the structural memory taxonomy, and memex is the automation layer +that lets the whole thing run day-to-day without constant human +maintenance. See **[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md)** +for the longer rationale on each extension, plus honest notes on what +memex doesn't cover. --- @@ -384,9 +388,9 @@ on top. It would not exist without either of them. **Core pattern — [Andrej Karpathy — "Agent-Maintained Persistent Wiki" gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)** The foundational idea of a compounding LLM-maintained wiki that moves -synthesis from query-time (RAG) to ingest-time. This repo is an -implementation of Karpathy's pattern with the community-identified -failure modes plugged. +synthesis from query-time (RAG) to ingest-time. memex is an +implementation of Karpathy's pattern with the engineering layer that +turns the concept into something practical to run day-to-day. **Structural memory taxonomy — [milla-jovovich/mempalace](https://github.com/milla-jovovich/mempalace)** The wing/room/hall/closet/drawer/tunnel concepts that turn a flat @@ -411,12 +415,12 @@ simple pages. The repo is Claude-specific (see the section above for what that means and how to adapt for other agents). -**Design process** — this repo was designed interactively with Claude -as a structured Signal & Noise analysis before any code was written. -The analysis walks through the seven real strengths and seven real -weaknesses of Karpathy's pattern, then works through concrete -mitigations for each weakness. Every component in this repo maps back -to a specific mitigation identified there. +**Design process** — memex was designed interactively with Claude as a +structured Signal & Noise analysis before any code was written. The +analysis walks through the seven real strengths of Karpathy's pattern +and seven places where it needs an engineering layer to be practical, +and works through the concrete extension for each. Every component in +this repo maps back to a specific extension identified there. - **Live interactive version**: [eric-turner.com/memex/signal-and-noise.html](https://eric-turner.com/memex/signal-and-noise.html) diff --git a/docs/DESIGN-RATIONALE.md b/docs/DESIGN-RATIONALE.md index 82d1349..791dc60 100644 --- a/docs/DESIGN-RATIONALE.md +++ b/docs/DESIGN-RATIONALE.md @@ -14,10 +14,11 @@ original persistent-wiki pattern: > — same content, works offline The analysis walks through the pattern's seven genuine strengths, seven -real weaknesses, and concrete mitigations for each weakness. This repo -is the implementation of those mitigations. If you want to understand -*why* a component exists, the interactive version has the longer-form -argument; this document is the condensed written version. +places where it needs an engineering layer to be practical, and the +concrete extension for each. memex is the implementation of those +extensions. If you want to understand *why* a component exists, the +interactive version has the longer-form argument; this document is the +condensed written version. --- @@ -38,20 +39,24 @@ repo preserves all of them: --- -## Where the pattern is genuinely weak — and how this repo answers +## Where memex extends the pattern -The analysis identified seven real weaknesses. Five have direct -mitigations in this repo; two remain open trade-offs you should be aware -of. +Karpathy's gist is a concept pitch. He was explicit that he was sharing +an "idea file" for others to build on, not publishing a working +implementation. The analysis identified seven places where the core idea +needs an engineering layer to become practical day-to-day — five have +first-class answers in memex, and two remain scoped-out trade-offs that +the architecture cleanly acknowledges. -### 1. Errors persist and compound +### 1. Claim freshness and reversibility -**The problem**: Unlike RAG — where a hallucination is ephemeral and the -next query starts clean — an LLM wiki persists its mistakes. If the LLM -incorrectly links two concepts at ingest time, future ingests build on -that wrong prior. +**The gap**: Unlike RAG — where a hallucination is ephemeral and the +next query starts clean — an LLM-maintained wiki is stateful. If a +claim is wrong at ingest time, it stays wrong until something corrects +it. For the pattern to work long-term, claims need a way to earn trust +over time and lose it when unused. -**How this repo mitigates**: +**How memex extends it**: - **`confidence` field** — every page carries `high`/`medium`/`low` with decay based on `last_verified`. Wrong claims aren't treated as @@ -71,13 +76,14 @@ that wrong prior. two chances to get caught (AI compile + human review) before they become persistent. -### 2. Hard scale ceiling at ~50K tokens +### 2. Scalable search beyond the context window -**The problem**: The wiki approach stops working when `index.md` no -longer fits in context. Karpathy's own wiki was ~100 articles / 400K -words — already near the ceiling. +**The gap**: The pattern works beautifully up to ~100 articles, where +`index.md` still fits in context. Karpathy's own wiki was right at the +ceiling. Past that point, the agent needs a real search layer — loading +the full index stops being practical. -**How this repo mitigates**: +**How memex extends it**: - **`qmd` from day one** — `qmd` (BM25 + vector + LLM re-ranking) is set up in the default configuration so the agent never has to load the @@ -93,14 +99,14 @@ words — already near the ceiling. `includeByDefault: false`, so archived pages don't eat context until explicitly queried. -### 3. Manual cross-checking burden returns in precision-critical domains +### 3. Traceable sources for every claim -**The problem**: For API specs, version constraints, legal records, and -medical protocols, LLM-generated content needs human verification. The -maintenance burden you thought you'd eliminated comes back as -verification overhead. +**The gap**: In precision-sensitive domains (API specs, version +constraints, legal records, medical protocols), LLM-generated content +needs to be verifiable against a source. For the pattern to work in +those contexts, every claim needs to trace back to something immutable. -**How this repo mitigates**: +**How memex extends it**: - **Staging workflow** — every automated page goes through human review. For precision-critical content, that review IS the cross-check. The @@ -120,15 +126,16 @@ medical, compliance), no amount of automation replaces domain-expert review. If that's your use case, treat this repo as a *drafting* tool, not a canonical source. -### 4. Knowledge staleness without active upkeep +### 4. Continuous feed without manual discipline -**The problem**: Community analysis of 120+ comments on Karpathy's gist -found this is the #1 failure mode. Most people who try the pattern get +**The gap**: Community analysis of 120+ comments on Karpathy's gist +converged on one clear finding: this is the #1 friction point. Most +people who try the pattern get the folder structure right and still end up with a wiki that slowly becomes unreliable because they stop feeding it. Six-week half-life is typical. -**How this repo mitigates** (this is the biggest thing): +**How memex extends it** (this is the biggest layer): - **Automation replaces human discipline** — daily cron runs `wiki-maintain.sh` (harvest + hygiene + qmd reindex); weekly cron runs @@ -147,17 +154,20 @@ typical. flags the things that *do* need human judgment. Everything else is auto-fixed. -This is the single biggest reason this repo exists. The automation -layer is entirely about removing "I forgot to lint" as a failure mode. +This is the single biggest layer memex adds. Nothing about it is +exotic — it's a cron-scheduled pipeline that runs the scripts you'd +otherwise have to remember to run. That's the whole trick. -### 5. Cognitive outsourcing risk +### 5. Keeping the human engaged with their own knowledge -**The problem**: Hacker News critics argued that the bookkeeping +**The gap**: Hacker News critics pointed out that the bookkeeping Karpathy outsources — filing, cross-referencing, summarizing — is -precisely where genuine understanding forms. Outsource it and you end up -with a comprehensive wiki you haven't internalized. +precisely where genuine understanding forms. If the LLM does all of +it, you can end up with a comprehensive wiki you haven't internalized. +For the pattern to be an actual memory aid and not a false one, the +human needs touchpoints that keep them engaged. -**How this repo mitigates**: +**How memex extends it**: - **Staging review is a forcing function** — you see every automated page before it lands. Even skimming forces engagement with the @@ -169,19 +179,21 @@ with a comprehensive wiki you haven't internalized. the agent reads at session start. You read it too (or the agent reads it to you) — ongoing re-exposure to your own knowledge base. -**Residual trade-off**: This is a real concern even with mitigations. -The wiki is designed as *augmentation*, not *replacement*. If you -never read your own wiki and only consult it through the agent, you're -in the outsourcing failure mode. The fix is discipline, not -architecture. +**Caveat**: memex is designed as *augmentation*, not *replacement*. +It's most valuable when you engage with it actively — reading your own +wake-up briefing, spot-checking promoted pages, noticing decay flags. +If you only consult the wiki through the agent and never look at it +yourself, you've outsourced the learning. That's a usage pattern +choice, not an architecture problem. -### 6. Weaker semantic retrieval than RAG at scale +### 6. Hybrid retrieval — structure and semantics -**The problem**: At large corpora, vector embeddings find semantically -related content across different wording in ways explicit wikilinks -can't match. +**The gap**: Explicit wikilinks catch direct topic references but miss +semantic neighbors that use different wording. At scale, the pattern +benefits from vector similarity to find cross-topic connections the +human (or the LLM at ingest time) didn't think to link manually. -**How this repo mitigates**: +**How memex extends it**: - **`qmd` is hybrid (BM25 + vector)** — not just keyword search. Vector similarity is built into the retrieval pipeline from day one. @@ -198,33 +210,38 @@ can't match. proper vector DB with specialized retrieval wins. This repo is for personal / small-team scale where the hybrid approach is sufficient. -### 7. No access control or multi-user support +### 7. Cross-machine collaboration -**The problem**: It's a folder of markdown files. No RBAC, no audit -logging, no concurrency handling, no permissions model. +**The gap**: Karpathy's gist describes a single-user, single-machine +setup. In practice, people work from multiple machines (laptop, +workstation, server) and sometimes collaborate with small teams. The +pattern needs a sync story that handles concurrent writes gracefully. -**How this repo mitigates**: +**How memex extends it**: - **Git-based sync with merge-union** — concurrent writes on different machines auto-resolve because markdown is set to `merge=union` in `.gitattributes`. Both sides win. -- **Network boundary as soft access control** — the suggested - deployment is over Tailscale or a VPN, so the network does the work a - RBAC layer would otherwise do. Not enterprise-grade, but sufficient - for personal/family/small-team use. +- **State file sync** — `.harvest-state.json` and `.hygiene-state.json` + are committed, so two machines running the same pipeline agree on + what's already been processed instead of re-doing the work. +- **Network boundary as access gate** — the suggested deployment is + over Tailscale or a VPN, so the network enforces who can reach the + wiki at all. Simple and sufficient for personal/family/small-team + use. -**Residual trade-off**: **This is the big one.** The repo is not a -replacement for enterprise knowledge management. No audit trails, no -fine-grained permissions, no compliance story. If you need any of -that, you need a different architecture. This repo is explicitly -scoped to the personal/small-team use case. +**Explicit scope**: memex is **deliberately not** enterprise knowledge +management. No audit trails, no fine-grained permissions, no compliance +story. If you need any of that, you need a different architecture. +This is for the personal and small-team case where git + Tailscale is +the right amount of rigor. --- -## The #1 failure mode — active upkeep +## The biggest layer — active upkeep -Every other weakness has a mitigation. *Active upkeep is the one that -kills wikis in the wild.* The community data is unambiguous: +The other six extensions are important, but this is the one that makes +or breaks the pattern in practice. The community data is unambiguous: - People who automate the lint schedule → wikis healthy at 6+ months - People who rely on "I'll remember to lint" → wikis abandoned at 6 weeks @@ -243,8 +260,9 @@ thing the human has to think about: If you disable all of these, you get the same outcome as every abandoned wiki: six-week half-life. The scripts aren't optional -convenience — they're the load-bearing answer to the pattern's primary -failure mode. +convenience — they're the load-bearing automation that lets the pattern +actually compound over months and years instead of requiring a +disciplined human to keep it alive. --- @@ -305,26 +323,32 @@ This repo borrows that wholesale. --- -## Honest residual trade-offs +## What memex deliberately doesn't try to do -Five items from the analysis that this repo doesn't fully solve and -where you should know the limits: +Five things memex is explicitly scoped around — not because they're +unsolvable, but because solving them well requires a different kind of +architecture than a personal/small-team wiki. If any of these are +dealbreakers for your use case, memex is probably not the right fit: -1. **Enterprise scale** — this is a personal/small-team tool. Millions - of documents, hundreds of users, RBAC, compliance: wrong - architecture. +1. **Enterprise scale** — millions of documents, hundreds of users, + RBAC, compliance: these need real enterprise knowledge management + infrastructure. memex is tuned for personal and small-team use. 2. **True semantic retrieval at massive scale** — `qmd` hybrid search - is great for thousands of pages, not millions. -3. **Cognitive outsourcing** — no architecture fix. Discipline - yourself to read your own wiki, not just query it through the agent. -4. **Precision-critical domains** — for legal/medical/regulatory data, - use this as a drafting tool, not a source of truth. Human - domain-expert review is not replaceable. -5. **Access control** — network boundary (Tailscale) is the fastest - path; nothing in the repo itself enforces permissions. + works great up to thousands of pages. At millions, a dedicated + vector database with specialized retrieval wins. +3. **Replacing your own learning** — memex is an augmentation layer, + not a substitute for reading. Used well, it's a memory aid; used as + a bypass, it just lets you forget more. +4. **Precision-critical source of truth** — for legal, medical, or + regulatory data, memex is a drafting tool. Human domain-expert + review still owns the final call. +5. **Access control** — the network boundary (Tailscale) is the + fastest path to "only authorized people can reach it." memex itself + doesn't enforce permissions inside that boundary. -If any of these are dealbreakers for your use case, a different -architecture is probably what you need. +These are scope decisions, not unfinished work. memex is the best +personal/small-team answer to Karpathy's pattern I could build; it's +not trying to be every answer. --- diff --git a/docs/artifacts/signal-and-noise.html b/docs/artifacts/signal-and-noise.html index 14b903c..b875a66 100644 --- a/docs/artifacts/signal-and-noise.html +++ b/docs/artifacts/signal-and-noise.html @@ -3,7 +3,7 @@ -memex — Signal & Noise Analysis +memex — Karpathy's Pattern — Signal & Noise @@ -839,18 +1103,18 @@
- Design Artifact
+ Vol. I • No. 1
April 2026
- memex project + Special Report

memex

Karpathy's Pattern — Signal & Noise
- Source: karpathy.gist
- Analysis: Signal vs Noise
- Decision Rationale + Source: github.com/karpathy
+ 17M+ Views • 5K+ Stars
+ Community Analysis
@@ -864,6 +1128,7 @@ KNOWLEDGE COMPOUNDS PERSONAL SCALE ONLY HALLUCINATIONS PERSIST + NO ENTERPRISE RBAC MARKDOWN IS FUTURE-PROOF PERSISTENT MEMORY RAG vs WIKI @@ -872,6 +1137,7 @@ KNOWLEDGE COMPOUNDS PERSONAL SCALE ONLY HALLUCINATIONS PERSIST + NO ENTERPRISE RBAC MARKDOWN IS FUTURE-PROOF @@ -880,7 +1146,7 @@
17M+
-
Views · Karpathy Tweet
+
Tweet Views
~100
@@ -901,15 +1167,7 @@
The Core Idea
-

Instead of making the LLM rediscover knowledge from raw documents on every query — the RAG way — Karpathy proposes having the LLM compile a structured, interlinked wiki once at ingest time. Knowledge accumulates. The LLM maintains the wiki, not the human.

-
- - -
-
★ This analysis produced memex
-

From analysis to implementation

-

This document was the design artifact that preceded the memex repository — a structured Signal & Noise pass over Karpathy's pattern that found seven real weaknesses and worked out concrete mitigations for each. Every automation component in memex maps directly to a mitigation identified here.

-

Read the repo:

+

Instead of making the LLM rediscover knowledge from raw documents on every query — the RAG way — Karpathy proposes having the LLM compile a structured, interlinked wiki once at ingest time. Knowledge accumulates. The LLM maintains the wiki, not the human.

@@ -950,38 +1208,42 @@ +

↓ Tap any row to expand analysis

+
-
Strengths
+
+ Strengths +
Knowledge Compounds Over Time
-
Unlike RAG — where every query starts from scratch re-deriving connections — the LLM wiki is stateful. Each new source you add integrates into existing pages, strengthening existing connections and building new ones. The system gets more valuable with every addition, not just bigger.
+
Unlike RAG — where every query starts from scratch re-deriving connections — the LLM wiki is stateful. Each new source you add integrates into existing pages, strengthening existing connections and building new ones. The system gets more valuable with every addition, not just bigger.
+
Zero Maintenance Burden on Humans
-
The grunt work of knowledge management — cross-referencing, updating related pages, creating summaries, flagging contradictions — is what kills every personal wiki humans try to maintain. LLMs do this tirelessly. The human's job shrinks to: decide what to read, and what questions to ask.
+
The grunt work of knowledge management — cross-referencing, updating related pages, creating summaries, flagging contradictions — is what kills every personal wiki humans try to maintain. LLMs do this tirelessly. The human's job shrinks to: decide what to read, and what questions to ask.
+
Token-Efficient at Personal Scale
-
At ~100 articles, the wiki's index.md fits in context. The LLM reads the index, identifies relevant articles, and loads only those — no embedding, no vector search, no retrieval noise. This is faster and cheaper per query than a full RAG pipeline for this scale.
+
At ~100 articles, the wiki's index.md fits in context. The LLM reads the index, identifies relevant articles, and loads only those — no embedding, no vector search, no retrieval noise. This is faster and cheaper per query than a full RAG pipeline for this scale.
+
-
Human-Readable & Auditable
-
The wiki is just markdown. You can open it in any editor, read it yourself, version it in git, and inspect every claim. There's no black-box vector math. Every connection the LLM made is visible. This transparency is a genuine advantage over opaque embeddings.
+
Human-Readable & Auditable
+
The wiki is just markdown. You can open it in any editor, read it yourself, version it in git, and inspect every claim. There's no black-box vector math. Every connection the LLM made is visible as a [[wikilink]]. This transparency is a genuine advantage over opaque embeddings.
+
-
Future-Proof & Portable
-
Plain markdown files work with any tool, any model, any era. No vendor lock-in. No proprietary database. When the next-gen model releases, you point it at the same folder. The data outlives the tooling.
+
Future-Proof & Portable
+
Plain markdown files work with any tool, any model, any era. No vendor lock-in. No proprietary database. When GPT-7 or Claude 5 releases, you point it at the same folder. The data outlives the tooling.
+
@@ -991,28 +1253,31 @@
Path to Fine-Tuning
-
As the wiki matures and gets "purified" through continuous lint passes, it becomes high-quality synthetic training data. Karpathy points to the possibility of fine-tuning a smaller, efficient model directly on the wiki — so the LLM "knows" your knowledge base in its own weights, not just its context.
+
As the wiki matures and gets "purified" through continuous lint passes, it becomes high-quality synthetic training data. Karpathy points to the possibility of fine-tuning a smaller, efficient model directly on the wiki — so the LLM "knows" your knowledge base in its own weights, not just its context.
+
+
-
Weaknesses
+
+ Weaknesses +
Errors Persist & Compound
-
This is the most serious structural flaw. With RAG, hallucinations are ephemeral — wrong answer this query, clean slate next time. With an LLM wiki, if the LLM incorrectly links two concepts at ingest time, that mistake becomes a prior that future ingest passes build upon. Persistent errors are more dangerous than ephemeral ones.
+
This is the most serious structural flaw. With RAG, hallucinations are ephemeral — wrong answer this query, clean slate next time. With an LLM wiki, if the LLM incorrectly links two concepts at ingest time, that mistake becomes a prior that future ingest passes build upon. Persistent errors are more dangerous than ephemeral ones.
+
Hard Scale Ceiling (~50K tokens)
-
The wiki approach stops working reliably when the index can no longer fit in the model's context window — roughly 50,000–100,000 tokens. Karpathy's own wiki is ~100 articles / ~400K words on a single topic. A mid-size company has thousands of documents; a large one has millions. The architecture simply doesn't extend to that scale.
+
The wiki approach stops working reliably when the index can no longer fit in the model's context window — roughly 50,000–100,000 tokens. Karpathy's own wiki is ~100 articles / ~400K words on a single topic. A mid-size company has thousands of documents; a large one has millions. The architecture simply doesn't extend to that scale.
+
No Access Control or Multi-User Support
-
It's a folder of markdown files. There is no Role-Based Access Control, no audit logging, no concurrency handling for simultaneous writes, no permissions model. Multiple users or agents creating write conflicts is unmanaged. This is not a limitation that can be patched — it's a structural consequence of the architecture.
+
It's a folder of markdown files. There is no Role-Based Access Control, no audit logging, no concurrency handling for simultaneous writes, no permissions model. Multiple users or agents creating write conflicts is unmanaged. This is not a limitation that can be patched — it's a structural consequence of the architecture.
+
@@ -1022,17 +1287,17 @@
Cognitive Outsourcing Risk
-
Critics argued that the bookkeeping Karpathy outsources — filing, cross-referencing, summarizing — is precisely where genuine understanding forms. By handing this to an LLM, you may end up with a comprehensive wiki you haven't internalized. You have a great reference; you may lack deep ownership of the knowledge.
+
Critics on Hacker News argued that the bookkeeping Karpathy outsources — filing, cross-referencing, summarizing — is precisely where genuine understanding forms. By handing this to an LLM, you may end up with a comprehensive wiki you haven't internalized. You have a great reference; you may lack deep ownership of the knowledge.
+
Knowledge Staleness Without Active Upkeep
-
Community reports show that most people who try this pattern get the folder structure right but end up with a wiki that slowly becomes unreliable or gets abandoned. The system requires consistent source ingestion and regular lint passes. If you stop feeding it, the wiki rots — its age relative to your domain's pace of change becomes a liability.
+
Community reports show that most people who try this pattern get the folder structure right but end up with a wiki that slowly becomes unreliable or gets abandoned. The system requires consistent source ingestion and regular lint passes. If you stop feeding it, the wiki rots — its age relative to your domain's pace of change becomes a liability.
+
Weaker Semantic Retrieval than RAG
-
Markdown wikilinks are explicit and manually-created. Vector embeddings discover semantic connections across differently-worded text that manual linking simply cannot — finding that an article titled "caching strategies" is semantically related to "performance bottlenecks" without an explicit link. At large corpora, RAG's fuzzy matching is the superior retrieval mechanism.
+
Markdown wikilinks are explicit and manually-created. Vector embeddings discover semantic connections across differently-worded text that manual linking simply cannot — finding that an article titled "caching strategies" is semantically related to "performance bottlenecks" without an explicit link. At large corpora, RAG's fuzzy matching is the superior retrieval mechanism.
+
@@ -1044,33 +1309,81 @@
RAG retrieves and forgets. A wiki accumulates and compounds. - — Design rationale for memex, April 2026 + — LLM Wiki v2, community extension of Karpathy's pattern
-
Scale matters most here. The comparison is not absolute — it is highly scale-dependent. Below ~50K tokens, the wiki wins. Above that threshold, RAG's architecture becomes necessary regardless of the storage format.
+
Scale matters most here. The comparison is not absolute — it is highly scale-dependent. Below ~50K tokens, the wiki pattern wins. Above that threshold, RAG's architecture becomes necessary regardless of the storage format.
- + - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DimensionLLM Wikimemex / LLM Wiki RAG
Knowledge Accumulation✦ Compounds with each ingestStateless — restarts every query
Maintenance Cost✦ LLM does the filingChunking pipelines need upkeep
Scale Ceiling~50–100K tokens hard limit✦ Millions of documents, no ceiling
Human Readability✦ Plain markdown, fully auditableBlack-box vector space
Semantic RetrievalExplicit links only✦ Fuzzy semantic matching
Error PersistenceErrors compound into future pagesErrors are ephemeral per query
Multi-user / RBACNone — flat file system✦ Supported by most platforms
Query Latency✦ Fast at personal scaleEmbedding search overhead
Setup Complexity✦ Just folders & markdownVector DB, chunking, embeddings
Vendor Lock-in✦ Zero — any model, any editorOften tied to embedding provider
Cross-reference Quality✦ Rich, named wikilinksImplicit via similarity score
Fine-tuning Pathway✦ Wiki becomes training dataRaw chunks are poor training data
Knowledge Accumulation✦ Compounds with each ingestStateless — restarts every query
Maintenance Cost✦ LLM does the filingChunking pipelines need upkeep
Scale Ceiling~50–100K tokens hard limit✦ Millions of documents, no ceiling
Human Readability✦ Plain markdown, fully auditableBlack-box vector space
Semantic RetrievalExplicit links only✦ Fuzzy semantic matching
Error PersistenceErrors compound into future pagesErrors are ephemeral per query
Multi-user / RBACNone — flat file system✦ Supported by most platforms
Query Latency✦ Fast at personal scaleEmbedding search overhead
Setup Complexity✦ Just folders & markdownVector DB, chunking, embeddings
Vendor Lock-in✦ Zero — any model, any editorOften tied to embedding provider
Cross-reference Quality✦ Rich, named wikilinksImplicit via similarity score
Fine-tuning Pathway✦ Wiki becomes training dataRaw chunks are poor training data
@@ -1081,13 +1394,13 @@
Excellent Fit

Solo Deep Research

-

Reading papers, articles, and reports over weeks or months on a single topic. Karpathy's primary use case — his ML research wiki has ~100 articles and 400K words, all compiled without writing a line manually.

+

Reading papers, articles, and reports over weeks or months on a single topic. Karpathy's primary use case — his ML research wiki has ~100 articles and 400K words, all compiled without writing a line manually.

Excellent Fit

Personal Knowledge Base

-

Goals, health tracking, journal entries, podcast notes — building a structured picture of yourself over time. The LLM creates concept pages for recurring themes and connects them across months or years.

+

Goals, health tracking, journal entries, podcast notes — building a structured picture of yourself over time. The LLM creates concept pages for recurring themes and connects them across months or years.

@@ -1133,7 +1446,7 @@
Signal

LLM as Librarian

-

Offloading the maintenance bottleneck — the work that kills all human-maintained wikis — to an LLM is elegant and correct. The pattern solves a real problem people actually have.

+

Offloading the maintenance bottleneck — the work that kills all human-maintained wikis — to an LLM is elegant and correct. The pattern solves a real problem people actually have.

Strong
@@ -1151,7 +1464,7 @@
Noise

Error Amplification Risk

-

Real and underweighted by enthusiasts. The persistent-error problem is structural — not a bug to fix with better prompting. It's a genuine trade-off the pattern makes, and it's most dangerous in precision-critical domains.

+

Real and underweighted by enthusiasts. The persistent-error problem is structural — not a bug to fix with better prompting. It's a genuine trade-off the pattern makes, and it's most dangerous in precision-critical domains.

Real Risk
@@ -1160,7 +1473,7 @@
Signal

The Idea File Paradigm

-

Karpathy's framing of sharing an "idea file" vs. a code repo — letting each person's agent instantiate a custom version — is genuinely forward-thinking about how patterns propagate in the agent era.

+

Karpathy's framing of sharing an "idea file" vs. a code repo — letting each person's agent instantiate a custom version — is genuinely forward-thinking about how patterns propagate in the agent era.

Solid
@@ -1169,7 +1482,7 @@
Noise

"It'll Replace Enterprise RAG"

-

Karpathy explicitly scoped this to individual researchers. The limitations (no RBAC, no concurrency, ~50K token ceiling) are not bugs — they are consequences of the design assumptions. Enterprise use requires entirely different infrastructure.

+

Karpathy explicitly scoped this to individual researchers. The limitations (no RBAC, no concurrency, ~50K token ceiling) are not bugs — they are consequences of the design assumptions. Enterprise use requires entirely different infrastructure.

Pure Noise
@@ -1178,13 +1491,13 @@
- The schema file is a wish, not a discipline. The lack of an actual security model structurally makes this a pattern with a dedicated output directory and no guardrails. - — Community critique, April 2026 + The schema file is a wish, not a discipline. The lack of an actual security model structurally makes this a skill with a dedicated output directory and no guardrails. + — Threads community critique, April 2026
The bottleneck for personal knowledge bases was never the reading. It was the boring maintenance work nobody wanted to do. LLMs eliminate that bottleneck. - — Design rationale for memex + — LLM Wiki v2 community extension
@@ -1192,7 +1505,7 @@
- These are the real engineering answers. For each known limitation, concrete mitigations exist. Some from Karpathy's own gist, others from production implementations and community analysis. Every mitigation below maps to a component in the memex repository. Click any row to expand the full approach. The Active Upkeep section is the one that matters most. + These are the real engineering answers. For each known limitation, the community has converged on concrete mitigations — some from Karpathy's own gist, others from production implementations. Click any row to expand the full approach. The Active Upkeep section at the bottom is the one that matters most.
@@ -1212,9 +1525,22 @@
-

The index.md breaks around 100–150 articles when it stops fitting cleanly in context. The fix is qmd — built by Tobi Lütke (Shopify CEO) and explicitly recommended by Karpathy himself. It's a local, on-device search engine for markdown files using hybrid BM25 + vector search with LLM re-ranking. No API calls, no data leaves your machine.

-

memex uses qmd from day one with three collections: wiki (live), wiki-archive (excluded by default), and wiki-conversations (mined sessions). Wing + room structural filtering narrows retrieval before search runs.

-
In memex: Configured at install time via docs/SETUP.md. The agent picks the right collection per query via guidance in the example CLAUDE.md files.
+

The index.md breaks around 100–150 articles when it stops fitting cleanly in context. The community-endorsed fix is qmd — built by Tobi Lütke (Shopify CEO) and explicitly recommended by Karpathy himself. It's a local, on-device search engine for markdown files using hybrid BM25 + vector search with LLM re-ranking. No API calls, no data leaves your machine.

+

Install and integrate:

+

+ npm install -g @tobilu/qmd + qmd collection add ./wiki --name my-research + qmd mcp +

+

The qmd mcp command exposes it as an MCP server so Claude Code uses it as a native tool — no shell-out friction. Three search modes: keyword BM25 (qmd search), semantic vector (qmd vsearch), and hybrid re-ranked (qmd query). Use the JSON output flag to pipe results into agent workflows.

+
Sweet spot: Use plain index.md navigation up to ~50 articles. Introduce qmd around 50–100. At 200+, qmd becomes essential — not optional.
+
+ Setup Effort +
+
+
+ 30 min one-time setup +
@@ -1226,7 +1552,16 @@
-

Before reaching for qmd, a simpler scaling step is to split index.md into domain-specific sub-indexes: wiki/patterns/index.md, wiki/decisions/index.md, etc. A root index.md points to sub-indexes, keeping any single file within comfortable context window bounds.

+

Before reaching for qmd, a simpler scaling step is to split index.md into domain-specific sub-indexes: wiki/ml-theory/index.md, wiki/infrastructure/index.md, etc. A root index.md points to sub-indexes, keeping any single file within comfortable context window bounds.

+

Define this in your schema file (CLAUDE.md) so the LLM knows which sub-index to update on ingest and which to consult on query. The LLM reads only the relevant sub-index, not the full corpus.

+
Sharding adds maintenance complexity to the schema. Document the domain boundaries clearly or the LLM will make inconsistent decisions about where new content lands.
+
+ Setup Effort +
+
+
+ 15 min schema update +
@@ -1234,12 +1569,74 @@
03 Consolidation tiers — promote stable knowledge up the stack - memex · confidence field + LLM Wiki v2 pattern
-

Structure knowledge in tiers by confidence and stability. Low-confidence claims live in draft pages. After multi-source confirmation, the LLM promotes them. Core principles graduate to a high-confidence tier that rarely changes.

-

In memex: Implemented via the confidence frontmatter field with time-based decay (6/9/12 month thresholds). Pages age out naturally as the automation re-promotes or archives them.

+

From the LLM Wiki v2 community extension: structure knowledge in tiers by confidence and stability. Raw observations live in low-confidence pages. After multi-source confirmation, the LLM promotes them to "established" pages. Core principles graduate to a high-confidence tier that rarely changes.

+

Each tier is more compressed, more confident, and longer-lived than the one below it. The LLM only loads lower tiers when deeper detail is needed. This naturally keeps context window usage lean as the wiki grows — you're querying the compressed tier first, the full tier only on demand.

+
Payoff: This also solves the staleness problem. Lower-tier pages decay naturally; upper-tier facts are reinforced repeatedly and earn their permanence.
+
+ Setup Effort +
+
+
+ Schema design work, ongoing co-evolution +
+
+
+ +
+
+ + +
+
+
🔐
+

Access Control & Multi-User

+ Medium Priority +
+
+ +
+
+ 01 + Host behind a lightweight wrapper — llmwiki.app or self-hosted MCP + MCP · llmwiki · FastAPI + +
+
+

The flat-file architecture has no access control by default. The cleanest mitigation is to expose the wiki through an MCP server rather than as raw files. The open-source llmwiki project (lucasastorian/llmwiki) does exactly this: it wraps the Karpathy pattern with a FastAPI backend, Supabase auth, and MCP endpoints. Claude connects via MCP and has read/write tools — but only through the authenticated layer.

+

For self-hosted setups: build a minimal FastAPI wrapper that authenticates via JWT before allowing MCP tool calls. The markdown files stay on disk; the API layer enforces who can read and write. This pattern is already used in production implementations like Hjarni.

+
Eric's wheelhouse: Given your OPNsense VLAN setup and existing FastAPI work on TaskForge, a simple auth wrapper is well within reach. Expose via Tailscale to keep it off the public internet entirely — no RBAC needed if the network boundary does the work.
+
+ Setup Effort +
+
+
+ Weekend project for self-hosted +
+
+
+ +
+
+ 02 + Scoped directories for shared vs. private content + Git · Directory structure + +
+
+

For small teams, a simpler pattern than full RBAC: separate wiki/shared/ from wiki/private/ directories, with git branch-level access control. The MCP server only exposes the shared/ tree to team members; personal pages stay in private/ on a branch only you merge from.

+

The LLM Wiki v2 pattern calls this "mesh sync with shared/private scoping." The schema file defines what can be promoted from private to shared and the conditions for that promotion.

+
This is soft access control — it relies on disciplined git usage, not cryptographic enforcement. Fine for trusted small teams; not for anything requiring audit trails or compliance.
+
+ Setup Effort +
+
+
+ Git config + schema update +
@@ -1263,9 +1660,22 @@
-

Make uncertainty explicit. Every factual claim carries metadata: confidence level, last verified date, source count. Confidence decays with time and strengthens with reinforcement from new sources.

-

In memex: Implemented with confidence: high|medium|low + last_verified + sources: fields. The hygiene script auto-decays stale pages and flags them for re-verification.

-
Key benefit: Errors become visible, decaying warnings instead of permanent silent landmines.
+

The LLM Wiki v2 pattern solves persistent errors by making uncertainty explicit. Every factual claim in a wiki page carries metadata: how many sources support it, when it was last confirmed, and a confidence score (e.g., 0.85). Confidence decays with time and strengthens with reinforcement from new sources.

+

Implement this in YAML frontmatter on each page:

+

+ confidence: 0.85 + sources: 2 + last_confirmed: 2026-04-01 +

+

The lint pass checks for pages with decayed confidence scores and flags them for re-verification. The LLM can say "I'm fairly sure about X but less sure about Y" — it's no longer a flat collection of equally-weighted claims.

+
Key benefit: This turns errors from permanent silent landmines into visible, decaying warnings. A wrong claim doesn't compound forever — it eventually gets flagged by its own decaying score.
+
+ Setup Effort +
+
+
+ Schema + frontmatter template update +
@@ -1273,12 +1683,41 @@
02 Typed supersession — new info explicitly replaces old claims - archive/ · log.md + Schema · log.md
-

When new information contradicts an existing claim, the wrong pattern is leaving the old claim with an appended note. The right pattern: the new claim explicitly supersedes the old one, which moves to archive with a link to its replacement.

-

In memex: Pages with status: Superseded by ... are auto-archived. The archive retains the full history with archived_date, archived_reason, and original_path fields.

+

When new information contradicts an existing wiki claim, the wrong pattern is leaving the old claim with an appended note. The right pattern: the new claim explicitly supersedes the old one. The old version is preserved but marked stale with a timestamp and link to what replaced it — version control for knowledge, not just for files.

+

Define supersession in your schema: the LLM's ingest instructions should check for contradictions against existing pages before writing, and when found, issue a formal supersession record rather than a quiet edit.

+
log.md discipline: Karpathy's second navigation file — the append-only audit log — is the mechanism for this. Every supersession event gets a log entry with timestamp, old claim, new claim, and source. The log is immutable context you can audit.
+
+ Setup Effort +
+
+
+ Schema + ingest prompt engineering +
+
+
+ +
+
+ 03 + Typed entity system — prevent duplicate and conflicting concepts + Schema · ELF / LLMWiki v2 + +
+
+

Community implementation ELF (Eli's Lab Framework) uses a strict typed-entity system where every page is declared as a type (library, project, person, concept, decision) and every link between pages has a typed relationship (uses, depends-on, contradicts, caused, fixed, supersedes). This prevents the LLM from creating duplicate concept pages under different names.

+

A 5-step incremental ingest pass: diff → summarize → extract → write → image. The extract step enforces entity typing before the write step creates any new page — if a typed entity already exists, it merges rather than duplicates.

+
Typed entity systems add upfront schema design work. Start loose; only formalize types after you see which duplicates are actually causing problems.
+
+ Setup Effort +
+
+
+ Significant schema design investment +
@@ -1289,63 +1728,549 @@
★ Biggest Mitigation Challenge

Active Upkeep — The Real Failure Mode

-

Community analysis of 120+ comments on Karpathy's gist converged on one clear finding: most people who try this pattern get the folder structure right and still end up with a wiki that slowly becomes unreliable, redundant, or abandoned. The difference between a wiki that compounds and one that quietly rots comes down to operational discipline — not technical setup. This is why memex's automation layer exists.

+

Community analysis of 120+ comments on Karpathy's gist converged on a clear finding: most people who try this pattern get the folder structure right and still end up with a wiki that slowly becomes unreliable, redundant, or abandoned. The difference between a wiki that compounds and one that quietly rots comes down to operational discipline — not technical setup.

Daily
Feed the Machine
    -
  • Extract new Claude Code sessions (hourly cron)
  • -
  • Summarize + index (daily 2am)
  • -
  • Harvest URLs + quick hygiene (daily 3am)
  • +
  • Drop new sources into raw/ via Obsidian Web Clipper
  • +
  • Ingest anything queued in _raw/ staging dir
  • +
  • Log questions answered by the wiki (reinforces confidence)
Weekly
-
Deep Pass
+
Lint Pass
    -
  • Full hygiene with LLM checks (Sun 4am)
  • -
  • Duplicate detection (auto-merge)
  • -
  • Contradiction report (human review)
  • -
  • Technology lifecycle checks
  • +
  • Run health check — orphan pages, broken wikilinks
  • +
  • Flag contradictions for review
  • +
  • Identify concepts referenced but not yet given own page
  • +
  • Review low-confidence / decayed pages
-
Continuous
-
Decay & Archive
+
Monthly
+
Schema Evolution
    -
  • last_verified refreshes from new sessions
  • -
  • Unused pages decay 6mo → 9mo → 12mo
  • -
  • Stale pages auto-archive
  • -
  • Archive auto-restores on reference
  • +
  • Review CLAUDE.md / AGENTS.md for outdated rules
  • +
  • Promote stable lower-tier pages up to established tier
  • +
  • Run qmd re-index if collection has grown significantly
  • +
  • Purge truly stale pages per retention curve
-
Review
-
Human in Loop
+
As Needed
+
Circuit Breakers
    -
  • Staging pipeline for automated content
  • -
  • wiki-staging.py --review workflow
  • -
  • Hygiene reports split fixed vs needs-review
  • -
  • Promote / reject / defer
  • +
  • Separate vault and agent working directories
  • +
  • Never let agent write directly to vault/verified/
  • +
  • Manual audit any page cited in high-stakes decisions
  • +
  • Keep raw/ as ground truth — always traceable back
+
+
+
🔄
+

Upkeep Automation — Making It Stick

+ Critical +
+
+ +
+
+ 01 + Separate vault from agent working directory — hard partition + Directory structure + +
+
+

The instinct is to have the agent write directly into the wiki. This creates the rot. The principle: your curated/verified vault and the agent's working vault (speculative writes, messy drafts, exploratory connections still being tested) must be physically separate directories. Only the human promotes content from agent-working to vault.

+

Structure: wiki/verified/ (human-promoted, high trust) vs wiki/staging/ (agent writes here first). The lint pass reviews staging and proposes promotions. You approve them. The signal-to-noise ratio in your verified wiki stays high permanently.

+
Why this works: You're not adding friction to the agent — you're protecting the valuable layer. The agent still does all the work. You just gate what graduates to trusted status.
+
+ Setup Effort +
+
+
+ Directory rename + schema update +
+
+
+ +
+
+ 02 + Automate the ingest trigger — don't rely on memory to feed it + Cron · Webhooks · Claude Code + +
+
+

The number one reason wikis rot: the human stops ingesting because life gets busy. The fix is removing the human from the trigger loop. Set up a cron job or a filesystem watcher on raw/ that automatically triggers the ingest command whenever a new file lands. The human's job shrinks to: drop file, walk away.

+

Implementations: inotifywait on Linux, fswatch on macOS, or a Node.js chokidar watcher. On drop, the watcher calls your ingest script which runs the LLM compilation pass. You get a notification when it completes.

+
For your stack: This maps cleanly to your existing automation patterns — a simple Node-RED flow watching a directory, triggering a webhook to Claude Code, and notifying via Slack/Telegram through OpenClaw when ingest completes.
+
+ Setup Effort +
+
+
+ 2–4 hours watcher + webhook +
+
+
+ +
+
+ 03 + Schedule the weekly lint as a non-negotiable calendar block + Cron · Scheduler + +
+
+

Lint passes don't happen if you have to remember to run them. The solution is automating them on a schedule — a weekly cron job that runs the lint command, writes a report to a lint-reports/ directory, and sends you a summary notification. The report tells you: N orphan pages found, N contradictions flagged, N pages with decayed confidence.

+

You review the report (5 minutes), decide which flagged items to address, and optionally run the LLM to resolve them. The system is telling you what needs attention rather than you having to inspect everything.

+
What community data shows: People who automate the lint schedule have wikis that stay healthy at 6 months. People who rely on manual "I'll remember to lint" have wikis that are abandoned or unreliable at 6 weeks.
+
+ Setup Effort +
+
+
+ Cron setup + notification routing +
+
+
+ +
+
+ 04 + Identity-aware filter — the schema knows who the wiki is for + Schema · CLAUDE.md + +
+
+

A community-evolved enhancement to Karpathy's original: add an identity-aware filter to your schema. A prompt section that tells the LLM exactly who the wiki is for, what their goals are, and what "high-signal" means in that context. The LLM then scores sources before ingesting and rewrites that filter over time based on what has proven useful.

+

This prevents the wiki from becoming a neutral encyclopedia of everything you've read. It stays opinionated, relevant, and tuned to your actual work. Over months, the schema itself becomes a reflection of what you find worth knowing — a second-order artifact of the system.

+
Upkeep benefit: A well-tuned identity filter means the LLM rejects low-signal sources at ingest time rather than filling the wiki with noise you'll have to purge later. Garbage-in prevention beats garbage-out cleanup.
+
+ Setup Effort +
+
+
+ 10 min schema addition, self-evolving after +
+
+
+ +
+
+ 05 + Retention curve — build in structured forgetting + Frontmatter · Lint pass + +
+
+

Not everything should live forever. A wiki that never forgets becomes noisy — important signals buried under outdated context. Implement a retention curve: facts that were important once but haven't been accessed or reinforced in months gradually fade to "archived" status. The lint pass executes this curve automatically.

+

Frontmatter fields to add: last_accessed, access_count, status: active|fading|archived. The lint pass updates status based on time-since-access and reinforcement count. Archived pages aren't deleted — they move to wiki/archive/ where they're out of the active index but still traceable.

+
The payoff: Active upkeep gets easier over time as the wiki self-trims. After 6 months of running with a retention curve, your active wiki is denser and higher-signal than at month 1 — not bloated and harder to navigate.
+
+ Setup Effort +
+
+
+ Frontmatter + lint script update +
+
+
+ +
+
+
+ +
+ +
+
⬡ Your Stack Extension — MemPalace + qmd + Conversation Pipeline
+

The wiki gains a living feed and a structural memory layer.

+

Standard Karpathy wiki is fed by sources you manually drop into raw/. Your setup replaces that bottleneck with an automated conversation pipeline: every AI session gets mined into MemPalace, summarized, and fed into raw/ on a continuous basis. The wiki stops being a project you maintain and becomes an organism that grows from your daily work. Combined with qmd replacing ChromaDB for indexing, you have a genuinely novel hybrid that addresses the core limitations differently than any single pattern alone.

+

Note: You are skipping MemPalace's ChromaDB storage layer and using qmd for indexing instead. The implications of that choice are documented throughout this tab.

+
+
96.6%MemPalace R@5 Raw Mode
+
+34%Retrieval via wing+room filtering
+
~170Tokens on wake-up (L0+L1)
+
19MCP Tools available
+
qmdReplaces ChromaDB indexing
+
+
+ + +
+
Your Architecture — Data Flow
+
Layer 0 — Conversation Capture
+
+
Claude / AI
Sessions
+
+
MemPalace
mine --mode convos
+
+
Wings / Rooms
Halls / Tunnels
+
+
Closets
(summaries)
+
+
Drawers
(verbatim)
+
+
Layer 1 — Wiki Compilation
+
+
Conversation
Summaries
+
+
raw/
(staged)
+
+
LLM
Compiler
+
+
wiki/
(compiled pages)
+
+
qmd
Index
+
+
Layer 2 — Query
+
+
Natural
Language Query
+
+
MemPalace
wing+room filter
+
+
+
qmd
BM25+vector
+
+
LLM reads
wiki pages
+
+
Grounded
Answer
+
+
+ + +
+

MemPalace Concepts

+ +
+
+
+
🏛️
+
Wing
+
Person or Project
+
Top-level namespace — one per person you work with or project you run. Conversations and facts are scoped to their wing automatically via keyword detection on mining.
+
→ Maps to wiki domain sub-index (e.g. wiki/taskforge/)
+
+
+
🚪
+
Room
+
Topic / Concept
+
Specific subject within a wing — auth-migration, ci-pipeline, database-decisions. When the same room exists across wings, a tunnel auto-connects them. Provides the +34% retrieval boost via wing+room filtering.
+
→ Maps to wiki concept page (e.g. wiki/taskforge/auth.md)
+
+
+
🗂️
+
Closet
+
Summary Layer
+
Plain-text summaries that point the LLM to the right drawer. This is the layer you are feeding into raw/ — closet output becomes a high-quality, pre-structured input to the wiki compiler rather than raw transcript noise.
+
→ These summaries become your raw/ inputs
+
+
+
📦
+
Drawer
+
Verbatim Archive
+
The exact original words — never summarized, never lost. This is your ground truth for cross-checking. When confidence scoring flags a wiki claim as decayed, you trace it back to the drawer for verification. Eliminates the "no original source" problem.
+
→ Ground truth for cross-check / error persistence mitigation
+
+
+
🏃
+
Hall
+
Memory Type Corridor
+
Fixed corridors within every wing: hall_facts (decisions), hall_events (sessions/milestones), hall_discoveries (breakthroughs), hall_preferences (habits), hall_advice (recommendations). Memory typed at ingest time — no post-hoc categorization needed.
+
→ Maps to wiki page type in CLAUDE.md schema
+
+
+
🚇
+
Tunnel
+
Cross-Wing Connection
+
Automatic links when the same room topic appears across different wings. "Auth-migration" in wing_kai and wing_taskforge creates a tunnel — the palace navigation finds cross-project connections that explicit wikilinks alone would miss.
+
→ Enriches wiki cross-references beyond manual [[wikilinks]]
+
+
+ + +
+

Impact on Known Limitations

+ +
+
+ +
+ Largely Solved +

Active Upkeep — The #1 Failure Mode

+

Conversation mining + auto-save hooks make the feed automatic. You no longer have to remember to drop files into raw/. Every Claude Code session is mined. The PreCompact hook fires before context compression. The Stop hook fires every 15 messages.

+
+
BeforeHumans forget to ingest → wiki rots at 6 weeks
+
AfterHooks auto-mine every session → continuous feed
+
+
+ +
+ Largely Solved +

Error Persistence / Cross-Check

+

Drawers preserve verbatim originals permanently. When a wiki claim is flagged as low-confidence, you have an exact traceable source to verify against — not just "raw/source-2026-04.md" but a wing-scoped, room-tagged original with a drawer ID.

+
+
BeforeErrors persist silently, no clear original to check
+
AfterDrawers = verbatim ground truth, always traceable
+
+
+ +
+ Significantly Reduced +

Scale Ceiling

+

MemPalace's wing+room metadata filtering means qmd doesn't have to search the entire corpus — it searches a pre-narrowed wing/room scope first. This extends the effective scale ceiling because retrieval is structurally guided before the BM25+vector pass fires.

+
+
Beforeqmd searches entire wiki — token ceiling still binding
+
AfterWing+room filter → qmd works on relevant subset
+
+
+ +
+ Character Shifted +

Knowledge Staleness

+

Conversations are the primary source — they're inherently current. Every session you have becomes a potential ingest. Staleness now depends on how actively you use AI tools (which you do constantly), not on whether you remember to read and clip articles.

+
+
BeforeStaleness from manual source curation gaps
+
AfterStaleness from conversation coverage gaps (much smaller)
+
+
+ +
+ Reduced +

Semantic Retrieval Gap vs RAG

+

The combination of MemPalace structural navigation (wing → room → closet → drawer) plus qmd's BM25+vector search covers both explicit structural navigation and fuzzy semantic matching. You have the best of both retrieval patterns without a full vector database.

+
+
BeforeExplicit wikilinks only — misses differently-worded concepts
+
AfterStructural nav + qmd semantic fills the gap
+
+
+ +
+ New Consideration +

Conversation Noise in raw/

+

Not every conversation deserves to enter the wiki. Debugging rabbit holes, exploratory dead-ends, and casual exchanges are valuable in MemPalace's verbatim drawers but would pollute the wiki if compiled directly. The summarization/filtering step before raw/ is now load-bearing.

+
+
Old RiskNo raw/ source, hard to feed continuously
+
New RiskToo much raw/ — summarization quality is critical
+
+
+ +
+ + +
+

qmd vs ChromaDB — Your Trade-off

+ +
+
+
⚠ Honest Assessment of the Trade-off
+ MemPalace's benchmark-leading 96.6% R@5 score comes specifically from raw verbatim storage in ChromaDB. By replacing ChromaDB with qmd, you are choosing a different design point: simpler local infrastructure and tighter wiki integration over maximum semantic recall on conversation search. This is a defensible choice for your use case — but it's worth knowing what you're trading. +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Dimensionqmd (your choice)ChromaDB (MemPalace default)
Storage formatMarkdown files (same as wiki)✦ Proprietary vector DB
Semantic recall (LongMemEval)Not benchmarked on this task✦ 96.6% R@5 raw mode
Wiki integration✦ Native — indexes wiki/ directlySeparate store, no wiki awareness
Single index to maintain✦ Yes — one qmd collectionNo — wiki + ChromaDB separate
MCP exposure✦ qmd mcp — native tool for ClaudeVia MemPalace MCP server
Hybrid search (BM25 + vector)✦ Built in — qmd queryChromaDB semantic only
Dependencies✦ npm only, local GGUF modelPython, chromadb, potential version pin issues
Verbatim drawer retrievalNot designed for this✦ Core feature — drawers are ChromaDB entries
Architectural simplicity✦ One search layer for everythingTwo parallel search systems
+ +
+ The key practical point: MemPalace's structural navigation (wing+room filtering) still provides the +34% retrieval boost regardless of what sits behind it. You retain the palace architecture's biggest advantage. The ChromaDB vs qmd choice only affects the semantic search layer, not the structural navigation layer. + — Analysis based on MemPalace architecture documentation, April 2026 +
+ + +
+

Updated Mitigation Status

+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
LimitationBefore MemPalaceWith MemPalace + qmdResidual Work
Active UpkeepManual — wikis rot✦ Auto-hooks feed continuouslySummarization quality tuning
Error PersistenceNo traceable ground truth✦ Drawers = verbatim sourceConfidence scoring in schema
Scale Ceiling~50–100K token hard limitExtended by wing+room filteringqmd still needed at 200+ articles
Semantic Retrieval GapExplicit links only✦ Structure + qmd BM25+vectorSome ChromaDB recall lost (see above)
Knowledge StalenessDepends on manual curation✦ Continuous from session miningRetention curve still needed
Cross-checkRaw docs only, imprecise✦ Drawer-level verbatim traceabilityfact_checker.py not yet wired (v3)
Access ControlFlat file, noneStill needs MCP wrapper layerTailscale boundary is your fastest path
Cognitive OutsourcingValid concernUnchanged — wiki is still reference onlyDesign intent: reference, not replacement
+ + +
+

New Risks Introduced

+ +
+
+ +
+
+ ! + Summarization quality is now load-bearing + Critical Path + +
+
+

In the original pattern, you curated sources manually — only deliberate, quality inputs entered raw/. With conversation mining, the filter is your summarization scripts. If those scripts surface debugging dead-ends, exploratory rabbit holes, or noise, it enters the wiki compilation pipeline. Garbage-in still applies — it's just at a different point in the flow.

+

Mitigation: Tune your conversation scripts to filter by memory type (hall_facts and hall_discoveries are high-signal; hall_events is medium; raw session transcripts are low). Only promote closet summaries tagged as decisions, discoveries, or recommendations. Use MemPalace's --extract general mode to auto-classify before staging.

+
Practical rule: Only closets from hall_facts and hall_discoveries should auto-promote to raw/. Other halls should require a manual review step before staging.
+
+
+ +
+
+ ! + MemPalace fact_checker.py is not yet wired into KG ops (v3.0.0) + Known Gap · Issue #27 + +
+
+

MemPalace's contradiction detection (fact_checker.py) exists as a standalone utility but is not currently called automatically during knowledge graph operations — the authors acknowledged this in their April 7 correction note. This means cross-wing contradictions won't be auto-flagged at ingest time yet.

+

Mitigation: Call fact_checker.py manually as part of your lint pass script until Issue #27 is resolved. Wire it as a pre-commit hook on wiki/ changes: any new page goes through fact_checker before being promoted from staging to verified.

+
Track Issue #27 on the MemPalace repo. This is being actively fixed. Once wired, contradiction detection becomes a native part of your ingest pipeline — a major upgrade to the cross-check mitigation.
+
+
+ +
+
+ ~ + Two memory systems need schema alignment + Operational Risk + +
+
+

MemPalace's taxonomy (wings, rooms, halls) and the wiki's taxonomy (domains, concept pages, page types in CLAUDE.md) are separate schemas. If they drift — MemPalace calls something "wing_taskforge/hall_facts/auth" while the wiki calls it "infrastructure/auth-decisions" — the structural navigation loses coherence. Tunnels and wikilinks stop reinforcing each other.

+

Mitigation: Define a canonical mapping document (a simple markdown table) that maps MemPalace wing/room names to wiki domain/page paths. Reference it in both CLAUDE.md and your MemPalace wing_config.json. Review quarterly — schemas co-evolve, but they need to co-evolve together.

+
Your advantage: You already have a discipline around CLAUDE.md management. Add a "Palace Map" section to your global CLAUDE.md that specifies the canonical wing→wiki-domain mapping. The LLM consults it on every ingest.
+
+
+ +
+ +
+
- Analysis: April 2026 - git.eric-turner.com/turnercodeflex/memex - Design → Implementation + Sources: VentureBeat · Epsilla · Atlan · Medium · Starmorph · GitHub Gist Community · MemPalace README + memex · Karpathy's Pattern + MemPalace · April 2026 + Compiled April 11, 2026