docs: reframe as extensions + replace Signal & Noise artifact

Two changes, one commit: 1. Reframe "weaknesses" as "extensions memex adds": Karpathy's gist is a concept pitch, not an implementation. Reframe the seven places memex extends the pattern as engineering-layer additions rather than problems to fix. Cleaner narrative — memex builds on Karpathy's work instead of critiquing it. Touches README.md (Why each part exists + Credits) and DESIGN-RATIONALE.md (section titles, trade-off framing, biggest layer section, scope note at the end). 2. Replace docs/artifacts/signal-and-noise.html with the full upstream version: The earlier abbreviated copy dropped the MemPalace integration tab, the detailed mitigation steps with effort pips, the impact before/after cards, and the qmd vs ChromaDB comparison. This restores all of that. Also swaps self-references from "LLM Wiki" to "memex" while leaving external "LLM Wiki v2" community citations alone (those refer to a separate pattern and aren't ours to rename). The live hosted copy at eric-turner.com/memex/signal-and-noise.html has already been updated via scp — Hugo picks up static changes with --poll 1s so the public URL reflects this file immediately.
2026-04-12 22:01:31 -06:00
parent 2a37e33fd6
commit 4c6b7609a1
3 changed files with 1191 additions and 238 deletions
@@ -100,30 +100,34 @@ keeping cross-references intact, and flagging ambiguity for review.

 ---

-## Why each part exists
+## How memex extends Karpathy's pattern

 Before implementing anything, the design was worked out interactively
-with Claude as a
-[Signal & Noise analysis of Karpathy's pattern](https://eric-turner.com/memex/signal-and-noise.html).
-That analysis found seven real weaknesses in the core pattern. This
-repo exists because each weakness has a concrete mitigation — and
-every component maps directly to one:
+with Claude as a structured
+[Signal & Noise analysis](https://eric-turner.com/memex/signal-and-noise.html).
+Karpathy's original gist is a concept pitch, not an implementation —
+he was explicit that he was sharing an "idea file" for others to build
+on. memex is one attempt at that build-out. The analysis identified
+seven places where the core idea needed an engineering layer to become
+practical day-to-day, and every automation component in this repo maps
+to one of those extensions:

-| Karpathy-pattern weakness | How this repo answers it |
-|---------------------------|--------------------------|
-| **Errors persist and compound** | `confidence` field with time-based decay → pages age out visibly. Staging review catches automated content before it goes live. Full-mode hygiene does LLM contradiction detection. |
-| **Hard ~50K-token ceiling** | `qmd` (BM25 + vector + re-ranking) set up from day one. Wing/room structural filtering narrows search before retrieval. Archive collection is excluded from default search. |
-| **Manual cross-checking returns** | Every wiki claim traces back to immutable `raw/harvested/*.md` with SHA-256 hash. Staging review IS the cross-check. `compilation_notes` field makes review fast. |
-| **Knowledge staleness** (the #1 failure mode in community data) | Daily + weekly cron removes "I forgot" as a failure mode. `last_verified` auto-refreshes from conversation references. Decayed pages auto-archive. |
-| **Cognitive outsourcing risk** | Staging review forces engagement with every automated page. `qmd query` makes retrieval an active exploration. Wake-up briefing ~200 tokens the human reads too. |
-| **Weaker semantic retrieval** | `qmd` hybrid (BM25 + vector). Full-mode hygiene adds missing cross-references. Structural metadata (wings, rooms) complements semantic search. |
-| **No access control** | Git sync with `merge=union` markdown handling. Network-boundary ACL via Tailscale is the suggested path. *This one is a residual trade-off — see [DESIGN-RATIONALE.md](docs/DESIGN-RATIONALE.md).* |
+| What memex adds | How it works |
+|-----------------|--------------|
+| **Time-decaying confidence** — pages earn trust through reinforcement and fade without it | `confidence` field + `last_verified`, 6/9/12 month decay thresholds, auto-archive. Full-mode hygiene also adds LLM contradiction detection across pages. |
+| **Scalable search beyond the context window** | `qmd` (BM25 + vector + LLM re-ranking) from day one, with three collections (`wiki` / `wiki-archive` / `wiki-conversations`) so queries can route to the right surface. |
+| **Traceable sources for every claim** | Every compiled page traces back to an immutable `raw/harvested/*.md` file with a SHA-256 content hash. Staging review is the built-in cross-check, and `compilation_notes` makes review fast. |
+| **Continuous feed without manual discipline** | Daily + weekly cron chains extract → summarize → harvest → hygiene → reindex. `last_verified` auto-refreshes from new conversation references; decayed pages auto-archive and auto-restore when referenced again. |
+| **Human-in-the-loop staging** for automated content | Every automated page lands in `staging/` first with `origin: automated`, `status: pending`. Nothing bypasses human review — one promotion step and it's in the live wiki with `last_verified` set. |
+| **Hybrid retrieval** — structural navigation + semantic search | Wings/rooms/halls (borrowed from mempalace) give structural filtering that narrows the search space before qmd's hybrid BM25 + vector pass runs. Full-mode hygiene also auto-adds missing cross-references. |
+| **Cross-machine git sync** for collaborative knowledge bases | `.gitattributes` with `merge=union` on markdown so concurrent writes on different machines merge additively. Harvest and hygiene state files sync across machines so both agree on what's been processed. |

-The short version: Karpathy published the idea, the community found the
-holes, and this repo is the automation layer that plugs the holes.
-See **[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md)** for the
-full argument with honest residual trade-offs and what this repo
-explicitly does NOT solve.
+The short version: Karpathy shared the idea, milla-jovovich's mempalace
+added the structural memory taxonomy, and memex is the automation layer
+that lets the whole thing run day-to-day without constant human
+maintenance. See **[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md)**
+for the longer rationale on each extension, plus honest notes on what
+memex doesn't cover.

 ---

@@ -384,9 +388,9 @@ on top. It would not exist without either of them.

 **Core pattern — [Andrej Karpathy — "Agent-Maintained Persistent Wiki" gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)**
 The foundational idea of a compounding LLM-maintained wiki that moves
-synthesis from query-time (RAG) to ingest-time. This repo is an
-implementation of Karpathy's pattern with the community-identified
-failure modes plugged.
+synthesis from query-time (RAG) to ingest-time. memex is an
+implementation of Karpathy's pattern with the engineering layer that
+turns the concept into something practical to run day-to-day.

 **Structural memory taxonomy — [milla-jovovich/mempalace](https://github.com/milla-jovovich/mempalace)**
 The wing/room/hall/closet/drawer/tunnel concepts that turn a flat
@@ -411,12 +415,12 @@ simple pages.
 The repo is Claude-specific (see the section above for what that means
 and how to adapt for other agents).

-**Design process** — this repo was designed interactively with Claude
-as a structured Signal & Noise analysis before any code was written.
-The analysis walks through the seven real strengths and seven real
-weaknesses of Karpathy's pattern, then works through concrete
-mitigations for each weakness. Every component in this repo maps back
-to a specific mitigation identified there.
+**Design process** — memex was designed interactively with Claude as a
+structured Signal & Noise analysis before any code was written. The
+analysis walks through the seven real strengths of Karpathy's pattern
+and seven places where it needs an engineering layer to be practical,
+and works through the concrete extension for each. Every component in
+this repo maps back to a specific extension identified there.

 - **Live interactive version**:
  [eric-turner.com/memex/signal-and-noise.html](https://eric-turner.com/memex/signal-and-noise.html)
@@ -14,10 +14,11 @@ original persistent-wiki pattern:
 > — same content, works offline

 The analysis walks through the pattern's seven genuine strengths, seven
-real weaknesses, and concrete mitigations for each weakness. This repo
-is the implementation of those mitigations. If you want to understand
-*why* a component exists, the interactive version has the longer-form
-argument; this document is the condensed written version.
+places where it needs an engineering layer to be practical, and the
+concrete extension for each. memex is the implementation of those
+extensions. If you want to understand *why* a component exists, the
+interactive version has the longer-form argument; this document is the
+condensed written version.

 ---

@@ -38,20 +39,24 @@ repo preserves all of them:

 ---

-## Where the pattern is genuinely weak — and how this repo answers
+## Where memex extends the pattern

-The analysis identified seven real weaknesses. Five have direct
-mitigations in this repo; two remain open trade-offs you should be aware
-of.
+Karpathy's gist is a concept pitch. He was explicit that he was sharing
+an "idea file" for others to build on, not publishing a working
+implementation. The analysis identified seven places where the core idea
+needs an engineering layer to become practical day-to-day — five have
+first-class answers in memex, and two remain scoped-out trade-offs that
+the architecture cleanly acknowledges.

-### 1. Errors persist and compound
+### 1. Claim freshness and reversibility

-**The problem**: Unlike RAG — where a hallucination is ephemeral and the
-next query starts clean — an LLM wiki persists its mistakes. If the LLM
-incorrectly links two concepts at ingest time, future ingests build on
-that wrong prior.
+**The gap**: Unlike RAG — where a hallucination is ephemeral and the
+next query starts clean — an LLM-maintained wiki is stateful. If a
+claim is wrong at ingest time, it stays wrong until something corrects
+it. For the pattern to work long-term, claims need a way to earn trust
+over time and lose it when unused.

-**How this repo mitigates**:
+**How memex extends it**:

 - **`confidence` field** — every page carries `high`/`medium`/`low` with
  decay based on `last_verified`. Wrong claims aren't treated as
@@ -71,13 +76,14 @@ that wrong prior.
  two chances to get caught (AI compile + human review) before they
  become persistent.

-### 2. Hard scale ceiling at ~50K tokens
+### 2. Scalable search beyond the context window

-**The problem**: The wiki approach stops working when `index.md` no
-longer fits in context. Karpathy's own wiki was ~100 articles / 400K
-words — already near the ceiling.
+**The gap**: The pattern works beautifully up to ~100 articles, where
+`index.md` still fits in context. Karpathy's own wiki was right at the
+ceiling. Past that point, the agent needs a real search layer — loading
+the full index stops being practical.

-**How this repo mitigates**:
+**How memex extends it**:

 - **`qmd` from day one** — `qmd` (BM25 + vector + LLM re-ranking) is set
  up in the default configuration so the agent never has to load the
@@ -93,14 +99,14 @@ words — already near the ceiling.
  `includeByDefault: false`, so archived pages don't eat context until
  explicitly queried.

-### 3. Manual cross-checking burden returns in precision-critical domains
+### 3. Traceable sources for every claim

-**The problem**: For API specs, version constraints, legal records, and
-medical protocols, LLM-generated content needs human verification. The
-maintenance burden you thought you'd eliminated comes back as
-verification overhead.
+**The gap**: In precision-sensitive domains (API specs, version
+constraints, legal records, medical protocols), LLM-generated content
+needs to be verifiable against a source. For the pattern to work in
+those contexts, every claim needs to trace back to something immutable.

-**How this repo mitigates**:
+**How memex extends it**:

 - **Staging workflow** — every automated page goes through human review.
  For precision-critical content, that review IS the cross-check. The
@@ -120,15 +126,16 @@ medical, compliance), no amount of automation replaces domain-expert
 review. If that's your use case, treat this repo as a *drafting* tool,
 not a canonical source.

-### 4. Knowledge staleness without active upkeep
+### 4. Continuous feed without manual discipline

-**The problem**: Community analysis of 120+ comments on Karpathy's gist
-found this is the #1 failure mode. Most people who try the pattern get
+**The gap**: Community analysis of 120+ comments on Karpathy's gist
+converged on one clear finding: this is the #1 friction point. Most
+people who try the pattern get
 the folder structure right and still end up with a wiki that slowly
 becomes unreliable because they stop feeding it. Six-week half-life is
 typical.

-**How this repo mitigates** (this is the biggest thing):
+**How memex extends it** (this is the biggest layer):

 - **Automation replaces human discipline** — daily cron runs
  `wiki-maintain.sh` (harvest + hygiene + qmd reindex); weekly cron runs
@@ -147,17 +154,20 @@ typical.
  flags the things that *do* need human judgment. Everything else is
  auto-fixed.

-This is the single biggest reason this repo exists. The automation
-layer is entirely about removing "I forgot to lint" as a failure mode.
+This is the single biggest layer memex adds. Nothing about it is
+exotic — it's a cron-scheduled pipeline that runs the scripts you'd
+otherwise have to remember to run. That's the whole trick.

-### 5. Cognitive outsourcing risk
+### 5. Keeping the human engaged with their own knowledge

-**The problem**: Hacker News critics argued that the bookkeeping
+**The gap**: Hacker News critics pointed out that the bookkeeping
 Karpathy outsources — filing, cross-referencing, summarizing — is
-precisely where genuine understanding forms. Outsource it and you end up
-with a comprehensive wiki you haven't internalized.
+precisely where genuine understanding forms. If the LLM does all of
+it, you can end up with a comprehensive wiki you haven't internalized.
+For the pattern to be an actual memory aid and not a false one, the
+human needs touchpoints that keep them engaged.

-**How this repo mitigates**:
+**How memex extends it**:

 - **Staging review is a forcing function** — you see every automated
  page before it lands. Even skimming forces engagement with the
@@ -169,19 +179,21 @@ with a comprehensive wiki you haven't internalized.
  the agent reads at session start. You read it too (or the agent reads
  it to you) — ongoing re-exposure to your own knowledge base.

-**Residual trade-off**: This is a real concern even with mitigations.
-The wiki is designed as *augmentation*, not *replacement*. If you
-never read your own wiki and only consult it through the agent, you're
-in the outsourcing failure mode. The fix is discipline, not
-architecture.
+**Caveat**: memex is designed as *augmentation*, not *replacement*.
+It's most valuable when you engage with it actively — reading your own
+wake-up briefing, spot-checking promoted pages, noticing decay flags.
+If you only consult the wiki through the agent and never look at it
+yourself, you've outsourced the learning. That's a usage pattern
+choice, not an architecture problem.

-### 6. Weaker semantic retrieval than RAG at scale
+### 6. Hybrid retrieval — structure and semantics

-**The problem**: At large corpora, vector embeddings find semantically
-related content across different wording in ways explicit wikilinks
-can't match.
+**The gap**: Explicit wikilinks catch direct topic references but miss
+semantic neighbors that use different wording. At scale, the pattern
+benefits from vector similarity to find cross-topic connections the
+human (or the LLM at ingest time) didn't think to link manually.

-**How this repo mitigates**:
+**How memex extends it**:

 - **`qmd` is hybrid (BM25 + vector)** — not just keyword search. Vector
  similarity is built into the retrieval pipeline from day one.
@@ -198,33 +210,38 @@ can't match.
 proper vector DB with specialized retrieval wins. This repo is for
 personal / small-team scale where the hybrid approach is sufficient.

-### 7. No access control or multi-user support
+### 7. Cross-machine collaboration

-**The problem**: It's a folder of markdown files. No RBAC, no audit
-logging, no concurrency handling, no permissions model.
+**The gap**: Karpathy's gist describes a single-user, single-machine
+setup. In practice, people work from multiple machines (laptop,
+workstation, server) and sometimes collaborate with small teams. The
+pattern needs a sync story that handles concurrent writes gracefully.

-**How this repo mitigates**:
+**How memex extends it**:

 - **Git-based sync with merge-union** — concurrent writes on different
  machines auto-resolve because markdown is set to `merge=union` in
  `.gitattributes`. Both sides win.
- **Network boundary as soft access control** — the suggested
-  deployment is over Tailscale or a VPN, so the network does the work a
-  RBAC layer would otherwise do. Not enterprise-grade, but sufficient
-  for personal/family/small-team use.
+- **State file sync** — `.harvest-state.json` and `.hygiene-state.json`
+  are committed, so two machines running the same pipeline agree on
+  what's already been processed instead of re-doing the work.
+- **Network boundary as access gate** — the suggested deployment is
+  over Tailscale or a VPN, so the network enforces who can reach the
+  wiki at all. Simple and sufficient for personal/family/small-team
+  use.

-**Residual trade-off**: **This is the big one.** The repo is not a
-replacement for enterprise knowledge management. No audit trails, no
-fine-grained permissions, no compliance story. If you need any of
-that, you need a different architecture. This repo is explicitly
-scoped to the personal/small-team use case.
+**Explicit scope**: memex is **deliberately not** enterprise knowledge
+management. No audit trails, no fine-grained permissions, no compliance
+story. If you need any of that, you need a different architecture.
+This is for the personal and small-team case where git + Tailscale is
+the right amount of rigor.

 ---

-## The #1 failure mode — active upkeep
+## The biggest layer — active upkeep

-Every other weakness has a mitigation. *Active upkeep is the one that
-kills wikis in the wild.* The community data is unambiguous:
+The other six extensions are important, but this is the one that makes
+or breaks the pattern in practice. The community data is unambiguous:

 - People who automate the lint schedule → wikis healthy at 6+ months
 - People who rely on "I'll remember to lint" → wikis abandoned at 6 weeks
@@ -243,8 +260,9 @@ thing the human has to think about:

 If you disable all of these, you get the same outcome as every
 abandoned wiki: six-week half-life. The scripts aren't optional
-convenience — they're the load-bearing answer to the pattern's primary
-failure mode.
+convenience — they're the load-bearing automation that lets the pattern
+actually compound over months and years instead of requiring a
+disciplined human to keep it alive.

 ---

@@ -305,26 +323,32 @@ This repo borrows that wholesale.

 ---

-## Honest residual trade-offs
+## What memex deliberately doesn't try to do

-Five items from the analysis that this repo doesn't fully solve and
-where you should know the limits:
+Five things memex is explicitly scoped around — not because they're
+unsolvable, but because solving them well requires a different kind of
+architecture than a personal/small-team wiki. If any of these are
+dealbreakers for your use case, memex is probably not the right fit:

-1. **Enterprise scale** — this is a personal/small-team tool. Millions
-   of documents, hundreds of users, RBAC, compliance: wrong
-   architecture.
+1. **Enterprise scale** — millions of documents, hundreds of users,
+   RBAC, compliance: these need real enterprise knowledge management
+   infrastructure. memex is tuned for personal and small-team use.
 2. **True semantic retrieval at massive scale** — `qmd` hybrid search
-   is great for thousands of pages, not millions.
-3. **Cognitive outsourcing** — no architecture fix. Discipline
-   yourself to read your own wiki, not just query it through the agent.
-4. **Precision-critical domains** — for legal/medical/regulatory data,
-   use this as a drafting tool, not a source of truth. Human
-   domain-expert review is not replaceable.
-5. **Access control** — network boundary (Tailscale) is the fastest
-   path; nothing in the repo itself enforces permissions.
+   works great up to thousands of pages. At millions, a dedicated
+   vector database with specialized retrieval wins.
+3. **Replacing your own learning** — memex is an augmentation layer,
+   not a substitute for reading. Used well, it's a memory aid; used as
+   a bypass, it just lets you forget more.
+4. **Precision-critical source of truth** — for legal, medical, or
+   regulatory data, memex is a drafting tool. Human domain-expert
+   review still owns the final call.
+5. **Access control** — the network boundary (Tailscale) is the
+   fastest path to "only authorized people can reach it." memex itself
+   doesn't enforce permissions inside that boundary.

-If any of these are dealbreakers for your use case, a different
-architecture is probably what you need.
+These are scope decisions, not unfinished work. memex is the best
+personal/small-team answer to Karpathy's pattern I could build; it's
+not trying to be every answer.

 ---