docs: reframe as extensions + replace Signal & Noise artifact

Two changes, one commit:

1. Reframe "weaknesses" as "extensions memex adds":
   Karpathy's gist is a concept pitch, not an implementation. Reframe
   the seven places memex extends the pattern as engineering-layer
   additions rather than problems to fix. Cleaner narrative — memex
   builds on Karpathy's work instead of critiquing it.

   Touches README.md (Why each part exists + Credits) and
   DESIGN-RATIONALE.md (section titles, trade-off framing, biggest
   layer section, scope note at the end).

2. Replace docs/artifacts/signal-and-noise.html with the full
   upstream version:
   The earlier abbreviated copy dropped the MemPalace integration tab,
   the detailed mitigation steps with effort pips, the impact
   before/after cards, and the qmd vs ChromaDB comparison. This
   restores all of that. Also swaps self-references from "LLM Wiki"
   to "memex" while leaving external "LLM Wiki v2" community
   citations alone (those refer to a separate pattern and aren't ours
   to rename).

The live hosted copy at eric-turner.com/memex/signal-and-noise.html
has already been updated via scp — Hugo picks up static changes with
--poll 1s so the public URL reflects this file immediately.
This commit is contained in:
Eric Turner
2026-04-12 22:01:31 -06:00
parent 2a37e33fd6
commit 4c6b7609a1
3 changed files with 1191 additions and 238 deletions

View File

@@ -100,30 +100,34 @@ keeping cross-references intact, and flagging ambiguity for review.
--- ---
## Why each part exists ## How memex extends Karpathy's pattern
Before implementing anything, the design was worked out interactively Before implementing anything, the design was worked out interactively
with Claude as a with Claude as a structured
[Signal & Noise analysis of Karpathy's pattern](https://eric-turner.com/memex/signal-and-noise.html). [Signal & Noise analysis](https://eric-turner.com/memex/signal-and-noise.html).
That analysis found seven real weaknesses in the core pattern. This Karpathy's original gist is a concept pitch, not an implementation —
repo exists because each weakness has a concrete mitigation — and he was explicit that he was sharing an "idea file" for others to build
every component maps directly to one: on. memex is one attempt at that build-out. The analysis identified
seven places where the core idea needed an engineering layer to become
practical day-to-day, and every automation component in this repo maps
to one of those extensions:
| Karpathy-pattern weakness | How this repo answers it | | What memex adds | How it works |
|---------------------------|--------------------------| |-----------------|--------------|
| **Errors persist and compound** | `confidence` field with time-based decay → pages age out visibly. Staging review catches automated content before it goes live. Full-mode hygiene does LLM contradiction detection. | | **Time-decaying confidence** — pages earn trust through reinforcement and fade without it | `confidence` field + `last_verified`, 6/9/12 month decay thresholds, auto-archive. Full-mode hygiene also adds LLM contradiction detection across pages. |
| **Hard ~50K-token ceiling** | `qmd` (BM25 + vector + re-ranking) set up from day one. Wing/room structural filtering narrows search before retrieval. Archive collection is excluded from default search. | | **Scalable search beyond the context window** | `qmd` (BM25 + vector + LLM re-ranking) from day one, with three collections (`wiki` / `wiki-archive` / `wiki-conversations`) so queries can route to the right surface. |
| **Manual cross-checking returns** | Every wiki claim traces back to immutable `raw/harvested/*.md` with SHA-256 hash. Staging review IS the cross-check. `compilation_notes` field makes review fast. | | **Traceable sources for every claim** | Every compiled page traces back to an immutable `raw/harvested/*.md` file with a SHA-256 content hash. Staging review is the built-in cross-check, and `compilation_notes` makes review fast. |
| **Knowledge staleness** (the #1 failure mode in community data) | Daily + weekly cron removes "I forgot" as a failure mode. `last_verified` auto-refreshes from conversation references. Decayed pages auto-archive. | | **Continuous feed without manual discipline** | Daily + weekly cron chains extract → summarize → harvest → hygiene → reindex. `last_verified` auto-refreshes from new conversation references; decayed pages auto-archive and auto-restore when referenced again. |
| **Cognitive outsourcing risk** | Staging review forces engagement with every automated page. `qmd query` makes retrieval an active exploration. Wake-up briefing ~200 tokens the human reads too. | | **Human-in-the-loop staging** for automated content | Every automated page lands in `staging/` first with `origin: automated`, `status: pending`. Nothing bypasses human review — one promotion step and it's in the live wiki with `last_verified` set. |
| **Weaker semantic retrieval** | `qmd` hybrid (BM25 + vector). Full-mode hygiene adds missing cross-references. Structural metadata (wings, rooms) complements semantic search. | | **Hybrid retrieval** — structural navigation + semantic search | Wings/rooms/halls (borrowed from mempalace) give structural filtering that narrows the search space before qmd's hybrid BM25 + vector pass runs. Full-mode hygiene also auto-adds missing cross-references. |
| **No access control** | Git sync with `merge=union` markdown handling. Network-boundary ACL via Tailscale is the suggested path. *This one is a residual trade-off — see [DESIGN-RATIONALE.md](docs/DESIGN-RATIONALE.md).* | | **Cross-machine git sync** for collaborative knowledge bases | `.gitattributes` with `merge=union` on markdown so concurrent writes on different machines merge additively. Harvest and hygiene state files sync across machines so both agree on what's been processed. |
The short version: Karpathy published the idea, the community found the The short version: Karpathy shared the idea, milla-jovovich's mempalace
holes, and this repo is the automation layer that plugs the holes. added the structural memory taxonomy, and memex is the automation layer
See **[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md)** for the that lets the whole thing run day-to-day without constant human
full argument with honest residual trade-offs and what this repo maintenance. See **[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md)**
explicitly does NOT solve. for the longer rationale on each extension, plus honest notes on what
memex doesn't cover.
--- ---
@@ -384,9 +388,9 @@ on top. It would not exist without either of them.
**Core pattern — [Andrej Karpathy — "Agent-Maintained Persistent Wiki" gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)** **Core pattern — [Andrej Karpathy — "Agent-Maintained Persistent Wiki" gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)**
The foundational idea of a compounding LLM-maintained wiki that moves The foundational idea of a compounding LLM-maintained wiki that moves
synthesis from query-time (RAG) to ingest-time. This repo is an synthesis from query-time (RAG) to ingest-time. memex is an
implementation of Karpathy's pattern with the community-identified implementation of Karpathy's pattern with the engineering layer that
failure modes plugged. turns the concept into something practical to run day-to-day.
**Structural memory taxonomy — [milla-jovovich/mempalace](https://github.com/milla-jovovich/mempalace)** **Structural memory taxonomy — [milla-jovovich/mempalace](https://github.com/milla-jovovich/mempalace)**
The wing/room/hall/closet/drawer/tunnel concepts that turn a flat The wing/room/hall/closet/drawer/tunnel concepts that turn a flat
@@ -411,12 +415,12 @@ simple pages.
The repo is Claude-specific (see the section above for what that means The repo is Claude-specific (see the section above for what that means
and how to adapt for other agents). and how to adapt for other agents).
**Design process**this repo was designed interactively with Claude **Design process**memex was designed interactively with Claude as a
as a structured Signal & Noise analysis before any code was written. structured Signal & Noise analysis before any code was written. The
The analysis walks through the seven real strengths and seven real analysis walks through the seven real strengths of Karpathy's pattern
weaknesses of Karpathy's pattern, then works through concrete and seven places where it needs an engineering layer to be practical,
mitigations for each weakness. Every component in this repo maps back and works through the concrete extension for each. Every component in
to a specific mitigation identified there. this repo maps back to a specific extension identified there.
- **Live interactive version**: - **Live interactive version**:
[eric-turner.com/memex/signal-and-noise.html](https://eric-turner.com/memex/signal-and-noise.html) [eric-turner.com/memex/signal-and-noise.html](https://eric-turner.com/memex/signal-and-noise.html)

View File

@@ -14,10 +14,11 @@ original persistent-wiki pattern:
> — same content, works offline > — same content, works offline
The analysis walks through the pattern's seven genuine strengths, seven The analysis walks through the pattern's seven genuine strengths, seven
real weaknesses, and concrete mitigations for each weakness. This repo places where it needs an engineering layer to be practical, and the
is the implementation of those mitigations. If you want to understand concrete extension for each. memex is the implementation of those
*why* a component exists, the interactive version has the longer-form extensions. If you want to understand *why* a component exists, the
argument; this document is the condensed written version. interactive version has the longer-form argument; this document is the
condensed written version.
--- ---
@@ -38,20 +39,24 @@ repo preserves all of them:
--- ---
## Where the pattern is genuinely weak — and how this repo answers ## Where memex extends the pattern
The analysis identified seven real weaknesses. Five have direct Karpathy's gist is a concept pitch. He was explicit that he was sharing
mitigations in this repo; two remain open trade-offs you should be aware an "idea file" for others to build on, not publishing a working
of. implementation. The analysis identified seven places where the core idea
needs an engineering layer to become practical day-to-day — five have
first-class answers in memex, and two remain scoped-out trade-offs that
the architecture cleanly acknowledges.
### 1. Errors persist and compound ### 1. Claim freshness and reversibility
**The problem**: Unlike RAG — where a hallucination is ephemeral and the **The gap**: Unlike RAG — where a hallucination is ephemeral and the
next query starts clean — an LLM wiki persists its mistakes. If the LLM next query starts clean — an LLM-maintained wiki is stateful. If a
incorrectly links two concepts at ingest time, future ingests build on claim is wrong at ingest time, it stays wrong until something corrects
that wrong prior. it. For the pattern to work long-term, claims need a way to earn trust
over time and lose it when unused.
**How this repo mitigates**: **How memex extends it**:
- **`confidence` field** — every page carries `high`/`medium`/`low` with - **`confidence` field** — every page carries `high`/`medium`/`low` with
decay based on `last_verified`. Wrong claims aren't treated as decay based on `last_verified`. Wrong claims aren't treated as
@@ -71,13 +76,14 @@ that wrong prior.
two chances to get caught (AI compile + human review) before they two chances to get caught (AI compile + human review) before they
become persistent. become persistent.
### 2. Hard scale ceiling at ~50K tokens ### 2. Scalable search beyond the context window
**The problem**: The wiki approach stops working when `index.md` no **The gap**: The pattern works beautifully up to ~100 articles, where
longer fits in context. Karpathy's own wiki was ~100 articles / 400K `index.md` still fits in context. Karpathy's own wiki was right at the
words — already near the ceiling. ceiling. Past that point, the agent needs a real search layer — loading
the full index stops being practical.
**How this repo mitigates**: **How memex extends it**:
- **`qmd` from day one** — `qmd` (BM25 + vector + LLM re-ranking) is set - **`qmd` from day one** — `qmd` (BM25 + vector + LLM re-ranking) is set
up in the default configuration so the agent never has to load the up in the default configuration so the agent never has to load the
@@ -93,14 +99,14 @@ words — already near the ceiling.
`includeByDefault: false`, so archived pages don't eat context until `includeByDefault: false`, so archived pages don't eat context until
explicitly queried. explicitly queried.
### 3. Manual cross-checking burden returns in precision-critical domains ### 3. Traceable sources for every claim
**The problem**: For API specs, version constraints, legal records, and **The gap**: In precision-sensitive domains (API specs, version
medical protocols, LLM-generated content needs human verification. The constraints, legal records, medical protocols), LLM-generated content
maintenance burden you thought you'd eliminated comes back as needs to be verifiable against a source. For the pattern to work in
verification overhead. those contexts, every claim needs to trace back to something immutable.
**How this repo mitigates**: **How memex extends it**:
- **Staging workflow** — every automated page goes through human review. - **Staging workflow** — every automated page goes through human review.
For precision-critical content, that review IS the cross-check. The For precision-critical content, that review IS the cross-check. The
@@ -120,15 +126,16 @@ medical, compliance), no amount of automation replaces domain-expert
review. If that's your use case, treat this repo as a *drafting* tool, review. If that's your use case, treat this repo as a *drafting* tool,
not a canonical source. not a canonical source.
### 4. Knowledge staleness without active upkeep ### 4. Continuous feed without manual discipline
**The problem**: Community analysis of 120+ comments on Karpathy's gist **The gap**: Community analysis of 120+ comments on Karpathy's gist
found this is the #1 failure mode. Most people who try the pattern get converged on one clear finding: this is the #1 friction point. Most
people who try the pattern get
the folder structure right and still end up with a wiki that slowly the folder structure right and still end up with a wiki that slowly
becomes unreliable because they stop feeding it. Six-week half-life is becomes unreliable because they stop feeding it. Six-week half-life is
typical. typical.
**How this repo mitigates** (this is the biggest thing): **How memex extends it** (this is the biggest layer):
- **Automation replaces human discipline** — daily cron runs - **Automation replaces human discipline** — daily cron runs
`wiki-maintain.sh` (harvest + hygiene + qmd reindex); weekly cron runs `wiki-maintain.sh` (harvest + hygiene + qmd reindex); weekly cron runs
@@ -147,17 +154,20 @@ typical.
flags the things that *do* need human judgment. Everything else is flags the things that *do* need human judgment. Everything else is
auto-fixed. auto-fixed.
This is the single biggest reason this repo exists. The automation This is the single biggest layer memex adds. Nothing about it is
layer is entirely about removing "I forgot to lint" as a failure mode. exotic — it's a cron-scheduled pipeline that runs the scripts you'd
otherwise have to remember to run. That's the whole trick.
### 5. Cognitive outsourcing risk ### 5. Keeping the human engaged with their own knowledge
**The problem**: Hacker News critics argued that the bookkeeping **The gap**: Hacker News critics pointed out that the bookkeeping
Karpathy outsources — filing, cross-referencing, summarizing — is Karpathy outsources — filing, cross-referencing, summarizing — is
precisely where genuine understanding forms. Outsource it and you end up precisely where genuine understanding forms. If the LLM does all of
with a comprehensive wiki you haven't internalized. it, you can end up with a comprehensive wiki you haven't internalized.
For the pattern to be an actual memory aid and not a false one, the
human needs touchpoints that keep them engaged.
**How this repo mitigates**: **How memex extends it**:
- **Staging review is a forcing function** — you see every automated - **Staging review is a forcing function** — you see every automated
page before it lands. Even skimming forces engagement with the page before it lands. Even skimming forces engagement with the
@@ -169,19 +179,21 @@ with a comprehensive wiki you haven't internalized.
the agent reads at session start. You read it too (or the agent reads the agent reads at session start. You read it too (or the agent reads
it to you) — ongoing re-exposure to your own knowledge base. it to you) — ongoing re-exposure to your own knowledge base.
**Residual trade-off**: This is a real concern even with mitigations. **Caveat**: memex is designed as *augmentation*, not *replacement*.
The wiki is designed as *augmentation*, not *replacement*. If you It's most valuable when you engage with it actively — reading your own
never read your own wiki and only consult it through the agent, you're wake-up briefing, spot-checking promoted pages, noticing decay flags.
in the outsourcing failure mode. The fix is discipline, not If you only consult the wiki through the agent and never look at it
architecture. yourself, you've outsourced the learning. That's a usage pattern
choice, not an architecture problem.
### 6. Weaker semantic retrieval than RAG at scale ### 6. Hybrid retrieval — structure and semantics
**The problem**: At large corpora, vector embeddings find semantically **The gap**: Explicit wikilinks catch direct topic references but miss
related content across different wording in ways explicit wikilinks semantic neighbors that use different wording. At scale, the pattern
can't match. benefits from vector similarity to find cross-topic connections the
human (or the LLM at ingest time) didn't think to link manually.
**How this repo mitigates**: **How memex extends it**:
- **`qmd` is hybrid (BM25 + vector)** — not just keyword search. Vector - **`qmd` is hybrid (BM25 + vector)** — not just keyword search. Vector
similarity is built into the retrieval pipeline from day one. similarity is built into the retrieval pipeline from day one.
@@ -198,33 +210,38 @@ can't match.
proper vector DB with specialized retrieval wins. This repo is for proper vector DB with specialized retrieval wins. This repo is for
personal / small-team scale where the hybrid approach is sufficient. personal / small-team scale where the hybrid approach is sufficient.
### 7. No access control or multi-user support ### 7. Cross-machine collaboration
**The problem**: It's a folder of markdown files. No RBAC, no audit **The gap**: Karpathy's gist describes a single-user, single-machine
logging, no concurrency handling, no permissions model. setup. In practice, people work from multiple machines (laptop,
workstation, server) and sometimes collaborate with small teams. The
pattern needs a sync story that handles concurrent writes gracefully.
**How this repo mitigates**: **How memex extends it**:
- **Git-based sync with merge-union** — concurrent writes on different - **Git-based sync with merge-union** — concurrent writes on different
machines auto-resolve because markdown is set to `merge=union` in machines auto-resolve because markdown is set to `merge=union` in
`.gitattributes`. Both sides win. `.gitattributes`. Both sides win.
- **Network boundary as soft access control** — the suggested - **State file sync** — `.harvest-state.json` and `.hygiene-state.json`
deployment is over Tailscale or a VPN, so the network does the work a are committed, so two machines running the same pipeline agree on
RBAC layer would otherwise do. Not enterprise-grade, but sufficient what's already been processed instead of re-doing the work.
for personal/family/small-team use. - **Network boundary as access gate** — the suggested deployment is
over Tailscale or a VPN, so the network enforces who can reach the
wiki at all. Simple and sufficient for personal/family/small-team
use.
**Residual trade-off**: **This is the big one.** The repo is not a **Explicit scope**: memex is **deliberately not** enterprise knowledge
replacement for enterprise knowledge management. No audit trails, no management. No audit trails, no fine-grained permissions, no compliance
fine-grained permissions, no compliance story. If you need any of story. If you need any of that, you need a different architecture.
that, you need a different architecture. This repo is explicitly This is for the personal and small-team case where git + Tailscale is
scoped to the personal/small-team use case. the right amount of rigor.
--- ---
## The #1 failure mode — active upkeep ## The biggest layer — active upkeep
Every other weakness has a mitigation. *Active upkeep is the one that The other six extensions are important, but this is the one that makes
kills wikis in the wild.* The community data is unambiguous: or breaks the pattern in practice. The community data is unambiguous:
- People who automate the lint schedule → wikis healthy at 6+ months - People who automate the lint schedule → wikis healthy at 6+ months
- People who rely on "I'll remember to lint" → wikis abandoned at 6 weeks - People who rely on "I'll remember to lint" → wikis abandoned at 6 weeks
@@ -243,8 +260,9 @@ thing the human has to think about:
If you disable all of these, you get the same outcome as every If you disable all of these, you get the same outcome as every
abandoned wiki: six-week half-life. The scripts aren't optional abandoned wiki: six-week half-life. The scripts aren't optional
convenience — they're the load-bearing answer to the pattern's primary convenience — they're the load-bearing automation that lets the pattern
failure mode. actually compound over months and years instead of requiring a
disciplined human to keep it alive.
--- ---
@@ -305,26 +323,32 @@ This repo borrows that wholesale.
--- ---
## Honest residual trade-offs ## What memex deliberately doesn't try to do
Five items from the analysis that this repo doesn't fully solve and Five things memex is explicitly scoped around — not because they're
where you should know the limits: unsolvable, but because solving them well requires a different kind of
architecture than a personal/small-team wiki. If any of these are
dealbreakers for your use case, memex is probably not the right fit:
1. **Enterprise scale**this is a personal/small-team tool. Millions 1. **Enterprise scale**millions of documents, hundreds of users,
of documents, hundreds of users, RBAC, compliance: wrong RBAC, compliance: these need real enterprise knowledge management
architecture. infrastructure. memex is tuned for personal and small-team use.
2. **True semantic retrieval at massive scale**`qmd` hybrid search 2. **True semantic retrieval at massive scale**`qmd` hybrid search
is great for thousands of pages, not millions. works great up to thousands of pages. At millions, a dedicated
3. **Cognitive outsourcing** — no architecture fix. Discipline vector database with specialized retrieval wins.
yourself to read your own wiki, not just query it through the agent. 3. **Replacing your own learning** — memex is an augmentation layer,
4. **Precision-critical domains** — for legal/medical/regulatory data, not a substitute for reading. Used well, it's a memory aid; used as
use this as a drafting tool, not a source of truth. Human a bypass, it just lets you forget more.
domain-expert review is not replaceable. 4. **Precision-critical source of truth** — for legal, medical, or
5. **Access control** — network boundary (Tailscale) is the fastest regulatory data, memex is a drafting tool. Human domain-expert
path; nothing in the repo itself enforces permissions. review still owns the final call.
5. **Access control** — the network boundary (Tailscale) is the
fastest path to "only authorized people can reach it." memex itself
doesn't enforce permissions inside that boundary.
If any of these are dealbreakers for your use case, a different These are scope decisions, not unfinished work. memex is the best
architecture is probably what you need. personal/small-team answer to Karpathy's pattern I could build; it's
not trying to be every answer.
--- ---

File diff suppressed because it is too large Load Diff