docs: reframe as extensions + replace Signal & Noise artifact

Two changes, one commit:

1. Reframe "weaknesses" as "extensions memex adds":
   Karpathy's gist is a concept pitch, not an implementation. Reframe
   the seven places memex extends the pattern as engineering-layer
   additions rather than problems to fix. Cleaner narrative — memex
   builds on Karpathy's work instead of critiquing it.

   Touches README.md (Why each part exists + Credits) and
   DESIGN-RATIONALE.md (section titles, trade-off framing, biggest
   layer section, scope note at the end).

2. Replace docs/artifacts/signal-and-noise.html with the full
   upstream version:
   The earlier abbreviated copy dropped the MemPalace integration tab,
   the detailed mitigation steps with effort pips, the impact
   before/after cards, and the qmd vs ChromaDB comparison. This
   restores all of that. Also swaps self-references from "LLM Wiki"
   to "memex" while leaving external "LLM Wiki v2" community
   citations alone (those refer to a separate pattern and aren't ours
   to rename).

The live hosted copy at eric-turner.com/memex/signal-and-noise.html
has already been updated via scp — Hugo picks up static changes with
--poll 1s so the public URL reflects this file immediately.
This commit is contained in:
Eric Turner
2026-04-12 22:01:31 -06:00
parent 2a37e33fd6
commit 4c6b7609a1
3 changed files with 1191 additions and 238 deletions

View File

@@ -100,30 +100,34 @@ keeping cross-references intact, and flagging ambiguity for review.
---
## Why each part exists
## How memex extends Karpathy's pattern
Before implementing anything, the design was worked out interactively
with Claude as a
[Signal & Noise analysis of Karpathy's pattern](https://eric-turner.com/memex/signal-and-noise.html).
That analysis found seven real weaknesses in the core pattern. This
repo exists because each weakness has a concrete mitigation — and
every component maps directly to one:
with Claude as a structured
[Signal & Noise analysis](https://eric-turner.com/memex/signal-and-noise.html).
Karpathy's original gist is a concept pitch, not an implementation —
he was explicit that he was sharing an "idea file" for others to build
on. memex is one attempt at that build-out. The analysis identified
seven places where the core idea needed an engineering layer to become
practical day-to-day, and every automation component in this repo maps
to one of those extensions:
| Karpathy-pattern weakness | How this repo answers it |
|---------------------------|--------------------------|
| **Errors persist and compound** | `confidence` field with time-based decay → pages age out visibly. Staging review catches automated content before it goes live. Full-mode hygiene does LLM contradiction detection. |
| **Hard ~50K-token ceiling** | `qmd` (BM25 + vector + re-ranking) set up from day one. Wing/room structural filtering narrows search before retrieval. Archive collection is excluded from default search. |
| **Manual cross-checking returns** | Every wiki claim traces back to immutable `raw/harvested/*.md` with SHA-256 hash. Staging review IS the cross-check. `compilation_notes` field makes review fast. |
| **Knowledge staleness** (the #1 failure mode in community data) | Daily + weekly cron removes "I forgot" as a failure mode. `last_verified` auto-refreshes from conversation references. Decayed pages auto-archive. |
| **Cognitive outsourcing risk** | Staging review forces engagement with every automated page. `qmd query` makes retrieval an active exploration. Wake-up briefing ~200 tokens the human reads too. |
| **Weaker semantic retrieval** | `qmd` hybrid (BM25 + vector). Full-mode hygiene adds missing cross-references. Structural metadata (wings, rooms) complements semantic search. |
| **No access control** | Git sync with `merge=union` markdown handling. Network-boundary ACL via Tailscale is the suggested path. *This one is a residual trade-off — see [DESIGN-RATIONALE.md](docs/DESIGN-RATIONALE.md).* |
| What memex adds | How it works |
|-----------------|--------------|
| **Time-decaying confidence** — pages earn trust through reinforcement and fade without it | `confidence` field + `last_verified`, 6/9/12 month decay thresholds, auto-archive. Full-mode hygiene also adds LLM contradiction detection across pages. |
| **Scalable search beyond the context window** | `qmd` (BM25 + vector + LLM re-ranking) from day one, with three collections (`wiki` / `wiki-archive` / `wiki-conversations`) so queries can route to the right surface. |
| **Traceable sources for every claim** | Every compiled page traces back to an immutable `raw/harvested/*.md` file with a SHA-256 content hash. Staging review is the built-in cross-check, and `compilation_notes` makes review fast. |
| **Continuous feed without manual discipline** | Daily + weekly cron chains extract → summarize → harvest → hygiene → reindex. `last_verified` auto-refreshes from new conversation references; decayed pages auto-archive and auto-restore when referenced again. |
| **Human-in-the-loop staging** for automated content | Every automated page lands in `staging/` first with `origin: automated`, `status: pending`. Nothing bypasses human review — one promotion step and it's in the live wiki with `last_verified` set. |
| **Hybrid retrieval** — structural navigation + semantic search | Wings/rooms/halls (borrowed from mempalace) give structural filtering that narrows the search space before qmd's hybrid BM25 + vector pass runs. Full-mode hygiene also auto-adds missing cross-references. |
| **Cross-machine git sync** for collaborative knowledge bases | `.gitattributes` with `merge=union` on markdown so concurrent writes on different machines merge additively. Harvest and hygiene state files sync across machines so both agree on what's been processed. |
The short version: Karpathy published the idea, the community found the
holes, and this repo is the automation layer that plugs the holes.
See **[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md)** for the
full argument with honest residual trade-offs and what this repo
explicitly does NOT solve.
The short version: Karpathy shared the idea, milla-jovovich's mempalace
added the structural memory taxonomy, and memex is the automation layer
that lets the whole thing run day-to-day without constant human
maintenance. See **[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md)**
for the longer rationale on each extension, plus honest notes on what
memex doesn't cover.
---
@@ -384,9 +388,9 @@ on top. It would not exist without either of them.
**Core pattern — [Andrej Karpathy — "Agent-Maintained Persistent Wiki" gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)**
The foundational idea of a compounding LLM-maintained wiki that moves
synthesis from query-time (RAG) to ingest-time. This repo is an
implementation of Karpathy's pattern with the community-identified
failure modes plugged.
synthesis from query-time (RAG) to ingest-time. memex is an
implementation of Karpathy's pattern with the engineering layer that
turns the concept into something practical to run day-to-day.
**Structural memory taxonomy — [milla-jovovich/mempalace](https://github.com/milla-jovovich/mempalace)**
The wing/room/hall/closet/drawer/tunnel concepts that turn a flat
@@ -411,12 +415,12 @@ simple pages.
The repo is Claude-specific (see the section above for what that means
and how to adapt for other agents).
**Design process**this repo was designed interactively with Claude
as a structured Signal & Noise analysis before any code was written.
The analysis walks through the seven real strengths and seven real
weaknesses of Karpathy's pattern, then works through concrete
mitigations for each weakness. Every component in this repo maps back
to a specific mitigation identified there.
**Design process**memex was designed interactively with Claude as a
structured Signal & Noise analysis before any code was written. The
analysis walks through the seven real strengths of Karpathy's pattern
and seven places where it needs an engineering layer to be practical,
and works through the concrete extension for each. Every component in
this repo maps back to a specific extension identified there.
- **Live interactive version**:
[eric-turner.com/memex/signal-and-noise.html](https://eric-turner.com/memex/signal-and-noise.html)