docs: reframe as extensions + replace Signal & Noise artifact
Two changes, one commit: 1. Reframe "weaknesses" as "extensions memex adds": Karpathy's gist is a concept pitch, not an implementation. Reframe the seven places memex extends the pattern as engineering-layer additions rather than problems to fix. Cleaner narrative — memex builds on Karpathy's work instead of critiquing it. Touches README.md (Why each part exists + Credits) and DESIGN-RATIONALE.md (section titles, trade-off framing, biggest layer section, scope note at the end). 2. Replace docs/artifacts/signal-and-noise.html with the full upstream version: The earlier abbreviated copy dropped the MemPalace integration tab, the detailed mitigation steps with effort pips, the impact before/after cards, and the qmd vs ChromaDB comparison. This restores all of that. Also swaps self-references from "LLM Wiki" to "memex" while leaving external "LLM Wiki v2" community citations alone (those refer to a separate pattern and aren't ours to rename). The live hosted copy at eric-turner.com/memex/signal-and-noise.html has already been updated via scp — Hugo picks up static changes with --poll 1s so the public URL reflects this file immediately.
This commit is contained in:
62
README.md
62
README.md
@@ -100,30 +100,34 @@ keeping cross-references intact, and flagging ambiguity for review.
|
||||
|
||||
---
|
||||
|
||||
## Why each part exists
|
||||
## How memex extends Karpathy's pattern
|
||||
|
||||
Before implementing anything, the design was worked out interactively
|
||||
with Claude as a
|
||||
[Signal & Noise analysis of Karpathy's pattern](https://eric-turner.com/memex/signal-and-noise.html).
|
||||
That analysis found seven real weaknesses in the core pattern. This
|
||||
repo exists because each weakness has a concrete mitigation — and
|
||||
every component maps directly to one:
|
||||
with Claude as a structured
|
||||
[Signal & Noise analysis](https://eric-turner.com/memex/signal-and-noise.html).
|
||||
Karpathy's original gist is a concept pitch, not an implementation —
|
||||
he was explicit that he was sharing an "idea file" for others to build
|
||||
on. memex is one attempt at that build-out. The analysis identified
|
||||
seven places where the core idea needed an engineering layer to become
|
||||
practical day-to-day, and every automation component in this repo maps
|
||||
to one of those extensions:
|
||||
|
||||
| Karpathy-pattern weakness | How this repo answers it |
|
||||
|---------------------------|--------------------------|
|
||||
| **Errors persist and compound** | `confidence` field with time-based decay → pages age out visibly. Staging review catches automated content before it goes live. Full-mode hygiene does LLM contradiction detection. |
|
||||
| **Hard ~50K-token ceiling** | `qmd` (BM25 + vector + re-ranking) set up from day one. Wing/room structural filtering narrows search before retrieval. Archive collection is excluded from default search. |
|
||||
| **Manual cross-checking returns** | Every wiki claim traces back to immutable `raw/harvested/*.md` with SHA-256 hash. Staging review IS the cross-check. `compilation_notes` field makes review fast. |
|
||||
| **Knowledge staleness** (the #1 failure mode in community data) | Daily + weekly cron removes "I forgot" as a failure mode. `last_verified` auto-refreshes from conversation references. Decayed pages auto-archive. |
|
||||
| **Cognitive outsourcing risk** | Staging review forces engagement with every automated page. `qmd query` makes retrieval an active exploration. Wake-up briefing ~200 tokens the human reads too. |
|
||||
| **Weaker semantic retrieval** | `qmd` hybrid (BM25 + vector). Full-mode hygiene adds missing cross-references. Structural metadata (wings, rooms) complements semantic search. |
|
||||
| **No access control** | Git sync with `merge=union` markdown handling. Network-boundary ACL via Tailscale is the suggested path. *This one is a residual trade-off — see [DESIGN-RATIONALE.md](docs/DESIGN-RATIONALE.md).* |
|
||||
| What memex adds | How it works |
|
||||
|-----------------|--------------|
|
||||
| **Time-decaying confidence** — pages earn trust through reinforcement and fade without it | `confidence` field + `last_verified`, 6/9/12 month decay thresholds, auto-archive. Full-mode hygiene also adds LLM contradiction detection across pages. |
|
||||
| **Scalable search beyond the context window** | `qmd` (BM25 + vector + LLM re-ranking) from day one, with three collections (`wiki` / `wiki-archive` / `wiki-conversations`) so queries can route to the right surface. |
|
||||
| **Traceable sources for every claim** | Every compiled page traces back to an immutable `raw/harvested/*.md` file with a SHA-256 content hash. Staging review is the built-in cross-check, and `compilation_notes` makes review fast. |
|
||||
| **Continuous feed without manual discipline** | Daily + weekly cron chains extract → summarize → harvest → hygiene → reindex. `last_verified` auto-refreshes from new conversation references; decayed pages auto-archive and auto-restore when referenced again. |
|
||||
| **Human-in-the-loop staging** for automated content | Every automated page lands in `staging/` first with `origin: automated`, `status: pending`. Nothing bypasses human review — one promotion step and it's in the live wiki with `last_verified` set. |
|
||||
| **Hybrid retrieval** — structural navigation + semantic search | Wings/rooms/halls (borrowed from mempalace) give structural filtering that narrows the search space before qmd's hybrid BM25 + vector pass runs. Full-mode hygiene also auto-adds missing cross-references. |
|
||||
| **Cross-machine git sync** for collaborative knowledge bases | `.gitattributes` with `merge=union` on markdown so concurrent writes on different machines merge additively. Harvest and hygiene state files sync across machines so both agree on what's been processed. |
|
||||
|
||||
The short version: Karpathy published the idea, the community found the
|
||||
holes, and this repo is the automation layer that plugs the holes.
|
||||
See **[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md)** for the
|
||||
full argument with honest residual trade-offs and what this repo
|
||||
explicitly does NOT solve.
|
||||
The short version: Karpathy shared the idea, milla-jovovich's mempalace
|
||||
added the structural memory taxonomy, and memex is the automation layer
|
||||
that lets the whole thing run day-to-day without constant human
|
||||
maintenance. See **[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md)**
|
||||
for the longer rationale on each extension, plus honest notes on what
|
||||
memex doesn't cover.
|
||||
|
||||
---
|
||||
|
||||
@@ -384,9 +388,9 @@ on top. It would not exist without either of them.
|
||||
|
||||
**Core pattern — [Andrej Karpathy — "Agent-Maintained Persistent Wiki" gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)**
|
||||
The foundational idea of a compounding LLM-maintained wiki that moves
|
||||
synthesis from query-time (RAG) to ingest-time. This repo is an
|
||||
implementation of Karpathy's pattern with the community-identified
|
||||
failure modes plugged.
|
||||
synthesis from query-time (RAG) to ingest-time. memex is an
|
||||
implementation of Karpathy's pattern with the engineering layer that
|
||||
turns the concept into something practical to run day-to-day.
|
||||
|
||||
**Structural memory taxonomy — [milla-jovovich/mempalace](https://github.com/milla-jovovich/mempalace)**
|
||||
The wing/room/hall/closet/drawer/tunnel concepts that turn a flat
|
||||
@@ -411,12 +415,12 @@ simple pages.
|
||||
The repo is Claude-specific (see the section above for what that means
|
||||
and how to adapt for other agents).
|
||||
|
||||
**Design process** — this repo was designed interactively with Claude
|
||||
as a structured Signal & Noise analysis before any code was written.
|
||||
The analysis walks through the seven real strengths and seven real
|
||||
weaknesses of Karpathy's pattern, then works through concrete
|
||||
mitigations for each weakness. Every component in this repo maps back
|
||||
to a specific mitigation identified there.
|
||||
**Design process** — memex was designed interactively with Claude as a
|
||||
structured Signal & Noise analysis before any code was written. The
|
||||
analysis walks through the seven real strengths of Karpathy's pattern
|
||||
and seven places where it needs an engineering layer to be practical,
|
||||
and works through the concrete extension for each. Every component in
|
||||
this repo maps back to a specific extension identified there.
|
||||
|
||||
- **Live interactive version**:
|
||||
[eric-turner.com/memex/signal-and-noise.html](https://eric-turner.com/memex/signal-and-noise.html)
|
||||
|
||||
Reference in New Issue
Block a user