Two changes, one commit: 1. Reframe "weaknesses" as "extensions memex adds": Karpathy's gist is a concept pitch, not an implementation. Reframe the seven places memex extends the pattern as engineering-layer additions rather than problems to fix. Cleaner narrative — memex builds on Karpathy's work instead of critiquing it. Touches README.md (Why each part exists + Credits) and DESIGN-RATIONALE.md (section titles, trade-off framing, biggest layer section, scope note at the end). 2. Replace docs/artifacts/signal-and-noise.html with the full upstream version: The earlier abbreviated copy dropped the MemPalace integration tab, the detailed mitigation steps with effort pips, the impact before/after cards, and the qmd vs ChromaDB comparison. This restores all of that. Also swaps self-references from "LLM Wiki" to "memex" while leaving external "LLM Wiki v2" community citations alone (those refer to a separate pattern and aren't ours to rename). The live hosted copy at eric-turner.com/memex/signal-and-noise.html has already been updated via scp — Hugo picks up static changes with --poll 1s so the public URL reflects this file immediately.
369 lines
18 KiB
Markdown
369 lines
18 KiB
Markdown
# Design Rationale — Signal & Noise
|
|
|
|
Why each part of this repo exists. This is the "why" document; the other
|
|
docs are the "what" and "how."
|
|
|
|
Before implementing anything, the design was worked out interactively
|
|
with Claude as a structured Signal & Noise analysis of Andrej Karpathy's
|
|
original persistent-wiki pattern:
|
|
|
|
> **Interactive version**: [eric-turner.com/memex/signal-and-noise.html](https://eric-turner.com/memex/signal-and-noise.html)
|
|
> — tabs for pros/cons, vs RAG, use-case fits, signal breakdown, mitigations
|
|
>
|
|
> **Self-contained archive**: [`artifacts/signal-and-noise.html`](artifacts/signal-and-noise.html)
|
|
> — same content, works offline
|
|
|
|
The analysis walks through the pattern's seven genuine strengths, seven
|
|
places where it needs an engineering layer to be practical, and the
|
|
concrete extension for each. memex is the implementation of those
|
|
extensions. If you want to understand *why* a component exists, the
|
|
interactive version has the longer-form argument; this document is the
|
|
condensed written version.
|
|
|
|
---
|
|
|
|
## Where the pattern is genuinely strong
|
|
|
|
The analysis found seven strengths that hold up under scrutiny. This
|
|
repo preserves all of them:
|
|
|
|
| Strength | How this repo keeps it |
|
|
|----------|-----------------------|
|
|
| **Knowledge compounds over time** | Every ingest adds to the existing wiki rather than restarting; conversation mining and URL harvesting continuously feed new material in |
|
|
| **Zero maintenance burden on humans** | Cron-driven harvest + hygiene; the only manual step is staging review, and that's fast because the AI already compiled the page |
|
|
| **Token-efficient at personal scale** | `index.md` fits in context; `qmd` kicks in only at 50+ articles; the wake-up briefing is ~200 tokens |
|
|
| **Human-readable & auditable** | Plain markdown everywhere; every cross-reference is visible; git history shows every change |
|
|
| **Future-proof & portable** | No vendor lock-in; you can point any agent at the same tree tomorrow |
|
|
| **Self-healing via lint passes** | `wiki-hygiene.py` runs quick checks daily and full (LLM) checks weekly |
|
|
| **Path to fine-tuning** | Wiki pages are high-quality synthetic training data once purified through hygiene |
|
|
|
|
---
|
|
|
|
## Where memex extends the pattern
|
|
|
|
Karpathy's gist is a concept pitch. He was explicit that he was sharing
|
|
an "idea file" for others to build on, not publishing a working
|
|
implementation. The analysis identified seven places where the core idea
|
|
needs an engineering layer to become practical day-to-day — five have
|
|
first-class answers in memex, and two remain scoped-out trade-offs that
|
|
the architecture cleanly acknowledges.
|
|
|
|
### 1. Claim freshness and reversibility
|
|
|
|
**The gap**: Unlike RAG — where a hallucination is ephemeral and the
|
|
next query starts clean — an LLM-maintained wiki is stateful. If a
|
|
claim is wrong at ingest time, it stays wrong until something corrects
|
|
it. For the pattern to work long-term, claims need a way to earn trust
|
|
over time and lose it when unused.
|
|
|
|
**How memex extends it**:
|
|
|
|
- **`confidence` field** — every page carries `high`/`medium`/`low` with
|
|
decay based on `last_verified`. Wrong claims aren't treated as
|
|
permanent — they age out visibly.
|
|
- **Archive + restore** — decayed pages get moved to `archive/` where
|
|
they're excluded from default search. If they get referenced again
|
|
they're auto-restored with `confidence: medium` (never straight to
|
|
`high` — they have to re-earn trust).
|
|
- **Raw harvested material is immutable** — `raw/harvested/*.md` files
|
|
are the ground truth. Every compiled wiki page can be traced back to
|
|
its source via the `sources:` frontmatter field.
|
|
- **Full-mode contradiction detection** — `wiki-hygiene.py --full` uses
|
|
sonnet to find conflicting claims across pages. Report-only (humans
|
|
decide which side wins).
|
|
- **Staging review** — automated content goes to `staging/` first.
|
|
Nothing enters the live wiki without human approval, so errors have
|
|
two chances to get caught (AI compile + human review) before they
|
|
become persistent.
|
|
|
|
### 2. Scalable search beyond the context window
|
|
|
|
**The gap**: The pattern works beautifully up to ~100 articles, where
|
|
`index.md` still fits in context. Karpathy's own wiki was right at the
|
|
ceiling. Past that point, the agent needs a real search layer — loading
|
|
the full index stops being practical.
|
|
|
|
**How memex extends it**:
|
|
|
|
- **`qmd` from day one** — `qmd` (BM25 + vector + LLM re-ranking) is set
|
|
up in the default configuration so the agent never has to load the
|
|
full index. At 50+ pages, `qmd search` replaces `cat index.md`.
|
|
- **Wing/room structural filtering** — conversations are partitioned by
|
|
project code (wing) and topic (room, via the `topics:` frontmatter).
|
|
Retrieval is pre-narrowed to the relevant wing before search runs.
|
|
This extends the effective ceiling because `qmd` works on a relevant
|
|
subset, not the whole corpus.
|
|
- **Hygiene full mode flags redundancy** — duplicate detection auto-merges
|
|
weaker pages into stronger ones, keeping the corpus lean.
|
|
- **Archive excludes stale content** — the `wiki-archive` collection has
|
|
`includeByDefault: false`, so archived pages don't eat context until
|
|
explicitly queried.
|
|
|
|
### 3. Traceable sources for every claim
|
|
|
|
**The gap**: In precision-sensitive domains (API specs, version
|
|
constraints, legal records, medical protocols), LLM-generated content
|
|
needs to be verifiable against a source. For the pattern to work in
|
|
those contexts, every claim needs to trace back to something immutable.
|
|
|
|
**How memex extends it**:
|
|
|
|
- **Staging workflow** — every automated page goes through human review.
|
|
For precision-critical content, that review IS the cross-check. The
|
|
AI does the drafting; you verify.
|
|
- **`compilation_notes` field** — staging pages include the AI's own
|
|
explanation of what it did and why. Makes review faster — you can
|
|
spot-check the reasoning rather than re-reading the whole page.
|
|
- **Immutable raw sources** — every wiki claim traces back to a specific
|
|
file in `raw/harvested/` with a SHA-256 `content_hash`. Verification
|
|
means comparing the claim to the source, not "trust the LLM."
|
|
- **`confidence: low` for precision domains** — the agent's instructions
|
|
(via `CLAUDE.md`) tell it to flag low-confidence content when
|
|
citing. Humans see the warning before acting.
|
|
|
|
**Residual trade-off**: For *truly* mission-critical data (legal,
|
|
medical, compliance), no amount of automation replaces domain-expert
|
|
review. If that's your use case, treat this repo as a *drafting* tool,
|
|
not a canonical source.
|
|
|
|
### 4. Continuous feed without manual discipline
|
|
|
|
**The gap**: Community analysis of 120+ comments on Karpathy's gist
|
|
converged on one clear finding: this is the #1 friction point. Most
|
|
people who try the pattern get
|
|
the folder structure right and still end up with a wiki that slowly
|
|
becomes unreliable because they stop feeding it. Six-week half-life is
|
|
typical.
|
|
|
|
**How memex extends it** (this is the biggest layer):
|
|
|
|
- **Automation replaces human discipline** — daily cron runs
|
|
`wiki-maintain.sh` (harvest + hygiene + qmd reindex); weekly cron runs
|
|
`--full` mode. You don't need to remember anything.
|
|
- **Conversation mining is the feed** — you don't need to curate sources
|
|
manually. Every Claude Code session becomes potential ingest. The
|
|
feed is automatic and continuous, as long as you're doing work.
|
|
- **`last_verified` refreshes from conversation references** — when the
|
|
summarizer links a conversation to a wiki page via `related:`, the
|
|
hygiene script picks that up and bumps `last_verified`. Pages stay
|
|
fresh as long as they're still being discussed.
|
|
- **Decay thresholds force attention** — pages without refresh signals
|
|
for 6/9/12 months get downgraded and eventually archived. The wiki
|
|
self-trims.
|
|
- **Hygiene reports** — `reports/hygiene-YYYY-MM-DD-needs-review.md`
|
|
flags the things that *do* need human judgment. Everything else is
|
|
auto-fixed.
|
|
|
|
This is the single biggest layer memex adds. Nothing about it is
|
|
exotic — it's a cron-scheduled pipeline that runs the scripts you'd
|
|
otherwise have to remember to run. That's the whole trick.
|
|
|
|
### 5. Keeping the human engaged with their own knowledge
|
|
|
|
**The gap**: Hacker News critics pointed out that the bookkeeping
|
|
Karpathy outsources — filing, cross-referencing, summarizing — is
|
|
precisely where genuine understanding forms. If the LLM does all of
|
|
it, you can end up with a comprehensive wiki you haven't internalized.
|
|
For the pattern to be an actual memory aid and not a false one, the
|
|
human needs touchpoints that keep them engaged.
|
|
|
|
**How memex extends it**:
|
|
|
|
- **Staging review is a forcing function** — you see every automated
|
|
page before it lands. Even skimming forces engagement with the
|
|
material.
|
|
- **`qmd query "..."` for exploration** — searching the wiki is an
|
|
active process, not passive retrieval. You're asking questions, not
|
|
pulling a file.
|
|
- **The wake-up briefing** — `context/wake-up.md` is a 200-token digest
|
|
the agent reads at session start. You read it too (or the agent reads
|
|
it to you) — ongoing re-exposure to your own knowledge base.
|
|
|
|
**Caveat**: memex is designed as *augmentation*, not *replacement*.
|
|
It's most valuable when you engage with it actively — reading your own
|
|
wake-up briefing, spot-checking promoted pages, noticing decay flags.
|
|
If you only consult the wiki through the agent and never look at it
|
|
yourself, you've outsourced the learning. That's a usage pattern
|
|
choice, not an architecture problem.
|
|
|
|
### 6. Hybrid retrieval — structure and semantics
|
|
|
|
**The gap**: Explicit wikilinks catch direct topic references but miss
|
|
semantic neighbors that use different wording. At scale, the pattern
|
|
benefits from vector similarity to find cross-topic connections the
|
|
human (or the LLM at ingest time) didn't think to link manually.
|
|
|
|
**How memex extends it**:
|
|
|
|
- **`qmd` is hybrid (BM25 + vector)** — not just keyword search. Vector
|
|
similarity is built into the retrieval pipeline from day one.
|
|
- **Structural navigation complements semantic search** — project codes
|
|
(wings) and topic frontmatter narrow the search space before the
|
|
hybrid search runs. Structure + semantics is stronger than either
|
|
alone.
|
|
- **Missing cross-reference detection** — full-mode hygiene asks the
|
|
LLM to find pages that *should* link to each other but don't, then
|
|
auto-adds them. This is the explicit-linking approach catching up to
|
|
semantic retrieval over time.
|
|
|
|
**Residual trade-off**: At enterprise scale (millions of documents), a
|
|
proper vector DB with specialized retrieval wins. This repo is for
|
|
personal / small-team scale where the hybrid approach is sufficient.
|
|
|
|
### 7. Cross-machine collaboration
|
|
|
|
**The gap**: Karpathy's gist describes a single-user, single-machine
|
|
setup. In practice, people work from multiple machines (laptop,
|
|
workstation, server) and sometimes collaborate with small teams. The
|
|
pattern needs a sync story that handles concurrent writes gracefully.
|
|
|
|
**How memex extends it**:
|
|
|
|
- **Git-based sync with merge-union** — concurrent writes on different
|
|
machines auto-resolve because markdown is set to `merge=union` in
|
|
`.gitattributes`. Both sides win.
|
|
- **State file sync** — `.harvest-state.json` and `.hygiene-state.json`
|
|
are committed, so two machines running the same pipeline agree on
|
|
what's already been processed instead of re-doing the work.
|
|
- **Network boundary as access gate** — the suggested deployment is
|
|
over Tailscale or a VPN, so the network enforces who can reach the
|
|
wiki at all. Simple and sufficient for personal/family/small-team
|
|
use.
|
|
|
|
**Explicit scope**: memex is **deliberately not** enterprise knowledge
|
|
management. No audit trails, no fine-grained permissions, no compliance
|
|
story. If you need any of that, you need a different architecture.
|
|
This is for the personal and small-team case where git + Tailscale is
|
|
the right amount of rigor.
|
|
|
|
---
|
|
|
|
## The biggest layer — active upkeep
|
|
|
|
The other six extensions are important, but this is the one that makes
|
|
or breaks the pattern in practice. The community data is unambiguous:
|
|
|
|
- People who automate the lint schedule → wikis healthy at 6+ months
|
|
- People who rely on "I'll remember to lint" → wikis abandoned at 6 weeks
|
|
|
|
The entire automation layer of this repo exists to remove upkeep as a
|
|
thing the human has to think about:
|
|
|
|
| Cadence | Job | Purpose |
|
|
|---------|-----|---------|
|
|
| Every 15 min | `wiki-sync.sh` | Commit/pull/push — cross-machine sync |
|
|
| Every 2 hours | `wiki-sync.sh full` | Full sync + qmd reindex |
|
|
| Every hour | `mine-conversations.sh --extract-only` | Capture new Claude Code sessions (no LLM) |
|
|
| Daily 2am | `summarize-conversations.py --claude` + index | Classify + summarize (LLM) |
|
|
| Daily 3am | `wiki-maintain.sh` | Harvest + quick hygiene + reindex |
|
|
| Weekly Sun 4am | `wiki-maintain.sh --hygiene-only --full` | LLM-powered duplicate/contradiction/cross-ref detection |
|
|
|
|
If you disable all of these, you get the same outcome as every
|
|
abandoned wiki: six-week half-life. The scripts aren't optional
|
|
convenience — they're the load-bearing automation that lets the pattern
|
|
actually compound over months and years instead of requiring a
|
|
disciplined human to keep it alive.
|
|
|
|
---
|
|
|
|
## What was borrowed from where
|
|
|
|
This repo is a synthesis of two ideas with an automation layer on top:
|
|
|
|
### From Karpathy
|
|
|
|
- The core pattern: LLM-maintained persistent wiki, compile at ingest
|
|
time instead of retrieve at query time
|
|
- Separation of `raw/` (immutable sources) from `wiki/` (compiled pages)
|
|
- `CLAUDE.md` as the schema that disciplines the agent
|
|
- Periodic "lint" passes to catch orphans, contradictions, missing refs
|
|
- The idea that the wiki becomes fine-tuning material over time
|
|
|
|
### From mempalace
|
|
|
|
- **Wings** = per-person or per-project namespaces → this repo uses
|
|
project codes (`mc`, `wiki`, `web`, etc.) as the same thing in
|
|
`conversations/<project>/`
|
|
- **Rooms** = topics within a wing → the `topics:` frontmatter on
|
|
conversation files
|
|
- **Halls** = memory-type corridors (fact / event / discovery /
|
|
preference / advice / tooling) → the `halls:` frontmatter field
|
|
classified by the summarizer
|
|
- **Closets** = summary layer → the summary body of each summarized
|
|
conversation
|
|
- **Drawers** = verbatim archive, never lost → the extracted
|
|
conversation transcripts under `conversations/<project>/*.md`
|
|
- **Tunnels** = cross-wing connections → the `related:` frontmatter
|
|
linking conversations to wiki pages
|
|
- Wing + room structural filtering gives a documented +34% retrieval
|
|
boost over flat search
|
|
|
|
The MemPalace taxonomy solved a problem Karpathy's pattern doesn't
|
|
address: how do you navigate a growing corpus without reading
|
|
everything? The answer is to give the corpus structural metadata at
|
|
ingest time, then filter on that metadata before doing semantic search.
|
|
This repo borrows that wholesale.
|
|
|
|
### What this repo adds
|
|
|
|
- **Automation layer** tying the pieces together with cron-friendly
|
|
orchestration
|
|
- **Staging pipeline** as a human-in-the-loop checkpoint for automated
|
|
content
|
|
- **Confidence decay + auto-archive + auto-restore** as the "retention
|
|
curve" that community analysis identified as critical for long-term
|
|
wiki health
|
|
- **`qmd` integration** as the scalable search layer (chosen over
|
|
ChromaDB because it uses the same markdown storage as the wiki —
|
|
one index to maintain, not two)
|
|
- **Hygiene reports** with fixed vs needs-review separation so
|
|
automation handles mechanical fixes and humans handle ambiguity
|
|
- **Cross-machine sync** via git with markdown merge-union so the same
|
|
wiki lives on multiple machines without merge hell
|
|
|
|
---
|
|
|
|
## What memex deliberately doesn't try to do
|
|
|
|
Five things memex is explicitly scoped around — not because they're
|
|
unsolvable, but because solving them well requires a different kind of
|
|
architecture than a personal/small-team wiki. If any of these are
|
|
dealbreakers for your use case, memex is probably not the right fit:
|
|
|
|
1. **Enterprise scale** — millions of documents, hundreds of users,
|
|
RBAC, compliance: these need real enterprise knowledge management
|
|
infrastructure. memex is tuned for personal and small-team use.
|
|
2. **True semantic retrieval at massive scale** — `qmd` hybrid search
|
|
works great up to thousands of pages. At millions, a dedicated
|
|
vector database with specialized retrieval wins.
|
|
3. **Replacing your own learning** — memex is an augmentation layer,
|
|
not a substitute for reading. Used well, it's a memory aid; used as
|
|
a bypass, it just lets you forget more.
|
|
4. **Precision-critical source of truth** — for legal, medical, or
|
|
regulatory data, memex is a drafting tool. Human domain-expert
|
|
review still owns the final call.
|
|
5. **Access control** — the network boundary (Tailscale) is the
|
|
fastest path to "only authorized people can reach it." memex itself
|
|
doesn't enforce permissions inside that boundary.
|
|
|
|
These are scope decisions, not unfinished work. memex is the best
|
|
personal/small-team answer to Karpathy's pattern I could build; it's
|
|
not trying to be every answer.
|
|
|
|
---
|
|
|
|
## Further reading
|
|
|
|
- [The original Karpathy gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)
|
|
— the concept
|
|
- [mempalace](https://github.com/milla-jovovich/mempalace) — the
|
|
structural memory layer
|
|
- [Signal & Noise interactive analysis](https://eric-turner.com/memex/signal-and-noise.html)
|
|
— the design rationale this document summarizes (live interactive version)
|
|
- [`artifacts/signal-and-noise.html`](artifacts/signal-and-noise.html)
|
|
— self-contained archive of the same analysis, works offline
|
|
- [README](../README.md) — the concept pitch
|
|
- [ARCHITECTURE.md](ARCHITECTURE.md) — component deep-dive
|
|
- [SETUP.md](SETUP.md) — installation
|
|
- [CUSTOMIZE.md](CUSTOMIZE.md) — adapting for non-Claude-Code setups
|