Replace all four references to the Claude public artifact URL with the self-hosted version at eric-turner.com/memex/signal-and-noise.html plus the offline-capable archive at docs/artifacts/signal-and-noise.html. The Claude artifact can now be unpublished without breaking any links in the repo. The self-hosted HTML is deployed to the Hugo site's static directory and lives alongside the archived copy in this repo — either can stand on its own.
345 lines
16 KiB
Markdown
345 lines
16 KiB
Markdown
# Design Rationale — Signal & Noise
|
|
|
|
Why each part of this repo exists. This is the "why" document; the other
|
|
docs are the "what" and "how."
|
|
|
|
Before implementing anything, the design was worked out interactively
|
|
with Claude as a structured Signal & Noise analysis of Andrej Karpathy's
|
|
original persistent-wiki pattern:
|
|
|
|
> **Interactive version**: [eric-turner.com/memex/signal-and-noise.html](https://eric-turner.com/memex/signal-and-noise.html)
|
|
> — tabs for pros/cons, vs RAG, use-case fits, signal breakdown, mitigations
|
|
>
|
|
> **Self-contained archive**: [`artifacts/signal-and-noise.html`](artifacts/signal-and-noise.html)
|
|
> — same content, works offline
|
|
|
|
The analysis walks through the pattern's seven genuine strengths, seven
|
|
real weaknesses, and concrete mitigations for each weakness. This repo
|
|
is the implementation of those mitigations. If you want to understand
|
|
*why* a component exists, the interactive version has the longer-form
|
|
argument; this document is the condensed written version.
|
|
|
|
---
|
|
|
|
## Where the pattern is genuinely strong
|
|
|
|
The analysis found seven strengths that hold up under scrutiny. This
|
|
repo preserves all of them:
|
|
|
|
| Strength | How this repo keeps it |
|
|
|----------|-----------------------|
|
|
| **Knowledge compounds over time** | Every ingest adds to the existing wiki rather than restarting; conversation mining and URL harvesting continuously feed new material in |
|
|
| **Zero maintenance burden on humans** | Cron-driven harvest + hygiene; the only manual step is staging review, and that's fast because the AI already compiled the page |
|
|
| **Token-efficient at personal scale** | `index.md` fits in context; `qmd` kicks in only at 50+ articles; the wake-up briefing is ~200 tokens |
|
|
| **Human-readable & auditable** | Plain markdown everywhere; every cross-reference is visible; git history shows every change |
|
|
| **Future-proof & portable** | No vendor lock-in; you can point any agent at the same tree tomorrow |
|
|
| **Self-healing via lint passes** | `wiki-hygiene.py` runs quick checks daily and full (LLM) checks weekly |
|
|
| **Path to fine-tuning** | Wiki pages are high-quality synthetic training data once purified through hygiene |
|
|
|
|
---
|
|
|
|
## Where the pattern is genuinely weak — and how this repo answers
|
|
|
|
The analysis identified seven real weaknesses. Five have direct
|
|
mitigations in this repo; two remain open trade-offs you should be aware
|
|
of.
|
|
|
|
### 1. Errors persist and compound
|
|
|
|
**The problem**: Unlike RAG — where a hallucination is ephemeral and the
|
|
next query starts clean — an LLM wiki persists its mistakes. If the LLM
|
|
incorrectly links two concepts at ingest time, future ingests build on
|
|
that wrong prior.
|
|
|
|
**How this repo mitigates**:
|
|
|
|
- **`confidence` field** — every page carries `high`/`medium`/`low` with
|
|
decay based on `last_verified`. Wrong claims aren't treated as
|
|
permanent — they age out visibly.
|
|
- **Archive + restore** — decayed pages get moved to `archive/` where
|
|
they're excluded from default search. If they get referenced again
|
|
they're auto-restored with `confidence: medium` (never straight to
|
|
`high` — they have to re-earn trust).
|
|
- **Raw harvested material is immutable** — `raw/harvested/*.md` files
|
|
are the ground truth. Every compiled wiki page can be traced back to
|
|
its source via the `sources:` frontmatter field.
|
|
- **Full-mode contradiction detection** — `wiki-hygiene.py --full` uses
|
|
sonnet to find conflicting claims across pages. Report-only (humans
|
|
decide which side wins).
|
|
- **Staging review** — automated content goes to `staging/` first.
|
|
Nothing enters the live wiki without human approval, so errors have
|
|
two chances to get caught (AI compile + human review) before they
|
|
become persistent.
|
|
|
|
### 2. Hard scale ceiling at ~50K tokens
|
|
|
|
**The problem**: The wiki approach stops working when `index.md` no
|
|
longer fits in context. Karpathy's own wiki was ~100 articles / 400K
|
|
words — already near the ceiling.
|
|
|
|
**How this repo mitigates**:
|
|
|
|
- **`qmd` from day one** — `qmd` (BM25 + vector + LLM re-ranking) is set
|
|
up in the default configuration so the agent never has to load the
|
|
full index. At 50+ pages, `qmd search` replaces `cat index.md`.
|
|
- **Wing/room structural filtering** — conversations are partitioned by
|
|
project code (wing) and topic (room, via the `topics:` frontmatter).
|
|
Retrieval is pre-narrowed to the relevant wing before search runs.
|
|
This extends the effective ceiling because `qmd` works on a relevant
|
|
subset, not the whole corpus.
|
|
- **Hygiene full mode flags redundancy** — duplicate detection auto-merges
|
|
weaker pages into stronger ones, keeping the corpus lean.
|
|
- **Archive excludes stale content** — the `wiki-archive` collection has
|
|
`includeByDefault: false`, so archived pages don't eat context until
|
|
explicitly queried.
|
|
|
|
### 3. Manual cross-checking burden returns in precision-critical domains
|
|
|
|
**The problem**: For API specs, version constraints, legal records, and
|
|
medical protocols, LLM-generated content needs human verification. The
|
|
maintenance burden you thought you'd eliminated comes back as
|
|
verification overhead.
|
|
|
|
**How this repo mitigates**:
|
|
|
|
- **Staging workflow** — every automated page goes through human review.
|
|
For precision-critical content, that review IS the cross-check. The
|
|
AI does the drafting; you verify.
|
|
- **`compilation_notes` field** — staging pages include the AI's own
|
|
explanation of what it did and why. Makes review faster — you can
|
|
spot-check the reasoning rather than re-reading the whole page.
|
|
- **Immutable raw sources** — every wiki claim traces back to a specific
|
|
file in `raw/harvested/` with a SHA-256 `content_hash`. Verification
|
|
means comparing the claim to the source, not "trust the LLM."
|
|
- **`confidence: low` for precision domains** — the agent's instructions
|
|
(via `CLAUDE.md`) tell it to flag low-confidence content when
|
|
citing. Humans see the warning before acting.
|
|
|
|
**Residual trade-off**: For *truly* mission-critical data (legal,
|
|
medical, compliance), no amount of automation replaces domain-expert
|
|
review. If that's your use case, treat this repo as a *drafting* tool,
|
|
not a canonical source.
|
|
|
|
### 4. Knowledge staleness without active upkeep
|
|
|
|
**The problem**: Community analysis of 120+ comments on Karpathy's gist
|
|
found this is the #1 failure mode. Most people who try the pattern get
|
|
the folder structure right and still end up with a wiki that slowly
|
|
becomes unreliable because they stop feeding it. Six-week half-life is
|
|
typical.
|
|
|
|
**How this repo mitigates** (this is the biggest thing):
|
|
|
|
- **Automation replaces human discipline** — daily cron runs
|
|
`wiki-maintain.sh` (harvest + hygiene + qmd reindex); weekly cron runs
|
|
`--full` mode. You don't need to remember anything.
|
|
- **Conversation mining is the feed** — you don't need to curate sources
|
|
manually. Every Claude Code session becomes potential ingest. The
|
|
feed is automatic and continuous, as long as you're doing work.
|
|
- **`last_verified` refreshes from conversation references** — when the
|
|
summarizer links a conversation to a wiki page via `related:`, the
|
|
hygiene script picks that up and bumps `last_verified`. Pages stay
|
|
fresh as long as they're still being discussed.
|
|
- **Decay thresholds force attention** — pages without refresh signals
|
|
for 6/9/12 months get downgraded and eventually archived. The wiki
|
|
self-trims.
|
|
- **Hygiene reports** — `reports/hygiene-YYYY-MM-DD-needs-review.md`
|
|
flags the things that *do* need human judgment. Everything else is
|
|
auto-fixed.
|
|
|
|
This is the single biggest reason this repo exists. The automation
|
|
layer is entirely about removing "I forgot to lint" as a failure mode.
|
|
|
|
### 5. Cognitive outsourcing risk
|
|
|
|
**The problem**: Hacker News critics argued that the bookkeeping
|
|
Karpathy outsources — filing, cross-referencing, summarizing — is
|
|
precisely where genuine understanding forms. Outsource it and you end up
|
|
with a comprehensive wiki you haven't internalized.
|
|
|
|
**How this repo mitigates**:
|
|
|
|
- **Staging review is a forcing function** — you see every automated
|
|
page before it lands. Even skimming forces engagement with the
|
|
material.
|
|
- **`qmd query "..."` for exploration** — searching the wiki is an
|
|
active process, not passive retrieval. You're asking questions, not
|
|
pulling a file.
|
|
- **The wake-up briefing** — `context/wake-up.md` is a 200-token digest
|
|
the agent reads at session start. You read it too (or the agent reads
|
|
it to you) — ongoing re-exposure to your own knowledge base.
|
|
|
|
**Residual trade-off**: This is a real concern even with mitigations.
|
|
The wiki is designed as *augmentation*, not *replacement*. If you
|
|
never read your own wiki and only consult it through the agent, you're
|
|
in the outsourcing failure mode. The fix is discipline, not
|
|
architecture.
|
|
|
|
### 6. Weaker semantic retrieval than RAG at scale
|
|
|
|
**The problem**: At large corpora, vector embeddings find semantically
|
|
related content across different wording in ways explicit wikilinks
|
|
can't match.
|
|
|
|
**How this repo mitigates**:
|
|
|
|
- **`qmd` is hybrid (BM25 + vector)** — not just keyword search. Vector
|
|
similarity is built into the retrieval pipeline from day one.
|
|
- **Structural navigation complements semantic search** — project codes
|
|
(wings) and topic frontmatter narrow the search space before the
|
|
hybrid search runs. Structure + semantics is stronger than either
|
|
alone.
|
|
- **Missing cross-reference detection** — full-mode hygiene asks the
|
|
LLM to find pages that *should* link to each other but don't, then
|
|
auto-adds them. This is the explicit-linking approach catching up to
|
|
semantic retrieval over time.
|
|
|
|
**Residual trade-off**: At enterprise scale (millions of documents), a
|
|
proper vector DB with specialized retrieval wins. This repo is for
|
|
personal / small-team scale where the hybrid approach is sufficient.
|
|
|
|
### 7. No access control or multi-user support
|
|
|
|
**The problem**: It's a folder of markdown files. No RBAC, no audit
|
|
logging, no concurrency handling, no permissions model.
|
|
|
|
**How this repo mitigates**:
|
|
|
|
- **Git-based sync with merge-union** — concurrent writes on different
|
|
machines auto-resolve because markdown is set to `merge=union` in
|
|
`.gitattributes`. Both sides win.
|
|
- **Network boundary as soft access control** — the suggested
|
|
deployment is over Tailscale or a VPN, so the network does the work a
|
|
RBAC layer would otherwise do. Not enterprise-grade, but sufficient
|
|
for personal/family/small-team use.
|
|
|
|
**Residual trade-off**: **This is the big one.** The repo is not a
|
|
replacement for enterprise knowledge management. No audit trails, no
|
|
fine-grained permissions, no compliance story. If you need any of
|
|
that, you need a different architecture. This repo is explicitly
|
|
scoped to the personal/small-team use case.
|
|
|
|
---
|
|
|
|
## The #1 failure mode — active upkeep
|
|
|
|
Every other weakness has a mitigation. *Active upkeep is the one that
|
|
kills wikis in the wild.* The community data is unambiguous:
|
|
|
|
- People who automate the lint schedule → wikis healthy at 6+ months
|
|
- People who rely on "I'll remember to lint" → wikis abandoned at 6 weeks
|
|
|
|
The entire automation layer of this repo exists to remove upkeep as a
|
|
thing the human has to think about:
|
|
|
|
| Cadence | Job | Purpose |
|
|
|---------|-----|---------|
|
|
| Every 15 min | `wiki-sync.sh` | Commit/pull/push — cross-machine sync |
|
|
| Every 2 hours | `wiki-sync.sh full` | Full sync + qmd reindex |
|
|
| Every hour | `mine-conversations.sh --extract-only` | Capture new Claude Code sessions (no LLM) |
|
|
| Daily 2am | `summarize-conversations.py --claude` + index | Classify + summarize (LLM) |
|
|
| Daily 3am | `wiki-maintain.sh` | Harvest + quick hygiene + reindex |
|
|
| Weekly Sun 4am | `wiki-maintain.sh --hygiene-only --full` | LLM-powered duplicate/contradiction/cross-ref detection |
|
|
|
|
If you disable all of these, you get the same outcome as every
|
|
abandoned wiki: six-week half-life. The scripts aren't optional
|
|
convenience — they're the load-bearing answer to the pattern's primary
|
|
failure mode.
|
|
|
|
---
|
|
|
|
## What was borrowed from where
|
|
|
|
This repo is a synthesis of two ideas with an automation layer on top:
|
|
|
|
### From Karpathy
|
|
|
|
- The core pattern: LLM-maintained persistent wiki, compile at ingest
|
|
time instead of retrieve at query time
|
|
- Separation of `raw/` (immutable sources) from `wiki/` (compiled pages)
|
|
- `CLAUDE.md` as the schema that disciplines the agent
|
|
- Periodic "lint" passes to catch orphans, contradictions, missing refs
|
|
- The idea that the wiki becomes fine-tuning material over time
|
|
|
|
### From mempalace
|
|
|
|
- **Wings** = per-person or per-project namespaces → this repo uses
|
|
project codes (`mc`, `wiki`, `web`, etc.) as the same thing in
|
|
`conversations/<project>/`
|
|
- **Rooms** = topics within a wing → the `topics:` frontmatter on
|
|
conversation files
|
|
- **Halls** = memory-type corridors (fact / event / discovery /
|
|
preference / advice / tooling) → the `halls:` frontmatter field
|
|
classified by the summarizer
|
|
- **Closets** = summary layer → the summary body of each summarized
|
|
conversation
|
|
- **Drawers** = verbatim archive, never lost → the extracted
|
|
conversation transcripts under `conversations/<project>/*.md`
|
|
- **Tunnels** = cross-wing connections → the `related:` frontmatter
|
|
linking conversations to wiki pages
|
|
- Wing + room structural filtering gives a documented +34% retrieval
|
|
boost over flat search
|
|
|
|
The MemPalace taxonomy solved a problem Karpathy's pattern doesn't
|
|
address: how do you navigate a growing corpus without reading
|
|
everything? The answer is to give the corpus structural metadata at
|
|
ingest time, then filter on that metadata before doing semantic search.
|
|
This repo borrows that wholesale.
|
|
|
|
### What this repo adds
|
|
|
|
- **Automation layer** tying the pieces together with cron-friendly
|
|
orchestration
|
|
- **Staging pipeline** as a human-in-the-loop checkpoint for automated
|
|
content
|
|
- **Confidence decay + auto-archive + auto-restore** as the "retention
|
|
curve" that community analysis identified as critical for long-term
|
|
wiki health
|
|
- **`qmd` integration** as the scalable search layer (chosen over
|
|
ChromaDB because it uses the same markdown storage as the wiki —
|
|
one index to maintain, not two)
|
|
- **Hygiene reports** with fixed vs needs-review separation so
|
|
automation handles mechanical fixes and humans handle ambiguity
|
|
- **Cross-machine sync** via git with markdown merge-union so the same
|
|
wiki lives on multiple machines without merge hell
|
|
|
|
---
|
|
|
|
## Honest residual trade-offs
|
|
|
|
Five items from the analysis that this repo doesn't fully solve and
|
|
where you should know the limits:
|
|
|
|
1. **Enterprise scale** — this is a personal/small-team tool. Millions
|
|
of documents, hundreds of users, RBAC, compliance: wrong
|
|
architecture.
|
|
2. **True semantic retrieval at massive scale** — `qmd` hybrid search
|
|
is great for thousands of pages, not millions.
|
|
3. **Cognitive outsourcing** — no architecture fix. Discipline
|
|
yourself to read your own wiki, not just query it through the agent.
|
|
4. **Precision-critical domains** — for legal/medical/regulatory data,
|
|
use this as a drafting tool, not a source of truth. Human
|
|
domain-expert review is not replaceable.
|
|
5. **Access control** — network boundary (Tailscale) is the fastest
|
|
path; nothing in the repo itself enforces permissions.
|
|
|
|
If any of these are dealbreakers for your use case, a different
|
|
architecture is probably what you need.
|
|
|
|
---
|
|
|
|
## Further reading
|
|
|
|
- [The original Karpathy gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)
|
|
— the concept
|
|
- [mempalace](https://github.com/milla-jovovich/mempalace) — the
|
|
structural memory layer
|
|
- [Signal & Noise interactive analysis](https://eric-turner.com/memex/signal-and-noise.html)
|
|
— the design rationale this document summarizes (live interactive version)
|
|
- [`artifacts/signal-and-noise.html`](artifacts/signal-and-noise.html)
|
|
— self-contained archive of the same analysis, works offline
|
|
- [README](../README.md) — the concept pitch
|
|
- [ARCHITECTURE.md](ARCHITECTURE.md) — component deep-dive
|
|
- [SETUP.md](SETUP.md) — installation
|
|
- [CUSTOMIZE.md](CUSTOMIZE.md) — adapting for non-Claude-Code setups
|