docs: reframe as extensions + replace Signal & Noise artifact

Two changes, one commit:

1. Reframe "weaknesses" as "extensions memex adds":
   Karpathy's gist is a concept pitch, not an implementation. Reframe
   the seven places memex extends the pattern as engineering-layer
   additions rather than problems to fix. Cleaner narrative — memex
   builds on Karpathy's work instead of critiquing it.

   Touches README.md (Why each part exists + Credits) and
   DESIGN-RATIONALE.md (section titles, trade-off framing, biggest
   layer section, scope note at the end).

2. Replace docs/artifacts/signal-and-noise.html with the full
   upstream version:
   The earlier abbreviated copy dropped the MemPalace integration tab,
   the detailed mitigation steps with effort pips, the impact
   before/after cards, and the qmd vs ChromaDB comparison. This
   restores all of that. Also swaps self-references from "LLM Wiki"
   to "memex" while leaving external "LLM Wiki v2" community
   citations alone (those refer to a separate pattern and aren't ours
   to rename).

The live hosted copy at eric-turner.com/memex/signal-and-noise.html
has already been updated via scp — Hugo picks up static changes with
--poll 1s so the public URL reflects this file immediately.
This commit is contained in:
Eric Turner
2026-04-12 22:01:31 -06:00
parent 2a37e33fd6
commit 4c6b7609a1
3 changed files with 1191 additions and 238 deletions

View File

@@ -14,10 +14,11 @@ original persistent-wiki pattern:
> — same content, works offline
The analysis walks through the pattern's seven genuine strengths, seven
real weaknesses, and concrete mitigations for each weakness. This repo
is the implementation of those mitigations. If you want to understand
*why* a component exists, the interactive version has the longer-form
argument; this document is the condensed written version.
places where it needs an engineering layer to be practical, and the
concrete extension for each. memex is the implementation of those
extensions. If you want to understand *why* a component exists, the
interactive version has the longer-form argument; this document is the
condensed written version.
---
@@ -38,20 +39,24 @@ repo preserves all of them:
---
## Where the pattern is genuinely weak — and how this repo answers
## Where memex extends the pattern
The analysis identified seven real weaknesses. Five have direct
mitigations in this repo; two remain open trade-offs you should be aware
of.
Karpathy's gist is a concept pitch. He was explicit that he was sharing
an "idea file" for others to build on, not publishing a working
implementation. The analysis identified seven places where the core idea
needs an engineering layer to become practical day-to-day — five have
first-class answers in memex, and two remain scoped-out trade-offs that
the architecture cleanly acknowledges.
### 1. Errors persist and compound
### 1. Claim freshness and reversibility
**The problem**: Unlike RAG — where a hallucination is ephemeral and the
next query starts clean — an LLM wiki persists its mistakes. If the LLM
incorrectly links two concepts at ingest time, future ingests build on
that wrong prior.
**The gap**: Unlike RAG — where a hallucination is ephemeral and the
next query starts clean — an LLM-maintained wiki is stateful. If a
claim is wrong at ingest time, it stays wrong until something corrects
it. For the pattern to work long-term, claims need a way to earn trust
over time and lose it when unused.
**How this repo mitigates**:
**How memex extends it**:
- **`confidence` field** — every page carries `high`/`medium`/`low` with
decay based on `last_verified`. Wrong claims aren't treated as
@@ -71,13 +76,14 @@ that wrong prior.
two chances to get caught (AI compile + human review) before they
become persistent.
### 2. Hard scale ceiling at ~50K tokens
### 2. Scalable search beyond the context window
**The problem**: The wiki approach stops working when `index.md` no
longer fits in context. Karpathy's own wiki was ~100 articles / 400K
words — already near the ceiling.
**The gap**: The pattern works beautifully up to ~100 articles, where
`index.md` still fits in context. Karpathy's own wiki was right at the
ceiling. Past that point, the agent needs a real search layer — loading
the full index stops being practical.
**How this repo mitigates**:
**How memex extends it**:
- **`qmd` from day one** — `qmd` (BM25 + vector + LLM re-ranking) is set
up in the default configuration so the agent never has to load the
@@ -93,14 +99,14 @@ words — already near the ceiling.
`includeByDefault: false`, so archived pages don't eat context until
explicitly queried.
### 3. Manual cross-checking burden returns in precision-critical domains
### 3. Traceable sources for every claim
**The problem**: For API specs, version constraints, legal records, and
medical protocols, LLM-generated content needs human verification. The
maintenance burden you thought you'd eliminated comes back as
verification overhead.
**The gap**: In precision-sensitive domains (API specs, version
constraints, legal records, medical protocols), LLM-generated content
needs to be verifiable against a source. For the pattern to work in
those contexts, every claim needs to trace back to something immutable.
**How this repo mitigates**:
**How memex extends it**:
- **Staging workflow** — every automated page goes through human review.
For precision-critical content, that review IS the cross-check. The
@@ -120,15 +126,16 @@ medical, compliance), no amount of automation replaces domain-expert
review. If that's your use case, treat this repo as a *drafting* tool,
not a canonical source.
### 4. Knowledge staleness without active upkeep
### 4. Continuous feed without manual discipline
**The problem**: Community analysis of 120+ comments on Karpathy's gist
found this is the #1 failure mode. Most people who try the pattern get
**The gap**: Community analysis of 120+ comments on Karpathy's gist
converged on one clear finding: this is the #1 friction point. Most
people who try the pattern get
the folder structure right and still end up with a wiki that slowly
becomes unreliable because they stop feeding it. Six-week half-life is
typical.
**How this repo mitigates** (this is the biggest thing):
**How memex extends it** (this is the biggest layer):
- **Automation replaces human discipline** — daily cron runs
`wiki-maintain.sh` (harvest + hygiene + qmd reindex); weekly cron runs
@@ -147,17 +154,20 @@ typical.
flags the things that *do* need human judgment. Everything else is
auto-fixed.
This is the single biggest reason this repo exists. The automation
layer is entirely about removing "I forgot to lint" as a failure mode.
This is the single biggest layer memex adds. Nothing about it is
exotic — it's a cron-scheduled pipeline that runs the scripts you'd
otherwise have to remember to run. That's the whole trick.
### 5. Cognitive outsourcing risk
### 5. Keeping the human engaged with their own knowledge
**The problem**: Hacker News critics argued that the bookkeeping
**The gap**: Hacker News critics pointed out that the bookkeeping
Karpathy outsources — filing, cross-referencing, summarizing — is
precisely where genuine understanding forms. Outsource it and you end up
with a comprehensive wiki you haven't internalized.
precisely where genuine understanding forms. If the LLM does all of
it, you can end up with a comprehensive wiki you haven't internalized.
For the pattern to be an actual memory aid and not a false one, the
human needs touchpoints that keep them engaged.
**How this repo mitigates**:
**How memex extends it**:
- **Staging review is a forcing function** — you see every automated
page before it lands. Even skimming forces engagement with the
@@ -169,19 +179,21 @@ with a comprehensive wiki you haven't internalized.
the agent reads at session start. You read it too (or the agent reads
it to you) — ongoing re-exposure to your own knowledge base.
**Residual trade-off**: This is a real concern even with mitigations.
The wiki is designed as *augmentation*, not *replacement*. If you
never read your own wiki and only consult it through the agent, you're
in the outsourcing failure mode. The fix is discipline, not
architecture.
**Caveat**: memex is designed as *augmentation*, not *replacement*.
It's most valuable when you engage with it actively — reading your own
wake-up briefing, spot-checking promoted pages, noticing decay flags.
If you only consult the wiki through the agent and never look at it
yourself, you've outsourced the learning. That's a usage pattern
choice, not an architecture problem.
### 6. Weaker semantic retrieval than RAG at scale
### 6. Hybrid retrieval — structure and semantics
**The problem**: At large corpora, vector embeddings find semantically
related content across different wording in ways explicit wikilinks
can't match.
**The gap**: Explicit wikilinks catch direct topic references but miss
semantic neighbors that use different wording. At scale, the pattern
benefits from vector similarity to find cross-topic connections the
human (or the LLM at ingest time) didn't think to link manually.
**How this repo mitigates**:
**How memex extends it**:
- **`qmd` is hybrid (BM25 + vector)** — not just keyword search. Vector
similarity is built into the retrieval pipeline from day one.
@@ -198,33 +210,38 @@ can't match.
proper vector DB with specialized retrieval wins. This repo is for
personal / small-team scale where the hybrid approach is sufficient.
### 7. No access control or multi-user support
### 7. Cross-machine collaboration
**The problem**: It's a folder of markdown files. No RBAC, no audit
logging, no concurrency handling, no permissions model.
**The gap**: Karpathy's gist describes a single-user, single-machine
setup. In practice, people work from multiple machines (laptop,
workstation, server) and sometimes collaborate with small teams. The
pattern needs a sync story that handles concurrent writes gracefully.
**How this repo mitigates**:
**How memex extends it**:
- **Git-based sync with merge-union** — concurrent writes on different
machines auto-resolve because markdown is set to `merge=union` in
`.gitattributes`. Both sides win.
- **Network boundary as soft access control** — the suggested
deployment is over Tailscale or a VPN, so the network does the work a
RBAC layer would otherwise do. Not enterprise-grade, but sufficient
for personal/family/small-team use.
- **State file sync** — `.harvest-state.json` and `.hygiene-state.json`
are committed, so two machines running the same pipeline agree on
what's already been processed instead of re-doing the work.
- **Network boundary as access gate** — the suggested deployment is
over Tailscale or a VPN, so the network enforces who can reach the
wiki at all. Simple and sufficient for personal/family/small-team
use.
**Residual trade-off**: **This is the big one.** The repo is not a
replacement for enterprise knowledge management. No audit trails, no
fine-grained permissions, no compliance story. If you need any of
that, you need a different architecture. This repo is explicitly
scoped to the personal/small-team use case.
**Explicit scope**: memex is **deliberately not** enterprise knowledge
management. No audit trails, no fine-grained permissions, no compliance
story. If you need any of that, you need a different architecture.
This is for the personal and small-team case where git + Tailscale is
the right amount of rigor.
---
## The #1 failure mode — active upkeep
## The biggest layer — active upkeep
Every other weakness has a mitigation. *Active upkeep is the one that
kills wikis in the wild.* The community data is unambiguous:
The other six extensions are important, but this is the one that makes
or breaks the pattern in practice. The community data is unambiguous:
- People who automate the lint schedule → wikis healthy at 6+ months
- People who rely on "I'll remember to lint" → wikis abandoned at 6 weeks
@@ -243,8 +260,9 @@ thing the human has to think about:
If you disable all of these, you get the same outcome as every
abandoned wiki: six-week half-life. The scripts aren't optional
convenience — they're the load-bearing answer to the pattern's primary
failure mode.
convenience — they're the load-bearing automation that lets the pattern
actually compound over months and years instead of requiring a
disciplined human to keep it alive.
---
@@ -305,26 +323,32 @@ This repo borrows that wholesale.
---
## Honest residual trade-offs
## What memex deliberately doesn't try to do
Five items from the analysis that this repo doesn't fully solve and
where you should know the limits:
Five things memex is explicitly scoped around — not because they're
unsolvable, but because solving them well requires a different kind of
architecture than a personal/small-team wiki. If any of these are
dealbreakers for your use case, memex is probably not the right fit:
1. **Enterprise scale**this is a personal/small-team tool. Millions
of documents, hundreds of users, RBAC, compliance: wrong
architecture.
1. **Enterprise scale**millions of documents, hundreds of users,
RBAC, compliance: these need real enterprise knowledge management
infrastructure. memex is tuned for personal and small-team use.
2. **True semantic retrieval at massive scale**`qmd` hybrid search
is great for thousands of pages, not millions.
3. **Cognitive outsourcing** — no architecture fix. Discipline
yourself to read your own wiki, not just query it through the agent.
4. **Precision-critical domains** — for legal/medical/regulatory data,
use this as a drafting tool, not a source of truth. Human
domain-expert review is not replaceable.
5. **Access control** — network boundary (Tailscale) is the fastest
path; nothing in the repo itself enforces permissions.
works great up to thousands of pages. At millions, a dedicated
vector database with specialized retrieval wins.
3. **Replacing your own learning** — memex is an augmentation layer,
not a substitute for reading. Used well, it's a memory aid; used as
a bypass, it just lets you forget more.
4. **Precision-critical source of truth** — for legal, medical, or
regulatory data, memex is a drafting tool. Human domain-expert
review still owns the final call.
5. **Access control** — the network boundary (Tailscale) is the
fastest path to "only authorized people can reach it." memex itself
doesn't enforce permissions inside that boundary.
If any of these are dealbreakers for your use case, a different
architecture is probably what you need.
These are scope decisions, not unfinished work. memex is the best
personal/small-team answer to Karpathy's pattern I could build; it's
not trying to be every answer.
---

File diff suppressed because it is too large Load Diff