docs: reframe as extensions + replace Signal & Noise artifact
Two changes, one commit: 1. Reframe "weaknesses" as "extensions memex adds": Karpathy's gist is a concept pitch, not an implementation. Reframe the seven places memex extends the pattern as engineering-layer additions rather than problems to fix. Cleaner narrative — memex builds on Karpathy's work instead of critiquing it. Touches README.md (Why each part exists + Credits) and DESIGN-RATIONALE.md (section titles, trade-off framing, biggest layer section, scope note at the end). 2. Replace docs/artifacts/signal-and-noise.html with the full upstream version: The earlier abbreviated copy dropped the MemPalace integration tab, the detailed mitigation steps with effort pips, the impact before/after cards, and the qmd vs ChromaDB comparison. This restores all of that. Also swaps self-references from "LLM Wiki" to "memex" while leaving external "LLM Wiki v2" community citations alone (those refer to a separate pattern and aren't ours to rename). The live hosted copy at eric-turner.com/memex/signal-and-noise.html has already been updated via scp — Hugo picks up static changes with --poll 1s so the public URL reflects this file immediately.
This commit is contained in:
62
README.md
62
README.md
@@ -100,30 +100,34 @@ keeping cross-references intact, and flagging ambiguity for review.
|
||||
|
||||
---
|
||||
|
||||
## Why each part exists
|
||||
## How memex extends Karpathy's pattern
|
||||
|
||||
Before implementing anything, the design was worked out interactively
|
||||
with Claude as a
|
||||
[Signal & Noise analysis of Karpathy's pattern](https://eric-turner.com/memex/signal-and-noise.html).
|
||||
That analysis found seven real weaknesses in the core pattern. This
|
||||
repo exists because each weakness has a concrete mitigation — and
|
||||
every component maps directly to one:
|
||||
with Claude as a structured
|
||||
[Signal & Noise analysis](https://eric-turner.com/memex/signal-and-noise.html).
|
||||
Karpathy's original gist is a concept pitch, not an implementation —
|
||||
he was explicit that he was sharing an "idea file" for others to build
|
||||
on. memex is one attempt at that build-out. The analysis identified
|
||||
seven places where the core idea needed an engineering layer to become
|
||||
practical day-to-day, and every automation component in this repo maps
|
||||
to one of those extensions:
|
||||
|
||||
| Karpathy-pattern weakness | How this repo answers it |
|
||||
|---------------------------|--------------------------|
|
||||
| **Errors persist and compound** | `confidence` field with time-based decay → pages age out visibly. Staging review catches automated content before it goes live. Full-mode hygiene does LLM contradiction detection. |
|
||||
| **Hard ~50K-token ceiling** | `qmd` (BM25 + vector + re-ranking) set up from day one. Wing/room structural filtering narrows search before retrieval. Archive collection is excluded from default search. |
|
||||
| **Manual cross-checking returns** | Every wiki claim traces back to immutable `raw/harvested/*.md` with SHA-256 hash. Staging review IS the cross-check. `compilation_notes` field makes review fast. |
|
||||
| **Knowledge staleness** (the #1 failure mode in community data) | Daily + weekly cron removes "I forgot" as a failure mode. `last_verified` auto-refreshes from conversation references. Decayed pages auto-archive. |
|
||||
| **Cognitive outsourcing risk** | Staging review forces engagement with every automated page. `qmd query` makes retrieval an active exploration. Wake-up briefing ~200 tokens the human reads too. |
|
||||
| **Weaker semantic retrieval** | `qmd` hybrid (BM25 + vector). Full-mode hygiene adds missing cross-references. Structural metadata (wings, rooms) complements semantic search. |
|
||||
| **No access control** | Git sync with `merge=union` markdown handling. Network-boundary ACL via Tailscale is the suggested path. *This one is a residual trade-off — see [DESIGN-RATIONALE.md](docs/DESIGN-RATIONALE.md).* |
|
||||
| What memex adds | How it works |
|
||||
|-----------------|--------------|
|
||||
| **Time-decaying confidence** — pages earn trust through reinforcement and fade without it | `confidence` field + `last_verified`, 6/9/12 month decay thresholds, auto-archive. Full-mode hygiene also adds LLM contradiction detection across pages. |
|
||||
| **Scalable search beyond the context window** | `qmd` (BM25 + vector + LLM re-ranking) from day one, with three collections (`wiki` / `wiki-archive` / `wiki-conversations`) so queries can route to the right surface. |
|
||||
| **Traceable sources for every claim** | Every compiled page traces back to an immutable `raw/harvested/*.md` file with a SHA-256 content hash. Staging review is the built-in cross-check, and `compilation_notes` makes review fast. |
|
||||
| **Continuous feed without manual discipline** | Daily + weekly cron chains extract → summarize → harvest → hygiene → reindex. `last_verified` auto-refreshes from new conversation references; decayed pages auto-archive and auto-restore when referenced again. |
|
||||
| **Human-in-the-loop staging** for automated content | Every automated page lands in `staging/` first with `origin: automated`, `status: pending`. Nothing bypasses human review — one promotion step and it's in the live wiki with `last_verified` set. |
|
||||
| **Hybrid retrieval** — structural navigation + semantic search | Wings/rooms/halls (borrowed from mempalace) give structural filtering that narrows the search space before qmd's hybrid BM25 + vector pass runs. Full-mode hygiene also auto-adds missing cross-references. |
|
||||
| **Cross-machine git sync** for collaborative knowledge bases | `.gitattributes` with `merge=union` on markdown so concurrent writes on different machines merge additively. Harvest and hygiene state files sync across machines so both agree on what's been processed. |
|
||||
|
||||
The short version: Karpathy published the idea, the community found the
|
||||
holes, and this repo is the automation layer that plugs the holes.
|
||||
See **[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md)** for the
|
||||
full argument with honest residual trade-offs and what this repo
|
||||
explicitly does NOT solve.
|
||||
The short version: Karpathy shared the idea, milla-jovovich's mempalace
|
||||
added the structural memory taxonomy, and memex is the automation layer
|
||||
that lets the whole thing run day-to-day without constant human
|
||||
maintenance. See **[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md)**
|
||||
for the longer rationale on each extension, plus honest notes on what
|
||||
memex doesn't cover.
|
||||
|
||||
---
|
||||
|
||||
@@ -384,9 +388,9 @@ on top. It would not exist without either of them.
|
||||
|
||||
**Core pattern — [Andrej Karpathy — "Agent-Maintained Persistent Wiki" gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)**
|
||||
The foundational idea of a compounding LLM-maintained wiki that moves
|
||||
synthesis from query-time (RAG) to ingest-time. This repo is an
|
||||
implementation of Karpathy's pattern with the community-identified
|
||||
failure modes plugged.
|
||||
synthesis from query-time (RAG) to ingest-time. memex is an
|
||||
implementation of Karpathy's pattern with the engineering layer that
|
||||
turns the concept into something practical to run day-to-day.
|
||||
|
||||
**Structural memory taxonomy — [milla-jovovich/mempalace](https://github.com/milla-jovovich/mempalace)**
|
||||
The wing/room/hall/closet/drawer/tunnel concepts that turn a flat
|
||||
@@ -411,12 +415,12 @@ simple pages.
|
||||
The repo is Claude-specific (see the section above for what that means
|
||||
and how to adapt for other agents).
|
||||
|
||||
**Design process** — this repo was designed interactively with Claude
|
||||
as a structured Signal & Noise analysis before any code was written.
|
||||
The analysis walks through the seven real strengths and seven real
|
||||
weaknesses of Karpathy's pattern, then works through concrete
|
||||
mitigations for each weakness. Every component in this repo maps back
|
||||
to a specific mitigation identified there.
|
||||
**Design process** — memex was designed interactively with Claude as a
|
||||
structured Signal & Noise analysis before any code was written. The
|
||||
analysis walks through the seven real strengths of Karpathy's pattern
|
||||
and seven places where it needs an engineering layer to be practical,
|
||||
and works through the concrete extension for each. Every component in
|
||||
this repo maps back to a specific extension identified there.
|
||||
|
||||
- **Live interactive version**:
|
||||
[eric-turner.com/memex/signal-and-noise.html](https://eric-turner.com/memex/signal-and-noise.html)
|
||||
|
||||
@@ -14,10 +14,11 @@ original persistent-wiki pattern:
|
||||
> — same content, works offline
|
||||
|
||||
The analysis walks through the pattern's seven genuine strengths, seven
|
||||
real weaknesses, and concrete mitigations for each weakness. This repo
|
||||
is the implementation of those mitigations. If you want to understand
|
||||
*why* a component exists, the interactive version has the longer-form
|
||||
argument; this document is the condensed written version.
|
||||
places where it needs an engineering layer to be practical, and the
|
||||
concrete extension for each. memex is the implementation of those
|
||||
extensions. If you want to understand *why* a component exists, the
|
||||
interactive version has the longer-form argument; this document is the
|
||||
condensed written version.
|
||||
|
||||
---
|
||||
|
||||
@@ -38,20 +39,24 @@ repo preserves all of them:
|
||||
|
||||
---
|
||||
|
||||
## Where the pattern is genuinely weak — and how this repo answers
|
||||
## Where memex extends the pattern
|
||||
|
||||
The analysis identified seven real weaknesses. Five have direct
|
||||
mitigations in this repo; two remain open trade-offs you should be aware
|
||||
of.
|
||||
Karpathy's gist is a concept pitch. He was explicit that he was sharing
|
||||
an "idea file" for others to build on, not publishing a working
|
||||
implementation. The analysis identified seven places where the core idea
|
||||
needs an engineering layer to become practical day-to-day — five have
|
||||
first-class answers in memex, and two remain scoped-out trade-offs that
|
||||
the architecture cleanly acknowledges.
|
||||
|
||||
### 1. Errors persist and compound
|
||||
### 1. Claim freshness and reversibility
|
||||
|
||||
**The problem**: Unlike RAG — where a hallucination is ephemeral and the
|
||||
next query starts clean — an LLM wiki persists its mistakes. If the LLM
|
||||
incorrectly links two concepts at ingest time, future ingests build on
|
||||
that wrong prior.
|
||||
**The gap**: Unlike RAG — where a hallucination is ephemeral and the
|
||||
next query starts clean — an LLM-maintained wiki is stateful. If a
|
||||
claim is wrong at ingest time, it stays wrong until something corrects
|
||||
it. For the pattern to work long-term, claims need a way to earn trust
|
||||
over time and lose it when unused.
|
||||
|
||||
**How this repo mitigates**:
|
||||
**How memex extends it**:
|
||||
|
||||
- **`confidence` field** — every page carries `high`/`medium`/`low` with
|
||||
decay based on `last_verified`. Wrong claims aren't treated as
|
||||
@@ -71,13 +76,14 @@ that wrong prior.
|
||||
two chances to get caught (AI compile + human review) before they
|
||||
become persistent.
|
||||
|
||||
### 2. Hard scale ceiling at ~50K tokens
|
||||
### 2. Scalable search beyond the context window
|
||||
|
||||
**The problem**: The wiki approach stops working when `index.md` no
|
||||
longer fits in context. Karpathy's own wiki was ~100 articles / 400K
|
||||
words — already near the ceiling.
|
||||
**The gap**: The pattern works beautifully up to ~100 articles, where
|
||||
`index.md` still fits in context. Karpathy's own wiki was right at the
|
||||
ceiling. Past that point, the agent needs a real search layer — loading
|
||||
the full index stops being practical.
|
||||
|
||||
**How this repo mitigates**:
|
||||
**How memex extends it**:
|
||||
|
||||
- **`qmd` from day one** — `qmd` (BM25 + vector + LLM re-ranking) is set
|
||||
up in the default configuration so the agent never has to load the
|
||||
@@ -93,14 +99,14 @@ words — already near the ceiling.
|
||||
`includeByDefault: false`, so archived pages don't eat context until
|
||||
explicitly queried.
|
||||
|
||||
### 3. Manual cross-checking burden returns in precision-critical domains
|
||||
### 3. Traceable sources for every claim
|
||||
|
||||
**The problem**: For API specs, version constraints, legal records, and
|
||||
medical protocols, LLM-generated content needs human verification. The
|
||||
maintenance burden you thought you'd eliminated comes back as
|
||||
verification overhead.
|
||||
**The gap**: In precision-sensitive domains (API specs, version
|
||||
constraints, legal records, medical protocols), LLM-generated content
|
||||
needs to be verifiable against a source. For the pattern to work in
|
||||
those contexts, every claim needs to trace back to something immutable.
|
||||
|
||||
**How this repo mitigates**:
|
||||
**How memex extends it**:
|
||||
|
||||
- **Staging workflow** — every automated page goes through human review.
|
||||
For precision-critical content, that review IS the cross-check. The
|
||||
@@ -120,15 +126,16 @@ medical, compliance), no amount of automation replaces domain-expert
|
||||
review. If that's your use case, treat this repo as a *drafting* tool,
|
||||
not a canonical source.
|
||||
|
||||
### 4. Knowledge staleness without active upkeep
|
||||
### 4. Continuous feed without manual discipline
|
||||
|
||||
**The problem**: Community analysis of 120+ comments on Karpathy's gist
|
||||
found this is the #1 failure mode. Most people who try the pattern get
|
||||
**The gap**: Community analysis of 120+ comments on Karpathy's gist
|
||||
converged on one clear finding: this is the #1 friction point. Most
|
||||
people who try the pattern get
|
||||
the folder structure right and still end up with a wiki that slowly
|
||||
becomes unreliable because they stop feeding it. Six-week half-life is
|
||||
typical.
|
||||
|
||||
**How this repo mitigates** (this is the biggest thing):
|
||||
**How memex extends it** (this is the biggest layer):
|
||||
|
||||
- **Automation replaces human discipline** — daily cron runs
|
||||
`wiki-maintain.sh` (harvest + hygiene + qmd reindex); weekly cron runs
|
||||
@@ -147,17 +154,20 @@ typical.
|
||||
flags the things that *do* need human judgment. Everything else is
|
||||
auto-fixed.
|
||||
|
||||
This is the single biggest reason this repo exists. The automation
|
||||
layer is entirely about removing "I forgot to lint" as a failure mode.
|
||||
This is the single biggest layer memex adds. Nothing about it is
|
||||
exotic — it's a cron-scheduled pipeline that runs the scripts you'd
|
||||
otherwise have to remember to run. That's the whole trick.
|
||||
|
||||
### 5. Cognitive outsourcing risk
|
||||
### 5. Keeping the human engaged with their own knowledge
|
||||
|
||||
**The problem**: Hacker News critics argued that the bookkeeping
|
||||
**The gap**: Hacker News critics pointed out that the bookkeeping
|
||||
Karpathy outsources — filing, cross-referencing, summarizing — is
|
||||
precisely where genuine understanding forms. Outsource it and you end up
|
||||
with a comprehensive wiki you haven't internalized.
|
||||
precisely where genuine understanding forms. If the LLM does all of
|
||||
it, you can end up with a comprehensive wiki you haven't internalized.
|
||||
For the pattern to be an actual memory aid and not a false one, the
|
||||
human needs touchpoints that keep them engaged.
|
||||
|
||||
**How this repo mitigates**:
|
||||
**How memex extends it**:
|
||||
|
||||
- **Staging review is a forcing function** — you see every automated
|
||||
page before it lands. Even skimming forces engagement with the
|
||||
@@ -169,19 +179,21 @@ with a comprehensive wiki you haven't internalized.
|
||||
the agent reads at session start. You read it too (or the agent reads
|
||||
it to you) — ongoing re-exposure to your own knowledge base.
|
||||
|
||||
**Residual trade-off**: This is a real concern even with mitigations.
|
||||
The wiki is designed as *augmentation*, not *replacement*. If you
|
||||
never read your own wiki and only consult it through the agent, you're
|
||||
in the outsourcing failure mode. The fix is discipline, not
|
||||
architecture.
|
||||
**Caveat**: memex is designed as *augmentation*, not *replacement*.
|
||||
It's most valuable when you engage with it actively — reading your own
|
||||
wake-up briefing, spot-checking promoted pages, noticing decay flags.
|
||||
If you only consult the wiki through the agent and never look at it
|
||||
yourself, you've outsourced the learning. That's a usage pattern
|
||||
choice, not an architecture problem.
|
||||
|
||||
### 6. Weaker semantic retrieval than RAG at scale
|
||||
### 6. Hybrid retrieval — structure and semantics
|
||||
|
||||
**The problem**: At large corpora, vector embeddings find semantically
|
||||
related content across different wording in ways explicit wikilinks
|
||||
can't match.
|
||||
**The gap**: Explicit wikilinks catch direct topic references but miss
|
||||
semantic neighbors that use different wording. At scale, the pattern
|
||||
benefits from vector similarity to find cross-topic connections the
|
||||
human (or the LLM at ingest time) didn't think to link manually.
|
||||
|
||||
**How this repo mitigates**:
|
||||
**How memex extends it**:
|
||||
|
||||
- **`qmd` is hybrid (BM25 + vector)** — not just keyword search. Vector
|
||||
similarity is built into the retrieval pipeline from day one.
|
||||
@@ -198,33 +210,38 @@ can't match.
|
||||
proper vector DB with specialized retrieval wins. This repo is for
|
||||
personal / small-team scale where the hybrid approach is sufficient.
|
||||
|
||||
### 7. No access control or multi-user support
|
||||
### 7. Cross-machine collaboration
|
||||
|
||||
**The problem**: It's a folder of markdown files. No RBAC, no audit
|
||||
logging, no concurrency handling, no permissions model.
|
||||
**The gap**: Karpathy's gist describes a single-user, single-machine
|
||||
setup. In practice, people work from multiple machines (laptop,
|
||||
workstation, server) and sometimes collaborate with small teams. The
|
||||
pattern needs a sync story that handles concurrent writes gracefully.
|
||||
|
||||
**How this repo mitigates**:
|
||||
**How memex extends it**:
|
||||
|
||||
- **Git-based sync with merge-union** — concurrent writes on different
|
||||
machines auto-resolve because markdown is set to `merge=union` in
|
||||
`.gitattributes`. Both sides win.
|
||||
- **Network boundary as soft access control** — the suggested
|
||||
deployment is over Tailscale or a VPN, so the network does the work a
|
||||
RBAC layer would otherwise do. Not enterprise-grade, but sufficient
|
||||
for personal/family/small-team use.
|
||||
- **State file sync** — `.harvest-state.json` and `.hygiene-state.json`
|
||||
are committed, so two machines running the same pipeline agree on
|
||||
what's already been processed instead of re-doing the work.
|
||||
- **Network boundary as access gate** — the suggested deployment is
|
||||
over Tailscale or a VPN, so the network enforces who can reach the
|
||||
wiki at all. Simple and sufficient for personal/family/small-team
|
||||
use.
|
||||
|
||||
**Residual trade-off**: **This is the big one.** The repo is not a
|
||||
replacement for enterprise knowledge management. No audit trails, no
|
||||
fine-grained permissions, no compliance story. If you need any of
|
||||
that, you need a different architecture. This repo is explicitly
|
||||
scoped to the personal/small-team use case.
|
||||
**Explicit scope**: memex is **deliberately not** enterprise knowledge
|
||||
management. No audit trails, no fine-grained permissions, no compliance
|
||||
story. If you need any of that, you need a different architecture.
|
||||
This is for the personal and small-team case where git + Tailscale is
|
||||
the right amount of rigor.
|
||||
|
||||
---
|
||||
|
||||
## The #1 failure mode — active upkeep
|
||||
## The biggest layer — active upkeep
|
||||
|
||||
Every other weakness has a mitigation. *Active upkeep is the one that
|
||||
kills wikis in the wild.* The community data is unambiguous:
|
||||
The other six extensions are important, but this is the one that makes
|
||||
or breaks the pattern in practice. The community data is unambiguous:
|
||||
|
||||
- People who automate the lint schedule → wikis healthy at 6+ months
|
||||
- People who rely on "I'll remember to lint" → wikis abandoned at 6 weeks
|
||||
@@ -243,8 +260,9 @@ thing the human has to think about:
|
||||
|
||||
If you disable all of these, you get the same outcome as every
|
||||
abandoned wiki: six-week half-life. The scripts aren't optional
|
||||
convenience — they're the load-bearing answer to the pattern's primary
|
||||
failure mode.
|
||||
convenience — they're the load-bearing automation that lets the pattern
|
||||
actually compound over months and years instead of requiring a
|
||||
disciplined human to keep it alive.
|
||||
|
||||
---
|
||||
|
||||
@@ -305,26 +323,32 @@ This repo borrows that wholesale.
|
||||
|
||||
---
|
||||
|
||||
## Honest residual trade-offs
|
||||
## What memex deliberately doesn't try to do
|
||||
|
||||
Five items from the analysis that this repo doesn't fully solve and
|
||||
where you should know the limits:
|
||||
Five things memex is explicitly scoped around — not because they're
|
||||
unsolvable, but because solving them well requires a different kind of
|
||||
architecture than a personal/small-team wiki. If any of these are
|
||||
dealbreakers for your use case, memex is probably not the right fit:
|
||||
|
||||
1. **Enterprise scale** — this is a personal/small-team tool. Millions
|
||||
of documents, hundreds of users, RBAC, compliance: wrong
|
||||
architecture.
|
||||
1. **Enterprise scale** — millions of documents, hundreds of users,
|
||||
RBAC, compliance: these need real enterprise knowledge management
|
||||
infrastructure. memex is tuned for personal and small-team use.
|
||||
2. **True semantic retrieval at massive scale** — `qmd` hybrid search
|
||||
is great for thousands of pages, not millions.
|
||||
3. **Cognitive outsourcing** — no architecture fix. Discipline
|
||||
yourself to read your own wiki, not just query it through the agent.
|
||||
4. **Precision-critical domains** — for legal/medical/regulatory data,
|
||||
use this as a drafting tool, not a source of truth. Human
|
||||
domain-expert review is not replaceable.
|
||||
5. **Access control** — network boundary (Tailscale) is the fastest
|
||||
path; nothing in the repo itself enforces permissions.
|
||||
works great up to thousands of pages. At millions, a dedicated
|
||||
vector database with specialized retrieval wins.
|
||||
3. **Replacing your own learning** — memex is an augmentation layer,
|
||||
not a substitute for reading. Used well, it's a memory aid; used as
|
||||
a bypass, it just lets you forget more.
|
||||
4. **Precision-critical source of truth** — for legal, medical, or
|
||||
regulatory data, memex is a drafting tool. Human domain-expert
|
||||
review still owns the final call.
|
||||
5. **Access control** — the network boundary (Tailscale) is the
|
||||
fastest path to "only authorized people can reach it." memex itself
|
||||
doesn't enforce permissions inside that boundary.
|
||||
|
||||
If any of these are dealbreakers for your use case, a different
|
||||
architecture is probably what you need.
|
||||
These are scope decisions, not unfinished work. memex is the best
|
||||
personal/small-team answer to Karpathy's pattern I could build; it's
|
||||
not trying to be every answer.
|
||||
|
||||
---
|
||||
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user