Replace all four references to the Claude public artifact URL with the self-hosted version at eric-turner.com/memex/signal-and-noise.html plus the offline-capable archive at docs/artifacts/signal-and-noise.html. The Claude artifact can now be unpublished without breaking any links in the repo. The self-hosted HTML is deployed to the Hugo site's static directory and lives alongside the archived copy in this repo — either can stand on its own.
444 lines
21 KiB
Markdown
444 lines
21 KiB
Markdown
# memex — Compounding Knowledge for AI Agents
|
|
|
|
A persistent, LLM-maintained knowledge base that sits between you and the
|
|
sources it was compiled from. Unlike RAG — which re-discovers the same
|
|
answers on every query — memex **gets richer over time**. Facts get
|
|
cross-referenced, contradictions get flagged, stale advice ages out and
|
|
gets archived, and new knowledge discovered during a session gets written
|
|
back so it's there next time.
|
|
|
|
The agent reads the wiki at the start of every session and updates it as
|
|
new things are learned. The wiki is the long-term memory; the session is
|
|
the working memory.
|
|
|
|
> **Why "memex"?** In 1945, Vannevar Bush wrote
|
|
> [*As We May Think*](https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/)
|
|
> describing a hypothetical machine called the **memex** (a portmanteau
|
|
> of "memory" and "index") that would store and cross-reference a
|
|
> person's entire library of books, records, and communications, with
|
|
> "associative trails" linking related ideas. He imagined someone using
|
|
> it would build up a personal knowledge web over a lifetime, and that
|
|
> the trails themselves — the network of learned associations — were
|
|
> more valuable than any individual document.
|
|
>
|
|
> Eighty years later, LLMs make the memex finally buildable. The
|
|
> "associative trails" Bush imagined are the `related:` frontmatter
|
|
> fields and wikilinks the agent maintains. This repo is one attempt
|
|
> at that.
|
|
|
|
> **Inspiration**: memex combines
|
|
> [Andrej Karpathy's persistent-wiki gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)
|
|
> and [milla-jovovich/mempalace](https://github.com/milla-jovovich/mempalace),
|
|
> and adds an automation layer on top so the wiki maintains itself.
|
|
|
|
---
|
|
|
|
## The problem with stateless RAG
|
|
|
|
Most people's experience with LLMs and documents looks like RAG: you upload
|
|
files, the LLM retrieves chunks at query time, generates an answer, done.
|
|
This works — but the LLM is rediscovering knowledge from scratch on every
|
|
question. There's no accumulation.
|
|
|
|
Ask the same subtle question twice and the LLM does all the same work twice.
|
|
Ask something that requires synthesizing five documents and the LLM has to
|
|
find and piece together the relevant fragments every time. Nothing is built
|
|
up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.
|
|
|
|
Worse, raw sources go stale. URLs rot. Documentation lags. Blog posts
|
|
get retracted. If your knowledge base is "the original documents,"
|
|
stale advice keeps showing up alongside current advice and there's no way
|
|
to know which is which.
|
|
|
|
## The core idea — a compounding wiki
|
|
|
|
Instead of retrieving from raw documents at query time, the LLM
|
|
**incrementally builds and maintains a persistent wiki** — a structured,
|
|
interlinked collection of markdown files that sits between you and the
|
|
raw sources.
|
|
|
|
When a new source shows up (a doc page, a blog post, a CLI `--help`, a
|
|
conversation transcript), the LLM doesn't just index it. It reads it,
|
|
extracts what's load-bearing, and integrates it into the existing wiki —
|
|
updating topic pages, revising summaries, noting where new data
|
|
contradicts old claims, strengthening or challenging the evolving
|
|
synthesis. The knowledge is compiled once and then *kept current*, not
|
|
re-derived on every query.
|
|
|
|
This is the key difference: **the wiki is a persistent, compounding
|
|
artifact.** The cross-references are already there. The contradictions have
|
|
already been flagged. The synthesis already reflects everything the LLM
|
|
has read. The wiki gets richer with every source added and every question
|
|
asked.
|
|
|
|
You never (or rarely) write the wiki yourself. The LLM writes and maintains
|
|
all of it. You're in charge of sourcing, exploration, and asking the right
|
|
questions. The LLM does the summarizing, cross-referencing, filing, and
|
|
bookkeeping that make a knowledge base actually useful over time.
|
|
|
|
---
|
|
|
|
## What this adds beyond Karpathy's gist
|
|
|
|
Karpathy's gist describes the *idea* — a wiki the agent maintains. This
|
|
repo is a working implementation with an automation layer that handles the
|
|
lifecycle of knowledge, not just its creation:
|
|
|
|
| Layer | What it does |
|
|
|-------|--------------|
|
|
| **Conversation mining** | Extracts Claude Code session transcripts into searchable markdown. Summarizes them via `claude -p` with model routing (haiku for short sessions, sonnet for long ones). Links summaries to wiki pages by topic. |
|
|
| **URL harvesting** | Scans summarized conversations for external reference URLs. Fetches them via `trafilatura` → `crawl4ai` → stealth mode cascade. Compiles clean markdown into pending wiki pages. |
|
|
| **Human-in-the-loop staging** | Automated content lands in `staging/` with `status: pending`. You review via CLI, interactive prompts, or an in-session Claude review. Nothing automated goes live without approval. |
|
|
| **Staleness decay** | Every page tracks `last_verified`. After 6 months without a refresh signal, confidence decays `high → medium`; 9 months → `low`; 12 months → `stale` → auto-archived. |
|
|
| **Auto-restoration** | Archived pages that get referenced again in new conversations or wiki updates are automatically restored. |
|
|
| **Hygiene** | Daily structural checks (orphans, broken cross-refs, index drift, frontmatter repair). Weekly LLM-powered checks (duplicates, contradictions, missing cross-references). |
|
|
| **Orchestrator** | One script chains all of the above into a daily cron-able pipeline. |
|
|
|
|
The result: you don't have to maintain the wiki. You just *use* it. The
|
|
automation handles harvesting new knowledge, retiring old knowledge,
|
|
keeping cross-references intact, and flagging ambiguity for review.
|
|
|
|
---
|
|
|
|
## Why each part exists
|
|
|
|
Before implementing anything, the design was worked out interactively
|
|
with Claude as a
|
|
[Signal & Noise analysis of Karpathy's pattern](https://eric-turner.com/memex/signal-and-noise.html).
|
|
That analysis found seven real weaknesses in the core pattern. This
|
|
repo exists because each weakness has a concrete mitigation — and
|
|
every component maps directly to one:
|
|
|
|
| Karpathy-pattern weakness | How this repo answers it |
|
|
|---------------------------|--------------------------|
|
|
| **Errors persist and compound** | `confidence` field with time-based decay → pages age out visibly. Staging review catches automated content before it goes live. Full-mode hygiene does LLM contradiction detection. |
|
|
| **Hard ~50K-token ceiling** | `qmd` (BM25 + vector + re-ranking) set up from day one. Wing/room structural filtering narrows search before retrieval. Archive collection is excluded from default search. |
|
|
| **Manual cross-checking returns** | Every wiki claim traces back to immutable `raw/harvested/*.md` with SHA-256 hash. Staging review IS the cross-check. `compilation_notes` field makes review fast. |
|
|
| **Knowledge staleness** (the #1 failure mode in community data) | Daily + weekly cron removes "I forgot" as a failure mode. `last_verified` auto-refreshes from conversation references. Decayed pages auto-archive. |
|
|
| **Cognitive outsourcing risk** | Staging review forces engagement with every automated page. `qmd query` makes retrieval an active exploration. Wake-up briefing ~200 tokens the human reads too. |
|
|
| **Weaker semantic retrieval** | `qmd` hybrid (BM25 + vector). Full-mode hygiene adds missing cross-references. Structural metadata (wings, rooms) complements semantic search. |
|
|
| **No access control** | Git sync with `merge=union` markdown handling. Network-boundary ACL via Tailscale is the suggested path. *This one is a residual trade-off — see [DESIGN-RATIONALE.md](docs/DESIGN-RATIONALE.md).* |
|
|
|
|
The short version: Karpathy published the idea, the community found the
|
|
holes, and this repo is the automation layer that plugs the holes.
|
|
See **[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md)** for the
|
|
full argument with honest residual trade-offs and what this repo
|
|
explicitly does NOT solve.
|
|
|
|
---
|
|
|
|
## Compounding loop
|
|
|
|
```
|
|
┌─────────────────────┐
|
|
│ Claude Code │
|
|
│ sessions (.jsonl) │
|
|
└──────────┬──────────┘
|
|
│ extract-sessions.py (hourly, no LLM)
|
|
▼
|
|
┌─────────────────────┐
|
|
│ conversations/ │ markdown transcripts
|
|
│ <project>/*.md │ (status: extracted)
|
|
└──────────┬──────────┘
|
|
│ summarize-conversations.py --claude (daily)
|
|
▼
|
|
┌─────────────────────┐
|
|
│ conversations/ │ summaries with related: wiki links
|
|
│ <project>/*.md │ (status: summarized)
|
|
└──────────┬──────────┘
|
|
│ wiki-harvest.py (daily)
|
|
▼
|
|
┌─────────────────────┐
|
|
│ raw/harvested/ │ fetched URL content
|
|
│ *.md │ (immutable source material)
|
|
└──────────┬──────────┘
|
|
│ claude -p compile step
|
|
▼
|
|
┌─────────────────────┐
|
|
│ staging/<type>/ │ pending pages
|
|
│ *.md │ (status: pending, origin: automated)
|
|
└──────────┬──────────┘
|
|
│ human review (wiki-staging.py --review)
|
|
▼
|
|
┌─────────────────────┐
|
|
│ patterns/ │ LIVE wiki
|
|
│ decisions/ │ (origin: manual or promoted-from-automated)
|
|
│ concepts/ │
|
|
│ environments/ │
|
|
└──────────┬──────────┘
|
|
│ wiki-hygiene.py (daily quick / weekly full)
|
|
│ - refresh last_verified from new conversations
|
|
│ - decay confidence on idle pages
|
|
│ - auto-restore archived pages referenced again
|
|
│ - fuzzy-fix broken cross-references
|
|
▼
|
|
┌─────────────────────┐
|
|
│ archive/<type>/ │ stale/superseded content
|
|
│ *.md │ (excluded from default search)
|
|
└─────────────────────┘
|
|
```
|
|
|
|
Every arrow is automated. The only human step is staging review — and
|
|
that's quick because the AI compilation step already wrote the page, you
|
|
just approve or reject.
|
|
|
|
---
|
|
|
|
## Quick start — two paths
|
|
|
|
### Path A: just the idea (Karpathy-style)
|
|
|
|
Open a Claude Code session in an empty directory and tell it:
|
|
|
|
```
|
|
I want you to start maintaining a persistent knowledge wiki for me.
|
|
Create a directory structure with patterns/, decisions/, concepts/, and
|
|
environments/ subdirectories. Each page should have YAML frontmatter with
|
|
title, type, confidence, sources, related, last_compiled, and last_verified
|
|
fields. Create an index.md at the root that catalogs every page.
|
|
|
|
From now on, when I share a source (a doc page, a CLI --help, a conversation
|
|
I had), read it, extract what's load-bearing, and integrate it into the
|
|
wiki. Update existing pages when new knowledge refines them. Flag
|
|
contradictions between pages. Create new pages when topics aren't
|
|
covered yet. Update index.md every time you create or remove a page.
|
|
|
|
When I ask a question, read the relevant wiki pages first, then answer.
|
|
If you rely on a wiki page with `confidence: low`, flag that to me.
|
|
```
|
|
|
|
That's the whole idea. The agent will build you a growing markdown tree
|
|
that compounds over time. This is the minimum viable version.
|
|
|
|
### Path B: the full automation (this repo)
|
|
|
|
```bash
|
|
git clone <this-repo> ~/projects/wiki
|
|
cd ~/projects/wiki
|
|
|
|
# Install the Python extraction tools
|
|
pipx install trafilatura
|
|
pipx install crawl4ai && crawl4ai-setup
|
|
|
|
# Install qmd for full-text + vector search
|
|
npm install -g @tobilu/qmd
|
|
|
|
# Configure qmd (3 collections — see docs/SETUP.md for the YAML)
|
|
# Edit scripts/extract-sessions.py with your project codes
|
|
# Edit scripts/update-conversation-index.py with matching display names
|
|
|
|
# Copy the example CLAUDE.md files (wiki schema + global instructions)
|
|
cp docs/examples/wiki-CLAUDE.md CLAUDE.md
|
|
cat docs/examples/global-CLAUDE.md >> ~/.claude/CLAUDE.md
|
|
# edit both for your conventions
|
|
|
|
# Run the full pipeline once, manually
|
|
bash scripts/mine-conversations.sh --extract-only # Fast, no LLM
|
|
python3 scripts/summarize-conversations.py --claude # Classify + summarize
|
|
python3 scripts/update-conversation-index.py --reindex
|
|
|
|
# Then maintain
|
|
bash scripts/wiki-maintain.sh # Daily hygiene
|
|
bash scripts/wiki-maintain.sh --hygiene-only --full # Weekly deep pass
|
|
```
|
|
|
|
See [`docs/SETUP.md`](docs/SETUP.md) for complete setup including qmd
|
|
configuration (three collections: `wiki`, `wiki-archive`,
|
|
`wiki-conversations`), optional cron schedules, git sync, and the
|
|
post-merge hook. See [`docs/examples/`](docs/examples/) for starter
|
|
`CLAUDE.md` files (wiki schema + global instructions) with explicit
|
|
guidance on using the three qmd collections.
|
|
|
|
---
|
|
|
|
## Directory layout after setup
|
|
|
|
```
|
|
wiki/
|
|
├── CLAUDE.md ← Schema + instructions the agent reads every session
|
|
├── index.md ← Content catalog (the agent reads this first)
|
|
├── patterns/ ← HOW things should be built (LIVE)
|
|
├── decisions/ ← WHY we chose this approach (LIVE)
|
|
├── concepts/ ← WHAT the foundational ideas are (LIVE)
|
|
├── environments/ ← WHERE implementations differ (LIVE)
|
|
├── staging/ ← PENDING automated content awaiting review
|
|
│ ├── index.md
|
|
│ └── <type>/
|
|
├── archive/ ← STALE / superseded (excluded from search)
|
|
│ ├── index.md
|
|
│ └── <type>/
|
|
├── raw/ ← Immutable source material (never modified)
|
|
│ ├── <topic>/
|
|
│ └── harvested/ ← URL harvester output
|
|
├── conversations/ ← Mined Claude Code session transcripts
|
|
│ ├── index.md
|
|
│ └── <project>/
|
|
├── context/ ← Auto-updated AI session briefing
|
|
│ ├── wake-up.md ← Loaded at the start of every session
|
|
│ └── active-concerns.md ← Current blockers and focus areas
|
|
├── reports/ ← Hygiene operation logs
|
|
├── scripts/ ← The automation pipeline
|
|
├── tests/ ← Pytest suite (171 tests)
|
|
├── .harvest-state.json ← URL dedup state (committed, synced)
|
|
├── .hygiene-state.json ← Content hashes, deferred issues (committed, synced)
|
|
└── .mine-state.json ← Conversation extraction offsets (gitignored, per-machine)
|
|
```
|
|
|
|
---
|
|
|
|
## What's Claude-specific (and what isn't)
|
|
|
|
This repo is built around **Claude Code** as the agent. Specifically:
|
|
|
|
1. **Session mining** expects `~/.claude/projects/<hashed-path>/*.jsonl`
|
|
files written by the Claude Code CLI. Other agents won't produce these.
|
|
2. **Summarization** uses `claude -p` (the Claude Code CLI's one-shot mode)
|
|
with haiku/sonnet routing by conversation length. Other LLM CLIs would
|
|
need a different wrapper.
|
|
3. **URL compilation** uses `claude -p` to turn raw harvested content into
|
|
a wiki page with proper frontmatter.
|
|
4. **The agent itself** (the thing that reads `CLAUDE.md` and maintains the
|
|
wiki conversationally) is Claude Code. Any agent that reads markdown
|
|
and can write files could do this job — `CLAUDE.md` is just a text
|
|
file telling the agent what the wiki's conventions are.
|
|
|
|
**What's NOT Claude-specific**:
|
|
|
|
- The wiki schema (frontmatter, directory layout, lifecycle states)
|
|
- The staleness decay model and archive/restore semantics
|
|
- The human-in-the-loop staging workflow
|
|
- The hygiene checks (orphans, broken cross-refs, duplicates)
|
|
- The `trafilatura` + `crawl4ai` URL fetching
|
|
- The qmd search integration
|
|
- The git-based cross-machine sync
|
|
|
|
If you use a different agent, you replace parts **1-4** above with
|
|
equivalents for your agent. The other 80% of the repo is agent-agnostic.
|
|
See [`docs/CUSTOMIZE.md`](docs/CUSTOMIZE.md) for concrete adaptation
|
|
recipes.
|
|
|
|
---
|
|
|
|
## Architecture at a glance
|
|
|
|
Eleven scripts organized in three layers:
|
|
|
|
**Mining layer** (ingests conversations):
|
|
- `extract-sessions.py` — Parse Claude Code JSONL → markdown transcripts
|
|
- `summarize-conversations.py` — Classify + summarize via `claude -p`
|
|
- `update-conversation-index.py` — Regenerate conversation index + wake-up context
|
|
|
|
**Automation layer** (maintains the wiki):
|
|
- `wiki_lib.py` — Shared frontmatter parser, `WikiPage` dataclass, constants
|
|
- `wiki-harvest.py` — URL classification + fetch cascade + compile to staging
|
|
- `wiki-staging.py` — Human review (list/promote/reject/review/sync)
|
|
- `wiki-hygiene.py` — Quick + full hygiene checks, archival, auto-restore
|
|
- `wiki-maintain.sh` — Top-level orchestrator chaining harvest + hygiene
|
|
|
|
**Sync layer**:
|
|
- `wiki-sync.sh` — Git commit/pull/push with merge-union markdown handling
|
|
- `mine-conversations.sh` — Mining orchestrator
|
|
|
|
See [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) for a deeper tour.
|
|
|
|
---
|
|
|
|
## Why markdown, not a real database?
|
|
|
|
Markdown files are:
|
|
|
|
- **Human-readable without any tooling** — you can browse in Obsidian, VS Code, or `cat`
|
|
- **Git-native** — full history, branching, rollback, cross-machine sync for free
|
|
- **Agent-friendly** — every LLM was trained on markdown, so reading and writing it is free
|
|
- **Durable** — no schema migrations, no database corruption, no vendor lock-in
|
|
- **Interoperable** — Obsidian graph view, `grep`, `qmd`, `ripgrep`, any editor
|
|
|
|
A SQLite file with the same content would be faster to query but harder
|
|
to browse, harder to merge, harder to audit, and fundamentally less
|
|
*collaborative* between you and the agent. Markdown wins for knowledge
|
|
management what Postgres wins for transactions.
|
|
|
|
---
|
|
|
|
## Testing
|
|
|
|
Full pytest suite in `tests/` — 171 tests across all scripts, runs in
|
|
**~1.3 seconds**, no network or LLM calls needed, works on macOS and
|
|
Linux/WSL.
|
|
|
|
```bash
|
|
cd tests && python3 -m pytest
|
|
# or
|
|
bash tests/run.sh
|
|
```
|
|
|
|
The test suite uses a disposable `tmp_wiki` fixture so no test ever
|
|
touches your real wiki.
|
|
|
|
---
|
|
|
|
## Credits and inspiration
|
|
|
|
This repo is a synthesis of two existing ideas with an automation layer
|
|
on top. It would not exist without either of them.
|
|
|
|
**Core pattern — [Andrej Karpathy — "Agent-Maintained Persistent Wiki" gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)**
|
|
The foundational idea of a compounding LLM-maintained wiki that moves
|
|
synthesis from query-time (RAG) to ingest-time. This repo is an
|
|
implementation of Karpathy's pattern with the community-identified
|
|
failure modes plugged.
|
|
|
|
**Structural memory taxonomy — [milla-jovovich/mempalace](https://github.com/milla-jovovich/mempalace)**
|
|
The wing/room/hall/closet/drawer/tunnel concepts that turn a flat
|
|
corpus into something you can navigate without reading everything. See
|
|
[`ARCHITECTURE.md#borrowed-concepts`](docs/ARCHITECTURE.md#borrowed-concepts)
|
|
for the explicit mapping of MemPalace terms to this repo's
|
|
implementation.
|
|
|
|
**Search layer — [qmd](https://github.com/tobi/qmd)** by Tobi Lütke
|
|
(Shopify CEO). Local BM25 + vector + LLM re-ranking on markdown files.
|
|
Chosen over ChromaDB because it uses the same storage format as the
|
|
wiki — one index to maintain, not two. Explicitly recommended by
|
|
Karpathy as well.
|
|
|
|
**URL extraction stack** — [trafilatura](https://github.com/adbar/trafilatura)
|
|
for fast static-page extraction and [crawl4ai](https://github.com/unclecode/crawl4ai)
|
|
for JS-rendered and anti-bot cases. The two-tool cascade handles
|
|
essentially any web content without needing a full browser stack for
|
|
simple pages.
|
|
|
|
**The agent** — [Claude Code](https://claude.com/claude-code) by Anthropic.
|
|
The repo is Claude-specific (see the section above for what that means
|
|
and how to adapt for other agents).
|
|
|
|
**Design process** — this repo was designed interactively with Claude
|
|
as a structured Signal & Noise analysis before any code was written.
|
|
The analysis walks through the seven real strengths and seven real
|
|
weaknesses of Karpathy's pattern, then works through concrete
|
|
mitigations for each weakness. Every component in this repo maps back
|
|
to a specific mitigation identified there.
|
|
|
|
- **Live interactive version**:
|
|
[eric-turner.com/memex/signal-and-noise.html](https://eric-turner.com/memex/signal-and-noise.html)
|
|
— click tabs to explore pros/cons, vs RAG, use-case fits, signal
|
|
breakdown, and mitigations
|
|
- **Self-contained archive in this repo**:
|
|
[`docs/artifacts/signal-and-noise.html`](docs/artifacts/signal-and-noise.html)
|
|
— download and open locally; works offline
|
|
- **Condensed written version**:
|
|
[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md)
|
|
— every tradeoff and mitigation rendered as prose
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
MIT — see [`LICENSE`](LICENSE).
|
|
|
|
## Contributing
|
|
|
|
This is a personal project that I'm making public in case the pattern is
|
|
useful to others. Issues and PRs welcome, but I make no promises about
|
|
response time. If you fork and make it your own, I'd love to hear how you
|
|
adapted it.
|