Initial commit — memex
A compounding LLM-maintained knowledge wiki. Synthesis of Andrej Karpathy's persistent-wiki gist and milla-jovovich's mempalace, with an automation layer on top for conversation mining, URL harvesting, human-in-the-loop staging, staleness decay, and hygiene. Includes: - 11 pipeline scripts (extract, summarize, index, harvest, stage, hygiene, maintain, sync, + shared library) - Full docs: README, SETUP, ARCHITECTURE, DESIGN-RATIONALE, CUSTOMIZE - Example CLAUDE.md files (wiki schema + global instructions) tuned for the three-collection qmd setup - 171-test pytest suite (cross-platform, runs in ~1.3s) - MIT licensed
This commit is contained in:
421
README.md
Normal file
421
README.md
Normal file
@@ -0,0 +1,421 @@
|
||||
# LLM Wiki — Compounding Knowledge for AI Agents
|
||||
|
||||
A persistent, LLM-maintained knowledge base that sits between you and the
|
||||
sources it was compiled from. Unlike RAG — which re-discovers the same
|
||||
answers on every query — the wiki **gets richer over time**. Facts get
|
||||
cross-referenced, contradictions get flagged, stale advice ages out and
|
||||
gets archived, and new knowledge discovered during a session gets written
|
||||
back so it's there next time.
|
||||
|
||||
The agent reads the wiki at the start of every session and updates it as
|
||||
new things are learned. The wiki is the long-term memory; the session is
|
||||
the working memory.
|
||||
|
||||
> **Inspiration**: this combines the ideas from
|
||||
> [Andrej Karpathy's persistent-wiki gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)
|
||||
> and [milla-jovovich/mempalace](https://github.com/milla-jovovich/mempalace),
|
||||
> and adds an automation layer on top so the wiki maintains itself.
|
||||
|
||||
---
|
||||
|
||||
## The problem with stateless RAG
|
||||
|
||||
Most people's experience with LLMs and documents looks like RAG: you upload
|
||||
files, the LLM retrieves chunks at query time, generates an answer, done.
|
||||
This works — but the LLM is rediscovering knowledge from scratch on every
|
||||
question. There's no accumulation.
|
||||
|
||||
Ask the same subtle question twice and the LLM does all the same work twice.
|
||||
Ask something that requires synthesizing five documents and the LLM has to
|
||||
find and piece together the relevant fragments every time. Nothing is built
|
||||
up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.
|
||||
|
||||
Worse, raw sources go stale. URLs rot. Documentation lags. Blog posts
|
||||
get retracted. If your knowledge base is "the original documents,"
|
||||
stale advice keeps showing up alongside current advice and there's no way
|
||||
to know which is which.
|
||||
|
||||
## The core idea — a compounding wiki
|
||||
|
||||
Instead of retrieving from raw documents at query time, the LLM
|
||||
**incrementally builds and maintains a persistent wiki** — a structured,
|
||||
interlinked collection of markdown files that sits between you and the
|
||||
raw sources.
|
||||
|
||||
When a new source shows up (a doc page, a blog post, a CLI `--help`, a
|
||||
conversation transcript), the LLM doesn't just index it. It reads it,
|
||||
extracts what's load-bearing, and integrates it into the existing wiki —
|
||||
updating topic pages, revising summaries, noting where new data
|
||||
contradicts old claims, strengthening or challenging the evolving
|
||||
synthesis. The knowledge is compiled once and then *kept current*, not
|
||||
re-derived on every query.
|
||||
|
||||
This is the key difference: **the wiki is a persistent, compounding
|
||||
artifact.** The cross-references are already there. The contradictions have
|
||||
already been flagged. The synthesis already reflects everything the LLM
|
||||
has read. The wiki gets richer with every source added and every question
|
||||
asked.
|
||||
|
||||
You never (or rarely) write the wiki yourself. The LLM writes and maintains
|
||||
all of it. You're in charge of sourcing, exploration, and asking the right
|
||||
questions. The LLM does the summarizing, cross-referencing, filing, and
|
||||
bookkeeping that make a knowledge base actually useful over time.
|
||||
|
||||
---
|
||||
|
||||
## What this adds beyond Karpathy's gist
|
||||
|
||||
Karpathy's gist describes the *idea* — a wiki the agent maintains. This
|
||||
repo is a working implementation with an automation layer that handles the
|
||||
lifecycle of knowledge, not just its creation:
|
||||
|
||||
| Layer | What it does |
|
||||
|-------|--------------|
|
||||
| **Conversation mining** | Extracts Claude Code session transcripts into searchable markdown. Summarizes them via `claude -p` with model routing (haiku for short sessions, sonnet for long ones). Links summaries to wiki pages by topic. |
|
||||
| **URL harvesting** | Scans summarized conversations for external reference URLs. Fetches them via `trafilatura` → `crawl4ai` → stealth mode cascade. Compiles clean markdown into pending wiki pages. |
|
||||
| **Human-in-the-loop staging** | Automated content lands in `staging/` with `status: pending`. You review via CLI, interactive prompts, or an in-session Claude review. Nothing automated goes live without approval. |
|
||||
| **Staleness decay** | Every page tracks `last_verified`. After 6 months without a refresh signal, confidence decays `high → medium`; 9 months → `low`; 12 months → `stale` → auto-archived. |
|
||||
| **Auto-restoration** | Archived pages that get referenced again in new conversations or wiki updates are automatically restored. |
|
||||
| **Hygiene** | Daily structural checks (orphans, broken cross-refs, index drift, frontmatter repair). Weekly LLM-powered checks (duplicates, contradictions, missing cross-references). |
|
||||
| **Orchestrator** | One script chains all of the above into a daily cron-able pipeline. |
|
||||
|
||||
The result: you don't have to maintain the wiki. You just *use* it. The
|
||||
automation handles harvesting new knowledge, retiring old knowledge,
|
||||
keeping cross-references intact, and flagging ambiguity for review.
|
||||
|
||||
---
|
||||
|
||||
## Why each part exists
|
||||
|
||||
Before implementing anything, the design was worked out interactively
|
||||
with Claude as a [Signal & Noise analysis of Karpathy's
|
||||
pattern](https://claude.ai/public/artifacts/0f6e1d9b-3b8c-43df-99d7-3a4328a1620c).
|
||||
That analysis found seven real weaknesses in the core pattern. This
|
||||
repo exists because each weakness has a concrete mitigation — and
|
||||
every component maps directly to one:
|
||||
|
||||
| Karpathy-pattern weakness | How this repo answers it |
|
||||
|---------------------------|--------------------------|
|
||||
| **Errors persist and compound** | `confidence` field with time-based decay → pages age out visibly. Staging review catches automated content before it goes live. Full-mode hygiene does LLM contradiction detection. |
|
||||
| **Hard ~50K-token ceiling** | `qmd` (BM25 + vector + re-ranking) set up from day one. Wing/room structural filtering narrows search before retrieval. Archive collection is excluded from default search. |
|
||||
| **Manual cross-checking returns** | Every wiki claim traces back to immutable `raw/harvested/*.md` with SHA-256 hash. Staging review IS the cross-check. `compilation_notes` field makes review fast. |
|
||||
| **Knowledge staleness** (the #1 failure mode in community data) | Daily + weekly cron removes "I forgot" as a failure mode. `last_verified` auto-refreshes from conversation references. Decayed pages auto-archive. |
|
||||
| **Cognitive outsourcing risk** | Staging review forces engagement with every automated page. `qmd query` makes retrieval an active exploration. Wake-up briefing ~200 tokens the human reads too. |
|
||||
| **Weaker semantic retrieval** | `qmd` hybrid (BM25 + vector). Full-mode hygiene adds missing cross-references. Structural metadata (wings, rooms) complements semantic search. |
|
||||
| **No access control** | Git sync with `merge=union` markdown handling. Network-boundary ACL via Tailscale is the suggested path. *This one is a residual trade-off — see [DESIGN-RATIONALE.md](docs/DESIGN-RATIONALE.md).* |
|
||||
|
||||
The short version: Karpathy published the idea, the community found the
|
||||
holes, and this repo is the automation layer that plugs the holes.
|
||||
See **[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md)** for the
|
||||
full argument with honest residual trade-offs and what this repo
|
||||
explicitly does NOT solve.
|
||||
|
||||
---
|
||||
|
||||
## Compounding loop
|
||||
|
||||
```
|
||||
┌─────────────────────┐
|
||||
│ Claude Code │
|
||||
│ sessions (.jsonl) │
|
||||
└──────────┬──────────┘
|
||||
│ extract-sessions.py (hourly, no LLM)
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ conversations/ │ markdown transcripts
|
||||
│ <project>/*.md │ (status: extracted)
|
||||
└──────────┬──────────┘
|
||||
│ summarize-conversations.py --claude (daily)
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ conversations/ │ summaries with related: wiki links
|
||||
│ <project>/*.md │ (status: summarized)
|
||||
└──────────┬──────────┘
|
||||
│ wiki-harvest.py (daily)
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ raw/harvested/ │ fetched URL content
|
||||
│ *.md │ (immutable source material)
|
||||
└──────────┬──────────┘
|
||||
│ claude -p compile step
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ staging/<type>/ │ pending pages
|
||||
│ *.md │ (status: pending, origin: automated)
|
||||
└──────────┬──────────┘
|
||||
│ human review (wiki-staging.py --review)
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ patterns/ │ LIVE wiki
|
||||
│ decisions/ │ (origin: manual or promoted-from-automated)
|
||||
│ concepts/ │
|
||||
│ environments/ │
|
||||
└──────────┬──────────┘
|
||||
│ wiki-hygiene.py (daily quick / weekly full)
|
||||
│ - refresh last_verified from new conversations
|
||||
│ - decay confidence on idle pages
|
||||
│ - auto-restore archived pages referenced again
|
||||
│ - fuzzy-fix broken cross-references
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ archive/<type>/ │ stale/superseded content
|
||||
│ *.md │ (excluded from default search)
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
Every arrow is automated. The only human step is staging review — and
|
||||
that's quick because the AI compilation step already wrote the page, you
|
||||
just approve or reject.
|
||||
|
||||
---
|
||||
|
||||
## Quick start — two paths
|
||||
|
||||
### Path A: just the idea (Karpathy-style)
|
||||
|
||||
Open a Claude Code session in an empty directory and tell it:
|
||||
|
||||
```
|
||||
I want you to start maintaining a persistent knowledge wiki for me.
|
||||
Create a directory structure with patterns/, decisions/, concepts/, and
|
||||
environments/ subdirectories. Each page should have YAML frontmatter with
|
||||
title, type, confidence, sources, related, last_compiled, and last_verified
|
||||
fields. Create an index.md at the root that catalogs every page.
|
||||
|
||||
From now on, when I share a source (a doc page, a CLI --help, a conversation
|
||||
I had), read it, extract what's load-bearing, and integrate it into the
|
||||
wiki. Update existing pages when new knowledge refines them. Flag
|
||||
contradictions between pages. Create new pages when topics aren't
|
||||
covered yet. Update index.md every time you create or remove a page.
|
||||
|
||||
When I ask a question, read the relevant wiki pages first, then answer.
|
||||
If you rely on a wiki page with `confidence: low`, flag that to me.
|
||||
```
|
||||
|
||||
That's the whole idea. The agent will build you a growing markdown tree
|
||||
that compounds over time. This is the minimum viable version.
|
||||
|
||||
### Path B: the full automation (this repo)
|
||||
|
||||
```bash
|
||||
git clone <this-repo> ~/projects/wiki
|
||||
cd ~/projects/wiki
|
||||
|
||||
# Install the Python extraction tools
|
||||
pipx install trafilatura
|
||||
pipx install crawl4ai && crawl4ai-setup
|
||||
|
||||
# Install qmd for full-text + vector search
|
||||
npm install -g @tobilu/qmd
|
||||
|
||||
# Configure qmd (3 collections — see docs/SETUP.md for the YAML)
|
||||
# Edit scripts/extract-sessions.py with your project codes
|
||||
# Edit scripts/update-conversation-index.py with matching display names
|
||||
|
||||
# Copy the example CLAUDE.md files (wiki schema + global instructions)
|
||||
cp docs/examples/wiki-CLAUDE.md CLAUDE.md
|
||||
cat docs/examples/global-CLAUDE.md >> ~/.claude/CLAUDE.md
|
||||
# edit both for your conventions
|
||||
|
||||
# Run the full pipeline once, manually
|
||||
bash scripts/mine-conversations.sh --extract-only # Fast, no LLM
|
||||
python3 scripts/summarize-conversations.py --claude # Classify + summarize
|
||||
python3 scripts/update-conversation-index.py --reindex
|
||||
|
||||
# Then maintain
|
||||
bash scripts/wiki-maintain.sh # Daily hygiene
|
||||
bash scripts/wiki-maintain.sh --hygiene-only --full # Weekly deep pass
|
||||
```
|
||||
|
||||
See [`docs/SETUP.md`](docs/SETUP.md) for complete setup including qmd
|
||||
configuration (three collections: `wiki`, `wiki-archive`,
|
||||
`wiki-conversations`), optional cron schedules, git sync, and the
|
||||
post-merge hook. See [`docs/examples/`](docs/examples/) for starter
|
||||
`CLAUDE.md` files (wiki schema + global instructions) with explicit
|
||||
guidance on using the three qmd collections.
|
||||
|
||||
---
|
||||
|
||||
## Directory layout after setup
|
||||
|
||||
```
|
||||
wiki/
|
||||
├── CLAUDE.md ← Schema + instructions the agent reads every session
|
||||
├── index.md ← Content catalog (the agent reads this first)
|
||||
├── patterns/ ← HOW things should be built (LIVE)
|
||||
├── decisions/ ← WHY we chose this approach (LIVE)
|
||||
├── concepts/ ← WHAT the foundational ideas are (LIVE)
|
||||
├── environments/ ← WHERE implementations differ (LIVE)
|
||||
├── staging/ ← PENDING automated content awaiting review
|
||||
│ ├── index.md
|
||||
│ └── <type>/
|
||||
├── archive/ ← STALE / superseded (excluded from search)
|
||||
│ ├── index.md
|
||||
│ └── <type>/
|
||||
├── raw/ ← Immutable source material (never modified)
|
||||
│ ├── <topic>/
|
||||
│ └── harvested/ ← URL harvester output
|
||||
├── conversations/ ← Mined Claude Code session transcripts
|
||||
│ ├── index.md
|
||||
│ └── <project>/
|
||||
├── context/ ← Auto-updated AI session briefing
|
||||
│ ├── wake-up.md ← Loaded at the start of every session
|
||||
│ └── active-concerns.md ← Current blockers and focus areas
|
||||
├── reports/ ← Hygiene operation logs
|
||||
├── scripts/ ← The automation pipeline
|
||||
├── tests/ ← Pytest suite (171 tests)
|
||||
├── .harvest-state.json ← URL dedup state (committed, synced)
|
||||
├── .hygiene-state.json ← Content hashes, deferred issues (committed, synced)
|
||||
└── .mine-state.json ← Conversation extraction offsets (gitignored, per-machine)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## What's Claude-specific (and what isn't)
|
||||
|
||||
This repo is built around **Claude Code** as the agent. Specifically:
|
||||
|
||||
1. **Session mining** expects `~/.claude/projects/<hashed-path>/*.jsonl`
|
||||
files written by the Claude Code CLI. Other agents won't produce these.
|
||||
2. **Summarization** uses `claude -p` (the Claude Code CLI's one-shot mode)
|
||||
with haiku/sonnet routing by conversation length. Other LLM CLIs would
|
||||
need a different wrapper.
|
||||
3. **URL compilation** uses `claude -p` to turn raw harvested content into
|
||||
a wiki page with proper frontmatter.
|
||||
4. **The agent itself** (the thing that reads `CLAUDE.md` and maintains the
|
||||
wiki conversationally) is Claude Code. Any agent that reads markdown
|
||||
and can write files could do this job — `CLAUDE.md` is just a text
|
||||
file telling the agent what the wiki's conventions are.
|
||||
|
||||
**What's NOT Claude-specific**:
|
||||
|
||||
- The wiki schema (frontmatter, directory layout, lifecycle states)
|
||||
- The staleness decay model and archive/restore semantics
|
||||
- The human-in-the-loop staging workflow
|
||||
- The hygiene checks (orphans, broken cross-refs, duplicates)
|
||||
- The `trafilatura` + `crawl4ai` URL fetching
|
||||
- The qmd search integration
|
||||
- The git-based cross-machine sync
|
||||
|
||||
If you use a different agent, you replace parts **1-4** above with
|
||||
equivalents for your agent. The other 80% of the repo is agent-agnostic.
|
||||
See [`docs/CUSTOMIZE.md`](docs/CUSTOMIZE.md) for concrete adaptation
|
||||
recipes.
|
||||
|
||||
---
|
||||
|
||||
## Architecture at a glance
|
||||
|
||||
Eleven scripts organized in three layers:
|
||||
|
||||
**Mining layer** (ingests conversations):
|
||||
- `extract-sessions.py` — Parse Claude Code JSONL → markdown transcripts
|
||||
- `summarize-conversations.py` — Classify + summarize via `claude -p`
|
||||
- `update-conversation-index.py` — Regenerate conversation index + wake-up context
|
||||
|
||||
**Automation layer** (maintains the wiki):
|
||||
- `wiki_lib.py` — Shared frontmatter parser, `WikiPage` dataclass, constants
|
||||
- `wiki-harvest.py` — URL classification + fetch cascade + compile to staging
|
||||
- `wiki-staging.py` — Human review (list/promote/reject/review/sync)
|
||||
- `wiki-hygiene.py` — Quick + full hygiene checks, archival, auto-restore
|
||||
- `wiki-maintain.sh` — Top-level orchestrator chaining harvest + hygiene
|
||||
|
||||
**Sync layer**:
|
||||
- `wiki-sync.sh` — Git commit/pull/push with merge-union markdown handling
|
||||
- `mine-conversations.sh` — Mining orchestrator
|
||||
|
||||
See [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) for a deeper tour.
|
||||
|
||||
---
|
||||
|
||||
## Why markdown, not a real database?
|
||||
|
||||
Markdown files are:
|
||||
|
||||
- **Human-readable without any tooling** — you can browse in Obsidian, VS Code, or `cat`
|
||||
- **Git-native** — full history, branching, rollback, cross-machine sync for free
|
||||
- **Agent-friendly** — every LLM was trained on markdown, so reading and writing it is free
|
||||
- **Durable** — no schema migrations, no database corruption, no vendor lock-in
|
||||
- **Interoperable** — Obsidian graph view, `grep`, `qmd`, `ripgrep`, any editor
|
||||
|
||||
A SQLite file with the same content would be faster to query but harder
|
||||
to browse, harder to merge, harder to audit, and fundamentally less
|
||||
*collaborative* between you and the agent. Markdown wins for knowledge
|
||||
management what Postgres wins for transactions.
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
Full pytest suite in `tests/` — 171 tests across all scripts, runs in
|
||||
**~1.3 seconds**, no network or LLM calls needed, works on macOS and
|
||||
Linux/WSL.
|
||||
|
||||
```bash
|
||||
cd tests && python3 -m pytest
|
||||
# or
|
||||
bash tests/run.sh
|
||||
```
|
||||
|
||||
The test suite uses a disposable `tmp_wiki` fixture so no test ever
|
||||
touches your real wiki.
|
||||
|
||||
---
|
||||
|
||||
## Credits and inspiration
|
||||
|
||||
This repo is a synthesis of two existing ideas with an automation layer
|
||||
on top. It would not exist without either of them.
|
||||
|
||||
**Core pattern — [Andrej Karpathy — "Agent-Maintained Persistent Wiki" gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)**
|
||||
The foundational idea of a compounding LLM-maintained wiki that moves
|
||||
synthesis from query-time (RAG) to ingest-time. This repo is an
|
||||
implementation of Karpathy's pattern with the community-identified
|
||||
failure modes plugged.
|
||||
|
||||
**Structural memory taxonomy — [milla-jovovich/mempalace](https://github.com/milla-jovovich/mempalace)**
|
||||
The wing/room/hall/closet/drawer/tunnel concepts that turn a flat
|
||||
corpus into something you can navigate without reading everything. See
|
||||
[`ARCHITECTURE.md#borrowed-concepts`](docs/ARCHITECTURE.md#borrowed-concepts)
|
||||
for the explicit mapping of MemPalace terms to this repo's
|
||||
implementation.
|
||||
|
||||
**Search layer — [qmd](https://github.com/tobi/qmd)** by Tobi Lütke
|
||||
(Shopify CEO). Local BM25 + vector + LLM re-ranking on markdown files.
|
||||
Chosen over ChromaDB because it uses the same storage format as the
|
||||
wiki — one index to maintain, not two. Explicitly recommended by
|
||||
Karpathy as well.
|
||||
|
||||
**URL extraction stack** — [trafilatura](https://github.com/adbar/trafilatura)
|
||||
for fast static-page extraction and [crawl4ai](https://github.com/unclecode/crawl4ai)
|
||||
for JS-rendered and anti-bot cases. The two-tool cascade handles
|
||||
essentially any web content without needing a full browser stack for
|
||||
simple pages.
|
||||
|
||||
**The agent** — [Claude Code](https://claude.com/claude-code) by Anthropic.
|
||||
The repo is Claude-specific (see the section above for what that means
|
||||
and how to adapt for other agents).
|
||||
|
||||
**Design process** — this repo was designed interactively with Claude
|
||||
as a structured Signal & Noise analysis before any code was written.
|
||||
The interactive design artifact is here:
|
||||
[The LLM Wiki — Karpathy's Pattern — Signal & Noise](https://claude.ai/public/artifacts/0f6e1d9b-3b8c-43df-99d7-3a4328a1620c).
|
||||
That artifact walks through the seven real strengths and seven real
|
||||
weaknesses of the core pattern, then works through concrete mitigations
|
||||
for each weakness. Every component in this repo maps back to a specific
|
||||
mitigation identified there.
|
||||
[`docs/DESIGN-RATIONALE.md`](docs/DESIGN-RATIONALE.md) is the condensed
|
||||
version of that analysis as it applies to this implementation.
|
||||
|
||||
---
|
||||
|
||||
## License
|
||||
|
||||
MIT — see [`LICENSE`](LICENSE).
|
||||
|
||||
## Contributing
|
||||
|
||||
This is a personal project that I'm making public in case the pattern is
|
||||
useful to others. Issues and PRs welcome, but I make no promises about
|
||||
response time. If you fork and make it your own, I'd love to hear how you
|
||||
adapted it.
|
||||
Reference in New Issue
Block a user