Initial commit — memex

A compounding LLM-maintained knowledge wiki.

Synthesis of Andrej Karpathy's persistent-wiki gist and milla-jovovich's
mempalace, with an automation layer on top for conversation mining, URL
harvesting, human-in-the-loop staging, staleness decay, and hygiene.

Includes:
- 11 pipeline scripts (extract, summarize, index, harvest, stage,
  hygiene, maintain, sync, + shared library)
- Full docs: README, SETUP, ARCHITECTURE, DESIGN-RATIONALE, CUSTOMIZE
- Example CLAUDE.md files (wiki schema + global instructions) tuned for
  the three-collection qmd setup
- 171-test pytest suite (cross-platform, runs in ~1.3s)
- MIT licensed
This commit is contained in:
Eric Turner
2026-04-12 21:16:02 -06:00
commit ee54a2f5d4
31 changed files with 10792 additions and 0 deletions

View File

@@ -0,0 +1,161 @@
# Global Claude Code Instructions — Wiki Section
**What this is**: Content to add to your global `~/.claude/CLAUDE.md`
(the user-level instructions Claude Code reads at the start of every
session, regardless of which project you're in). These instructions tell
Claude how to consult the wiki from outside the wiki directory.
**Where to paste it**: Append these sections to `~/.claude/CLAUDE.md`.
Don't overwrite the whole file — this is additive.
---
Copy everything below this line into your global `~/.claude/CLAUDE.md`:
---
## Wake-Up Context
At the start of each session, read `~/projects/wiki/context/wake-up.md`
for a briefing on active projects, recent decisions, and current
concerns. This provides conversation continuity across sessions.
## LLM Wiki — When to Consult It
**Before creating API endpoints, Docker configs, CI pipelines, or making
architectural decisions**, check the wiki at `~/projects/wiki/` for
established patterns and decisions.
The wiki captures the **why** behind patterns — not just what to do, but
the reasoning, constraints, alternatives rejected, and environment-
specific differences. It compounds over time as projects discover new
knowledge.
**When to read from the wiki** (query mode):
- Creating any operational endpoint (/health, /version, /status)
- Setting up secrets management in a new service
- Writing Dockerfiles or docker-compose configurations
- Configuring CI/CD pipelines
- Adding database users or migrations
- Making architectural decisions that should be consistent across projects
**When to write back to the wiki** (ingest mode):
- When you discover something new that should apply across projects
- When a project reveals an exception or edge case to an existing pattern
- When a decision is made that future projects should follow
- When the human explicitly says "add this to the wiki"
Human-initiated wiki writes go directly to the live wiki with
`origin: manual`. Script-initiated writes go through `staging/` first.
See the wiki's own `CLAUDE.md` for the full ingest protocol.
## LLM Wiki — How to Search It
Use the `qmd` CLI for fast, structured search. DO NOT read `index.md`
for large queries — it's only for full-catalog browsing. DO NOT grep the
wiki manually when `qmd` is available.
The wiki has **three qmd collections**. Pick the right one for the
question:
### Default collection: `wiki` (live content)
For "what's our current pattern for X?" type questions. This is the
default — no `-c` flag needed.
```bash
# Keyword search (fast, BM25)
qmd search "health endpoint version" --json -n 5
# Semantic search (finds conceptually related pages)
qmd vsearch "how should API endpoints be structured" --json -n 5
# Best quality — hybrid BM25 + vector + LLM re-ranking
qmd query "health endpoint" --json -n 5
# Then read the matched page
cat ~/projects/wiki/patterns/health-endpoints.md
```
### Archive collection: `wiki-archive` (stale / superseded)
For "what was our OLD pattern before we changed it?" questions. This is
excluded from default searches; query explicitly with `-c wiki-archive`.
```bash
# "Did we used to use Alpine? Why did we stop?"
qmd search "alpine" -c wiki-archive --json -n 5
# Semantic search across archive
qmd vsearch "container base image considerations" -c wiki-archive --json -n 5
```
When you cite content from an archived page, tell the user it's
archived and may be outdated.
### Conversations collection: `wiki-conversations` (mined session transcripts)
For "when did we discuss this, and what did we decide?" questions. This
is the mined history of your actual Claude Code sessions — decisions,
debugging breakthroughs, design discussions. Excluded from default
searches because transcripts would flood results.
```bash
# "When did we decide to use staging?"
qmd search "staging review workflow" -c wiki-conversations --json -n 5
# "What debugging did we do around Docker networking?"
qmd vsearch "docker network conflicts" -c wiki-conversations --json -n 5
```
Useful for:
- Tracing the reasoning behind a decision back to the session where it
was made
- Finding a solution to a problem you remember solving but didn't write
up
- Context-gathering when returning to a project after time away
### Searching across all collections
Rarely needed, but for "find everything on this topic across time":
```bash
qmd search "topic" -c wiki -c wiki-archive -c wiki-conversations --json -n 10
```
## LLM Wiki — Rules When Citing
1. **Always use `--json`** for structured qmd output. Never try to parse
prose.
2. **Flag `confidence: low` pages** to the user when citing. The content
may be aging out.
3. **Flag `status: pending` pages** (in `staging/`) as unverified when
citing: "Note: this is from a pending wiki page that has not been
human-reviewed yet."
4. **Flag archived pages** as "archived and may be outdated" when citing.
5. **Use `index.md` for browsing only**, not for targeted lookups. `qmd`
is faster and more accurate.
6. **Prefer semantic search for conceptual queries**, keyword search for
specific names/terms.
## LLM Wiki — Quick Reference
- `~/projects/wiki/CLAUDE.md` — Full wiki schema and operations (read this when working IN the wiki)
- `~/projects/wiki/index.md` — Content catalog (browse the full wiki)
- `~/projects/wiki/patterns/` — How things should be built
- `~/projects/wiki/decisions/` — Why we chose this approach
- `~/projects/wiki/environments/` — Where environments differ
- `~/projects/wiki/concepts/` — Foundational ideas
- `~/projects/wiki/raw/` — Immutable source material (never modify)
- `~/projects/wiki/staging/` — Pending automated content (flag when citing)
- `~/projects/wiki/archive/` — Stale content (flag when citing)
- `~/projects/wiki/conversations/` — Session history (search via `-c wiki-conversations`)
---
**End of additions for `~/.claude/CLAUDE.md`.**
See also the wiki's own `CLAUDE.md` at the wiki root — that file tells
the agent how to *maintain* the wiki when working inside it. This file
(the global one) tells the agent how to *consult* the wiki from anywhere
else.