Initial commit — memex

A compounding LLM-maintained knowledge wiki. Synthesis of Andrej Karpathy's persistent-wiki gist and milla-jovovich's mempalace, with an automation layer on top for conversation mining, URL harvesting, human-in-the-loop staging, staleness decay, and hygiene. Includes: - 11 pipeline scripts (extract, summarize, index, harvest, stage, hygiene, maintain, sync, + shared library) - Full docs: README, SETUP, ARCHITECTURE, DESIGN-RATIONALE, CUSTOMIZE - Example CLAUDE.md files (wiki schema + global instructions) tuned for the three-collection qmd setup - 171-test pytest suite (cross-platform, runs in ~1.3s) - MIT licensed
2026-04-12 21:16:02 -06:00
commit ee54a2f5d4
31 changed files with 10792 additions and 0 deletions
--- a/docs/examples/global-CLAUDE.md
+++ b/docs/examples/global-CLAUDE.md
@@ -0,0 +1,161 @@
+# Global Claude Code Instructions — Wiki Section
+
+**What this is**: Content to add to your global `~/.claude/CLAUDE.md`
+(the user-level instructions Claude Code reads at the start of every
+session, regardless of which project you're in). These instructions tell
+Claude how to consult the wiki from outside the wiki directory.
+
+**Where to paste it**: Append these sections to `~/.claude/CLAUDE.md`.
+Don't overwrite the whole file — this is additive.
+
+---
+
+Copy everything below this line into your global `~/.claude/CLAUDE.md`:
+
+---
+
+## Wake-Up Context
+
+At the start of each session, read `~/projects/wiki/context/wake-up.md`
+for a briefing on active projects, recent decisions, and current
+concerns. This provides conversation continuity across sessions.
+
+## LLM Wiki — When to Consult It
+
+**Before creating API endpoints, Docker configs, CI pipelines, or making
+architectural decisions**, check the wiki at `~/projects/wiki/` for
+established patterns and decisions.
+
+The wiki captures the **why** behind patterns — not just what to do, but
+the reasoning, constraints, alternatives rejected, and environment-
+specific differences. It compounds over time as projects discover new
+knowledge.
+
+**When to read from the wiki** (query mode):
+- Creating any operational endpoint (/health, /version, /status)
+- Setting up secrets management in a new service
+- Writing Dockerfiles or docker-compose configurations
+- Configuring CI/CD pipelines
+- Adding database users or migrations
+- Making architectural decisions that should be consistent across projects
+
+**When to write back to the wiki** (ingest mode):
+- When you discover something new that should apply across projects
+- When a project reveals an exception or edge case to an existing pattern
+- When a decision is made that future projects should follow
+- When the human explicitly says "add this to the wiki"
+
+Human-initiated wiki writes go directly to the live wiki with
+`origin: manual`. Script-initiated writes go through `staging/` first.
+See the wiki's own `CLAUDE.md` for the full ingest protocol.
+
+## LLM Wiki — How to Search It
+
+Use the `qmd` CLI for fast, structured search. DO NOT read `index.md`
+for large queries — it's only for full-catalog browsing. DO NOT grep the
+wiki manually when `qmd` is available.
+
+The wiki has **three qmd collections**. Pick the right one for the
+question:
+
+### Default collection: `wiki` (live content)
+
+For "what's our current pattern for X?" type questions. This is the
+default — no `-c` flag needed.
+
+```bash
+# Keyword search (fast, BM25)
+qmd search "health endpoint version" --json -n 5
+
+# Semantic search (finds conceptually related pages)
+qmd vsearch "how should API endpoints be structured" --json -n 5
+
+# Best quality — hybrid BM25 + vector + LLM re-ranking
+qmd query "health endpoint" --json -n 5
+
+# Then read the matched page
+cat ~/projects/wiki/patterns/health-endpoints.md
+```
+
+### Archive collection: `wiki-archive` (stale / superseded)
+
+For "what was our OLD pattern before we changed it?" questions. This is
+excluded from default searches; query explicitly with `-c wiki-archive`.
+
+```bash
+# "Did we used to use Alpine? Why did we stop?"
+qmd search "alpine" -c wiki-archive --json -n 5
+
+# Semantic search across archive
+qmd vsearch "container base image considerations" -c wiki-archive --json -n 5
+```
+
+When you cite content from an archived page, tell the user it's
+archived and may be outdated.
+
+### Conversations collection: `wiki-conversations` (mined session transcripts)
+
+For "when did we discuss this, and what did we decide?" questions. This
+is the mined history of your actual Claude Code sessions — decisions,
+debugging breakthroughs, design discussions. Excluded from default
+searches because transcripts would flood results.
+
+```bash
+# "When did we decide to use staging?"
+qmd search "staging review workflow" -c wiki-conversations --json -n 5
+
+# "What debugging did we do around Docker networking?"
+qmd vsearch "docker network conflicts" -c wiki-conversations --json -n 5
+```
+
+Useful for:
+- Tracing the reasoning behind a decision back to the session where it
+  was made
+- Finding a solution to a problem you remember solving but didn't write
+  up
+- Context-gathering when returning to a project after time away
+
+### Searching across all collections
+
+Rarely needed, but for "find everything on this topic across time":
+
+```bash
+qmd search "topic" -c wiki -c wiki-archive -c wiki-conversations --json -n 10
+```
+
+## LLM Wiki — Rules When Citing
+
+1. **Always use `--json`** for structured qmd output. Never try to parse
+   prose.
+2. **Flag `confidence: low` pages** to the user when citing. The content
+   may be aging out.
+3. **Flag `status: pending` pages** (in `staging/`) as unverified when
+   citing: "Note: this is from a pending wiki page that has not been
+   human-reviewed yet."
+4. **Flag archived pages** as "archived and may be outdated" when citing.
+5. **Use `index.md` for browsing only**, not for targeted lookups. `qmd`
+   is faster and more accurate.
+6. **Prefer semantic search for conceptual queries**, keyword search for
+   specific names/terms.
+
+## LLM Wiki — Quick Reference
+
+- `~/projects/wiki/CLAUDE.md` — Full wiki schema and operations (read this when working IN the wiki)
+- `~/projects/wiki/index.md` — Content catalog (browse the full wiki)
+- `~/projects/wiki/patterns/` — How things should be built
+- `~/projects/wiki/decisions/` — Why we chose this approach
+- `~/projects/wiki/environments/` — Where environments differ
+- `~/projects/wiki/concepts/` — Foundational ideas
+- `~/projects/wiki/raw/` — Immutable source material (never modify)
+- `~/projects/wiki/staging/` — Pending automated content (flag when citing)
+- `~/projects/wiki/archive/` — Stale content (flag when citing)
+- `~/projects/wiki/conversations/` — Session history (search via `-c wiki-conversations`)
+
+---
+
+**End of additions for `~/.claude/CLAUDE.md`.**
+
+See also the wiki's own `CLAUDE.md` at the wiki root — that file tells
+the agent how to *maintain* the wiki when working inside it. This file
+(the global one) tells the agent how to *consult* the wiki from anywhere
+else.