A compounding LLM-maintained knowledge wiki. Synthesis of Andrej Karpathy's persistent-wiki gist and milla-jovovich's mempalace, with an automation layer on top for conversation mining, URL harvesting, human-in-the-loop staging, staleness decay, and hygiene. Includes: - 11 pipeline scripts (extract, summarize, index, harvest, stage, hygiene, maintain, sync, + shared library) - Full docs: README, SETUP, ARCHITECTURE, DESIGN-RATIONALE, CUSTOMIZE - Example CLAUDE.md files (wiki schema + global instructions) tuned for the three-collection qmd setup - 171-test pytest suite (cross-platform, runs in ~1.3s) - MIT licensed
13 KiB
LLM Wiki — Schema
This is a persistent, compounding knowledge base maintained by LLM agents.
It captures the why behind patterns, decisions, and implementations —
not just the what. Copy this file to the root of your wiki directory
(i.e. ~/projects/wiki/CLAUDE.md) and edit for your own conventions.
This is an example
CLAUDE.mdfor the wiki root. The agent reads this at the start of every session when working inside the wiki. It's the "constitution" that tells the agent how to maintain the knowledge base.
How This Wiki Works
You are the maintainer. When working in this wiki directory, you read raw sources, compile knowledge into wiki pages, maintain cross-references, and keep everything consistent.
You are a consumer. When working in any other project directory, you read wiki pages to inform your work — applying established patterns, respecting decisions, and understanding context.
Directory Structure
wiki/
├── CLAUDE.md ← You are here (schema)
├── index.md ← Content catalog — read this FIRST on any query
├── log.md ← Chronological record of all operations
│
├── patterns/ ← LIVE: HOW things should be built (with WHY)
├── decisions/ ← LIVE: WHY we chose this approach (with alternatives rejected)
├── environments/ ← LIVE: WHERE implementations differ
├── concepts/ ← LIVE: WHAT the foundational ideas are
│
├── raw/ ← Immutable source material (NEVER modify)
│ └── harvested/ ← URL harvester output
│
├── staging/ ← PENDING automated content awaiting human review
│ ├── index.md
│ └── <type>/
│
├── archive/ ← STALE / superseded (excluded from default search)
│ ├── index.md
│ └── <type>/
│
├── conversations/ ← Mined Claude Code session transcripts
│ ├── index.md
│ └── <wing>/ ← per-project or per-person (MemPalace "wing")
│
├── context/ ← Auto-updated AI session briefing
│ ├── wake-up.md ← Loaded at the start of every session
│ └── active-concerns.md
│
├── reports/ ← Hygiene operation logs
└── scripts/ ← The automation pipeline
Core rule — automated vs manual content:
| Origin | Destination | Status |
|---|---|---|
| Script-generated (harvester, hygiene, URL compile) | staging/ |
pending |
| Human-initiated ("add this to the wiki" in a Claude session) | Live wiki (patterns/, etc.) |
verified |
| Human-reviewed from staging | Live wiki (promoted) | verified |
Managed via scripts/wiki-staging.py --list / --promote / --reject / --review.
Page Conventions
Frontmatter (required on all wiki pages)
---
title: Page Title
type: pattern | decision | environment | concept
confidence: high | medium | low
origin: manual | automated # How the page entered the wiki
sources: [list of raw/ files this was compiled from]
related: [list of other wiki pages this connects to]
last_compiled: YYYY-MM-DD # Date this page was last (re)compiled from sources
last_verified: YYYY-MM-DD # Date the content was last confirmed accurate
---
origin values:
manual— Created by a human in a Claude session. Goes directly to the live wiki, no staging.automated— Created by a script (harvester, hygiene, etc.). Must pass throughstaging/for human review before promotion.
Confidence decay: Pages with no refresh signal for 6 months decay high → medium; 9 months → low; 12 months → stale (auto-archived). last_verified drives decay, not last_compiled. See scripts/wiki-hygiene.py and archive/index.md.
Staging Frontmatter (pages in staging/<type>/)
Automated-origin pages get additional staging metadata that is stripped on promotion:
---
title: ...
type: ...
origin: automated
status: pending # Awaiting review
staged_date: YYYY-MM-DD # When the automated script staged this
staged_by: wiki-harvest # Which script staged it (wiki-harvest, wiki-hygiene, ...)
target_path: patterns/foo.md # Where it should land on promotion
modifies: patterns/bar.md # Only present when this is an update to an existing live page
compilation_notes: "..." # AI's explanation of what it did and why
harvest_source: https://... # Only present for URL-harvested content
sources: [...]
related: [...]
last_verified: YYYY-MM-DD
---
Pattern Pages (patterns/)
Structure:
- What — One-paragraph description of the pattern
- Why — The reasoning, constraints, and goals that led to this pattern
- Canonical Example — A concrete implementation (link to raw/ source or inline)
- Structure — The specification: fields, endpoints, formats, conventions
- When to Deviate — Known exceptions or conditions where the pattern doesn't apply
- History — Key changes and the decisions that drove them
Decision Pages (decisions/)
Structure:
- Decision — One sentence: what we decided
- Context — What problem or constraint prompted this
- Options Considered — What alternatives existed (with pros/cons)
- Rationale — Why this option won
- Consequences — What this decision enables and constrains
- Status — Active | Superseded by [link] | Under Review
Environment Pages (environments/)
Structure:
- Overview — What this environment is (platform, CI, infra)
- Key Differences — Table comparing environments for this domain
- Implementation Details — Environment-specific configs, credentials, deploy method
- Gotchas — Things that have bitten us
Concept Pages (concepts/)
Structure:
- Definition — What this concept means in our context
- Why It Matters — How this concept shapes our decisions
- Related Patterns — Links to patterns that implement this concept
- Related Decisions — Links to decisions driven by this concept
Operations
Ingest (adding new knowledge)
When a new raw source is added or you learn something new:
- Read the source material thoroughly
- Identify which existing wiki pages need updating
- Identify if new pages are needed
- Update/create pages following the conventions above
- Update cross-references (
related:frontmatter) on all affected pages - Update
index.mdwith any new pages - Set
last_verified:to today's date on every page you create or update - Set
origin: manualon any page you create when a human directed you to - Append to
log.md:## [YYYY-MM-DD] ingest | Source Description
Where to write:
- Human-initiated ("add this to the wiki", "create a pattern for X") — write directly to the live directory (
patterns/,decisions/, etc.) withorigin: manual. The human's instruction IS the approval. - Script-initiated (harvest, auto-compile, hygiene auto-fix) — write to
staging/<type>/withorigin: automated,status: pending, plusstaged_date,staged_by,target_path, andcompilation_notes. For updates to existing live pages, also setmodifies: <live-page-path>.
Query (answering questions from other projects)
When working in another project and consulting the wiki:
- Use
qmdto search first (see Search Strategy below). Readindex.mdonly when browsing the full catalog. - Read the specific pattern/decision/concept pages
- Apply the knowledge, respecting environment differences
- If a page's
confidenceislow, flag that to the user — the content may be aging out - If a page has
status: pending(it's instaging/), flag that to the user: "Note: this is from a pending wiki page in staging, not yet verified." Use the content but make the uncertainty visible. - If you find yourself consulting a page under
archive/, mention it's archived and may be outdated - If your work reveals new knowledge, file it back — update the wiki (and bump
last_verified)
Search Strategy — which qmd collection to use
The wiki has three qmd collections. Pick the right one for the question:
| Question type | Collection | Command |
|---|---|---|
| "What's our current pattern for X?" | wiki (default) |
qmd search "X" --json -n 5 |
| "What's the rationale behind decision Y?" | wiki (default) |
qmd vsearch "why did we choose Y" --json -n 5 |
| "What was our OLD approach before we changed it?" | wiki-archive |
qmd search "X" -c wiki-archive --json -n 5 |
| "When did we discuss this, and what did we decide?" | wiki-conversations |
qmd search "X" -c wiki-conversations --json -n 5 |
| "Find everything across time" | all three | qmd search "X" -c wiki -c wiki-archive -c wiki-conversations --json -n 10 |
Rules of thumb:
- Use
qmd searchfor keyword matches (BM25, fast) - Use
qmd vsearchfor conceptual / semantically-similar queries (vector) - Use
qmd queryfor the best quality — hybrid BM25 + vector + LLM re-ranking - Always use
--jsonfor structured output - Read individual matched pages with
cator your file tool after finding them
Mine (conversation extraction and summarization)
Four-phase pipeline that extracts sessions into searchable conversation pages:
- Extract (
extract-sessions.py) — Parse session files into markdown transcripts - Summarize (
summarize-conversations.py --claude) — Classify + summarize viaclaude -pwith haiku/sonnet routing - Index (
update-conversation-index.py --reindex) — Regenerate conversation index +context/wake-up.md - Harvest (
wiki-harvest.py) — Scan summarized conversations for external reference URLs and compile them into wiki pages
Full pipeline via mine-conversations.sh. Extraction is incremental (tracks byte offsets). Summarization is incremental (tracks message count).
Maintain (wiki health automation)
scripts/wiki-maintain.sh chains harvest + hygiene + qmd reindex:
bash scripts/wiki-maintain.sh # Harvest + quick hygiene + reindex
bash scripts/wiki-maintain.sh --full # Harvest + full hygiene (LLM) + reindex
bash scripts/wiki-maintain.sh --harvest-only # Harvest only
bash scripts/wiki-maintain.sh --hygiene-only # Hygiene only
bash scripts/wiki-maintain.sh --dry-run # Show what would run
Lint (periodic health check)
Automated via scripts/wiki-hygiene.py. Two tiers:
Quick mode (no LLM, run daily — python3 scripts/wiki-hygiene.py):
- Backfill missing
last_verified - Refresh
last_verifiedfrom conversationrelated:references - Auto-restore archived pages that are referenced again
- Repair frontmatter (missing required fields, invalid values)
- Confidence decay per 6/9/12-month thresholds
- Archive stale and superseded pages
- Orphan pages (auto-linked into
index.md) - Broken cross-references (fuzzy-match fix via
difflib, or restore from archive) - Main index drift (auto add missing entries, remove stale ones)
- Empty stubs (report-only)
- State file drift (report-only)
- Staging/archive index resync
Full mode (LLM, run weekly — python3 scripts/wiki-hygiene.py --full):
- Everything in quick mode, plus:
- Missing cross-references between related pages (haiku)
- Duplicate coverage — weaker page auto-merged into stronger (sonnet)
- Contradictions between pages (sonnet, report-only)
- Technology lifecycle — flag pages referencing versions older than what's in recent conversations
Reports (written to reports/):
hygiene-YYYY-MM-DD-fixed.md— what was auto-fixedhygiene-YYYY-MM-DD-needs-review.md— what needs human judgment
Cross-Reference Conventions
- Link between wiki pages using relative markdown links:
[Pattern Name](../patterns/file.md) - Link to raw sources:
[Source](../raw/path/to/file.md) - In frontmatter
related:use the relative filename:patterns/secrets-at-startup.md
Naming Conventions
- Filenames:
kebab-case.md - Patterns: named by what they standardize (e.g.,
health-endpoints.md,secrets-at-startup.md) - Decisions: named by what was decided (e.g.,
no-alpine.md,dhi-base-images.md) - Environments: named by domain (e.g.,
docker-registries.md,ci-cd-platforms.md) - Concepts: named by the concept (e.g.,
two-user-database-model.md,build-once-deploy-many.md)
Customization Notes
Things you should change for your own wiki:
-
Directory structure — the four live dirs (
patterns/,decisions/,concepts/,environments/) reflect engineering use cases. Pick categories that match how you think — research wikis might usefindings/,hypotheses/,methods/,literature/instead. UpdateLIVE_CONTENT_DIRSinscripts/wiki_lib.pyto match. -
Page page-type sections — the "Structure" blocks under each page type are for my use. Define your own conventions.
-
statusfield — if you want to track Superseded/Active/Under Review explicitly, this is a natural add. The hygiene script already checks forstatus: Superseded by ...and archives those automatically. -
Environment Detection — if you don't have multiple environments, remove the section. If you do, update it for your own environments (work/home, dev/prod, mac/linux, etc.).
-
Cross-reference path format — I use
patterns/foo.mdin therelated:field. Obsidian users might prefer[[foo]]wikilink format. The hygiene script handles standard markdown links; adapt as needed.