# LLM Wiki — Schema This is a persistent, compounding knowledge base maintained by LLM agents. It captures the **why** behind patterns, decisions, and implementations — not just the what. Copy this file to the root of your wiki directory (i.e. `~/projects/wiki/CLAUDE.md`) and edit for your own conventions. > This is an example `CLAUDE.md` for the wiki root. The agent reads this > at the start of every session when working inside the wiki. It's the > "constitution" that tells the agent how to maintain the knowledge base. ## How This Wiki Works **You are the maintainer.** When working in this wiki directory, you read raw sources, compile knowledge into wiki pages, maintain cross-references, and keep everything consistent. **You are a consumer.** When working in any other project directory, you read wiki pages to inform your work — applying established patterns, respecting decisions, and understanding context. ## Directory Structure ``` wiki/ ├── CLAUDE.md ← You are here (schema) ├── index.md ← Content catalog — read this FIRST on any query ├── log.md ← Chronological record of all operations │ ├── patterns/ ← LIVE: HOW things should be built (with WHY) ├── decisions/ ← LIVE: WHY we chose this approach (with alternatives rejected) ├── environments/ ← LIVE: WHERE implementations differ ├── concepts/ ← LIVE: WHAT the foundational ideas are │ ├── raw/ ← Immutable source material (NEVER modify) │ └── harvested/ ← URL harvester output │ ├── staging/ ← PENDING automated content awaiting human review │ ├── index.md │ └── / │ ├── archive/ ← STALE / superseded (excluded from default search) │ ├── index.md │ └── / │ ├── conversations/ ← Mined Claude Code session transcripts │ ├── index.md │ └── / ← per-project or per-person (MemPalace "wing") │ ├── context/ ← Auto-updated AI session briefing │ ├── wake-up.md ← Loaded at the start of every session │ └── active-concerns.md │ ├── reports/ ← Hygiene operation logs └── scripts/ ← The automation pipeline ``` **Core rule — automated vs manual content**: | Origin | Destination | Status | |--------|-------------|--------| | Script-generated (harvester, hygiene, URL compile) | `staging/` | `pending` | | Human-initiated ("add this to the wiki" in a Claude session) | Live wiki (`patterns/`, etc.) | `verified` | | Human-reviewed from staging | Live wiki (promoted) | `verified` | Managed via `scripts/wiki-staging.py --list / --promote / --reject / --review`. ## Page Conventions ### Frontmatter (required on all wiki pages) ```yaml --- title: Page Title type: pattern | decision | environment | concept confidence: high | medium | low origin: manual | automated # How the page entered the wiki sources: [list of raw/ files this was compiled from] related: [list of other wiki pages this connects to] last_compiled: YYYY-MM-DD # Date this page was last (re)compiled from sources last_verified: YYYY-MM-DD # Date the content was last confirmed accurate --- ``` **`origin` values**: - `manual` — Created by a human in a Claude session. Goes directly to the live wiki, no staging. - `automated` — Created by a script (harvester, hygiene, etc.). Must pass through `staging/` for human review before promotion. **Confidence decay**: Pages with no refresh signal for 6 months decay `high → medium`; 9 months → `low`; 12 months → `stale` (auto-archived). `last_verified` drives decay, not `last_compiled`. See `scripts/wiki-hygiene.py` and `archive/index.md`. ### Staging Frontmatter (pages in `staging//`) Automated-origin pages get additional staging metadata that is **stripped on promotion**: ```yaml --- title: ... type: ... origin: automated status: pending # Awaiting review staged_date: YYYY-MM-DD # When the automated script staged this staged_by: wiki-harvest # Which script staged it (wiki-harvest, wiki-hygiene, ...) target_path: patterns/foo.md # Where it should land on promotion modifies: patterns/bar.md # Only present when this is an update to an existing live page compilation_notes: "..." # AI's explanation of what it did and why harvest_source: https://... # Only present for URL-harvested content sources: [...] related: [...] last_verified: YYYY-MM-DD --- ``` ### Pattern Pages (`patterns/`) Structure: 1. **What** — One-paragraph description of the pattern 2. **Why** — The reasoning, constraints, and goals that led to this pattern 3. **Canonical Example** — A concrete implementation (link to raw/ source or inline) 4. **Structure** — The specification: fields, endpoints, formats, conventions 5. **When to Deviate** — Known exceptions or conditions where the pattern doesn't apply 6. **History** — Key changes and the decisions that drove them ### Decision Pages (`decisions/`) Structure: 1. **Decision** — One sentence: what we decided 2. **Context** — What problem or constraint prompted this 3. **Options Considered** — What alternatives existed (with pros/cons) 4. **Rationale** — Why this option won 5. **Consequences** — What this decision enables and constrains 6. **Status** — Active | Superseded by [link] | Under Review ### Environment Pages (`environments/`) Structure: 1. **Overview** — What this environment is (platform, CI, infra) 2. **Key Differences** — Table comparing environments for this domain 3. **Implementation Details** — Environment-specific configs, credentials, deploy method 4. **Gotchas** — Things that have bitten us ### Concept Pages (`concepts/`) Structure: 1. **Definition** — What this concept means in our context 2. **Why It Matters** — How this concept shapes our decisions 3. **Related Patterns** — Links to patterns that implement this concept 4. **Related Decisions** — Links to decisions driven by this concept ## Operations ### Ingest (adding new knowledge) When a new raw source is added or you learn something new: 1. Read the source material thoroughly 2. Identify which existing wiki pages need updating 3. Identify if new pages are needed 4. Update/create pages following the conventions above 5. Update cross-references (`related:` frontmatter) on all affected pages 6. Update `index.md` with any new pages 7. Set `last_verified:` to today's date on every page you create or update 8. Set `origin: manual` on any page you create when a human directed you to 9. Append to `log.md`: `## [YYYY-MM-DD] ingest | Source Description` **Where to write**: - **Human-initiated** ("add this to the wiki", "create a pattern for X") — write directly to the live directory (`patterns/`, `decisions/`, etc.) with `origin: manual`. The human's instruction IS the approval. - **Script-initiated** (harvest, auto-compile, hygiene auto-fix) — write to `staging//` with `origin: automated`, `status: pending`, plus `staged_date`, `staged_by`, `target_path`, and `compilation_notes`. For updates to existing live pages, also set `modifies: `. ### Query (answering questions from other projects) When working in another project and consulting the wiki: 1. Use `qmd` to search first (see Search Strategy below). Read `index.md` only when browsing the full catalog. 2. Read the specific pattern/decision/concept pages 3. Apply the knowledge, respecting environment differences 4. If a page's `confidence` is `low`, flag that to the user — the content may be aging out 5. If a page has `status: pending` (it's in `staging/`), flag that to the user: "Note: this is from a pending wiki page in staging, not yet verified." Use the content but make the uncertainty visible. 6. If you find yourself consulting a page under `archive/`, mention it's archived and may be outdated 7. If your work reveals new knowledge, **file it back** — update the wiki (and bump `last_verified`) ### Search Strategy — which qmd collection to use The wiki has three qmd collections. Pick the right one for the question: | Question type | Collection | Command | |---|---|---| | "What's our current pattern for X?" | `wiki` (default) | `qmd search "X" --json -n 5` | | "What's the rationale behind decision Y?" | `wiki` (default) | `qmd vsearch "why did we choose Y" --json -n 5` | | "What was our OLD approach before we changed it?" | `wiki-archive` | `qmd search "X" -c wiki-archive --json -n 5` | | "When did we discuss this, and what did we decide?" | `wiki-conversations` | `qmd search "X" -c wiki-conversations --json -n 5` | | "Find everything across time" | all three | `qmd search "X" -c wiki -c wiki-archive -c wiki-conversations --json -n 10` | **Rules of thumb**: - Use `qmd search` for keyword matches (BM25, fast) - Use `qmd vsearch` for conceptual / semantically-similar queries (vector) - Use `qmd query` for the best quality — hybrid BM25 + vector + LLM re-ranking - Always use `--json` for structured output - Read individual matched pages with `cat` or your file tool after finding them ### Mine (conversation extraction and summarization) Four-phase pipeline that extracts sessions into searchable conversation pages: 1. **Extract** (`extract-sessions.py`) — Parse session files into markdown transcripts 2. **Summarize** (`summarize-conversations.py --claude`) — Classify + summarize via `claude -p` with haiku/sonnet routing 3. **Index** (`update-conversation-index.py --reindex`) — Regenerate conversation index + `context/wake-up.md` 4. **Harvest** (`wiki-harvest.py`) — Scan summarized conversations for external reference URLs and compile them into wiki pages Full pipeline via `mine-conversations.sh`. Extraction is incremental (tracks byte offsets). Summarization is incremental (tracks message count). ### Maintain (wiki health automation) `scripts/wiki-maintain.sh` chains harvest + hygiene + qmd reindex: ```bash bash scripts/wiki-maintain.sh # Harvest + quick hygiene + reindex bash scripts/wiki-maintain.sh --full # Harvest + full hygiene (LLM) + reindex bash scripts/wiki-maintain.sh --harvest-only # Harvest only bash scripts/wiki-maintain.sh --hygiene-only # Hygiene only bash scripts/wiki-maintain.sh --dry-run # Show what would run ``` ### Lint (periodic health check) Automated via `scripts/wiki-hygiene.py`. Two tiers: **Quick mode** (no LLM, run daily — `python3 scripts/wiki-hygiene.py`): - Backfill missing `last_verified` - Refresh `last_verified` from conversation `related:` references - Auto-restore archived pages that are referenced again - Repair frontmatter (missing required fields, invalid values) - Confidence decay per 6/9/12-month thresholds - Archive stale and superseded pages - Orphan pages (auto-linked into `index.md`) - Broken cross-references (fuzzy-match fix via `difflib`, or restore from archive) - Main index drift (auto add missing entries, remove stale ones) - Empty stubs (report-only) - State file drift (report-only) - Staging/archive index resync **Full mode** (LLM, run weekly — `python3 scripts/wiki-hygiene.py --full`): - Everything in quick mode, plus: - Missing cross-references between related pages (haiku) - Duplicate coverage — weaker page auto-merged into stronger (sonnet) - Contradictions between pages (sonnet, report-only) - Technology lifecycle — flag pages referencing versions older than what's in recent conversations **Reports** (written to `reports/`): - `hygiene-YYYY-MM-DD-fixed.md` — what was auto-fixed - `hygiene-YYYY-MM-DD-needs-review.md` — what needs human judgment ## Cross-Reference Conventions - Link between wiki pages using relative markdown links: `[Pattern Name](../patterns/file.md)` - Link to raw sources: `[Source](../raw/path/to/file.md)` - In frontmatter `related:` use the relative filename: `patterns/secrets-at-startup.md` ## Naming Conventions - Filenames: `kebab-case.md` - Patterns: named by what they standardize (e.g., `health-endpoints.md`, `secrets-at-startup.md`) - Decisions: named by what was decided (e.g., `no-alpine.md`, `dhi-base-images.md`) - Environments: named by domain (e.g., `docker-registries.md`, `ci-cd-platforms.md`) - Concepts: named by the concept (e.g., `two-user-database-model.md`, `build-once-deploy-many.md`) ## Customization Notes Things you should change for your own wiki: 1. **Directory structure** — the four live dirs (`patterns/`, `decisions/`, `concepts/`, `environments/`) reflect engineering use cases. Pick categories that match how you think — research wikis might use `findings/`, `hypotheses/`, `methods/`, `literature/` instead. Update `LIVE_CONTENT_DIRS` in `scripts/wiki_lib.py` to match. 2. **Page page-type sections** — the "Structure" blocks under each page type are for my use. Define your own conventions. 3. **`status` field** — if you want to track Superseded/Active/Under Review explicitly, this is a natural add. The hygiene script already checks for `status: Superseded by ...` and archives those automatically. 4. **Environment Detection** — if you don't have multiple environments, remove the section. If you do, update it for your own environments (work/home, dev/prod, mac/linux, etc.). 5. **Cross-reference path format** — I use `patterns/foo.md` in the `related:` field. Obsidian users might prefer `[[foo]]` wikilink format. The hygiene script handles standard markdown links; adapt as needed.