# LLM Wiki — Schema

This is a persistent, compounding knowledge base maintained by LLM agents.
It captures the **why** behind patterns, decisions, and implementations —
not just the what. Copy this file to the root of your wiki directory
(i.e. `~/projects/wiki/CLAUDE.md`) and edit for your own conventions.

> This is an example `CLAUDE.md` for the wiki root. The agent reads this
> at the start of every session when working inside the wiki. It's the
> "constitution" that tells the agent how to maintain the knowledge base.

## How This Wiki Works

**You are the maintainer.** When working in this wiki directory, you read
raw sources, compile knowledge into wiki pages, maintain cross-references,
and keep everything consistent.

**You are a consumer.** When working in any other project directory, you
read wiki pages to inform your work — applying established patterns,
respecting decisions, and understanding context.

## Directory Structure

```
wiki/
├── CLAUDE.md              ← You are here (schema)
├── index.md               ← Content catalog — read this FIRST on any query
├── log.md                 ← Chronological record of all operations
│
├── patterns/              ← LIVE: HOW things should be built (with WHY)
├── decisions/             ← LIVE: WHY we chose this approach (with alternatives rejected)
├── environments/          ← LIVE: WHERE implementations differ
├── concepts/              ← LIVE: WHAT the foundational ideas are
│
├── raw/                   ← Immutable source material (NEVER modify)
│   └── harvested/         ← URL harvester output
│
├── staging/               ← PENDING automated content awaiting human review
│   ├── index.md
│   └── <type>/
│
├── archive/               ← STALE / superseded (excluded from default search)
│   ├── index.md
│   └── <type>/
│
├── conversations/         ← Mined Claude Code session transcripts
│   ├── index.md
│   └── <wing>/            ← per-project or per-person (MemPalace "wing")
│
├── context/               ← Auto-updated AI session briefing
│   ├── wake-up.md         ← Loaded at the start of every session
│   └── active-concerns.md
│
├── reports/               ← Hygiene operation logs
└── scripts/               ← The automation pipeline
```

**Core rule — automated vs manual content**:

| Origin | Destination | Status |
|--------|-------------|--------|
| Script-generated (harvester, hygiene, URL compile) | `staging/` | `pending` |
| Human-initiated ("add this to the wiki" in a Claude session) | Live wiki (`patterns/`, etc.) | `verified` |
| Human-reviewed from staging | Live wiki (promoted) | `verified` |

Managed via `scripts/wiki-staging.py --list / --promote / --reject / --review`.

## Page Conventions

### Frontmatter (required on all wiki pages)

```yaml
---
title: Page Title
type: pattern | decision | environment | concept
confidence: high | medium | low
origin: manual | automated    # How the page entered the wiki
sources: [list of raw/ files this was compiled from]
related: [list of other wiki pages this connects to]
last_compiled: YYYY-MM-DD     # Date this page was last (re)compiled from sources
last_verified: YYYY-MM-DD     # Date the content was last confirmed accurate
---
```

**`origin` values**:
- `manual` — Created by a human in a Claude session. Goes directly to the live wiki, no staging.
- `automated` — Created by a script (harvester, hygiene, etc.). Must pass through `staging/` for human review before promotion.

**Confidence decay**: Pages with no refresh signal for 6 months decay `high → medium`; 9 months → `low`; 12 months → `stale` (auto-archived). `last_verified` drives decay, not `last_compiled`. See `scripts/wiki-hygiene.py` and `archive/index.md`.

### Staging Frontmatter (pages in `staging/<type>/`)

Automated-origin pages get additional staging metadata that is **stripped on promotion**:

```yaml
---
title: ...
type: ...
origin: automated
status: pending              # Awaiting review
staged_date: YYYY-MM-DD      # When the automated script staged this
staged_by: wiki-harvest      # Which script staged it (wiki-harvest, wiki-hygiene, ...)
target_path: patterns/foo.md # Where it should land on promotion
modifies: patterns/bar.md    # Only present when this is an update to an existing live page
compilation_notes: "..."     # AI's explanation of what it did and why
harvest_source: https://...  # Only present for URL-harvested content
sources: [...]
related: [...]
last_verified: YYYY-MM-DD
---
```

### Pattern Pages (`patterns/`)

Structure:
1. **What** — One-paragraph description of the pattern
2. **Why** — The reasoning, constraints, and goals that led to this pattern
3. **Canonical Example** — A concrete implementation (link to raw/ source or inline)
4. **Structure** — The specification: fields, endpoints, formats, conventions
5. **When to Deviate** — Known exceptions or conditions where the pattern doesn't apply
6. **History** — Key changes and the decisions that drove them

### Decision Pages (`decisions/`)

Structure:
1. **Decision** — One sentence: what we decided
2. **Context** — What problem or constraint prompted this
3. **Options Considered** — What alternatives existed (with pros/cons)
4. **Rationale** — Why this option won
5. **Consequences** — What this decision enables and constrains
6. **Status** — Active | Superseded by [link] | Under Review

### Environment Pages (`environments/`)

Structure:
1. **Overview** — What this environment is (platform, CI, infra)
2. **Key Differences** — Table comparing environments for this domain
3. **Implementation Details** — Environment-specific configs, credentials, deploy method
4. **Gotchas** — Things that have bitten us

### Concept Pages (`concepts/`)

Structure:
1. **Definition** — What this concept means in our context
2. **Why It Matters** — How this concept shapes our decisions
3. **Related Patterns** — Links to patterns that implement this concept
4. **Related Decisions** — Links to decisions driven by this concept

## Operations

### Ingest (adding new knowledge)

When a new raw source is added or you learn something new:

1. Read the source material thoroughly
2. Identify which existing wiki pages need updating
3. Identify if new pages are needed
4. Update/create pages following the conventions above
5. Update cross-references (`related:` frontmatter) on all affected pages
6. Update `index.md` with any new pages
7. Set `last_verified:` to today's date on every page you create or update
8. Set `origin: manual` on any page you create when a human directed you to
9. Append to `log.md`: `## [YYYY-MM-DD] ingest | Source Description`

**Where to write**:
- **Human-initiated** ("add this to the wiki", "create a pattern for X") — write directly to the live directory (`patterns/`, `decisions/`, etc.) with `origin: manual`. The human's instruction IS the approval.
- **Script-initiated** (harvest, auto-compile, hygiene auto-fix) — write to `staging/<type>/` with `origin: automated`, `status: pending`, plus `staged_date`, `staged_by`, `target_path`, and `compilation_notes`. For updates to existing live pages, also set `modifies: <live-page-path>`.

### Query (answering questions from other projects)

When working in another project and consulting the wiki:

1. Use `qmd` to search first (see Search Strategy below). Read `index.md` only when browsing the full catalog.
2. Read the specific pattern/decision/concept pages
3. Apply the knowledge, respecting environment differences
4. If a page's `confidence` is `low`, flag that to the user — the content may be aging out
5. If a page has `status: pending` (it's in `staging/`), flag that to the user: "Note: this is from a pending wiki page in staging, not yet verified." Use the content but make the uncertainty visible.
6. If you find yourself consulting a page under `archive/`, mention it's archived and may be outdated
7. If your work reveals new knowledge, **file it back** — update the wiki (and bump `last_verified`)

### Search Strategy — which qmd collection to use

The wiki has three qmd collections. Pick the right one for the question:

| Question type | Collection | Command |
|---|---|---|
| "What's our current pattern for X?" | `wiki` (default) | `qmd search "X" --json -n 5` |
| "What's the rationale behind decision Y?" | `wiki` (default) | `qmd vsearch "why did we choose Y" --json -n 5` |
| "What was our OLD approach before we changed it?" | `wiki-archive` | `qmd search "X" -c wiki-archive --json -n 5` |
| "When did we discuss this, and what did we decide?" | `wiki-conversations` | `qmd search "X" -c wiki-conversations --json -n 5` |
| "Find everything across time" | all three | `qmd search "X" -c wiki -c wiki-archive -c wiki-conversations --json -n 10` |

**Rules of thumb**:
- Use `qmd search` for keyword matches (BM25, fast)
- Use `qmd vsearch` for conceptual / semantically-similar queries (vector)
- Use `qmd query` for the best quality — hybrid BM25 + vector + LLM re-ranking
- Always use `--json` for structured output
- Read individual matched pages with `cat` or your file tool after finding them

### Mine (conversation extraction and summarization)

Four-phase pipeline that extracts sessions into searchable conversation pages:

1. **Extract** (`extract-sessions.py`) — Parse session files into markdown transcripts
2. **Summarize** (`summarize-conversations.py --claude`) — Classify + summarize via `claude -p` with haiku/sonnet routing
3. **Index** (`update-conversation-index.py --reindex`) — Regenerate conversation index + `context/wake-up.md`
4. **Harvest** (`wiki-harvest.py`) — Scan summarized conversations for external reference URLs and compile them into wiki pages

Full pipeline via `mine-conversations.sh`. Extraction is incremental (tracks byte offsets). Summarization is incremental (tracks message count).

### Maintain (wiki health automation)

`scripts/wiki-maintain.sh` chains harvest + hygiene + qmd reindex:

```bash
bash scripts/wiki-maintain.sh                 # Harvest + quick hygiene + reindex
bash scripts/wiki-maintain.sh --full          # Harvest + full hygiene (LLM) + reindex
bash scripts/wiki-maintain.sh --harvest-only  # Harvest only
bash scripts/wiki-maintain.sh --hygiene-only  # Hygiene only
bash scripts/wiki-maintain.sh --dry-run       # Show what would run
```

### Lint (periodic health check)

Automated via `scripts/wiki-hygiene.py`. Two tiers:

**Quick mode** (no LLM, run daily — `python3 scripts/wiki-hygiene.py`):
- Backfill missing `last_verified`
- Refresh `last_verified` from conversation `related:` references
- Auto-restore archived pages that are referenced again
- Repair frontmatter (missing required fields, invalid values)
- Confidence decay per 6/9/12-month thresholds
- Archive stale and superseded pages
- Orphan pages (auto-linked into `index.md`)
- Broken cross-references (fuzzy-match fix via `difflib`, or restore from archive)
- Main index drift (auto add missing entries, remove stale ones)
- Empty stubs (report-only)
- State file drift (report-only)
- Staging/archive index resync

**Full mode** (LLM, run weekly — `python3 scripts/wiki-hygiene.py --full`):
- Everything in quick mode, plus:
- Missing cross-references between related pages (haiku)
- Duplicate coverage — weaker page auto-merged into stronger (sonnet)
- Contradictions between pages (sonnet, report-only)
- Technology lifecycle — flag pages referencing versions older than what's in recent conversations

**Reports** (written to `reports/`):
- `hygiene-YYYY-MM-DD-fixed.md` — what was auto-fixed
- `hygiene-YYYY-MM-DD-needs-review.md` — what needs human judgment

## Cross-Reference Conventions

- Link between wiki pages using relative markdown links: `[Pattern Name](../patterns/file.md)`
- Link to raw sources: `[Source](../raw/path/to/file.md)`
- In frontmatter `related:` use the relative filename: `patterns/secrets-at-startup.md`

## Naming Conventions

- Filenames: `kebab-case.md`
- Patterns: named by what they standardize (e.g., `health-endpoints.md`, `secrets-at-startup.md`)
- Decisions: named by what was decided (e.g., `no-alpine.md`, `dhi-base-images.md`)
- Environments: named by domain (e.g., `docker-registries.md`, `ci-cd-platforms.md`)
- Concepts: named by the concept (e.g., `two-user-database-model.md`, `build-once-deploy-many.md`)

## Customization Notes

Things you should change for your own wiki:

1. **Directory structure** — the four live dirs (`patterns/`, `decisions/`, `concepts/`, `environments/`) reflect engineering use cases. Pick categories that match how you think — research wikis might use `findings/`, `hypotheses/`, `methods/`, `literature/` instead. Update `LIVE_CONTENT_DIRS` in `scripts/wiki_lib.py` to match.

2. **Page page-type sections** — the "Structure" blocks under each page type are for my use. Define your own conventions.

3. **`status` field** — if you want to track Superseded/Active/Under Review explicitly, this is a natural add. The hygiene script already checks for `status: Superseded by ...` and archives those automatically.

4. **Environment Detection** — if you don't have multiple environments, remove the section. If you do, update it for your own environments (work/home, dev/prod, mac/linux, etc.).

5. **Cross-reference path format** — I use `patterns/foo.md` in the `related:` field. Obsidian users might prefer `[[foo]]` wikilink format. The hygiene script handles standard markdown links; adapt as needed.