Files
memex/docs/examples/wiki-CLAUDE.md
Eric Turner d8fabc5a50 docs: rename self-references from "LLM Wiki" to "memex"
Replace project self-references throughout README, SETUP, and the example
CLAUDE.md files. External artifact titles are preserved as-is since they
refer to the actual title of the Claude design artifact.

Also add a "Why 'memex'?" aside to the README that roots the project in
Vannevar Bush's 1945 "As We May Think" essay, where the term originates.
The compounding knowledge wiki is the LLM-era realization of Bush's
memex concept: the "associative trails" he imagined are the related:
frontmatter fields and wikilinks the agent maintains.

Kept lowercase where referring to the generic pattern (e.g. "an LLM wiki
persists its mistakes") since that refers to the class of system, not
this specific project.
2026-04-12 21:32:17 -06:00

280 lines
13 KiB
Markdown

# memex — Schema
This is a persistent, compounding knowledge base maintained by LLM
agents. It captures the **why** behind patterns, decisions, and
implementations — not just the what. Copy this file to the root of your
wiki directory (i.e. `~/projects/wiki/CLAUDE.md`) and edit for your own
conventions.
> This is an example `CLAUDE.md` for the wiki root. The agent reads this
> at the start of every session when working inside the wiki. It's the
> "constitution" that tells the agent how to maintain the knowledge base.
## How This Wiki Works
**You are the maintainer.** When working in this wiki directory, you read
raw sources, compile knowledge into wiki pages, maintain cross-references,
and keep everything consistent.
**You are a consumer.** When working in any other project directory, you
read wiki pages to inform your work — applying established patterns,
respecting decisions, and understanding context.
## Directory Structure
```
wiki/
├── CLAUDE.md ← You are here (schema)
├── index.md ← Content catalog — read this FIRST on any query
├── log.md ← Chronological record of all operations
├── patterns/ ← LIVE: HOW things should be built (with WHY)
├── decisions/ ← LIVE: WHY we chose this approach (with alternatives rejected)
├── environments/ ← LIVE: WHERE implementations differ
├── concepts/ ← LIVE: WHAT the foundational ideas are
├── raw/ ← Immutable source material (NEVER modify)
│ └── harvested/ ← URL harvester output
├── staging/ ← PENDING automated content awaiting human review
│ ├── index.md
│ └── <type>/
├── archive/ ← STALE / superseded (excluded from default search)
│ ├── index.md
│ └── <type>/
├── conversations/ ← Mined Claude Code session transcripts
│ ├── index.md
│ └── <wing>/ ← per-project or per-person (MemPalace "wing")
├── context/ ← Auto-updated AI session briefing
│ ├── wake-up.md ← Loaded at the start of every session
│ └── active-concerns.md
├── reports/ ← Hygiene operation logs
└── scripts/ ← The automation pipeline
```
**Core rule — automated vs manual content**:
| Origin | Destination | Status |
|--------|-------------|--------|
| Script-generated (harvester, hygiene, URL compile) | `staging/` | `pending` |
| Human-initiated ("add this to the wiki" in a Claude session) | Live wiki (`patterns/`, etc.) | `verified` |
| Human-reviewed from staging | Live wiki (promoted) | `verified` |
Managed via `scripts/wiki-staging.py --list / --promote / --reject / --review`.
## Page Conventions
### Frontmatter (required on all wiki pages)
```yaml
---
title: Page Title
type: pattern | decision | environment | concept
confidence: high | medium | low
origin: manual | automated # How the page entered the wiki
sources: [list of raw/ files this was compiled from]
related: [list of other wiki pages this connects to]
last_compiled: YYYY-MM-DD # Date this page was last (re)compiled from sources
last_verified: YYYY-MM-DD # Date the content was last confirmed accurate
---
```
**`origin` values**:
- `manual` — Created by a human in a Claude session. Goes directly to the live wiki, no staging.
- `automated` — Created by a script (harvester, hygiene, etc.). Must pass through `staging/` for human review before promotion.
**Confidence decay**: Pages with no refresh signal for 6 months decay `high → medium`; 9 months → `low`; 12 months → `stale` (auto-archived). `last_verified` drives decay, not `last_compiled`. See `scripts/wiki-hygiene.py` and `archive/index.md`.
### Staging Frontmatter (pages in `staging/<type>/`)
Automated-origin pages get additional staging metadata that is **stripped on promotion**:
```yaml
---
title: ...
type: ...
origin: automated
status: pending # Awaiting review
staged_date: YYYY-MM-DD # When the automated script staged this
staged_by: wiki-harvest # Which script staged it (wiki-harvest, wiki-hygiene, ...)
target_path: patterns/foo.md # Where it should land on promotion
modifies: patterns/bar.md # Only present when this is an update to an existing live page
compilation_notes: "..." # AI's explanation of what it did and why
harvest_source: https://... # Only present for URL-harvested content
sources: [...]
related: [...]
last_verified: YYYY-MM-DD
---
```
### Pattern Pages (`patterns/`)
Structure:
1. **What** — One-paragraph description of the pattern
2. **Why** — The reasoning, constraints, and goals that led to this pattern
3. **Canonical Example** — A concrete implementation (link to raw/ source or inline)
4. **Structure** — The specification: fields, endpoints, formats, conventions
5. **When to Deviate** — Known exceptions or conditions where the pattern doesn't apply
6. **History** — Key changes and the decisions that drove them
### Decision Pages (`decisions/`)
Structure:
1. **Decision** — One sentence: what we decided
2. **Context** — What problem or constraint prompted this
3. **Options Considered** — What alternatives existed (with pros/cons)
4. **Rationale** — Why this option won
5. **Consequences** — What this decision enables and constrains
6. **Status** — Active | Superseded by [link] | Under Review
### Environment Pages (`environments/`)
Structure:
1. **Overview** — What this environment is (platform, CI, infra)
2. **Key Differences** — Table comparing environments for this domain
3. **Implementation Details** — Environment-specific configs, credentials, deploy method
4. **Gotchas** — Things that have bitten us
### Concept Pages (`concepts/`)
Structure:
1. **Definition** — What this concept means in our context
2. **Why It Matters** — How this concept shapes our decisions
3. **Related Patterns** — Links to patterns that implement this concept
4. **Related Decisions** — Links to decisions driven by this concept
## Operations
### Ingest (adding new knowledge)
When a new raw source is added or you learn something new:
1. Read the source material thoroughly
2. Identify which existing wiki pages need updating
3. Identify if new pages are needed
4. Update/create pages following the conventions above
5. Update cross-references (`related:` frontmatter) on all affected pages
6. Update `index.md` with any new pages
7. Set `last_verified:` to today's date on every page you create or update
8. Set `origin: manual` on any page you create when a human directed you to
9. Append to `log.md`: `## [YYYY-MM-DD] ingest | Source Description`
**Where to write**:
- **Human-initiated** ("add this to the wiki", "create a pattern for X") — write directly to the live directory (`patterns/`, `decisions/`, etc.) with `origin: manual`. The human's instruction IS the approval.
- **Script-initiated** (harvest, auto-compile, hygiene auto-fix) — write to `staging/<type>/` with `origin: automated`, `status: pending`, plus `staged_date`, `staged_by`, `target_path`, and `compilation_notes`. For updates to existing live pages, also set `modifies: <live-page-path>`.
### Query (answering questions from other projects)
When working in another project and consulting the wiki:
1. Use `qmd` to search first (see Search Strategy below). Read `index.md` only when browsing the full catalog.
2. Read the specific pattern/decision/concept pages
3. Apply the knowledge, respecting environment differences
4. If a page's `confidence` is `low`, flag that to the user — the content may be aging out
5. If a page has `status: pending` (it's in `staging/`), flag that to the user: "Note: this is from a pending wiki page in staging, not yet verified." Use the content but make the uncertainty visible.
6. If you find yourself consulting a page under `archive/`, mention it's archived and may be outdated
7. If your work reveals new knowledge, **file it back** — update the wiki (and bump `last_verified`)
### Search Strategy — which qmd collection to use
The wiki has three qmd collections. Pick the right one for the question:
| Question type | Collection | Command |
|---|---|---|
| "What's our current pattern for X?" | `wiki` (default) | `qmd search "X" --json -n 5` |
| "What's the rationale behind decision Y?" | `wiki` (default) | `qmd vsearch "why did we choose Y" --json -n 5` |
| "What was our OLD approach before we changed it?" | `wiki-archive` | `qmd search "X" -c wiki-archive --json -n 5` |
| "When did we discuss this, and what did we decide?" | `wiki-conversations` | `qmd search "X" -c wiki-conversations --json -n 5` |
| "Find everything across time" | all three | `qmd search "X" -c wiki -c wiki-archive -c wiki-conversations --json -n 10` |
**Rules of thumb**:
- Use `qmd search` for keyword matches (BM25, fast)
- Use `qmd vsearch` for conceptual / semantically-similar queries (vector)
- Use `qmd query` for the best quality — hybrid BM25 + vector + LLM re-ranking
- Always use `--json` for structured output
- Read individual matched pages with `cat` or your file tool after finding them
### Mine (conversation extraction and summarization)
Four-phase pipeline that extracts sessions into searchable conversation pages:
1. **Extract** (`extract-sessions.py`) — Parse session files into markdown transcripts
2. **Summarize** (`summarize-conversations.py --claude`) — Classify + summarize via `claude -p` with haiku/sonnet routing
3. **Index** (`update-conversation-index.py --reindex`) — Regenerate conversation index + `context/wake-up.md`
4. **Harvest** (`wiki-harvest.py`) — Scan summarized conversations for external reference URLs and compile them into wiki pages
Full pipeline via `mine-conversations.sh`. Extraction is incremental (tracks byte offsets). Summarization is incremental (tracks message count).
### Maintain (wiki health automation)
`scripts/wiki-maintain.sh` chains harvest + hygiene + qmd reindex:
```bash
bash scripts/wiki-maintain.sh # Harvest + quick hygiene + reindex
bash scripts/wiki-maintain.sh --full # Harvest + full hygiene (LLM) + reindex
bash scripts/wiki-maintain.sh --harvest-only # Harvest only
bash scripts/wiki-maintain.sh --hygiene-only # Hygiene only
bash scripts/wiki-maintain.sh --dry-run # Show what would run
```
### Lint (periodic health check)
Automated via `scripts/wiki-hygiene.py`. Two tiers:
**Quick mode** (no LLM, run daily — `python3 scripts/wiki-hygiene.py`):
- Backfill missing `last_verified`
- Refresh `last_verified` from conversation `related:` references
- Auto-restore archived pages that are referenced again
- Repair frontmatter (missing required fields, invalid values)
- Confidence decay per 6/9/12-month thresholds
- Archive stale and superseded pages
- Orphan pages (auto-linked into `index.md`)
- Broken cross-references (fuzzy-match fix via `difflib`, or restore from archive)
- Main index drift (auto add missing entries, remove stale ones)
- Empty stubs (report-only)
- State file drift (report-only)
- Staging/archive index resync
**Full mode** (LLM, run weekly — `python3 scripts/wiki-hygiene.py --full`):
- Everything in quick mode, plus:
- Missing cross-references between related pages (haiku)
- Duplicate coverage — weaker page auto-merged into stronger (sonnet)
- Contradictions between pages (sonnet, report-only)
- Technology lifecycle — flag pages referencing versions older than what's in recent conversations
**Reports** (written to `reports/`):
- `hygiene-YYYY-MM-DD-fixed.md` — what was auto-fixed
- `hygiene-YYYY-MM-DD-needs-review.md` — what needs human judgment
## Cross-Reference Conventions
- Link between wiki pages using relative markdown links: `[Pattern Name](../patterns/file.md)`
- Link to raw sources: `[Source](../raw/path/to/file.md)`
- In frontmatter `related:` use the relative filename: `patterns/secrets-at-startup.md`
## Naming Conventions
- Filenames: `kebab-case.md`
- Patterns: named by what they standardize (e.g., `health-endpoints.md`, `secrets-at-startup.md`)
- Decisions: named by what was decided (e.g., `no-alpine.md`, `dhi-base-images.md`)
- Environments: named by domain (e.g., `docker-registries.md`, `ci-cd-platforms.md`)
- Concepts: named by the concept (e.g., `two-user-database-model.md`, `build-once-deploy-many.md`)
## Customization Notes
Things you should change for your own wiki:
1. **Directory structure** — the four live dirs (`patterns/`, `decisions/`, `concepts/`, `environments/`) reflect engineering use cases. Pick categories that match how you think — research wikis might use `findings/`, `hypotheses/`, `methods/`, `literature/` instead. Update `LIVE_CONTENT_DIRS` in `scripts/wiki_lib.py` to match.
2. **Page page-type sections** — the "Structure" blocks under each page type are for my use. Define your own conventions.
3. **`status` field** — if you want to track Superseded/Active/Under Review explicitly, this is a natural add. The hygiene script already checks for `status: Superseded by ...` and archives those automatically.
4. **Environment Detection** — if you don't have multiple environments, remove the section. If you do, update it for your own environments (work/home, dev/prod, mac/linux, etc.).
5. **Cross-reference path format** — I use `patterns/foo.md` in the `related:` field. Obsidian users might prefer `[[foo]]` wikilink format. The hygiene script handles standard markdown links; adapt as needed.