Initial commit — memex
A compounding LLM-maintained knowledge wiki. Synthesis of Andrej Karpathy's persistent-wiki gist and milla-jovovich's mempalace, with an automation layer on top for conversation mining, URL harvesting, human-in-the-loop staging, staleness decay, and hygiene. Includes: - 11 pipeline scripts (extract, summarize, index, harvest, stage, hygiene, maintain, sync, + shared library) - Full docs: README, SETUP, ARCHITECTURE, DESIGN-RATIONALE, CUSTOMIZE - Example CLAUDE.md files (wiki schema + global instructions) tuned for the three-collection qmd setup - 171-test pytest suite (cross-platform, runs in ~1.3s) - MIT licensed
This commit is contained in:
278
docs/examples/wiki-CLAUDE.md
Normal file
278
docs/examples/wiki-CLAUDE.md
Normal file
@@ -0,0 +1,278 @@
|
||||
# LLM Wiki — Schema
|
||||
|
||||
This is a persistent, compounding knowledge base maintained by LLM agents.
|
||||
It captures the **why** behind patterns, decisions, and implementations —
|
||||
not just the what. Copy this file to the root of your wiki directory
|
||||
(i.e. `~/projects/wiki/CLAUDE.md`) and edit for your own conventions.
|
||||
|
||||
> This is an example `CLAUDE.md` for the wiki root. The agent reads this
|
||||
> at the start of every session when working inside the wiki. It's the
|
||||
> "constitution" that tells the agent how to maintain the knowledge base.
|
||||
|
||||
## How This Wiki Works
|
||||
|
||||
**You are the maintainer.** When working in this wiki directory, you read
|
||||
raw sources, compile knowledge into wiki pages, maintain cross-references,
|
||||
and keep everything consistent.
|
||||
|
||||
**You are a consumer.** When working in any other project directory, you
|
||||
read wiki pages to inform your work — applying established patterns,
|
||||
respecting decisions, and understanding context.
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
wiki/
|
||||
├── CLAUDE.md ← You are here (schema)
|
||||
├── index.md ← Content catalog — read this FIRST on any query
|
||||
├── log.md ← Chronological record of all operations
|
||||
│
|
||||
├── patterns/ ← LIVE: HOW things should be built (with WHY)
|
||||
├── decisions/ ← LIVE: WHY we chose this approach (with alternatives rejected)
|
||||
├── environments/ ← LIVE: WHERE implementations differ
|
||||
├── concepts/ ← LIVE: WHAT the foundational ideas are
|
||||
│
|
||||
├── raw/ ← Immutable source material (NEVER modify)
|
||||
│ └── harvested/ ← URL harvester output
|
||||
│
|
||||
├── staging/ ← PENDING automated content awaiting human review
|
||||
│ ├── index.md
|
||||
│ └── <type>/
|
||||
│
|
||||
├── archive/ ← STALE / superseded (excluded from default search)
|
||||
│ ├── index.md
|
||||
│ └── <type>/
|
||||
│
|
||||
├── conversations/ ← Mined Claude Code session transcripts
|
||||
│ ├── index.md
|
||||
│ └── <wing>/ ← per-project or per-person (MemPalace "wing")
|
||||
│
|
||||
├── context/ ← Auto-updated AI session briefing
|
||||
│ ├── wake-up.md ← Loaded at the start of every session
|
||||
│ └── active-concerns.md
|
||||
│
|
||||
├── reports/ ← Hygiene operation logs
|
||||
└── scripts/ ← The automation pipeline
|
||||
```
|
||||
|
||||
**Core rule — automated vs manual content**:
|
||||
|
||||
| Origin | Destination | Status |
|
||||
|--------|-------------|--------|
|
||||
| Script-generated (harvester, hygiene, URL compile) | `staging/` | `pending` |
|
||||
| Human-initiated ("add this to the wiki" in a Claude session) | Live wiki (`patterns/`, etc.) | `verified` |
|
||||
| Human-reviewed from staging | Live wiki (promoted) | `verified` |
|
||||
|
||||
Managed via `scripts/wiki-staging.py --list / --promote / --reject / --review`.
|
||||
|
||||
## Page Conventions
|
||||
|
||||
### Frontmatter (required on all wiki pages)
|
||||
|
||||
```yaml
|
||||
---
|
||||
title: Page Title
|
||||
type: pattern | decision | environment | concept
|
||||
confidence: high | medium | low
|
||||
origin: manual | automated # How the page entered the wiki
|
||||
sources: [list of raw/ files this was compiled from]
|
||||
related: [list of other wiki pages this connects to]
|
||||
last_compiled: YYYY-MM-DD # Date this page was last (re)compiled from sources
|
||||
last_verified: YYYY-MM-DD # Date the content was last confirmed accurate
|
||||
---
|
||||
```
|
||||
|
||||
**`origin` values**:
|
||||
- `manual` — Created by a human in a Claude session. Goes directly to the live wiki, no staging.
|
||||
- `automated` — Created by a script (harvester, hygiene, etc.). Must pass through `staging/` for human review before promotion.
|
||||
|
||||
**Confidence decay**: Pages with no refresh signal for 6 months decay `high → medium`; 9 months → `low`; 12 months → `stale` (auto-archived). `last_verified` drives decay, not `last_compiled`. See `scripts/wiki-hygiene.py` and `archive/index.md`.
|
||||
|
||||
### Staging Frontmatter (pages in `staging/<type>/`)
|
||||
|
||||
Automated-origin pages get additional staging metadata that is **stripped on promotion**:
|
||||
|
||||
```yaml
|
||||
---
|
||||
title: ...
|
||||
type: ...
|
||||
origin: automated
|
||||
status: pending # Awaiting review
|
||||
staged_date: YYYY-MM-DD # When the automated script staged this
|
||||
staged_by: wiki-harvest # Which script staged it (wiki-harvest, wiki-hygiene, ...)
|
||||
target_path: patterns/foo.md # Where it should land on promotion
|
||||
modifies: patterns/bar.md # Only present when this is an update to an existing live page
|
||||
compilation_notes: "..." # AI's explanation of what it did and why
|
||||
harvest_source: https://... # Only present for URL-harvested content
|
||||
sources: [...]
|
||||
related: [...]
|
||||
last_verified: YYYY-MM-DD
|
||||
---
|
||||
```
|
||||
|
||||
### Pattern Pages (`patterns/`)
|
||||
|
||||
Structure:
|
||||
1. **What** — One-paragraph description of the pattern
|
||||
2. **Why** — The reasoning, constraints, and goals that led to this pattern
|
||||
3. **Canonical Example** — A concrete implementation (link to raw/ source or inline)
|
||||
4. **Structure** — The specification: fields, endpoints, formats, conventions
|
||||
5. **When to Deviate** — Known exceptions or conditions where the pattern doesn't apply
|
||||
6. **History** — Key changes and the decisions that drove them
|
||||
|
||||
### Decision Pages (`decisions/`)
|
||||
|
||||
Structure:
|
||||
1. **Decision** — One sentence: what we decided
|
||||
2. **Context** — What problem or constraint prompted this
|
||||
3. **Options Considered** — What alternatives existed (with pros/cons)
|
||||
4. **Rationale** — Why this option won
|
||||
5. **Consequences** — What this decision enables and constrains
|
||||
6. **Status** — Active | Superseded by [link] | Under Review
|
||||
|
||||
### Environment Pages (`environments/`)
|
||||
|
||||
Structure:
|
||||
1. **Overview** — What this environment is (platform, CI, infra)
|
||||
2. **Key Differences** — Table comparing environments for this domain
|
||||
3. **Implementation Details** — Environment-specific configs, credentials, deploy method
|
||||
4. **Gotchas** — Things that have bitten us
|
||||
|
||||
### Concept Pages (`concepts/`)
|
||||
|
||||
Structure:
|
||||
1. **Definition** — What this concept means in our context
|
||||
2. **Why It Matters** — How this concept shapes our decisions
|
||||
3. **Related Patterns** — Links to patterns that implement this concept
|
||||
4. **Related Decisions** — Links to decisions driven by this concept
|
||||
|
||||
## Operations
|
||||
|
||||
### Ingest (adding new knowledge)
|
||||
|
||||
When a new raw source is added or you learn something new:
|
||||
|
||||
1. Read the source material thoroughly
|
||||
2. Identify which existing wiki pages need updating
|
||||
3. Identify if new pages are needed
|
||||
4. Update/create pages following the conventions above
|
||||
5. Update cross-references (`related:` frontmatter) on all affected pages
|
||||
6. Update `index.md` with any new pages
|
||||
7. Set `last_verified:` to today's date on every page you create or update
|
||||
8. Set `origin: manual` on any page you create when a human directed you to
|
||||
9. Append to `log.md`: `## [YYYY-MM-DD] ingest | Source Description`
|
||||
|
||||
**Where to write**:
|
||||
- **Human-initiated** ("add this to the wiki", "create a pattern for X") — write directly to the live directory (`patterns/`, `decisions/`, etc.) with `origin: manual`. The human's instruction IS the approval.
|
||||
- **Script-initiated** (harvest, auto-compile, hygiene auto-fix) — write to `staging/<type>/` with `origin: automated`, `status: pending`, plus `staged_date`, `staged_by`, `target_path`, and `compilation_notes`. For updates to existing live pages, also set `modifies: <live-page-path>`.
|
||||
|
||||
### Query (answering questions from other projects)
|
||||
|
||||
When working in another project and consulting the wiki:
|
||||
|
||||
1. Use `qmd` to search first (see Search Strategy below). Read `index.md` only when browsing the full catalog.
|
||||
2. Read the specific pattern/decision/concept pages
|
||||
3. Apply the knowledge, respecting environment differences
|
||||
4. If a page's `confidence` is `low`, flag that to the user — the content may be aging out
|
||||
5. If a page has `status: pending` (it's in `staging/`), flag that to the user: "Note: this is from a pending wiki page in staging, not yet verified." Use the content but make the uncertainty visible.
|
||||
6. If you find yourself consulting a page under `archive/`, mention it's archived and may be outdated
|
||||
7. If your work reveals new knowledge, **file it back** — update the wiki (and bump `last_verified`)
|
||||
|
||||
### Search Strategy — which qmd collection to use
|
||||
|
||||
The wiki has three qmd collections. Pick the right one for the question:
|
||||
|
||||
| Question type | Collection | Command |
|
||||
|---|---|---|
|
||||
| "What's our current pattern for X?" | `wiki` (default) | `qmd search "X" --json -n 5` |
|
||||
| "What's the rationale behind decision Y?" | `wiki` (default) | `qmd vsearch "why did we choose Y" --json -n 5` |
|
||||
| "What was our OLD approach before we changed it?" | `wiki-archive` | `qmd search "X" -c wiki-archive --json -n 5` |
|
||||
| "When did we discuss this, and what did we decide?" | `wiki-conversations` | `qmd search "X" -c wiki-conversations --json -n 5` |
|
||||
| "Find everything across time" | all three | `qmd search "X" -c wiki -c wiki-archive -c wiki-conversations --json -n 10` |
|
||||
|
||||
**Rules of thumb**:
|
||||
- Use `qmd search` for keyword matches (BM25, fast)
|
||||
- Use `qmd vsearch` for conceptual / semantically-similar queries (vector)
|
||||
- Use `qmd query` for the best quality — hybrid BM25 + vector + LLM re-ranking
|
||||
- Always use `--json` for structured output
|
||||
- Read individual matched pages with `cat` or your file tool after finding them
|
||||
|
||||
### Mine (conversation extraction and summarization)
|
||||
|
||||
Four-phase pipeline that extracts sessions into searchable conversation pages:
|
||||
|
||||
1. **Extract** (`extract-sessions.py`) — Parse session files into markdown transcripts
|
||||
2. **Summarize** (`summarize-conversations.py --claude`) — Classify + summarize via `claude -p` with haiku/sonnet routing
|
||||
3. **Index** (`update-conversation-index.py --reindex`) — Regenerate conversation index + `context/wake-up.md`
|
||||
4. **Harvest** (`wiki-harvest.py`) — Scan summarized conversations for external reference URLs and compile them into wiki pages
|
||||
|
||||
Full pipeline via `mine-conversations.sh`. Extraction is incremental (tracks byte offsets). Summarization is incremental (tracks message count).
|
||||
|
||||
### Maintain (wiki health automation)
|
||||
|
||||
`scripts/wiki-maintain.sh` chains harvest + hygiene + qmd reindex:
|
||||
|
||||
```bash
|
||||
bash scripts/wiki-maintain.sh # Harvest + quick hygiene + reindex
|
||||
bash scripts/wiki-maintain.sh --full # Harvest + full hygiene (LLM) + reindex
|
||||
bash scripts/wiki-maintain.sh --harvest-only # Harvest only
|
||||
bash scripts/wiki-maintain.sh --hygiene-only # Hygiene only
|
||||
bash scripts/wiki-maintain.sh --dry-run # Show what would run
|
||||
```
|
||||
|
||||
### Lint (periodic health check)
|
||||
|
||||
Automated via `scripts/wiki-hygiene.py`. Two tiers:
|
||||
|
||||
**Quick mode** (no LLM, run daily — `python3 scripts/wiki-hygiene.py`):
|
||||
- Backfill missing `last_verified`
|
||||
- Refresh `last_verified` from conversation `related:` references
|
||||
- Auto-restore archived pages that are referenced again
|
||||
- Repair frontmatter (missing required fields, invalid values)
|
||||
- Confidence decay per 6/9/12-month thresholds
|
||||
- Archive stale and superseded pages
|
||||
- Orphan pages (auto-linked into `index.md`)
|
||||
- Broken cross-references (fuzzy-match fix via `difflib`, or restore from archive)
|
||||
- Main index drift (auto add missing entries, remove stale ones)
|
||||
- Empty stubs (report-only)
|
||||
- State file drift (report-only)
|
||||
- Staging/archive index resync
|
||||
|
||||
**Full mode** (LLM, run weekly — `python3 scripts/wiki-hygiene.py --full`):
|
||||
- Everything in quick mode, plus:
|
||||
- Missing cross-references between related pages (haiku)
|
||||
- Duplicate coverage — weaker page auto-merged into stronger (sonnet)
|
||||
- Contradictions between pages (sonnet, report-only)
|
||||
- Technology lifecycle — flag pages referencing versions older than what's in recent conversations
|
||||
|
||||
**Reports** (written to `reports/`):
|
||||
- `hygiene-YYYY-MM-DD-fixed.md` — what was auto-fixed
|
||||
- `hygiene-YYYY-MM-DD-needs-review.md` — what needs human judgment
|
||||
|
||||
## Cross-Reference Conventions
|
||||
|
||||
- Link between wiki pages using relative markdown links: `[Pattern Name](../patterns/file.md)`
|
||||
- Link to raw sources: `[Source](../raw/path/to/file.md)`
|
||||
- In frontmatter `related:` use the relative filename: `patterns/secrets-at-startup.md`
|
||||
|
||||
## Naming Conventions
|
||||
|
||||
- Filenames: `kebab-case.md`
|
||||
- Patterns: named by what they standardize (e.g., `health-endpoints.md`, `secrets-at-startup.md`)
|
||||
- Decisions: named by what was decided (e.g., `no-alpine.md`, `dhi-base-images.md`)
|
||||
- Environments: named by domain (e.g., `docker-registries.md`, `ci-cd-platforms.md`)
|
||||
- Concepts: named by the concept (e.g., `two-user-database-model.md`, `build-once-deploy-many.md`)
|
||||
|
||||
## Customization Notes
|
||||
|
||||
Things you should change for your own wiki:
|
||||
|
||||
1. **Directory structure** — the four live dirs (`patterns/`, `decisions/`, `concepts/`, `environments/`) reflect engineering use cases. Pick categories that match how you think — research wikis might use `findings/`, `hypotheses/`, `methods/`, `literature/` instead. Update `LIVE_CONTENT_DIRS` in `scripts/wiki_lib.py` to match.
|
||||
|
||||
2. **Page page-type sections** — the "Structure" blocks under each page type are for my use. Define your own conventions.
|
||||
|
||||
3. **`status` field** — if you want to track Superseded/Active/Under Review explicitly, this is a natural add. The hygiene script already checks for `status: Superseded by ...` and archives those automatically.
|
||||
|
||||
4. **Environment Detection** — if you don't have multiple environments, remove the section. If you do, update it for your own environments (work/home, dev/prod, mac/linux, etc.).
|
||||
|
||||
5. **Cross-reference path format** — I use `patterns/foo.md` in the `related:` field. Obsidian users might prefer `[[foo]]` wikilink format. The hygiene script handles standard markdown links; adapt as needed.
|
||||
Reference in New Issue
Block a user