Files

Eric Turner ee54a2f5d4 Initial commit — memex

A compounding LLM-maintained knowledge wiki.

Synthesis of Andrej Karpathy's persistent-wiki gist and milla-jovovich's
mempalace, with an automation layer on top for conversation mining, URL
harvesting, human-in-the-loop staging, staleness decay, and hygiene.

Includes:
- 11 pipeline scripts (extract, summarize, index, harvest, stage,
  hygiene, maintain, sync, + shared library)
- Full docs: README, SETUP, ARCHITECTURE, DESIGN-RATIONALE, CUSTOMIZE
- Example CLAUDE.md files (wiki schema + global instructions) tuned for
  the three-collection qmd setup
- 171-test pytest suite (cross-platform, runs in ~1.3s)
- MIT licensed

2026-04-12 21:16:02 -06:00

13 KiB

Raw Blame History

LLM Wiki — Schema

This is a persistent, compounding knowledge base maintained by LLM agents. It captures the why behind patterns, decisions, and implementations — not just the what. Copy this file to the root of your wiki directory (i.e. ~/projects/wiki/CLAUDE.md) and edit for your own conventions.

This is an example CLAUDE.md for the wiki root. The agent reads this at the start of every session when working inside the wiki. It's the "constitution" that tells the agent how to maintain the knowledge base.

How This Wiki Works

You are the maintainer. When working in this wiki directory, you read raw sources, compile knowledge into wiki pages, maintain cross-references, and keep everything consistent.

You are a consumer. When working in any other project directory, you read wiki pages to inform your work — applying established patterns, respecting decisions, and understanding context.

Directory Structure

wiki/
├── CLAUDE.md              ← You are here (schema)
├── index.md               ← Content catalog — read this FIRST on any query
├── log.md                 ← Chronological record of all operations
│
├── patterns/              ← LIVE: HOW things should be built (with WHY)
├── decisions/             ← LIVE: WHY we chose this approach (with alternatives rejected)
├── environments/          ← LIVE: WHERE implementations differ
├── concepts/              ← LIVE: WHAT the foundational ideas are
│
├── raw/                   ← Immutable source material (NEVER modify)
│   └── harvested/         ← URL harvester output
│
├── staging/               ← PENDING automated content awaiting human review
│   ├── index.md
│   └── <type>/
│
├── archive/               ← STALE / superseded (excluded from default search)
│   ├── index.md
│   └── <type>/
│
├── conversations/         ← Mined Claude Code session transcripts
│   ├── index.md
│   └── <wing>/            ← per-project or per-person (MemPalace "wing")
│
├── context/               ← Auto-updated AI session briefing
│   ├── wake-up.md         ← Loaded at the start of every session
│   └── active-concerns.md
│
├── reports/               ← Hygiene operation logs
└── scripts/               ← The automation pipeline

Core rule — automated vs manual content:

Origin	Destination	Status
Script-generated (harvester, hygiene, URL compile)	`staging/`	`pending`
Human-initiated ("add this to the wiki" in a Claude session)	Live wiki (`patterns/`, etc.)	`verified`
Human-reviewed from staging	Live wiki (promoted)	`verified`

Managed via scripts/wiki-staging.py --list / --promote / --reject / --review.

Page Conventions

Frontmatter (required on all wiki pages)

---
title: Page Title
type: pattern | decision | environment | concept
confidence: high | medium | low
origin: manual | automated    # How the page entered the wiki
sources: [list of raw/ files this was compiled from]
related: [list of other wiki pages this connects to]
last_compiled: YYYY-MM-DD     # Date this page was last (re)compiled from sources
last_verified: YYYY-MM-DD     # Date the content was last confirmed accurate
---

origin values:

manual — Created by a human in a Claude session. Goes directly to the live wiki, no staging.
automated — Created by a script (harvester, hygiene, etc.). Must pass through staging/ for human review before promotion.

Confidence decay: Pages with no refresh signal for 6 months decay high → medium; 9 months → low; 12 months → stale (auto-archived). last_verified drives decay, not last_compiled. See scripts/wiki-hygiene.py and archive/index.md.

Staging Frontmatter (pages in `staging/<type>/`)

Automated-origin pages get additional staging metadata that is stripped on promotion:

---
title: ...
type: ...
origin: automated
status: pending              # Awaiting review
staged_date: YYYY-MM-DD      # When the automated script staged this
staged_by: wiki-harvest      # Which script staged it (wiki-harvest, wiki-hygiene, ...)
target_path: patterns/foo.md # Where it should land on promotion
modifies: patterns/bar.md    # Only present when this is an update to an existing live page
compilation_notes: "..."     # AI's explanation of what it did and why
harvest_source: https://...  # Only present for URL-harvested content
sources: [...]
related: [...]
last_verified: YYYY-MM-DD
---

Pattern Pages (`patterns/`)

Structure:

What — One-paragraph description of the pattern
Why — The reasoning, constraints, and goals that led to this pattern
Canonical Example — A concrete implementation (link to raw/ source or inline)
Structure — The specification: fields, endpoints, formats, conventions
When to Deviate — Known exceptions or conditions where the pattern doesn't apply
History — Key changes and the decisions that drove them

Decision Pages (`decisions/`)

Structure:

Decision — One sentence: what we decided
Context — What problem or constraint prompted this
Options Considered — What alternatives existed (with pros/cons)
Rationale — Why this option won
Consequences — What this decision enables and constrains
Status — Active | Superseded by [link] | Under Review

Environment Pages (`environments/`)

Structure:

Overview — What this environment is (platform, CI, infra)
Key Differences — Table comparing environments for this domain
Implementation Details — Environment-specific configs, credentials, deploy method
Gotchas — Things that have bitten us

Concept Pages (`concepts/`)

Structure:

Definition — What this concept means in our context
Why It Matters — How this concept shapes our decisions
Related Patterns — Links to patterns that implement this concept
Related Decisions — Links to decisions driven by this concept

Operations

Ingest (adding new knowledge)

When a new raw source is added or you learn something new:

Read the source material thoroughly
Identify which existing wiki pages need updating
Identify if new pages are needed
Update/create pages following the conventions above
Update cross-references (related: frontmatter) on all affected pages
Update index.md with any new pages
Set last_verified: to today's date on every page you create or update
Set origin: manual on any page you create when a human directed you to
Append to log.md: ## [YYYY-MM-DD] ingest | Source Description

Where to write:

Human-initiated ("add this to the wiki", "create a pattern for X") — write directly to the live directory (patterns/, decisions/, etc.) with origin: manual. The human's instruction IS the approval.
Script-initiated (harvest, auto-compile, hygiene auto-fix) — write to staging/<type>/ with origin: automated, status: pending, plus staged_date, staged_by, target_path, and compilation_notes. For updates to existing live pages, also set modifies: <live-page-path>.

Query (answering questions from other projects)

When working in another project and consulting the wiki:

Use qmd to search first (see Search Strategy below). Read index.md only when browsing the full catalog.
Read the specific pattern/decision/concept pages
Apply the knowledge, respecting environment differences
If a page's confidence is low, flag that to the user — the content may be aging out
If a page has status: pending (it's in staging/), flag that to the user: "Note: this is from a pending wiki page in staging, not yet verified." Use the content but make the uncertainty visible.
If you find yourself consulting a page under archive/, mention it's archived and may be outdated
If your work reveals new knowledge, file it back — update the wiki (and bump last_verified)

Search Strategy — which qmd collection to use

The wiki has three qmd collections. Pick the right one for the question:

Question type	Collection	Command
"What's our current pattern for X?"	`wiki` (default)	`qmd search "X" --json -n 5`
"What's the rationale behind decision Y?"	`wiki` (default)	`qmd vsearch "why did we choose Y" --json -n 5`
"What was our OLD approach before we changed it?"	`wiki-archive`	`qmd search "X" -c wiki-archive --json -n 5`
"When did we discuss this, and what did we decide?"	`wiki-conversations`	`qmd search "X" -c wiki-conversations --json -n 5`
"Find everything across time"	all three	`qmd search "X" -c wiki -c wiki-archive -c wiki-conversations --json -n 10`

Rules of thumb:

Use qmd search for keyword matches (BM25, fast)
Use qmd vsearch for conceptual / semantically-similar queries (vector)
Use qmd query for the best quality — hybrid BM25 + vector + LLM re-ranking
Always use --json for structured output
Read individual matched pages with cat or your file tool after finding them

Mine (conversation extraction and summarization)

Four-phase pipeline that extracts sessions into searchable conversation pages:

Extract (extract-sessions.py) — Parse session files into markdown transcripts
Summarize (summarize-conversations.py --claude) — Classify + summarize via claude -p with haiku/sonnet routing
Index (update-conversation-index.py --reindex) — Regenerate conversation index + context/wake-up.md
Harvest (wiki-harvest.py) — Scan summarized conversations for external reference URLs and compile them into wiki pages

Full pipeline via mine-conversations.sh. Extraction is incremental (tracks byte offsets). Summarization is incremental (tracks message count).

Maintain (wiki health automation)

scripts/wiki-maintain.sh chains harvest + hygiene + qmd reindex:

bash scripts/wiki-maintain.sh                 # Harvest + quick hygiene + reindex
bash scripts/wiki-maintain.sh --full          # Harvest + full hygiene (LLM) + reindex
bash scripts/wiki-maintain.sh --harvest-only  # Harvest only
bash scripts/wiki-maintain.sh --hygiene-only  # Hygiene only
bash scripts/wiki-maintain.sh --dry-run       # Show what would run

Lint (periodic health check)

Automated via scripts/wiki-hygiene.py. Two tiers:

Quick mode (no LLM, run daily — python3 scripts/wiki-hygiene.py):

Backfill missing last_verified
Refresh last_verified from conversation related: references
Auto-restore archived pages that are referenced again
Repair frontmatter (missing required fields, invalid values)
Confidence decay per 6/9/12-month thresholds
Archive stale and superseded pages
Orphan pages (auto-linked into index.md)
Broken cross-references (fuzzy-match fix via difflib, or restore from archive)
Main index drift (auto add missing entries, remove stale ones)
Empty stubs (report-only)
State file drift (report-only)
Staging/archive index resync

Full mode (LLM, run weekly — python3 scripts/wiki-hygiene.py --full):

Everything in quick mode, plus:
Missing cross-references between related pages (haiku)
Duplicate coverage — weaker page auto-merged into stronger (sonnet)
Contradictions between pages (sonnet, report-only)
Technology lifecycle — flag pages referencing versions older than what's in recent conversations

Reports (written to reports/):

hygiene-YYYY-MM-DD-fixed.md — what was auto-fixed
hygiene-YYYY-MM-DD-needs-review.md — what needs human judgment

Cross-Reference Conventions

Link between wiki pages using relative markdown links: [Pattern Name](../patterns/file.md)
Link to raw sources: [Source](../raw/path/to/file.md)
In frontmatter related: use the relative filename: patterns/secrets-at-startup.md

Naming Conventions

Filenames: kebab-case.md
Patterns: named by what they standardize (e.g., health-endpoints.md, secrets-at-startup.md)
Decisions: named by what was decided (e.g., no-alpine.md, dhi-base-images.md)
Environments: named by domain (e.g., docker-registries.md, ci-cd-platforms.md)
Concepts: named by the concept (e.g., two-user-database-model.md, build-once-deploy-many.md)

Customization Notes

Things you should change for your own wiki:

Directory structure — the four live dirs (patterns/, decisions/, concepts/, environments/) reflect engineering use cases. Pick categories that match how you think — research wikis might use findings/, hypotheses/, methods/, literature/ instead. Update LIVE_CONTENT_DIRS in scripts/wiki_lib.py to match.
Page page-type sections — the "Structure" blocks under each page type are for my use. Define your own conventions.
status field — if you want to track Superseded/Active/Under Review explicitly, this is a natural add. The hygiene script already checks for status: Superseded by ... and archives those automatically.
Environment Detection — if you don't have multiple environments, remove the section. If you do, update it for your own environments (work/home, dev/prod, mac/linux, etc.).
Cross-reference path format — I use patterns/foo.md in the related: field. Obsidian users might prefer [[foo]] wikilink format. The hygiene script handles standard markdown links; adapt as needed.

13 KiB Raw Blame History

LLM Wiki — Schema

How This Wiki Works

Directory Structure

Page Conventions

Frontmatter (required on all wiki pages)

Staging Frontmatter (pages in staging/<type>/)

Pattern Pages (patterns/)

Decision Pages (decisions/)

Environment Pages (environments/)

Concept Pages (concepts/)

Operations

Ingest (adding new knowledge)

Query (answering questions from other projects)

Search Strategy — which qmd collection to use

Mine (conversation extraction and summarization)

Maintain (wiki health automation)

Lint (periodic health check)

Cross-Reference Conventions

Naming Conventions

Customization Notes

13 KiB

Raw Blame History

Staging Frontmatter (pages in `staging/<type>/`)

Pattern Pages (`patterns/`)

Decision Pages (`decisions/`)

Environment Pages (`environments/`)

Concept Pages (`concepts/`)