A compounding LLM-maintained knowledge wiki. Synthesis of Andrej Karpathy's persistent-wiki gist and milla-jovovich's mempalace, with an automation layer on top for conversation mining, URL harvesting, human-in-the-loop staging, staleness decay, and hygiene. Includes: - 11 pipeline scripts (extract, summarize, index, harvest, stage, hygiene, maintain, sync, + shared library) - Full docs: README, SETUP, ARCHITECTURE, DESIGN-RATIONALE, CUSTOMIZE - Example CLAUDE.md files (wiki schema + global instructions) tuned for the three-collection qmd setup - 171-test pytest suite (cross-platform, runs in ~1.3s) - MIT licensed
14 KiB
Customization Guide
This repo is built around Claude Code, cron-based automation, and a specific directory layout. None of those are load-bearing for the core idea. This document walks through adapting it for different agents, different scheduling, and different subsets of functionality.
What's actually required for the core idea
The minimum viable compounding wiki is:
- A markdown directory tree
- An agent that reads the tree at the start of a session and writes to it during the session
- Some convention (a
CLAUDE.mdor equivalent) telling the agent how to maintain the wiki
Everything else in this repo is optional optimization — automated extraction, URL harvesting, hygiene checks, cron scheduling. They're worth the setup effort once the wiki grows past a few dozen pages, but they're not the idea.
Adapting for non-Claude-Code agents
Four script components are Claude-specific. Each has a natural replacement path:
1. extract-sessions.py — Claude Code JSONL parsing
What it does: Reads session files from ~/.claude/projects/ and
converts them to markdown transcripts.
What's Claude-specific: The JSONL format and directory structure are specific to the Claude Code CLI. Other agents don't produce these files.
Replacements:
- Cursor: Cursor stores chat history in
~/Library/Application Support/Cursor/User/globalStorage/(macOS) as SQLite. Write an equivalentextract-sessions.pythat queries that SQLite and produces the same markdown format. - Aider: Aider stores chat history as
.aider.chat.history.mdin each project directory. A much simpler extractor: walk all project directories, read each.aider.chat.history.md, split on session boundaries, write toconversations/<project>/. - OpenAI Codex / gemini CLI / other: Whatever session format your
tool uses — the target format is a markdown file with a specific
frontmatter shape (
title,type: conversation,project,date,status: extracted,messages: N, body of user/assistant turns). Anything that produces files in that shape will flow through the rest of the pipeline unchanged. - No agent at all — just manual: Skip this script entirely. Paste
interesting conversations into
conversations/general/YYYY-MM-DD-slug.mdby hand and setstatus: extractedyourself.
The pipeline downstream of extract-sessions.py doesn't care how the
transcripts got there, only that they exist with the right frontmatter.
2. summarize-conversations.py — claude -p summarization
What it does: Classifies extracted conversations into "halls" (fact/discovery/preference/advice/event/tooling) and writes summaries.
What's Claude-specific: Uses claude -p with haiku/sonnet routing.
Replacements:
- OpenAI: Replace the
call_claudehelper with a function that callsopenaiPython SDK orgptCLI. Use gpt-4o-mini for short conversations (equivalent to haiku routing) and gpt-4o for long ones. - Local LLM: The script already supports this path — just omit the
--claudeflag and run allama-serveron localhost:8080 (or the WSL gateway IP on Windows). Phi-4-14B scored 400/400 on our internal eval. - Ollama: Point
AI_BASE_URLat your Ollama endpoint (e.g.http://localhost:11434/v1). Ollama exposes an OpenAI-compatible API. - Any OpenAI-compatible endpoint:
AI_BASE_URLandAI_MODELenv vars configure the script — no code changes needed. - No LLM at all — manual summaries: Edit each conversation file by
hand to set
status: summarizedand add your owntopics/relatedfrontmatter. Tedious but works for a small wiki.
3. wiki-harvest.py — AI compile step
What it does: After fetching raw URL content, sends it to claude -p
to get a structured JSON verdict (new_page / update_page / both / skip)
plus the page content.
What's Claude-specific: claude -p --model haiku|sonnet.
Replacements:
- Any other LLM: Replace
call_claude_compile()with a function that calls your preferred backend. The prompt template (COMPILE_PROMPT_TEMPLATE) is reusable — just swap the transport. - Skip AI compilation entirely: Run
wiki-harvest.py --no-compileand the harvester will save raw content toraw/harvested/without trying to compile it. You can then manually (or via a different script) turn the raw content into wiki pages.
4. wiki-hygiene.py --full — LLM-powered checks
What it does: Duplicate detection, contradiction detection, missing cross-reference suggestions.
What's Claude-specific: claude -p --model haiku|sonnet.
Replacements:
- Same as #3: Replace the
call_claude()helper inwiki-hygiene.py. - Skip full mode entirely: Only run
wiki-hygiene.py --quick(the default). Quick mode has no LLM calls and catches 90% of structural issues. Contradictions and duplicates just have to be caught by human review duringwiki-staging.py --reviewsessions.
5. CLAUDE.md at the wiki root
What it does: The instructions Claude Code reads at the start of every session that explain the wiki schema and maintenance operations.
What's Claude-specific: The filename. Claude Code specifically looks
for CLAUDE.md; other agents look for other files.
Replacements:
| Agent | Equivalent file |
|---|---|
| Claude Code | CLAUDE.md |
| Cursor | .cursorrules or .cursor/rules/ |
| Aider | CONVENTIONS.md (read via --read CONVENTIONS.md) |
| Gemini CLI | GEMINI.md |
| Continue.dev | config.json prompts or .continue/rules/ |
The content is the same — just rename the file and point your agent at it.
Running without cron
Cron is convenient but not required. Alternatives:
Manual runs
Just call the scripts when you want the wiki updated:
cd ~/projects/wiki
# When you want to ingest new Claude Code sessions
bash scripts/mine-conversations.sh
# When you want hygiene + harvest
bash scripts/wiki-maintain.sh
# When you want the expensive LLM pass
bash scripts/wiki-maintain.sh --hygiene-only --full
This is arguably better than cron if you work in bursts — run maintenance when you start a session, not on a schedule.
systemd timers (Linux)
More observable than cron, better journaling:
# ~/.config/systemd/user/wiki-maintain.service
[Unit]
Description=Wiki maintenance pipeline
[Service]
Type=oneshot
WorkingDirectory=%h/projects/wiki
ExecStart=/usr/bin/bash %h/projects/wiki/scripts/wiki-maintain.sh
# ~/.config/systemd/user/wiki-maintain.timer
[Unit]
Description=Run wiki-maintain daily
[Timer]
OnCalendar=daily
Persistent=true
[Install]
WantedBy=timers.target
systemctl --user enable --now wiki-maintain.timer
journalctl --user -u wiki-maintain.service # see logs
launchd (macOS)
More native than cron on macOS:
<!-- ~/Library/LaunchAgents/com.user.wiki-maintain.plist -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key><string>com.user.wiki-maintain</string>
<key>ProgramArguments</key>
<array>
<string>/bin/bash</string>
<string>/Users/YOUR_USER/projects/wiki/scripts/wiki-maintain.sh</string>
</array>
<key>StartCalendarInterval</key>
<dict>
<key>Hour</key><integer>3</integer>
<key>Minute</key><integer>0</integer>
</dict>
<key>StandardOutPath</key><string>/tmp/wiki-maintain.log</string>
<key>StandardErrorPath</key><string>/tmp/wiki-maintain.err</string>
</dict>
</plist>
launchctl load ~/Library/LaunchAgents/com.user.wiki-maintain.plist
launchctl list | grep wiki # verify
Git hooks (pre-push)
Run hygiene before every push so the wiki is always clean when it hits the remote:
cat > ~/projects/wiki/.git/hooks/pre-push <<'HOOK'
#!/usr/bin/env bash
set -euo pipefail
bash ~/projects/wiki/scripts/wiki-maintain.sh --hygiene-only --no-reindex
HOOK
chmod +x ~/projects/wiki/.git/hooks/pre-push
Downside: every push is slow. Upside: you never push a broken wiki.
CI pipeline
Run wiki-hygiene.py --check-only in a CI workflow on every PR:
# .github/workflows/wiki-check.yml (or .gitea/workflows/...)
name: Wiki hygiene check
on: [push, pull_request]
jobs:
hygiene:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
- run: python3 scripts/wiki-hygiene.py --check-only
--check-only reports issues without auto-fixing them, so CI can flag
problems without modifying files.
Minimal subsets
You don't have to run the whole pipeline. Pick what's useful:
"Just the wiki" (no automation)
- Delete
scripts/wiki-*andscripts/*-conversations* - Delete
tests/ - Keep the directory structure (
patterns/,decisions/, etc.) - Keep
index.mdandCLAUDE.md - Write and maintain the wiki manually with your agent
This is the Karpathy-gist version. Works great for small wikis.
"Wiki + mining" (no harvesting, no hygiene)
- Keep the mining layer (
extract-sessions.py,summarize-conversations.py,update-conversation-index.py) - Delete the automation layer (
wiki-harvest.py,wiki-hygiene.py,wiki-staging.py,wiki-maintain.sh) - The wiki grows from session mining but you maintain it manually
Useful if you want session continuity (the wake-up briefing) without the full automation.
"Wiki + hygiene" (no mining, no harvesting)
- Keep
wiki-hygiene.pyandwiki_lib.py - Delete everything else
- Run
wiki-hygiene.py --quickperiodically to catch structural issues
Useful if you write the wiki manually but want automated checks for orphans, broken links, and staleness.
"Wiki + harvesting" (no session mining)
- Keep
wiki-harvest.py,wiki-staging.py,wiki_lib.py - Delete mining scripts
- Source URLs manually — put them in a file and point the harvester at it. You'd need to write a wrapper that extracts URLs from your source file and feeds them into the fetch cascade.
Useful if URLs come from somewhere other than Claude Code sessions (e.g. browser bookmarks, Pocket export, RSS).
Schema customization
The repo uses these live content types:
patterns/— HOW things should be builtdecisions/— WHY we chose this approachconcepts/— WHAT the foundational ideas areenvironments/— WHERE implementations differ
These reflect my engineering-focused use case. Your wiki might need different categories. To change them:
- Rename / add directories under the wiki root
- Edit
LIVE_CONTENT_DIRSinscripts/wiki_lib.py - Update the
type:frontmatter validation inscripts/wiki-hygiene.py(VALID_TYPESconstant) - Update
CLAUDE.mdto describe the new categories - Update
index.mdsection headers to match
Examples of alternative schemas:
Research wiki:
findings/— experimental resultshypotheses/— what you're testingmethods/— how you testliterature/— external sources
Product wiki:
features/— what the product doesdecisions/— why we chose thisusers/— personas, interviews, feedbackmetrics/— what we measure
Personal knowledge wiki:
topics/— general subject matterprojects/— specific ongoing workjournal/— dated entriesreferences/— external links/papers
None of these are better or worse — pick what matches how you think.
Frontmatter customization
The required fields are documented in CLAUDE.md (frontmatter spec).
You can add your own fields freely — the parser and hygiene checks
ignore unknown keys.
Useful additions you might want:
author: alice # who wrote or introduced the page
tags: [auth, security] # flat tag list
urgency: high # for to-do-style wiki pages
stakeholders: # who cares about this page
- product-team
- security-team
review_by: 2026-06-01 # explicit review date instead of age-based decay
If you want age-based decay to key off a different field than
last_verified (say, review_by), edit expected_confidence() in
scripts/wiki-hygiene.py to read from your custom field.
Working across multiple wikis
The scripts all honor the WIKI_DIR environment variable. Run multiple
wikis against the same scripts:
# Work wiki
WIKI_DIR=~/projects/work-wiki bash scripts/wiki-maintain.sh
# Personal wiki
WIKI_DIR=~/projects/personal-wiki bash scripts/wiki-maintain.sh
# Research wiki
WIKI_DIR=~/projects/research-wiki bash scripts/wiki-maintain.sh
Each has its own state files, its own cron entries, its own qmd
collection. You can symlink or copy scripts/ into each wiki, or run
all three against a single checked-out copy of the scripts.
What I'd change if starting over
Honest notes on the design choices, in case you're about to fork:
-
Config should be in YAML, not inline constants. I bolted a "CONFIGURE ME" comment onto
PROJECT_MAPandSKIP_DOMAIN_PATTERNSas a shortcut. Better: aconfig.yamlat the wiki root that all scripts read. -
The mining layer is tightly coupled to Claude Code. A cleaner design would put a
Sessioninterface inwiki_lib.pyand have extractors for each agent produceSessionobjects — the rest of the pipeline would be agent-agnostic. -
The hygiene script is a monolith. 1100+ lines is a lot. Splitting it into
wiki_hygiene/checks.py,wiki_hygiene/archive.py,wiki_hygiene/llm.py, etc., would be cleaner. It started as a single file and grew. -
The hyphenated filenames (
wiki-harvest.py) make Python imports awkward. Standard Python convention is underscores. I used hyphens for consistency with the shell scripts, andconftest.pyhas a module-loader workaround. A cleaner fork would use underscores everywhere. -
The wiki schema assumes you know what you want to catalog. If you don't, start with a free-form
notes/directory and let categories emerge organically, then refactor intopatterns/etc. later.
None of these are blockers. They're all "if I were designing v2" observations.