# Customization Guide This repo is built around Claude Code, cron-based automation, and a specific directory layout. None of those are load-bearing for the core idea. This document walks through adapting it for different agents, different scheduling, and different subsets of functionality. ## What's actually required for the core idea The minimum viable compounding wiki is: 1. A markdown directory tree 2. An agent that reads the tree at the start of a session and writes to it during the session 3. Some convention (a `CLAUDE.md` or equivalent) telling the agent how to maintain the wiki **Everything else in this repo is optional optimization** — automated extraction, URL harvesting, hygiene checks, cron scheduling. They're worth the setup effort once the wiki grows past a few dozen pages, but they're not the *idea*. --- ## Adapting for non-Claude-Code agents Four script components are Claude-specific. Each has a natural replacement path: ### 1. `extract-sessions.py` — Claude Code JSONL parsing **What it does**: Reads session files from `~/.claude/projects/` and converts them to markdown transcripts. **What's Claude-specific**: The JSONL format and directory structure are specific to the Claude Code CLI. Other agents don't produce these files. **Replacements**: - **Cursor**: Cursor stores chat history in `~/Library/Application Support/Cursor/User/globalStorage/` (macOS) as SQLite. Write an equivalent `extract-sessions.py` that queries that SQLite and produces the same markdown format. - **Aider**: Aider stores chat history as `.aider.chat.history.md` in each project directory. A much simpler extractor: walk all project directories, read each `.aider.chat.history.md`, split on session boundaries, write to `conversations//`. - **OpenAI Codex / gemini CLI / other**: Whatever session format your tool uses — the target format is a markdown file with a specific frontmatter shape (`title`, `type: conversation`, `project`, `date`, `status: extracted`, `messages: N`, body of user/assistant turns). Anything that produces files in that shape will flow through the rest of the pipeline unchanged. - **No agent at all — just manual**: Skip this script entirely. Paste interesting conversations into `conversations/general/YYYY-MM-DD-slug.md` by hand and set `status: extracted` yourself. The pipeline downstream of `extract-sessions.py` doesn't care how the transcripts got there, only that they exist with the right frontmatter. ### 2. `summarize-conversations.py` — `claude -p` summarization **What it does**: Classifies extracted conversations into "halls" (fact/discovery/preference/advice/event/tooling) and writes summaries. **What's Claude-specific**: Uses `claude -p` with haiku/sonnet routing. **Replacements**: - **OpenAI**: Replace the `call_claude` helper with a function that calls `openai` Python SDK or `gpt` CLI. Use gpt-4o-mini for short conversations (equivalent to haiku routing) and gpt-4o for long ones. - **Local LLM**: The script already supports this path — just omit the `--claude` flag and run a `llama-server` on localhost:8080 (or the WSL gateway IP on Windows). Phi-4-14B scored 400/400 on our internal eval. - **Ollama**: Point `AI_BASE_URL` at your Ollama endpoint (e.g. `http://localhost:11434/v1`). Ollama exposes an OpenAI-compatible API. - **Any OpenAI-compatible endpoint**: `AI_BASE_URL` and `AI_MODEL` env vars configure the script — no code changes needed. - **No LLM at all — manual summaries**: Edit each conversation file by hand to set `status: summarized` and add your own `topics`/`related` frontmatter. Tedious but works for a small wiki. ### 3. `wiki-harvest.py` — AI compile step **What it does**: After fetching raw URL content, sends it to `claude -p` to get a structured JSON verdict (new_page / update_page / both / skip) plus the page content. **What's Claude-specific**: `claude -p --model haiku|sonnet`. **Replacements**: - **Any other LLM**: Replace `call_claude_compile()` with a function that calls your preferred backend. The prompt template (`COMPILE_PROMPT_TEMPLATE`) is reusable — just swap the transport. - **Skip AI compilation entirely**: Run `wiki-harvest.py --no-compile` and the harvester will save raw content to `raw/harvested/` without trying to compile it. You can then manually (or via a different script) turn the raw content into wiki pages. ### 4. `wiki-hygiene.py --full` — LLM-powered checks **What it does**: Duplicate detection, contradiction detection, missing cross-reference suggestions. **What's Claude-specific**: `claude -p --model haiku|sonnet`. **Replacements**: - **Same as #3**: Replace the `call_claude()` helper in `wiki-hygiene.py`. - **Skip full mode entirely**: Only run `wiki-hygiene.py --quick` (the default). Quick mode has no LLM calls and catches 90% of structural issues. Contradictions and duplicates just have to be caught by human review during `wiki-staging.py --review` sessions. ### 5. `CLAUDE.md` at the wiki root **What it does**: The instructions Claude Code reads at the start of every session that explain the wiki schema and maintenance operations. **What's Claude-specific**: The filename. Claude Code specifically looks for `CLAUDE.md`; other agents look for other files. **Replacements**: | Agent | Equivalent file | |-------|-----------------| | Claude Code | `CLAUDE.md` | | Cursor | `.cursorrules` or `.cursor/rules/` | | Aider | `CONVENTIONS.md` (read via `--read CONVENTIONS.md`) | | Gemini CLI | `GEMINI.md` | | Continue.dev | `config.json` prompts or `.continue/rules/` | The content is the same — just rename the file and point your agent at it. --- ## Running without cron Cron is convenient but not required. Alternatives: ### Manual runs Just call the scripts when you want the wiki updated: ```bash cd ~/projects/wiki # When you want to ingest new Claude Code sessions bash scripts/mine-conversations.sh # When you want hygiene + harvest bash scripts/wiki-maintain.sh # When you want the expensive LLM pass bash scripts/wiki-maintain.sh --hygiene-only --full ``` This is arguably *better* than cron if you work in bursts — run maintenance when you start a session, not on a schedule. ### systemd timers (Linux) More observable than cron, better journaling: ```ini # ~/.config/systemd/user/wiki-maintain.service [Unit] Description=Wiki maintenance pipeline [Service] Type=oneshot WorkingDirectory=%h/projects/wiki ExecStart=/usr/bin/bash %h/projects/wiki/scripts/wiki-maintain.sh ``` ```ini # ~/.config/systemd/user/wiki-maintain.timer [Unit] Description=Run wiki-maintain daily [Timer] OnCalendar=daily Persistent=true [Install] WantedBy=timers.target ``` ```bash systemctl --user enable --now wiki-maintain.timer journalctl --user -u wiki-maintain.service # see logs ``` ### launchd (macOS) More native than cron on macOS: ```xml Labelcom.user.wiki-maintain ProgramArguments /bin/bash /Users/YOUR_USER/projects/wiki/scripts/wiki-maintain.sh StartCalendarInterval Hour3 Minute0 StandardOutPath/tmp/wiki-maintain.log StandardErrorPath/tmp/wiki-maintain.err ``` ```bash launchctl load ~/Library/LaunchAgents/com.user.wiki-maintain.plist launchctl list | grep wiki # verify ``` ### Git hooks (pre-push) Run hygiene before every push so the wiki is always clean when it hits the remote: ```bash cat > ~/projects/wiki/.git/hooks/pre-push <<'HOOK' #!/usr/bin/env bash set -euo pipefail bash ~/projects/wiki/scripts/wiki-maintain.sh --hygiene-only --no-reindex HOOK chmod +x ~/projects/wiki/.git/hooks/pre-push ``` Downside: every push is slow. Upside: you never push a broken wiki. ### CI pipeline Run `wiki-hygiene.py --check-only` in a CI workflow on every PR: ```yaml # .github/workflows/wiki-check.yml (or .gitea/workflows/...) name: Wiki hygiene check on: [push, pull_request] jobs: hygiene: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 - run: python3 scripts/wiki-hygiene.py --check-only ``` `--check-only` reports issues without auto-fixing them, so CI can flag problems without modifying files. --- ## Minimal subsets You don't have to run the whole pipeline. Pick what's useful: ### "Just the wiki" (no automation) - Delete `scripts/wiki-*` and `scripts/*-conversations*` - Delete `tests/` - Keep the directory structure (`patterns/`, `decisions/`, etc.) - Keep `index.md` and `CLAUDE.md` - Write and maintain the wiki manually with your agent This is the Karpathy-gist version. Works great for small wikis. ### "Wiki + mining" (no harvesting, no hygiene) - Keep the mining layer (`extract-sessions.py`, `summarize-conversations.py`, `update-conversation-index.py`) - Delete the automation layer (`wiki-harvest.py`, `wiki-hygiene.py`, `wiki-staging.py`, `wiki-maintain.sh`) - The wiki grows from session mining but you maintain it manually Useful if you want session continuity (the wake-up briefing) without the full automation. ### "Wiki + hygiene" (no mining, no harvesting) - Keep `wiki-hygiene.py` and `wiki_lib.py` - Delete everything else - Run `wiki-hygiene.py --quick` periodically to catch structural issues Useful if you write the wiki manually but want automated checks for orphans, broken links, and staleness. ### "Wiki + harvesting" (no session mining) - Keep `wiki-harvest.py`, `wiki-staging.py`, `wiki_lib.py` - Delete mining scripts - Source URLs manually — put them in a file and point the harvester at it. You'd need to write a wrapper that extracts URLs from your source file and feeds them into the fetch cascade. Useful if URLs come from somewhere other than Claude Code sessions (e.g. browser bookmarks, Pocket export, RSS). --- ## Schema customization The repo uses these live content types: - `patterns/` — HOW things should be built - `decisions/` — WHY we chose this approach - `concepts/` — WHAT the foundational ideas are - `environments/` — WHERE implementations differ These reflect my engineering-focused use case. Your wiki might need different categories. To change them: 1. Rename / add directories under the wiki root 2. Edit `LIVE_CONTENT_DIRS` in `scripts/wiki_lib.py` 3. Update the `type:` frontmatter validation in `scripts/wiki-hygiene.py` (`VALID_TYPES` constant) 4. Update `CLAUDE.md` to describe the new categories 5. Update `index.md` section headers to match Examples of alternative schemas: **Research wiki**: - `findings/` — experimental results - `hypotheses/` — what you're testing - `methods/` — how you test - `literature/` — external sources **Product wiki**: - `features/` — what the product does - `decisions/` — why we chose this - `users/` — personas, interviews, feedback - `metrics/` — what we measure **Personal knowledge wiki**: - `topics/` — general subject matter - `projects/` — specific ongoing work - `journal/` — dated entries - `references/` — external links/papers None of these are better or worse — pick what matches how you think. --- ## Frontmatter customization The required fields are documented in `CLAUDE.md` (frontmatter spec). You can add your own fields freely — the parser and hygiene checks ignore unknown keys. Useful additions you might want: ```yaml author: alice # who wrote or introduced the page tags: [auth, security] # flat tag list urgency: high # for to-do-style wiki pages stakeholders: # who cares about this page - product-team - security-team review_by: 2026-06-01 # explicit review date instead of age-based decay ``` If you want age-based decay to key off a different field than `last_verified` (say, `review_by`), edit `expected_confidence()` in `scripts/wiki-hygiene.py` to read from your custom field. --- ## Working across multiple wikis The scripts all honor the `WIKI_DIR` environment variable. Run multiple wikis against the same scripts: ```bash # Work wiki WIKI_DIR=~/projects/work-wiki bash scripts/wiki-maintain.sh # Personal wiki WIKI_DIR=~/projects/personal-wiki bash scripts/wiki-maintain.sh # Research wiki WIKI_DIR=~/projects/research-wiki bash scripts/wiki-maintain.sh ``` Each has its own state files, its own cron entries, its own qmd collection. You can symlink or copy `scripts/` into each wiki, or run all three against a single checked-out copy of the scripts. --- ## What I'd change if starting over Honest notes on the design choices, in case you're about to fork: 1. **Config should be in YAML, not inline constants.** I bolted a "CONFIGURE ME" comment onto `PROJECT_MAP` and `SKIP_DOMAIN_PATTERNS` as a shortcut. Better: a `config.yaml` at the wiki root that all scripts read. 2. **The mining layer is tightly coupled to Claude Code.** A cleaner design would put a `Session` interface in `wiki_lib.py` and have extractors for each agent produce `Session` objects — the rest of the pipeline would be agent-agnostic. 3. **The hygiene script is a monolith.** 1100+ lines is a lot. Splitting it into `wiki_hygiene/checks.py`, `wiki_hygiene/archive.py`, `wiki_hygiene/llm.py`, etc., would be cleaner. It started as a single file and grew. 4. **The hyphenated filenames (`wiki-harvest.py`) make Python imports awkward.** Standard Python convention is underscores. I used hyphens for consistency with the shell scripts, and `conftest.py` has a module-loader workaround. A cleaner fork would use underscores everywhere. 5. **The wiki schema assumes you know what you want to catalog.** If you don't, start with a free-form `notes/` directory and let categories emerge organically, then refactor into `patterns/` etc. later. None of these are blockers. They're all "if I were designing v2" observations.