feat(distill): close the MemPalace loop — conversations → wiki pages
Add wiki-distill.py as Phase 1a of the maintenance pipeline. This is
the 8th extension memex adds to Karpathy's pattern and the one that
makes the MemPalace integration a real ingest pipeline instead of
just a searchable archive beside the wiki.
## The gap distill closes
The mining layer was extracting Claude Code sessions, classifying
bullets into halls (fact/discovery/preference/advice/event/tooling),
and tagging topics. The URL harvester scanned conversations for cited
links. Hygiene refreshed last_verified on wiki pages referenced in
related: fields. But none of those steps compiled the knowledge
*inside* the conversations themselves into wiki pages. Decisions,
root causes, and patterns stayed in the summaries forever — findable
via qmd but never synthesized into canonical pages.
## What distill does
Narrow today-filter with historical rollup:
1. Find all summarized conversations dated TODAY
2. Extract their topics: — this is the "topics of today" set
3. For each topic in that set, pull ALL summarized conversations
across history that share that topic (full historical context)
4. Extract hall_facts + hall_discoveries + hall_advice bullets
(the high-signal hall types — skips event/preference/tooling)
5. Send topic group + wiki index.md to claude -p
6. Model emits JSON actions[]: new_page / update_page / skip
7. Write each action to staging/<type>/ with distill provenance
frontmatter (staged_by: wiki-distill, distill_topic,
distill_source_conversations, compilation_notes)
First-run bootstrap: uses 7-day lookback instead of today-only so
the state file gets seeded reasonably. After that, daily runs stay
narrow.
Self-triggering: dormant topics that resurface in a new conversation
automatically pull in all historical conversations on that topic via
the rollup. Old knowledge gets distilled when it becomes relevant
again without manual intervention.
## Orchestration — distill BEFORE harvest
wiki-maintain.sh now has Phase 1a (distill) + Phase 1b (harvest):
1a. wiki-distill.py — conversations → staging (PRIORITY)
1b. wiki-harvest.py — URLs → raw/harvested → staging (supplement)
2. wiki-hygiene.py — decay, archive, repair, checks
3. qmd reindex
Conversation content drives the page shape; URL harvesting fills
gaps for external references conversations don't cover. New flags:
--distill-only, --no-distill, --distill-first-run.
## Verified on real wiki
Tested end-to-end on the production wiki with 611 summarized
conversations across 14 wings. First-run dry-run found 116 topic
groups worth distilling (+ 3 too-thin). Tested single-topic compile
with --topic zoho-api: the LLM rolled up 2 conversations (34
bullets), synthesized a proper pattern page with "What / Why /
Known Limitations" structure, linked it to existing wiki pages,
and landed it in staging with full distill provenance. LLM
correctly rejected claude-code-statusline (already well-covered
by an existing live page) — so the "skip" path works.
## Code additions
- scripts/wiki-distill.py (new, ~530 lines)
- scripts/wiki_lib.py: HIGH_SIGNAL_HALLS + parse_conversation_halls
+ high_signal_halls + _flatten_bullet helpers
- scripts/wiki-maintain.sh: Phase 1a distill, new flags
- tests/test_wiki_distill.py (21 new tests — hall parsing, rollup,
state management, CLI smoke tests)
- tests/test_shell_scripts.py: updated phase-name assertion for
the Phase 1a/1b split
## Docs additions
- README.md: 8th row in extensions table, updated compounding-loop
diagram, new wiki-distill.py reference in architecture overview
- docs/DESIGN-RATIONALE.md: new section 8 "Closing the MemPalace
loop" with full mempalace taxonomy mapping
- docs/ARCHITECTURE.md: wiki-distill.py section, updated phase
order, updated state file table, updated dep graph
- docs/SETUP.md: updated cron comment, first-run distill guidance,
verify section test count
- .gitignore: note distill-state.json is committed (sync across
machines), not gitignored
- docs/artifacts/signal-and-noise.html: new "Distill ⬣" top-level
tab with flow diagram, hall filter table, narrow-today/wide-
history explanation, staging provenance example
## Tests
192 tests total (+21 new, +1 regression fix), all green in ~1.5s.
This commit is contained in:
@@ -281,7 +281,13 @@ python3 scripts/summarize-conversations.py --claude
|
||||
# 3. Regenerate conversation index + wake-up context
|
||||
python3 scripts/update-conversation-index.py --reindex
|
||||
|
||||
# 4. Dry-run the maintenance pipeline
|
||||
# 4. First-run distill bootstrap (7-day lookback, burns claude -p calls)
|
||||
# Only do this if you have summarized conversations from recent work.
|
||||
# Skip it if you're starting with a fresh wiki.
|
||||
python3 scripts/wiki-distill.py --first-run --dry-run # plan
|
||||
python3 scripts/wiki-distill.py --first-run # actually do it
|
||||
|
||||
# 5. Dry-run the maintenance pipeline
|
||||
bash scripts/wiki-maintain.sh --dry-run --no-compile
|
||||
```
|
||||
|
||||
@@ -322,7 +328,7 @@ PATH=/home/YOUR_USER/.nvm/versions/node/v22/bin:/home/YOUR_USER/.local/bin:/usr/
|
||||
0 2 * * * cd /home/YOUR_USER/projects/wiki && python3 scripts/summarize-conversations.py --claude >> /tmp/wiki-mine.log 2>&1 && python3 scripts/update-conversation-index.py --reindex >> /tmp/wiki-mine.log 2>&1
|
||||
|
||||
# ─── Maintenance ───────────────────────────────────────────────────────────
|
||||
# Daily at 3am: harvest + quick hygiene + qmd reindex
|
||||
# Daily at 3am: distill conversations + harvest URLs + quick hygiene + qmd reindex
|
||||
0 3 * * * cd /home/YOUR_USER/projects/wiki && bash scripts/wiki-maintain.sh >> scripts/.maintain.log 2>&1
|
||||
|
||||
# Weekly Sunday at 4am: full hygiene with LLM checks
|
||||
@@ -424,8 +430,8 @@ cd tests && python3 -m pytest
|
||||
|
||||
Expected:
|
||||
- `qmd collection list` shows all three collections: `wiki`, `wiki-archive [excluded]`, `wiki-conversations [excluded]`
|
||||
- `wiki-maintain.sh --dry-run` completes all three phases
|
||||
- `pytest` passes all 171 tests in ~1.3 seconds
|
||||
- `wiki-maintain.sh --dry-run` completes all four phases (distill, harvest, hygiene, reindex)
|
||||
- `pytest` passes all 192 tests in ~1.5 seconds
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user