Files
memex/tests
Eric Turner 997aa837de feat(distill): close the MemPalace loop — conversations → wiki pages
Add wiki-distill.py as Phase 1a of the maintenance pipeline. This is
the 8th extension memex adds to Karpathy's pattern and the one that
makes the MemPalace integration a real ingest pipeline instead of
just a searchable archive beside the wiki.

## The gap distill closes

The mining layer was extracting Claude Code sessions, classifying
bullets into halls (fact/discovery/preference/advice/event/tooling),
and tagging topics. The URL harvester scanned conversations for cited
links. Hygiene refreshed last_verified on wiki pages referenced in
related: fields. But none of those steps compiled the knowledge
*inside* the conversations themselves into wiki pages. Decisions,
root causes, and patterns stayed in the summaries forever — findable
via qmd but never synthesized into canonical pages.

## What distill does

Narrow today-filter with historical rollup:

  1. Find all summarized conversations dated TODAY
  2. Extract their topics: — this is the "topics of today" set
  3. For each topic in that set, pull ALL summarized conversations
     across history that share that topic (full historical context)
  4. Extract hall_facts + hall_discoveries + hall_advice bullets
     (the high-signal hall types — skips event/preference/tooling)
  5. Send topic group + wiki index.md to claude -p
  6. Model emits JSON actions[]: new_page / update_page / skip
  7. Write each action to staging/<type>/ with distill provenance
     frontmatter (staged_by: wiki-distill, distill_topic,
     distill_source_conversations, compilation_notes)

First-run bootstrap: uses 7-day lookback instead of today-only so
the state file gets seeded reasonably. After that, daily runs stay
narrow.

Self-triggering: dormant topics that resurface in a new conversation
automatically pull in all historical conversations on that topic via
the rollup. Old knowledge gets distilled when it becomes relevant
again without manual intervention.

## Orchestration — distill BEFORE harvest

wiki-maintain.sh now has Phase 1a (distill) + Phase 1b (harvest):

  1a. wiki-distill.py    — conversations → staging (PRIORITY)
  1b. wiki-harvest.py    — URLs → raw/harvested → staging (supplement)
  2.  wiki-hygiene.py    — decay, archive, repair, checks
  3.  qmd reindex

Conversation content drives the page shape; URL harvesting fills
gaps for external references conversations don't cover. New flags:
--distill-only, --no-distill, --distill-first-run.

## Verified on real wiki

Tested end-to-end on the production wiki with 611 summarized
conversations across 14 wings. First-run dry-run found 116 topic
groups worth distilling (+ 3 too-thin). Tested single-topic compile
with --topic zoho-api: the LLM rolled up 2 conversations (34
bullets), synthesized a proper pattern page with "What / Why /
Known Limitations" structure, linked it to existing wiki pages,
and landed it in staging with full distill provenance. LLM
correctly rejected claude-code-statusline (already well-covered
by an existing live page) — so the "skip" path works.

## Code additions

- scripts/wiki-distill.py (new, ~530 lines)
- scripts/wiki_lib.py: HIGH_SIGNAL_HALLS + parse_conversation_halls
  + high_signal_halls + _flatten_bullet helpers
- scripts/wiki-maintain.sh: Phase 1a distill, new flags
- tests/test_wiki_distill.py (21 new tests — hall parsing, rollup,
  state management, CLI smoke tests)
- tests/test_shell_scripts.py: updated phase-name assertion for
  the Phase 1a/1b split

## Docs additions

- README.md: 8th row in extensions table, updated compounding-loop
  diagram, new wiki-distill.py reference in architecture overview
- docs/DESIGN-RATIONALE.md: new section 8 "Closing the MemPalace
  loop" with full mempalace taxonomy mapping
- docs/ARCHITECTURE.md: wiki-distill.py section, updated phase
  order, updated state file table, updated dep graph
- docs/SETUP.md: updated cron comment, first-run distill guidance,
  verify section test count
- .gitignore: note distill-state.json is committed (sync across
  machines), not gitignored
- docs/artifacts/signal-and-noise.html: new "Distill ⬣" top-level
  tab with flow diagram, hall filter table, narrow-today/wide-
  history explanation, staging provenance example

## Tests

192 tests total (+21 new, +1 regression fix), all green in ~1.5s.
2026-04-12 22:34:33 -06:00
..
2026-04-12 21:16:02 -06:00
2026-04-12 21:16:02 -06:00
2026-04-12 21:16:02 -06:00
2026-04-12 21:16:02 -06:00
2026-04-12 21:16:02 -06:00
2026-04-12 21:16:02 -06:00
2026-04-12 21:16:02 -06:00
2026-04-12 21:16:02 -06:00

Wiki Pipeline Test Suite

Pytest-based test suite covering all 11 scripts in scripts/. Runs on both macOS and Linux/WSL, uses only the Python standard library + pytest.

Running

# Full suite (from wiki root)
bash tests/run.sh

# Single test file
bash tests/run.sh test_wiki_lib.py

# Single test class or function
bash tests/run.sh test_wiki_hygiene.py::TestArchiveRestore
bash tests/run.sh test_wiki_hygiene.py::TestArchiveRestore::test_restore_reverses_archive

# Pattern matching
bash tests/run.sh -k "archive"

# Verbose
bash tests/run.sh -v

# Stop on first failure
bash tests/run.sh -x

# Or invoke pytest directly from the tests dir
cd tests && python3 -m pytest -v

What's tested

File Coverage
test_wiki_lib.py YAML parser, frontmatter round-trip, page iterators, date parsing, content hashing, WIKI_DIR env override
test_wiki_hygiene.py Backfill, confidence decay math, frontmatter repair, archive/restore round-trip, orphan detection, broken-xref fuzzy matching, index drift, empty stubs, conversation refresh signals, auto-restore, staging/archive sync, state drift, hygiene state file, full quick-run idempotency
test_wiki_staging.py List, promote, reject, promote-with-modifies, dry-run, staging index regeneration, path resolution
test_wiki_harvest.py URL classification (harvest/check/skip), private IP detection, URL extraction + filtering, filename derivation, content validation, state management, raw file writing, dry-run CLI smoke test
test_conversation_pipeline.py CLI smoke tests for extract-sessions, summarize-conversations, update-conversation-index; dry-run behavior; help flags; integration test with fake conversation files
test_shell_scripts.py wiki-maintain.sh / mine-conversations.sh / wiki-sync.sh: help, dry-run, mutex flags, bash syntax check, strict-mode check, shebang check, py_compile for all .py scripts

How it works

Isolation: Every test runs against a disposable tmp_wiki fixture (pytest tmp_path). The fixture sets the WIKI_DIR environment variable so all scripts resolve paths against the tmp directory instead of the real wiki. No test ever touches ~/projects/wiki.

Hyphenated filenames: Scripts like wiki-harvest.py use hyphens, which Python's import can't handle directly. conftest.py has a _load_script_module helper that loads a script file by path and exposes it as a module object.

Clean module state: Each test that loads a module clears any cached import first, so WIKI_DIR env overrides take effect correctly between tests.

Subprocess tests (for CLI smoke tests): conftest.py provides a run_script fixture that invokes a script via python3 or bash with WIKI_DIR set to the tmp wiki. Uses subprocess.run with capture_output and a timeout.

Cross-platform

  • #!/usr/bin/env bash shebangs (tested explicitly)
  • set -euo pipefail in all shell scripts (tested explicitly)
  • bash -n syntax check on all shell scripts
  • py_compile on all Python scripts
  • Uses pathlib everywhere — no hardcoded path separators
  • Uses the Python stdlib only (except pytest itself)

Requirements

  • Python 3.11+
  • pytest — install with pip install --user pytest or your distro's package manager
  • bash (any version — scripts use only portable features)

The tests do NOT require:

  • claude CLI (mocked / skipped)
  • trafilatura or crawl4ai (only dry-run / classification paths tested)
  • qmd (reindex phase is skipped in tests)
  • Network access
  • The real ~/projects/wiki or ~/.claude/projects directories

Speed

Full suite runs in ~1 second on a modern laptop. All tests are isolated and independent so they can run in any order and in parallel.

What's NOT tested

  • Real LLM calls (claude -p): too expensive, non-deterministic. Tested: CLI parsing, dry-run paths, mocked error handling.
  • Real web fetches (trafilatura/crawl4ai): too slow, non-deterministic. Tested: URL classification, filter logic, fetch-result validation.
  • Real git operations (wiki-sync.sh): requires a git repo fixture. Tested: script loads, handles non-git dir gracefully, --status exits clean.
  • Real qmd indexing: tested elsewhere via qmd collection list in the setup verification step.
  • Real Claude Code session JSONL parsing with actual sessions: would require fixture JSONL files. Tested: CLI parsing, empty-dir behavior, CLAUDE_PROJECTS_DIR env override.

These are smoke-tested end-to-end via the integration tests in test_conversation_pipeline.py and the dry-run paths in test_shell_scripts.py::TestWikiMaintainSh.