memex

6 Commits

Include renames

Author	SHA1	Message	Date
Eric Turner	997aa837de	feat(distill): close the MemPalace loop — conversations → wiki pages Add wiki-distill.py as Phase 1a of the maintenance pipeline. This is the 8th extension memex adds to Karpathy's pattern and the one that makes the MemPalace integration a real ingest pipeline instead of just a searchable archive beside the wiki. ## The gap distill closes The mining layer was extracting Claude Code sessions, classifying bullets into halls (fact/discovery/preference/advice/event/tooling), and tagging topics. The URL harvester scanned conversations for cited links. Hygiene refreshed last_verified on wiki pages referenced in related: fields. But none of those steps compiled the knowledge inside the conversations themselves into wiki pages. Decisions, root causes, and patterns stayed in the summaries forever — findable via qmd but never synthesized into canonical pages. ## What distill does Narrow today-filter with historical rollup: 1. Find all summarized conversations dated TODAY 2. Extract their topics: — this is the "topics of today" set 3. For each topic in that set, pull ALL summarized conversations across history that share that topic (full historical context) 4. Extract hall_facts + hall_discoveries + hall_advice bullets (the high-signal hall types — skips event/preference/tooling) 5. Send topic group + wiki index.md to claude -p 6. Model emits JSON actions[]: new_page / update_page / skip 7. Write each action to staging/<type>/ with distill provenance frontmatter (staged_by: wiki-distill, distill_topic, distill_source_conversations, compilation_notes) First-run bootstrap: uses 7-day lookback instead of today-only so the state file gets seeded reasonably. After that, daily runs stay narrow. Self-triggering: dormant topics that resurface in a new conversation automatically pull in all historical conversations on that topic via the rollup. Old knowledge gets distilled when it becomes relevant again without manual intervention. ## Orchestration — distill BEFORE harvest wiki-maintain.sh now has Phase 1a (distill) + Phase 1b (harvest): 1a. wiki-distill.py — conversations → staging (PRIORITY) 1b. wiki-harvest.py — URLs → raw/harvested → staging (supplement) 2. wiki-hygiene.py — decay, archive, repair, checks 3. qmd reindex Conversation content drives the page shape; URL harvesting fills gaps for external references conversations don't cover. New flags: --distill-only, --no-distill, --distill-first-run. ## Verified on real wiki Tested end-to-end on the production wiki with 611 summarized conversations across 14 wings. First-run dry-run found 116 topic groups worth distilling (+ 3 too-thin). Tested single-topic compile with --topic zoho-api: the LLM rolled up 2 conversations (34 bullets), synthesized a proper pattern page with "What / Why / Known Limitations" structure, linked it to existing wiki pages, and landed it in staging with full distill provenance. LLM correctly rejected claude-code-statusline (already well-covered by an existing live page) — so the "skip" path works. ## Code additions - scripts/wiki-distill.py (new, ~530 lines) - scripts/wiki_lib.py: HIGH_SIGNAL_HALLS + parse_conversation_halls + high_signal_halls + _flatten_bullet helpers - scripts/wiki-maintain.sh: Phase 1a distill, new flags - tests/test_wiki_distill.py (21 new tests — hall parsing, rollup, state management, CLI smoke tests) - tests/test_shell_scripts.py: updated phase-name assertion for the Phase 1a/1b split ## Docs additions - README.md: 8th row in extensions table, updated compounding-loop diagram, new wiki-distill.py reference in architecture overview - docs/DESIGN-RATIONALE.md: new section 8 "Closing the MemPalace loop" with full mempalace taxonomy mapping - docs/ARCHITECTURE.md: wiki-distill.py section, updated phase order, updated state file table, updated dep graph - docs/SETUP.md: updated cron comment, first-run distill guidance, verify section test count - .gitignore: note distill-state.json is committed (sync across machines), not gitignored - docs/artifacts/signal-and-noise.html: new "Distill ⬣" top-level tab with flow diagram, hall filter table, narrow-today/wide- history explanation, staging provenance example ## Tests 192 tests total (+21 new, +1 regression fix), all green in ~1.5s.	2026-04-12 22:34:33 -06:00
Eric Turner	4c6b7609a1	docs: reframe as extensions + replace Signal & Noise artifact Two changes, one commit: 1. Reframe "weaknesses" as "extensions memex adds": Karpathy's gist is a concept pitch, not an implementation. Reframe the seven places memex extends the pattern as engineering-layer additions rather than problems to fix. Cleaner narrative — memex builds on Karpathy's work instead of critiquing it. Touches README.md (Why each part exists + Credits) and DESIGN-RATIONALE.md (section titles, trade-off framing, biggest layer section, scope note at the end). 2. Replace docs/artifacts/signal-and-noise.html with the full upstream version: The earlier abbreviated copy dropped the MemPalace integration tab, the detailed mitigation steps with effort pips, the impact before/after cards, and the qmd vs ChromaDB comparison. This restores all of that. Also swaps self-references from "LLM Wiki" to "memex" while leaving external "LLM Wiki v2" community citations alone (those refer to a separate pattern and aren't ours to rename). The live hosted copy at eric-turner.com/memex/signal-and-noise.html has already been updated via scp — Hugo picks up static changes with --poll 1s so the public URL reflects this file immediately.	2026-04-12 22:01:31 -06:00
Eric Turner	2a37e33fd6	docs: point Signal & Noise links at self-hosted version Replace all four references to the Claude public artifact URL with the self-hosted version at eric-turner.com/memex/signal-and-noise.html plus the offline-capable archive at docs/artifacts/signal-and-noise.html. The Claude artifact can now be unpublished without breaking any links in the repo. The self-hosted HTML is deployed to the Hugo site's static directory and lives alongside the archived copy in this repo — either can stand on its own.	2026-04-12 21:42:11 -06:00
Eric Turner	55773bf668	docs: add Signal & Noise interactive artifact Archive a self-contained HTML copy of the design rationale artifact — the interactive Signal & Noise analysis of Karpathy's pattern that produced memex. Fully self-contained (inline CSS + JS, only external dependency is Google Fonts), works offline, renders identically in any modern browser. Updated the README Credits section to link: 1. Live interactive version at eric-turner.com/memex/signal-and-noise.html 2. Original Claude artifact 3. Archived copy in this repo 4. Condensed written version in DESIGN-RATIONALE.md The archived HTML means the analysis survives even if the live site or the Claude artifact URL ever goes away.	2026-04-12 21:40:33 -06:00
Eric Turner	d8fabc5a50	docs: rename self-references from "LLM Wiki" to "memex" Replace project self-references throughout README, SETUP, and the example CLAUDE.md files. External artifact titles are preserved as-is since they refer to the actual title of the Claude design artifact. Also add a "Why 'memex'?" aside to the README that roots the project in Vannevar Bush's 1945 "As We May Think" essay, where the term originates. The compounding knowledge wiki is the LLM-era realization of Bush's memex concept: the "associative trails" he imagined are the related: frontmatter fields and wikilinks the agent maintains. Kept lowercase where referring to the generic pattern (e.g. "an LLM wiki persists its mistakes") since that refers to the class of system, not this specific project.	2026-04-12 21:32:17 -06:00
Eric Turner	ee54a2f5d4	Initial commit — memex A compounding LLM-maintained knowledge wiki. Synthesis of Andrej Karpathy's persistent-wiki gist and milla-jovovich's mempalace, with an automation layer on top for conversation mining, URL harvesting, human-in-the-loop staging, staleness decay, and hygiene. Includes: - 11 pipeline scripts (extract, summarize, index, harvest, stage, hygiene, maintain, sync, + shared library) - Full docs: README, SETUP, ARCHITECTURE, DESIGN-RATIONALE, CUSTOMIZE - Example CLAUDE.md files (wiki schema + global instructions) tuned for the three-collection qmd setup - 171-test pytest suite (cross-platform, runs in ~1.3s) - MIT licensed	2026-04-12 21:16:02 -06:00