feat(distill): close the MemPalace loop — conversations → wiki pages
Add wiki-distill.py as Phase 1a of the maintenance pipeline. This is
the 8th extension memex adds to Karpathy's pattern and the one that
makes the MemPalace integration a real ingest pipeline instead of
just a searchable archive beside the wiki.
## The gap distill closes
The mining layer was extracting Claude Code sessions, classifying
bullets into halls (fact/discovery/preference/advice/event/tooling),
and tagging topics. The URL harvester scanned conversations for cited
links. Hygiene refreshed last_verified on wiki pages referenced in
related: fields. But none of those steps compiled the knowledge
*inside* the conversations themselves into wiki pages. Decisions,
root causes, and patterns stayed in the summaries forever — findable
via qmd but never synthesized into canonical pages.
## What distill does
Narrow today-filter with historical rollup:
1. Find all summarized conversations dated TODAY
2. Extract their topics: — this is the "topics of today" set
3. For each topic in that set, pull ALL summarized conversations
across history that share that topic (full historical context)
4. Extract hall_facts + hall_discoveries + hall_advice bullets
(the high-signal hall types — skips event/preference/tooling)
5. Send topic group + wiki index.md to claude -p
6. Model emits JSON actions[]: new_page / update_page / skip
7. Write each action to staging/<type>/ with distill provenance
frontmatter (staged_by: wiki-distill, distill_topic,
distill_source_conversations, compilation_notes)
First-run bootstrap: uses 7-day lookback instead of today-only so
the state file gets seeded reasonably. After that, daily runs stay
narrow.
Self-triggering: dormant topics that resurface in a new conversation
automatically pull in all historical conversations on that topic via
the rollup. Old knowledge gets distilled when it becomes relevant
again without manual intervention.
## Orchestration — distill BEFORE harvest
wiki-maintain.sh now has Phase 1a (distill) + Phase 1b (harvest):
1a. wiki-distill.py — conversations → staging (PRIORITY)
1b. wiki-harvest.py — URLs → raw/harvested → staging (supplement)
2. wiki-hygiene.py — decay, archive, repair, checks
3. qmd reindex
Conversation content drives the page shape; URL harvesting fills
gaps for external references conversations don't cover. New flags:
--distill-only, --no-distill, --distill-first-run.
## Verified on real wiki
Tested end-to-end on the production wiki with 611 summarized
conversations across 14 wings. First-run dry-run found 116 topic
groups worth distilling (+ 3 too-thin). Tested single-topic compile
with --topic zoho-api: the LLM rolled up 2 conversations (34
bullets), synthesized a proper pattern page with "What / Why /
Known Limitations" structure, linked it to existing wiki pages,
and landed it in staging with full distill provenance. LLM
correctly rejected claude-code-statusline (already well-covered
by an existing live page) — so the "skip" path works.
## Code additions
- scripts/wiki-distill.py (new, ~530 lines)
- scripts/wiki_lib.py: HIGH_SIGNAL_HALLS + parse_conversation_halls
+ high_signal_halls + _flatten_bullet helpers
- scripts/wiki-maintain.sh: Phase 1a distill, new flags
- tests/test_wiki_distill.py (21 new tests — hall parsing, rollup,
state management, CLI smoke tests)
- tests/test_shell_scripts.py: updated phase-name assertion for
the Phase 1a/1b split
## Docs additions
- README.md: 8th row in extensions table, updated compounding-loop
diagram, new wiki-distill.py reference in architecture overview
- docs/DESIGN-RATIONALE.md: new section 8 "Closing the MemPalace
loop" with full mempalace taxonomy mapping
- docs/ARCHITECTURE.md: wiki-distill.py section, updated phase
order, updated state file table, updated dep graph
- docs/SETUP.md: updated cron comment, first-run distill guidance,
verify section test count
- .gitignore: note distill-state.json is committed (sync across
machines), not gitignored
- docs/artifacts/signal-and-noise.html: new "Distill ⬣" top-level
tab with flow diagram, hall filter table, narrow-today/wide-
history explanation, staging provenance example
## Tests
192 tests total (+21 new, +1 regression fix), all green in ~1.5s.
This commit is contained in:
@@ -77,6 +77,7 @@ Automation + lifecycle management on top of both:
|
||||
┌─────────────────────────────────┐
|
||||
│ AUTOMATION LAYER │
|
||||
│ wiki_lib.py (shared helpers) │
|
||||
│ wiki-distill.py │ (conversations → staging) ← closes MemPalace loop
|
||||
│ wiki-harvest.py │ (URL → raw → staging)
|
||||
│ wiki-staging.py │ (human review)
|
||||
│ wiki-hygiene.py │ (decay, archive, repair, checks)
|
||||
@@ -169,10 +170,63 @@ Provides:
|
||||
All paths honor the `WIKI_DIR` environment variable, so tests and
|
||||
alternate installs can override the root.
|
||||
|
||||
### `wiki-distill.py`
|
||||
|
||||
**Closes the MemPalace loop.** Reads the *content* of summarized
|
||||
conversations — not the URLs they cite — and compiles wiki pages from
|
||||
the high-signal hall entries (`hall_facts`, `hall_discoveries`,
|
||||
`hall_advice`). Runs as Phase 1a in `wiki-maintain.sh`, before URL
|
||||
harvesting.
|
||||
|
||||
**Scope filter (deliberately narrow)**:
|
||||
1. Find all summarized conversations dated TODAY
|
||||
2. Extract their `topics:` — this is the "topics-of-today" set
|
||||
3. For each topic in that set, pull ALL summarized conversations across
|
||||
history that share that topic (full historical context via rollup)
|
||||
4. Extract `hall_facts` + `hall_discoveries` + `hall_advice` bullet
|
||||
content from each conversation's body
|
||||
5. Send the topic group (topic + matching conversations + halls) to
|
||||
`claude -p` with the current `index.md`
|
||||
6. Model emits a JSON `actions` array with `new_page` / `update_page` /
|
||||
`skip` verdicts; the script writes each to `staging/<type>/`
|
||||
|
||||
**First-run bootstrap**: the very first run uses a 7-day lookback
|
||||
instead of today-only, so the state file gets seeded with a reasonable
|
||||
starting set. After that, daily runs stay narrow.
|
||||
|
||||
**Self-triggering**: dormant topics that resurface in a new
|
||||
conversation automatically pull in all historical conversations on
|
||||
that topic via the rollup. No manual intervention needed to
|
||||
reprocess old knowledge when it becomes relevant again.
|
||||
|
||||
**Model routing**: haiku for short topic groups (< 15K chars prompt,
|
||||
< 20 bullets), sonnet for longer ones.
|
||||
|
||||
**State** lives in `.distill-state.json` — tracks processed
|
||||
conversations by content hash and topics-at-distill-time. A
|
||||
conversation is re-processed if its body changes OR if it gains a new
|
||||
topic not seen at previous distill.
|
||||
|
||||
**Staging output** includes distill-specific frontmatter:
|
||||
- `staged_by: wiki-distill`
|
||||
- `distill_topic: <topic>`
|
||||
- `distill_source_conversations: <comma-separated conversation paths>`
|
||||
|
||||
Commands:
|
||||
- `wiki-distill.py` — today-only rollup (default mode after first run)
|
||||
- `wiki-distill.py --first-run` — 7-day lookback bootstrap
|
||||
- `wiki-distill.py --topic TOPIC` — explicit single-topic processing
|
||||
- `wiki-distill.py --project WING` — only today-topics from this wing
|
||||
- `wiki-distill.py --dry-run` — plan only, no LLM calls, no writes
|
||||
- `wiki-distill.py --no-compile` — rollup only, skip claude -p step
|
||||
- `wiki-distill.py --limit N` — stop after N topic groups
|
||||
|
||||
### `wiki-harvest.py`
|
||||
|
||||
Scans summarized conversations for HTTP(S) URLs, classifies them,
|
||||
fetches content, and compiles pending wiki pages.
|
||||
fetches content, and compiles pending wiki pages. Runs as Phase 1b in
|
||||
`wiki-maintain.sh`, after distill — URL content is treated as a
|
||||
supplement to conversation-driven knowledge, not the primary source.
|
||||
|
||||
URL classification:
|
||||
- **Harvest** (Type A/B) — docs, articles, blogs → fetch and compile
|
||||
@@ -254,13 +308,17 @@ full-mode runs can skip unchanged pages. Reports land in
|
||||
Top-level orchestrator:
|
||||
|
||||
```
|
||||
Phase 1: wiki-harvest.py (unless --hygiene-only)
|
||||
Phase 2: wiki-hygiene.py (--full for the weekly pass, else quick)
|
||||
Phase 3: qmd update && qmd embed (unless --no-reindex or --dry-run)
|
||||
Phase 1a: wiki-distill.py (unless --no-distill or --harvest-only / --hygiene-only)
|
||||
Phase 1b: wiki-harvest.py (unless --distill-only / --hygiene-only)
|
||||
Phase 2: wiki-hygiene.py (--full for the weekly pass, else quick)
|
||||
Phase 3: qmd update && qmd embed (unless --no-reindex or --dry-run)
|
||||
```
|
||||
|
||||
Flags pass through to child scripts. Error-tolerant: if one phase fails,
|
||||
the others still run. Logs to `scripts/.maintain.log`.
|
||||
Ordering is deliberate: distill runs before harvest so that
|
||||
conversation content drives the page shape, and URL harvesting only
|
||||
supplements what the conversations are already covering. Flags pass
|
||||
through to child scripts. Error-tolerant: if one phase fails, the
|
||||
others still run. Logs to `scripts/.maintain.log`.
|
||||
|
||||
---
|
||||
|
||||
@@ -289,6 +347,7 @@ Three JSON files track per-pipeline state:
|
||||
| File | Owner | Synced? | Purpose |
|
||||
|------|-------|---------|---------|
|
||||
| `.mine-state.json` | `extract-sessions.py`, `summarize-conversations.py` | No (gitignored) | Per-session byte offsets — local filesystem state, not portable |
|
||||
| `.distill-state.json` | `wiki-distill.py` | Yes (committed) | Processed conversations (content hash + topics seen), rejected topics, first-run flag |
|
||||
| `.harvest-state.json` | `wiki-harvest.py` | Yes (committed) | URL dedup — harvested/skipped/failed/rejected URLs |
|
||||
| `.hygiene-state.json` | `wiki-hygiene.py` | Yes (committed) | Page content hashes, deferred issues, last-run timestamps |
|
||||
|
||||
@@ -301,13 +360,15 @@ because Claude Code session files live at OS-specific paths.
|
||||
## Module dependency graph
|
||||
|
||||
```
|
||||
wiki_lib.py ─┬─> wiki-harvest.py
|
||||
wiki_lib.py ─┬─> wiki-distill.py
|
||||
├─> wiki-harvest.py
|
||||
├─> wiki-staging.py
|
||||
└─> wiki-hygiene.py
|
||||
|
||||
wiki-maintain.sh ─> wiki-harvest.py
|
||||
─> wiki-hygiene.py
|
||||
─> qmd (external)
|
||||
wiki-maintain.sh ─> wiki-distill.py (Phase 1a — conversations → staging)
|
||||
─> wiki-harvest.py (Phase 1b — URLs → staging)
|
||||
─> wiki-hygiene.py (Phase 2)
|
||||
─> qmd (external) (Phase 3)
|
||||
|
||||
mine-conversations.sh ─> extract-sessions.py
|
||||
─> summarize-conversations.py
|
||||
|
||||
@@ -43,10 +43,13 @@ repo preserves all of them:
|
||||
|
||||
Karpathy's gist is a concept pitch. He was explicit that he was sharing
|
||||
an "idea file" for others to build on, not publishing a working
|
||||
implementation. The analysis identified seven places where the core idea
|
||||
needs an engineering layer to become practical day-to-day — five have
|
||||
first-class answers in memex, and two remain scoped-out trade-offs that
|
||||
the architecture cleanly acknowledges.
|
||||
implementation. The analysis identified eight places where the core idea
|
||||
needs an engineering layer to become practical day-to-day. The first
|
||||
seven emerged from the original Signal & Noise review; the eighth
|
||||
(conversation distillation) surfaced after building the other layers
|
||||
and realizing that the conversations themselves were being mined,
|
||||
summarized, indexed, and scanned for URLs — but the knowledge *inside*
|
||||
them was never becoming wiki pages.
|
||||
|
||||
### 1. Claim freshness and reversibility
|
||||
|
||||
@@ -236,6 +239,71 @@ story. If you need any of that, you need a different architecture.
|
||||
This is for the personal and small-team case where git + Tailscale is
|
||||
the right amount of rigor.
|
||||
|
||||
### 8. Closing the MemPalace loop — conversation distillation
|
||||
|
||||
**The gap**: The mining pipeline extracts Claude Code sessions into
|
||||
transcripts, classifies them by memory type (fact/discovery/preference/
|
||||
advice/event/tooling), and tags them with topics. The URL harvester
|
||||
scans them for cited links. Hygiene refreshes `last_verified` on any
|
||||
wiki page that appears in a conversation's `related:` field. But none
|
||||
of those steps actually *compile the knowledge inside the conversations
|
||||
themselves into wiki pages.* A decision made in a session, a root cause
|
||||
found during debugging, a pattern spotted in review — these stay in the
|
||||
conversation summaries (searchable but not synthesized) until a human
|
||||
manually writes them up. That's the last piece of the MemPalace model
|
||||
that wasn't wired through: **closet content was never becoming the
|
||||
source for the wiki proper**.
|
||||
|
||||
**How memex extends it**:
|
||||
|
||||
- **`wiki-distill.py`** runs as Phase 1a of `wiki-maintain.sh`, before
|
||||
URL harvesting. The ordering is deliberate: conversation content
|
||||
should drive the page, and URL harvesting should only supplement
|
||||
what the conversations are already covering.
|
||||
- **Narrow today-filter with historical rollup** — daily runs only
|
||||
look at topics appearing in TODAY's summarized conversations, but
|
||||
for each such topic the script pulls in ALL historical conversations
|
||||
sharing that topic. Processing scope stays small; LLM context stays
|
||||
wide. Old topics that resurface in new sessions automatically
|
||||
trigger a re-distillation of the full history on that topic.
|
||||
- **First-run bootstrap** — the very first run uses a 7-day lookback
|
||||
to seed the state. After that, daily runs stay narrow.
|
||||
- **High-signal halls only** — distill reads `hall_facts`,
|
||||
`hall_discoveries`, and `hall_advice` bullets. Skips `hall_events`
|
||||
(temporal, not knowledge), `hall_preferences` (user working style),
|
||||
and `hall_tooling` (often low-signal). These are the halls the
|
||||
MemPalace taxonomy treats as "canonical knowledge" vs "context."
|
||||
- **claude -p compile step** — each topic group (topic + all matching
|
||||
conversations + their high-signal halls) is sent to `claude -p`
|
||||
with the current wiki index. The model decides whether to create a
|
||||
new page, update an existing one, emit both, or skip (topic not
|
||||
substantive enough or already well-covered).
|
||||
- **Staging output with distill provenance** — new/updated pages land
|
||||
in `staging/` with `staged_by: wiki-distill`, `distill_topic`, and
|
||||
`distill_source_conversations` frontmatter fields. Every page traces
|
||||
back to the exact conversations it was distilled from.
|
||||
- **State file `.distill-state.json`** tracks processed conversations
|
||||
by content hash and topic set, so re-runs only process what actually
|
||||
changed. A conversation gets re-distilled if its body changes OR if
|
||||
it gains a new topic not seen at previous distill time.
|
||||
|
||||
**Why this matters**: Without distillation, the MemPalace integration
|
||||
was incomplete — the closet summaries existed, the structural metadata
|
||||
existed, qmd could search them, but knowledge discovered during work
|
||||
never escaped the conversation archive. You could find "we had a
|
||||
debugging session about X last month" but couldn't find "here's the
|
||||
canonical page on X that captures what we learned." This extension
|
||||
turns the MemPalace layer from a searchable archive into a proper
|
||||
**ingest pipeline** for the wiki.
|
||||
|
||||
**Residual consideration**: Summarization quality is now load-bearing.
|
||||
The distill step trusts the summarizer's classification of bullets
|
||||
into halls. If the summarizer puts a debugging dead-end in
|
||||
`hall_discoveries`, it may enter the wiki compilation pipeline. The
|
||||
`MIN_BULLETS_PER_TOPIC` filter (default 2) and the LLM's own
|
||||
substantiveness check (it can choose `skip` with a reason) together
|
||||
catch most noise, and the staging review catches the rest.
|
||||
|
||||
---
|
||||
|
||||
## The biggest layer — active upkeep
|
||||
@@ -255,7 +323,7 @@ thing the human has to think about:
|
||||
| Every 2 hours | `wiki-sync.sh full` | Full sync + qmd reindex |
|
||||
| Every hour | `mine-conversations.sh --extract-only` | Capture new Claude Code sessions (no LLM) |
|
||||
| Daily 2am | `summarize-conversations.py --claude` + index | Classify + summarize (LLM) |
|
||||
| Daily 3am | `wiki-maintain.sh` | Harvest + quick hygiene + reindex |
|
||||
| Daily 3am | `wiki-maintain.sh` | Distill + harvest + quick hygiene + reindex |
|
||||
| Weekly Sun 4am | `wiki-maintain.sh --hygiene-only --full` | LLM-powered duplicate/contradiction/cross-ref detection |
|
||||
|
||||
If you disable all of these, you get the same outcome as every
|
||||
|
||||
@@ -281,7 +281,13 @@ python3 scripts/summarize-conversations.py --claude
|
||||
# 3. Regenerate conversation index + wake-up context
|
||||
python3 scripts/update-conversation-index.py --reindex
|
||||
|
||||
# 4. Dry-run the maintenance pipeline
|
||||
# 4. First-run distill bootstrap (7-day lookback, burns claude -p calls)
|
||||
# Only do this if you have summarized conversations from recent work.
|
||||
# Skip it if you're starting with a fresh wiki.
|
||||
python3 scripts/wiki-distill.py --first-run --dry-run # plan
|
||||
python3 scripts/wiki-distill.py --first-run # actually do it
|
||||
|
||||
# 5. Dry-run the maintenance pipeline
|
||||
bash scripts/wiki-maintain.sh --dry-run --no-compile
|
||||
```
|
||||
|
||||
@@ -322,7 +328,7 @@ PATH=/home/YOUR_USER/.nvm/versions/node/v22/bin:/home/YOUR_USER/.local/bin:/usr/
|
||||
0 2 * * * cd /home/YOUR_USER/projects/wiki && python3 scripts/summarize-conversations.py --claude >> /tmp/wiki-mine.log 2>&1 && python3 scripts/update-conversation-index.py --reindex >> /tmp/wiki-mine.log 2>&1
|
||||
|
||||
# ─── Maintenance ───────────────────────────────────────────────────────────
|
||||
# Daily at 3am: harvest + quick hygiene + qmd reindex
|
||||
# Daily at 3am: distill conversations + harvest URLs + quick hygiene + qmd reindex
|
||||
0 3 * * * cd /home/YOUR_USER/projects/wiki && bash scripts/wiki-maintain.sh >> scripts/.maintain.log 2>&1
|
||||
|
||||
# Weekly Sunday at 4am: full hygiene with LLM checks
|
||||
@@ -424,8 +430,8 @@ cd tests && python3 -m pytest
|
||||
|
||||
Expected:
|
||||
- `qmd collection list` shows all three collections: `wiki`, `wiki-archive [excluded]`, `wiki-conversations [excluded]`
|
||||
- `wiki-maintain.sh --dry-run` completes all three phases
|
||||
- `pytest` passes all 171 tests in ~1.3 seconds
|
||||
- `wiki-maintain.sh --dry-run` completes all four phases (distill, harvest, hygiene, reindex)
|
||||
- `pytest` passes all 192 tests in ~1.5 seconds
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -1209,6 +1209,7 @@
|
||||
<button class="tab-btn" onclick="switchTab(this, 'tab-signals')">Signal Breakdown</button>
|
||||
<button class="tab-btn" onclick="switchTab(this, 'tab-mitigations')">Mitigations ★</button>
|
||||
<button class="tab-btn" onclick="switchTab(this, 'tab-mempalace')" style="color:var(--accent-green);font-weight:600">MemPalace ⬡</button>
|
||||
<button class="tab-btn" onclick="switchTab(this, 'tab-distill')" style="color:var(--accent-amber);font-weight:600">Distill ⬣</button>
|
||||
</div>
|
||||
|
||||
<!-- TAB: PROS & CONS -->
|
||||
@@ -2259,6 +2260,255 @@
|
||||
|
||||
</div><!-- /tab-mempalace -->
|
||||
|
||||
<!-- TAB: DISTILL — the 8th extension, closing the MemPalace loop -->
|
||||
<div id="tab-distill" class="tab-panel">
|
||||
|
||||
<div class="palace-hero" style="background:linear-gradient(135deg, #2a1810 0%, #1a1a10 50%, #0a1510 100%); border-color:#4a3a1a;">
|
||||
<div class="kicker" style="color:#f0c060">⬣ The 8th Extension — Closing the MemPalace Loop</div>
|
||||
<h3>Closet summaries <em>become</em> the source for the wiki itself.</h3>
|
||||
<p>The first seven extensions came out of the Signal & Noise review. The eighth surfaced only after the other layers were built — and it's the one that makes the MemPalace integration a real pipeline into the wiki instead of just a searchable archive beside it. The mining layer was extracting sessions, classifying bullets into halls, tagging topics, and making everything searchable via qmd. But the knowledge <em>inside</em> the conversations was never being compiled into wiki pages. A decision made in a session, a root cause found during debugging, a pattern spotted in review — these stayed in the conversation summaries forever, findable but not synthesized.</p>
|
||||
<p style="color:#f0c060;font-size:12.5px;font-family:'JetBrains Mono',monospace;letter-spacing:0.05em;">This is what the <code>wiki-distill.py</code> script solves. It's Phase 1a of <code>wiki-maintain.sh</code> and runs before URL harvesting because conversation content should drive the page, not the URLs the conversation cites.</p>
|
||||
<div class="hero-stats">
|
||||
<div class="hstat"><span class="hval">Phase 1a</span><span class="hlbl">Runs before harvest</span></div>
|
||||
<div class="hstat"><span class="hval">today</span><span class="hlbl">Narrow filter — today's topics</span></div>
|
||||
<div class="hstat"><span class="hval">∀ history</span><span class="hlbl">Rollup all past conversations on each topic</span></div>
|
||||
<div class="hstat"><span class="hval">3 halls</span><span class="hlbl">fact + discovery + advice</span></div>
|
||||
<div class="hstat"><span class="hval">haiku/sonnet</span><span class="hlbl">Auto-routed by topic size</span></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- FLOW DIAGRAM -->
|
||||
<div class="flow-diagram">
|
||||
<div class="flow-title">Distill Flow — Conversation Content → Wiki Pages</div>
|
||||
<div class="flow-label">Narrow: what topics to process today</div>
|
||||
<div class="flow-row">
|
||||
<div class="flow-node convo">Today's<br>conversations</div>
|
||||
<div class="flow-arrow">→</div>
|
||||
<div class="flow-node palace">Extract<br>topics[]</div>
|
||||
<div class="flow-arrow">=</div>
|
||||
<div class="flow-node wiki">Topics of<br>today set</div>
|
||||
</div>
|
||||
<div class="flow-label" style="margin-top:14px">Wide: pull full history for each today-topic</div>
|
||||
<div class="flow-row">
|
||||
<div class="flow-node wiki">Each<br>today-topic</div>
|
||||
<div class="flow-arrow">→</div>
|
||||
<div class="flow-node palace">Rollup ALL<br>historical convs</div>
|
||||
<div class="flow-arrow">→</div>
|
||||
<div class="flow-node palace">Extract<br>fact / discovery / advice</div>
|
||||
<div class="flow-arrow">→</div>
|
||||
<div class="flow-node llm">claude -p<br>distill prompt</div>
|
||||
</div>
|
||||
<div class="flow-label" style="margin-top:14px">Compile: model decides new / update / skip</div>
|
||||
<div class="flow-row">
|
||||
<div class="flow-node llm">JSON<br>actions[]</div>
|
||||
<div class="flow-arrow">→</div>
|
||||
<div class="flow-node wiki">new_page</div>
|
||||
<div class="flow-arrow">+</div>
|
||||
<div class="flow-node wiki">update_page<br>(modifies existing)</div>
|
||||
<div class="flow-arrow">→</div>
|
||||
<div class="flow-node raw">staging/<type>/<br>pending review</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- SECTION: WHY IT COMPLETES MEMPALACE -->
|
||||
<div class="section-header">
|
||||
<h2>Why This Completes MemPalace</h2>
|
||||
<span class="section-tag" style="border-color:var(--accent-amber);color:var(--accent-amber);background:#fff8e6">Pipeline Closure</span>
|
||||
</div>
|
||||
|
||||
<div class="palace-map">
|
||||
<div class="palace-cell">
|
||||
<div class="pc-icon">📦</div>
|
||||
<div class="pc-term">Drawer — before</div>
|
||||
<div class="pc-name">Verbatim Archive</div>
|
||||
<div class="pc-desc">Full transcripts stored, searchable via qmd. No compilation — if you wanted canonical knowledge from them, you had to write it up manually.</div>
|
||||
<div class="pc-wiki-map">Status: already working</div>
|
||||
</div>
|
||||
<div class="palace-cell">
|
||||
<div class="pc-icon">🗂️</div>
|
||||
<div class="pc-term">Closet — before</div>
|
||||
<div class="pc-name">Summary Layer</div>
|
||||
<div class="pc-desc">Summaries with hall classification (fact / discovery / preference / advice / event / tooling) and topics. Searchable. Terminal: never fed forward into the wiki compiler.</div>
|
||||
<div class="pc-wiki-map">Status: terminal data, not flowing</div>
|
||||
</div>
|
||||
<div class="palace-cell">
|
||||
<div class="pc-icon">⬣</div>
|
||||
<div class="pc-term">Distill — NEW</div>
|
||||
<div class="pc-name">Compiler Bridge</div>
|
||||
<div class="pc-desc">Reads closet content by topic, rolls up all matching conversations across history, filters to high-signal halls only, sends to claude -p with the current wiki index, emits new or updated wiki pages to staging.</div>
|
||||
<div class="pc-wiki-map">Status: wiki-distill.py</div>
|
||||
</div>
|
||||
<div class="palace-cell">
|
||||
<div class="pc-icon">📄</div>
|
||||
<div class="pc-term">Wiki Pages — NEW</div>
|
||||
<div class="pc-name">Distilled Knowledge</div>
|
||||
<div class="pc-desc">Pages in staging/<type>/ with full distill provenance: distill_topic, distill_source_conversations, compilation_notes. Promote via staging review. Session knowledge becomes canonical knowledge.</div>
|
||||
<div class="pc-wiki-map">Status: origin=automated, staged_by=wiki-distill</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- HALL FILTERING -->
|
||||
<div class="section-header">
|
||||
<h2>Which Halls Get Distilled</h2>
|
||||
<span class="section-tag" style="border-color:var(--accent-green);color:var(--accent-green);background:#eaf5ee">High Signal Only</span>
|
||||
</div>
|
||||
|
||||
<table class="compare-table">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Hall</th>
|
||||
<th style="text-align:center">Distilled?</th>
|
||||
<th>Why</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td class="row-label">hall_facts</td>
|
||||
<td style="text-align:center" class="cell-win">✦ YES</td>
|
||||
<td>Decisions locked in, choices made, specs agreed. Canonical knowledge.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class="row-label">hall_discoveries</td>
|
||||
<td style="text-align:center" class="cell-win">✦ YES</td>
|
||||
<td>Root causes, breakthroughs, non-obvious findings. The highest-signal content in any session.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class="row-label">hall_advice</td>
|
||||
<td style="text-align:center" class="cell-win">✦ YES</td>
|
||||
<td>Recommendations, lessons learned, "next time do X." Worth capturing as patterns.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class="row-label">hall_events</td>
|
||||
<td style="text-align:center" class="cell-mid">no</td>
|
||||
<td>Deployments, incidents, milestones. Temporal data — belongs in logs, not the wiki.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class="row-label">hall_preferences</td>
|
||||
<td style="text-align:center" class="cell-mid">no</td>
|
||||
<td>User working style notes. Belong in personal configs, not the shared wiki.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class="row-label">hall_tooling</td>
|
||||
<td style="text-align:center" class="cell-mid">no</td>
|
||||
<td>Script/command usage, failures, improvements. Usually low-signal or duplicates what's already in the wiki.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
<!-- HOW THE NARROW-TODAY + WIDE-HISTORY FILTER WORKS -->
|
||||
<div class="section-header">
|
||||
<h2>The Narrow-Today / Wide-History Filter</h2>
|
||||
<span class="section-tag" style="border-color:var(--accent-blue);color:var(--accent-blue);background:#e8eef5">Key Design</span>
|
||||
</div>
|
||||
|
||||
<div class="mitigation-intro">
|
||||
<strong>Processing scope stays narrow; LLM context stays wide.</strong> This is the key property that makes distill cheap enough to run daily and smart enough to produce good pages.
|
||||
</div>
|
||||
|
||||
<div class="mitigation-steps">
|
||||
|
||||
<div class="mitigation-step" onclick="toggleStep(this)">
|
||||
<div class="mitigation-step-header">
|
||||
<span class="step-num">01</span>
|
||||
<span class="step-title">Daily filter: only process topics appearing in TODAY's conversations</span>
|
||||
<span class="step-tool-tag">Scope</span>
|
||||
<span class="step-arrow">▶</span>
|
||||
</div>
|
||||
<div class="mitigation-step-body">
|
||||
<p>Each daily run only looks at conversations dated today. It extracts the <code>topics:</code> frontmatter from each — that union becomes the "topics of today" set. If you didn't discuss a topic today, it's not in the processing scope. This keeps the cron job cheap and predictable: if today was a light session day, distill runs fast. If today was a heavy architecture discussion, distill does real work.</p>
|
||||
<div class="tip-box"><strong>First run only:</strong> The very first run uses a 7-day lookback instead of today-only so the state file gets seeded. After that first bootstrap, daily runs stay narrow.</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="mitigation-step" onclick="toggleStep(this)">
|
||||
<div class="mitigation-step-header">
|
||||
<span class="step-num">02</span>
|
||||
<span class="step-title">Historical rollup: for each today-topic, pull ALL matching conversations</span>
|
||||
<span class="step-tool-tag">Context</span>
|
||||
<span class="step-arrow">▶</span>
|
||||
</div>
|
||||
<div class="mitigation-step-body">
|
||||
<p>Once the today-topic set is known, for each topic the script walks the entire conversation archive and pulls every summarized conversation that shares that topic. A discussion about <code>blue-green-deploy</code> today might roll up 16 conversations across the last 6 months. The claude -p call sees the full history, not just today's fragment.</p>
|
||||
<p>This is what makes the distilled pages <em>good</em>. The LLM isn't guessing what a pattern looks like from one session — it's synthesizing across everything you've ever discussed on the topic.</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="mitigation-step" onclick="toggleStep(this)">
|
||||
<div class="mitigation-step-header">
|
||||
<span class="step-num">03</span>
|
||||
<span class="step-title">Self-triggering: dormant topics wake up when they resurface</span>
|
||||
<span class="step-tool-tag">Emergent</span>
|
||||
<span class="step-arrow">▶</span>
|
||||
</div>
|
||||
<div class="mitigation-step-body">
|
||||
<p>The narrow-today/wide-history combination produces a useful emergent property: <strong>dormant topics wake up automatically.</strong> If you discussed <code>database-migrations</code> three months ago and it never came up again, it's not in the daily scope. But the day you mention it again in any new conversation, that topic enters today's set — and the rollup pulls in all three months of historical discussion. The wiki page gets updated with fresh synthesis across the full history without you having to manually trigger reprocessing.</p>
|
||||
<div class="tip-box"><strong>What this means in practice:</strong> Old knowledge gets distilled <em>when it becomes relevant again</em>. You don't need to remember to ask "hey, is there a wiki page for X?" — the next time X comes up in a session, distill will check the wiki state and either create or update the page for you.</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="mitigation-step" onclick="toggleStep(this)">
|
||||
<div class="mitigation-step-header">
|
||||
<span class="step-num">04</span>
|
||||
<span class="step-title">State tracking by content hash + topic set</span>
|
||||
<span class="step-tool-tag">.distill-state.json</span>
|
||||
<span class="step-arrow">▶</span>
|
||||
</div>
|
||||
<div class="mitigation-step-body">
|
||||
<p>A conversation is considered "already distilled" only if its body hash AND its topic set match what was seen at the last distill. If the body changes (summarizer re-ran and updated the bullets) OR a new topic is added, the conversation gets re-processed on the next run. Topics get tracked so rejected ones don't get reprocessed forever — if the LLM says "this topic doesn't deserve a wiki page" once, it stays rejected until something meaningful changes.</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="mitigation-step" onclick="toggleStep(this)">
|
||||
<div class="mitigation-step-header">
|
||||
<span class="step-num">05</span>
|
||||
<span class="step-title">Distill runs BEFORE harvest — conversation content has priority</span>
|
||||
<span class="step-tool-tag">Phase 1a</span>
|
||||
<span class="step-arrow">▶</span>
|
||||
</div>
|
||||
<div class="mitigation-step-body">
|
||||
<p>The orchestrator runs distill as Phase 1a and harvest as Phase 1b. Deliberate: if a topic is being actively discussed in your sessions, you want the wiki page to reflect <em>your</em> synthesis of what you've learned, not just the external URL cited in passing. URL harvesting then fills in gaps — it picks up the docs pages, blog posts, and references that your sessions didn't already cover.</p>
|
||||
<div class="warn-box">Both phases can produce staging pages. If distill creates <code>patterns/docker-hardening.md</code> and harvest creates <code>patterns/docker-hardening.md</code>, the staging-unique-path helper appends a short hash suffix so they don't collide. The reviewer sees both in staging and picks the better one (usually distill, since it has historical context).</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- STAGING FRONTMATTER -->
|
||||
<div class="section-header">
|
||||
<h2>Distill Staging Provenance</h2>
|
||||
<span class="section-tag" style="border-color:var(--accent-green);color:var(--accent-green);background:#eaf5ee">Traceable</span>
|
||||
</div>
|
||||
|
||||
<p style="font-size:13.5px;color:var(--muted);margin-bottom:20px;line-height:1.6;">Every distilled page lands in staging with full provenance in its frontmatter. When you review a page in staging, you can see exactly which conversations it came from and jump directly to those transcripts.</p>
|
||||
|
||||
<div class="flow-diagram" style="background:#0d0d0d; border-color:#2a2a2a;">
|
||||
<div class="flow-title" style="color:#c4b99a">Example: staging/patterns/zoho-crm-integration.md frontmatter</div>
|
||||
<pre style="font-family:'JetBrains Mono',monospace;font-size:11px;color:#c4b99a;line-height:1.6;margin:0;padding:14px 0;overflow-x:auto;">---
|
||||
origin: automated
|
||||
status: pending
|
||||
staged_date: 2026-04-12
|
||||
staged_by: wiki-distill
|
||||
target_path: patterns/zoho-crm-integration.md
|
||||
distill_topic: zoho-api
|
||||
distill_source_conversations: conversations/general/2026-04-06-73d15650.md,conversations/mc/2026-03-30-64089d1d.md
|
||||
compilation_notes: Two separate incidents discovered the same Zoho CRM v2 API limitations, documenting them as a pattern page prevents re-investigation and provides a canonical reference for future Zoho integrations.
|
||||
title: Zoho CRM Integration
|
||||
type: pattern
|
||||
confidence: high
|
||||
sources: [conversations/general/2026-04-06-73d15650.md, conversations/mc/2026-03-30-64089d1d.md]
|
||||
related: [database-migrations.md, activity-event-auditing.md]
|
||||
last_compiled: 2026-04-12
|
||||
last_verified: 2026-04-12
|
||||
---</pre>
|
||||
</div>
|
||||
|
||||
<div class="pull-quote" style="border-left-color:var(--accent-amber)">
|
||||
Without distillation, MemPalace was a searchable archive sitting beside the wiki. With distillation, it's a real ingest pipeline — closet content becomes the source material for the wiki proper, completing the eight-extension story.
|
||||
<span class="attribution">— memex design rationale, April 2026</span>
|
||||
</div>
|
||||
|
||||
</div><!-- /tab-distill -->
|
||||
|
||||
</div><!-- /page -->
|
||||
|
||||
<footer class="page-footer">
|
||||
|
||||
Reference in New Issue
Block a user