Files
memex/docs/artifacts/signal-and-noise.html
Eric Turner 997aa837de feat(distill): close the MemPalace loop — conversations → wiki pages
Add wiki-distill.py as Phase 1a of the maintenance pipeline. This is
the 8th extension memex adds to Karpathy's pattern and the one that
makes the MemPalace integration a real ingest pipeline instead of
just a searchable archive beside the wiki.

## The gap distill closes

The mining layer was extracting Claude Code sessions, classifying
bullets into halls (fact/discovery/preference/advice/event/tooling),
and tagging topics. The URL harvester scanned conversations for cited
links. Hygiene refreshed last_verified on wiki pages referenced in
related: fields. But none of those steps compiled the knowledge
*inside* the conversations themselves into wiki pages. Decisions,
root causes, and patterns stayed in the summaries forever — findable
via qmd but never synthesized into canonical pages.

## What distill does

Narrow today-filter with historical rollup:

  1. Find all summarized conversations dated TODAY
  2. Extract their topics: — this is the "topics of today" set
  3. For each topic in that set, pull ALL summarized conversations
     across history that share that topic (full historical context)
  4. Extract hall_facts + hall_discoveries + hall_advice bullets
     (the high-signal hall types — skips event/preference/tooling)
  5. Send topic group + wiki index.md to claude -p
  6. Model emits JSON actions[]: new_page / update_page / skip
  7. Write each action to staging/<type>/ with distill provenance
     frontmatter (staged_by: wiki-distill, distill_topic,
     distill_source_conversations, compilation_notes)

First-run bootstrap: uses 7-day lookback instead of today-only so
the state file gets seeded reasonably. After that, daily runs stay
narrow.

Self-triggering: dormant topics that resurface in a new conversation
automatically pull in all historical conversations on that topic via
the rollup. Old knowledge gets distilled when it becomes relevant
again without manual intervention.

## Orchestration — distill BEFORE harvest

wiki-maintain.sh now has Phase 1a (distill) + Phase 1b (harvest):

  1a. wiki-distill.py    — conversations → staging (PRIORITY)
  1b. wiki-harvest.py    — URLs → raw/harvested → staging (supplement)
  2.  wiki-hygiene.py    — decay, archive, repair, checks
  3.  qmd reindex

Conversation content drives the page shape; URL harvesting fills
gaps for external references conversations don't cover. New flags:
--distill-only, --no-distill, --distill-first-run.

## Verified on real wiki

Tested end-to-end on the production wiki with 611 summarized
conversations across 14 wings. First-run dry-run found 116 topic
groups worth distilling (+ 3 too-thin). Tested single-topic compile
with --topic zoho-api: the LLM rolled up 2 conversations (34
bullets), synthesized a proper pattern page with "What / Why /
Known Limitations" structure, linked it to existing wiki pages,
and landed it in staging with full distill provenance. LLM
correctly rejected claude-code-statusline (already well-covered
by an existing live page) — so the "skip" path works.

## Code additions

- scripts/wiki-distill.py (new, ~530 lines)
- scripts/wiki_lib.py: HIGH_SIGNAL_HALLS + parse_conversation_halls
  + high_signal_halls + _flatten_bullet helpers
- scripts/wiki-maintain.sh: Phase 1a distill, new flags
- tests/test_wiki_distill.py (21 new tests — hall parsing, rollup,
  state management, CLI smoke tests)
- tests/test_shell_scripts.py: updated phase-name assertion for
  the Phase 1a/1b split

## Docs additions

- README.md: 8th row in extensions table, updated compounding-loop
  diagram, new wiki-distill.py reference in architecture overview
- docs/DESIGN-RATIONALE.md: new section 8 "Closing the MemPalace
  loop" with full mempalace taxonomy mapping
- docs/ARCHITECTURE.md: wiki-distill.py section, updated phase
  order, updated state file table, updated dep graph
- docs/SETUP.md: updated cron comment, first-run distill guidance,
  verify section test count
- .gitignore: note distill-state.json is committed (sync across
  machines), not gitignored
- docs/artifacts/signal-and-noise.html: new "Distill ⬣" top-level
  tab with flow diagram, hall filter table, narrow-today/wide-
  history explanation, staging provenance example

## Tests

192 tests total (+21 new, +1 regression fix), all green in ~1.5s.
2026-04-12 22:34:33 -06:00

2542 lines
120 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>memex — Karpathy's Pattern — Signal & Noise</title>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link href="https://fonts.googleapis.com/css2?family=Playfair+Display:wght@400;700;900&family=JetBrains+Mono:wght@300;400;500&family=DM+Sans:wght@300;400;500&display=swap" rel="stylesheet">
<style>
:root {
--ink: #0d0d0d;
--paper: #f5f0e8;
--paper2: #ede7d8;
--accent-green: #1a6b45;
--accent-red: #8b2020;
--accent-amber: #b8820a;
--accent-blue: #1a3d6b;
--rule: #c4b99a;
--muted: #7a6e5f;
--highlight: #f0e2b6;
}
* { margin: 0; padding: 0; box-sizing: border-box; }
body {
background: var(--paper);
color: var(--ink);
font-family: 'DM Sans', sans-serif;
font-size: 15px;
line-height: 1.6;
min-height: 100vh;
background-image:
url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='4' height='4'%3E%3Crect width='4' height='4' fill='%23f5f0e8'/%3E%3Ccircle cx='1' cy='1' r='0.5' fill='%23c4b99a22'/%3E%3C/svg%3E");
}
/* MASTHEAD */
.masthead {
border-bottom: 3px double var(--ink);
padding: 28px 40px 20px;
display: grid;
grid-template-columns: 1fr auto 1fr;
align-items: center;
gap: 20px;
background: var(--paper2);
}
.masthead-left {
font-family: 'JetBrains Mono', monospace;
font-size: 10px;
font-weight: 300;
color: var(--muted);
letter-spacing: 0.15em;
text-transform: uppercase;
line-height: 1.8;
}
.masthead-center {
text-align: center;
}
.masthead-center h1 {
font-family: 'Playfair Display', serif;
font-size: clamp(28px, 5vw, 52px);
font-weight: 900;
letter-spacing: -0.02em;
line-height: 1;
text-transform: uppercase;
}
.masthead-center .subtitle {
font-family: 'JetBrains Mono', monospace;
font-size: 10px;
letter-spacing: 0.3em;
color: var(--muted);
text-transform: uppercase;
margin-top: 6px;
border-top: 1px solid var(--rule);
border-bottom: 1px solid var(--rule);
padding: 4px 0;
}
.masthead-right {
font-family: 'JetBrains Mono', monospace;
font-size: 10px;
font-weight: 300;
color: var(--muted);
letter-spacing: 0.1em;
text-align: right;
line-height: 1.8;
}
/* TICKER */
.ticker {
background: var(--ink);
color: var(--paper);
font-family: 'JetBrains Mono', monospace;
font-size: 10px;
letter-spacing: 0.15em;
padding: 7px 0;
white-space: nowrap;
overflow: hidden;
}
.ticker-track {
display: inline-block;
animation: ticker 30s linear infinite;
}
@keyframes ticker {
0% { transform: translateX(0); }
100% { transform: translateX(-50%); }
}
.ticker-item { margin: 0 40px; }
.ticker-dot { color: var(--accent-amber); margin: 0 4px; }
/* OVERVIEW STRIP */
.overview-strip {
display: grid;
grid-template-columns: repeat(4, 1fr);
border-bottom: 2px solid var(--ink);
}
.stat-cell {
padding: 18px 24px;
border-right: 1px solid var(--rule);
text-align: center;
}
.stat-cell:last-child { border-right: none; }
.stat-num {
font-family: 'Playfair Display', serif;
font-size: 32px;
font-weight: 900;
line-height: 1;
margin-bottom: 4px;
}
.stat-label {
font-family: 'JetBrains Mono', monospace;
font-size: 9px;
letter-spacing: 0.2em;
text-transform: uppercase;
color: var(--muted);
}
/* MAIN LAYOUT */
.page {
max-width: 1280px;
margin: 0 auto;
padding: 0 40px 60px;
}
/* SECTION HEADERS */
.section-header {
display: flex;
align-items: baseline;
gap: 16px;
margin: 36px 0 20px;
padding-bottom: 8px;
border-bottom: 2px solid var(--ink);
}
.section-header h2 {
font-family: 'Playfair Display', serif;
font-size: 22px;
font-weight: 700;
text-transform: uppercase;
letter-spacing: 0.05em;
}
.section-tag {
font-family: 'JetBrains Mono', monospace;
font-size: 9px;
letter-spacing: 0.2em;
text-transform: uppercase;
padding: 2px 8px;
border: 1px solid var(--ink);
border-radius: 2px;
}
.tag-green { border-color: var(--accent-green); color: var(--accent-green); background: #e8f5ee; }
.tag-red { border-color: var(--accent-red); color: var(--accent-red); background: #f5e8e8; }
/* CONCEPT EXPLAINER */
.concept-box {
background: var(--ink);
color: var(--paper);
padding: 28px 36px;
margin: 28px 0;
position: relative;
overflow: hidden;
}
.concept-box::before {
content: '"';
font-family: 'Playfair Display', serif;
font-size: 200px;
font-weight: 900;
position: absolute;
top: -40px;
left: 20px;
opacity: 0.07;
line-height: 1;
}
.concept-box .kicker {
font-family: 'JetBrains Mono', monospace;
font-size: 10px;
letter-spacing: 0.25em;
text-transform: uppercase;
color: var(--accent-amber);
margin-bottom: 10px;
}
.concept-box p {
font-family: 'Playfair Display', serif;
font-size: 18px;
font-weight: 400;
line-height: 1.5;
position: relative;
z-index: 1;
}
.concept-box em { color: #f0c060; font-style: normal; }
/* ARCHITECTURE DIAGRAM */
.arch-diagram {
display: flex;
align-items: center;
gap: 0;
margin: 24px 0 32px;
overflow-x: auto;
}
.arch-node {
flex: 1;
min-width: 160px;
padding: 18px 16px;
border: 2px solid var(--ink);
background: white;
text-align: center;
position: relative;
}
.arch-node.raw { border-color: var(--accent-blue); background: #e8eef5; }
.arch-node.wiki { border-color: var(--accent-green); background: #eaf5ee; }
.arch-node.schema { border-color: var(--accent-amber); background: #f5f0e0; }
.arch-node .node-label {
font-family: 'JetBrains Mono', monospace;
font-size: 11px;
font-weight: 500;
letter-spacing: 0.1em;
text-transform: uppercase;
margin-bottom: 6px;
}
.arch-node .node-name {
font-family: 'Playfair Display', serif;
font-size: 18px;
font-weight: 700;
}
.arch-node .node-desc {
font-size: 11px;
color: var(--muted);
margin-top: 6px;
line-height: 1.4;
}
.arch-arrow {
flex: 0 0 50px;
text-align: center;
font-size: 22px;
color: var(--muted);
}
/* PRO/CON GRID */
.procon-grid {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 0;
border: 2px solid var(--ink);
margin-bottom: 32px;
}
.procon-col { }
.procon-col-header {
padding: 14px 20px;
font-family: 'Playfair Display', serif;
font-size: 18px;
font-weight: 700;
text-transform: uppercase;
letter-spacing: 0.04em;
border-bottom: 2px solid var(--ink);
display: flex;
align-items: center;
gap: 10px;
}
.procon-col-header.pro { background: #1a6b45; color: white; }
.procon-col-header.con { background: #8b2020; color: white; border-left: 2px solid var(--ink); }
.procon-col-header .icon { font-size: 20px; }
.procon-items { padding: 0; }
.procon-item {
padding: 16px 20px;
border-bottom: 1px solid var(--rule);
cursor: pointer;
transition: background 0.15s;
position: relative;
}
.procon-item:last-child { border-bottom: none; }
.procon-item.con-item { border-left: 2px solid var(--ink); }
.procon-item:hover { background: var(--highlight); }
.procon-item.active { background: var(--highlight); }
.procon-item .item-title {
font-family: 'DM Sans', sans-serif;
font-weight: 500;
font-size: 14px;
margin-bottom: 4px;
display: flex;
align-items: flex-start;
gap: 8px;
}
.procon-item .item-title .bullet {
flex-shrink: 0;
width: 18px;
height: 18px;
border-radius: 50%;
display: inline-flex;
align-items: center;
justify-content: center;
font-size: 10px;
font-weight: 700;
margin-top: 1px;
}
.pro-item .bullet { background: var(--accent-green); color: white; }
.con-item .bullet { background: var(--accent-red); color: white; }
.procon-item .item-detail {
font-size: 12.5px;
color: var(--muted);
line-height: 1.5;
padding-left: 26px;
display: none;
margin-top: 6px;
}
.procon-item.active .item-detail { display: block; }
.procon-item .expand-hint {
position: absolute;
right: 16px;
top: 50%;
transform: translateY(-50%);
font-family: 'JetBrains Mono', monospace;
font-size: 16px;
color: var(--rule);
transition: transform 0.2s;
}
.procon-item.active .expand-hint { transform: translateY(-50%) rotate(45deg); color: var(--muted); }
/* VERDICT METER */
.verdict-section {
margin: 32px 0;
}
.verdict-header {
font-family: 'Playfair Display', serif;
font-size: 20px;
font-weight: 700;
text-transform: uppercase;
letter-spacing: 0.05em;
margin-bottom: 20px;
padding-bottom: 8px;
border-bottom: 2px solid var(--ink);
display: flex;
align-items: baseline;
gap: 14px;
}
.use-cases {
display: grid;
grid-template-columns: repeat(3, 1fr);
gap: 16px;
}
.use-case-card {
border: 2px solid var(--rule);
padding: 18px;
position: relative;
cursor: default;
transition: border-color 0.15s, transform 0.15s;
}
.use-case-card:hover { border-color: var(--ink); transform: translateY(-2px); }
.use-case-card .verdict-badge {
font-family: 'JetBrains Mono', monospace;
font-size: 9px;
font-weight: 500;
letter-spacing: 0.2em;
text-transform: uppercase;
padding: 3px 8px;
border-radius: 2px;
display: inline-block;
margin-bottom: 10px;
}
.badge-excellent { background: #1a6b45; color: white; }
.badge-good { background: #4a7a1a; color: white; }
.badge-poor { background: #b8820a; color: white; }
.badge-avoid { background: #8b2020; color: white; }
.use-case-card h4 {
font-family: 'Playfair Display', serif;
font-size: 16px;
font-weight: 700;
margin-bottom: 8px;
}
.use-case-card p { font-size: 12.5px; color: var(--muted); line-height: 1.5; }
.fit-meter {
margin-top: 12px;
height: 4px;
background: var(--rule);
border-radius: 2px;
overflow: hidden;
}
.fit-fill {
height: 100%;
border-radius: 2px;
transition: width 1s ease;
}
/* COMPARISON TABLE */
.compare-table {
width: 100%;
border-collapse: collapse;
margin: 20px 0 32px;
font-size: 13px;
}
.compare-table th {
font-family: 'JetBrains Mono', monospace;
font-size: 10px;
letter-spacing: 0.15em;
text-transform: uppercase;
padding: 12px 16px;
text-align: left;
border-bottom: 2px solid var(--ink);
background: var(--paper2);
}
.compare-table th:not(:first-child) { text-align: center; }
.compare-table td {
padding: 11px 16px;
border-bottom: 1px solid var(--rule);
vertical-align: middle;
}
.compare-table td:not(:first-child) { text-align: center; }
.compare-table tr:hover td { background: var(--highlight); }
.compare-table .row-label {
font-family: 'DM Sans', sans-serif;
font-weight: 500;
}
.compare-table .col-wiki { background: #eaf5ee44; }
.compare-table .col-rag { background: #e8eef544; }
.cell-win { color: var(--accent-green); font-weight: 500; }
.cell-lose { color: var(--accent-red); }
.cell-mid { color: var(--accent-amber); }
.cell-icon { font-size: 16px; }
/* QUOTE PULL */
.pull-quote {
border-left: 4px solid var(--ink);
margin: 28px 0;
padding: 16px 24px;
background: var(--paper2);
font-family: 'Playfair Display', serif;
font-size: 16px;
font-style: italic;
line-height: 1.6;
}
.pull-quote .attribution {
font-family: 'JetBrains Mono', monospace;
font-size: 10px;
letter-spacing: 0.15em;
text-transform: uppercase;
font-style: normal;
color: var(--muted);
margin-top: 8px;
display: block;
}
/* TABS */
.tab-row {
display: flex;
border-bottom: 2px solid var(--ink);
margin-bottom: 20px;
gap: 0;
overflow-x: auto;
}
.tab-btn {
font-family: 'JetBrains Mono', monospace;
font-size: 10px;
letter-spacing: 0.15em;
text-transform: uppercase;
padding: 10px 20px;
border: none;
background: transparent;
cursor: pointer;
border-bottom: 3px solid transparent;
margin-bottom: -2px;
white-space: nowrap;
color: var(--muted);
transition: color 0.15s;
}
.tab-btn.active {
border-bottom-color: var(--ink);
color: var(--ink);
font-weight: 500;
}
.tab-btn:hover { color: var(--ink); }
.tab-panel { display: none; }
.tab-panel.active { display: block; }
/* FOOTER */
.page-footer {
border-top: 3px double var(--ink);
padding: 20px 40px;
display: flex;
justify-content: space-between;
align-items: center;
background: var(--paper2);
font-family: 'JetBrains Mono', monospace;
font-size: 10px;
letter-spacing: 0.12em;
color: var(--muted);
text-transform: uppercase;
}
/* SIGNAL STRENGTH */
.signal-grid {
display: grid;
grid-template-columns: repeat(2, 1fr);
gap: 16px;
margin-bottom: 24px;
}
.signal-card {
border: 1px solid var(--rule);
padding: 16px 18px;
background: white;
}
.signal-card .signal-category {
font-family: 'JetBrains Mono', monospace;
font-size: 9px;
letter-spacing: 0.2em;
text-transform: uppercase;
color: var(--muted);
margin-bottom: 8px;
}
.signal-card h4 {
font-family: 'Playfair Display', serif;
font-size: 15px;
font-weight: 700;
margin-bottom: 6px;
}
.signal-card p { font-size: 12.5px; color: var(--muted); line-height: 1.5; }
.signal-bar {
display: flex;
align-items: center;
gap: 8px;
margin-top: 10px;
}
.signal-bar-track {
flex: 1;
height: 3px;
background: var(--rule);
border-radius: 2px;
}
.signal-bar-fill {
height: 100%;
border-radius: 2px;
}
.signal-bar-label {
font-family: 'JetBrains Mono', monospace;
font-size: 9px;
color: var(--muted);
min-width: 28px;
}
/* SCALE NOTE */
.scale-note {
background: #fff8e6;
border: 1px solid var(--accent-amber);
padding: 14px 18px;
margin-bottom: 20px;
display: flex;
gap: 12px;
font-size: 13px;
}
.scale-note .warn-icon { font-size: 18px; flex-shrink: 0; }
.scale-note strong { font-weight: 500; }
/* MITIGATION CARDS */
.mitigation-intro {
font-size: 13.5px;
color: var(--muted);
margin-bottom: 24px;
padding: 14px 18px;
border-left: 3px solid var(--accent-amber);
background: #fff8e6;
font-family: 'JetBrains Mono', monospace;
letter-spacing: 0.03em;
line-height: 1.6;
}
.mitigation-intro strong { color: var(--ink); }
.mitigation-category {
margin-bottom: 32px;
}
.mitigation-category-header {
display: flex;
align-items: center;
gap: 12px;
margin-bottom: 14px;
padding-bottom: 8px;
border-bottom: 1px solid var(--rule);
}
.mitigation-category-header .cat-icon {
width: 32px;
height: 32px;
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
font-size: 15px;
flex-shrink: 0;
}
.mitigation-category-header h3 {
font-family: 'Playfair Display', serif;
font-size: 18px;
font-weight: 700;
}
.mitigation-category-header .severity-badge {
font-family: 'JetBrains Mono', monospace;
font-size: 9px;
letter-spacing: 0.2em;
text-transform: uppercase;
padding: 2px 8px;
border-radius: 2px;
margin-left: auto;
}
.sev-critical { background: #8b2020; color: white; }
.sev-high { background: #b8820a; color: white; }
.sev-medium { background: #1a3d6b; color: white; }
.mitigation-steps {
display: grid;
gap: 10px;
}
.mitigation-step {
background: white;
border: 1px solid var(--rule);
padding: 0;
overflow: hidden;
transition: border-color 0.15s;
}
.mitigation-step:hover { border-color: var(--ink); }
.mitigation-step-header {
display: flex;
align-items: center;
gap: 12px;
padding: 14px 18px;
cursor: pointer;
user-select: none;
}
.step-num {
font-family: 'JetBrains Mono', monospace;
font-size: 11px;
font-weight: 500;
color: var(--muted);
min-width: 24px;
}
.step-title {
font-family: 'DM Sans', sans-serif;
font-weight: 500;
font-size: 14px;
flex: 1;
}
.step-tool-tag {
font-family: 'JetBrains Mono', monospace;
font-size: 9px;
letter-spacing: 0.15em;
padding: 2px 7px;
background: var(--paper2);
border: 1px solid var(--rule);
border-radius: 2px;
color: var(--muted);
white-space: nowrap;
}
.step-arrow {
font-size: 12px;
color: var(--muted);
transition: transform 0.2s;
margin-left: 4px;
}
.mitigation-step.open .step-arrow { transform: rotate(90deg); }
.mitigation-step-body {
display: none;
padding: 0 18px 16px 54px;
font-size: 13px;
color: var(--muted);
line-height: 1.6;
border-top: 1px solid var(--paper2);
}
.mitigation-step.open .mitigation-step-body { display: block; }
.mitigation-step-body p { margin-top: 10px; }
.mitigation-step-body .code-pill {
display: inline-block;
font-family: 'JetBrains Mono', monospace;
font-size: 11px;
background: var(--ink);
color: var(--paper);
padding: 2px 8px;
border-radius: 3px;
margin: 2px;
white-space: nowrap;
}
.mitigation-step-body .tip-box {
background: #eaf5ee;
border: 1px solid var(--accent-green);
border-radius: 3px;
padding: 10px 14px;
margin-top: 10px;
font-size: 12.5px;
color: var(--accent-green);
}
.mitigation-step-body .tip-box strong { color: var(--accent-green); }
.mitigation-step-body .warn-box {
background: #fff8e6;
border: 1px solid var(--accent-amber);
border-radius: 3px;
padding: 10px 14px;
margin-top: 10px;
font-size: 12.5px;
color: #7a5500;
}
.effort-row {
display: flex;
align-items: center;
gap: 10px;
margin-top: 12px;
padding-top: 10px;
border-top: 1px solid var(--paper2);
}
.effort-label {
font-family: 'JetBrains Mono', monospace;
font-size: 9px;
letter-spacing: 0.15em;
text-transform: uppercase;
color: var(--muted);
min-width: 80px;
}
.effort-pips {
display: flex;
gap: 4px;
}
.pip {
width: 10px;
height: 10px;
border-radius: 50%;
border: 1px solid var(--rule);
}
.pip.filled { background: var(--ink); border-color: var(--ink); }
.pip.amber { background: var(--accent-amber); border-color: var(--accent-amber); }
.pip.green { background: var(--accent-green); border-color: var(--accent-green); }
.effort-desc {
font-size: 11px;
color: var(--muted);
}
/* UPKEEP SYSTEM */
.upkeep-system {
background: var(--ink);
color: var(--paper);
padding: 28px 32px;
margin-bottom: 24px;
position: relative;
overflow: hidden;
}
.upkeep-system::after {
content: 'UPKEEP';
font-family: 'Playfair Display', serif;
font-size: 100px;
font-weight: 900;
position: absolute;
right: -10px;
bottom: -20px;
opacity: 0.06;
line-height: 1;
letter-spacing: -0.05em;
}
.upkeep-system .kicker {
font-family: 'JetBrains Mono', monospace;
font-size: 10px;
letter-spacing: 0.25em;
text-transform: uppercase;
color: var(--accent-amber);
margin-bottom: 10px;
}
.upkeep-system h3 {
font-family: 'Playfair Display', serif;
font-size: 22px;
font-weight: 700;
margin-bottom: 14px;
}
.upkeep-system p {
font-size: 13.5px;
color: #c4b99a;
line-height: 1.6;
max-width: 680px;
margin-bottom: 20px;
position: relative;
z-index: 1;
}
.upkeep-cadence {
display: grid;
grid-template-columns: repeat(4, 1fr);
gap: 12px;
position: relative;
z-index: 1;
}
.cadence-cell {
border: 1px solid #3a3530;
padding: 14px;
background: #161410;
}
.cadence-cell .freq {
font-family: 'JetBrains Mono', monospace;
font-size: 10px;
letter-spacing: 0.2em;
text-transform: uppercase;
color: var(--accent-amber);
margin-bottom: 6px;
}
.cadence-cell .cadence-title {
font-family: 'Playfair Display', serif;
font-size: 14px;
font-weight: 700;
margin-bottom: 6px;
}
.cadence-cell ul {
list-style: none;
padding: 0;
}
.cadence-cell ul li {
font-size: 11px;
color: #c4b99a;
padding: 3px 0;
border-bottom: 1px solid #2a2520;
display: flex;
align-items: flex-start;
gap: 6px;
}
.cadence-cell ul li::before {
content: '→';
color: #5a5040;
flex-shrink: 0;
margin-top: 1px;
}
.cadence-cell ul li:last-child { border-bottom: none; }
/* MEMPALACE STYLES */
.palace-hero {
background: linear-gradient(135deg, #0d1a0f 0%, #0a1520 50%, #1a0d10 100%);
color: var(--paper);
padding: 32px 36px;
margin-bottom: 28px;
position: relative;
overflow: hidden;
border: 1px solid #2a3a2a;
}
.palace-hero::before {
content: '⬡';
font-size: 220px;
position: absolute;
right: -30px;
top: -40px;
opacity: 0.04;
line-height: 1;
}
.palace-hero .kicker {
font-family: 'JetBrains Mono', monospace;
font-size: 10px;
letter-spacing: 0.3em;
text-transform: uppercase;
color: #5aba7a;
margin-bottom: 10px;
}
.palace-hero h3 {
font-family: 'Playfair Display', serif;
font-size: 26px;
font-weight: 900;
margin-bottom: 12px;
line-height: 1.2;
}
.palace-hero h3 em { color: #5aba7a; font-style: normal; }
.palace-hero p {
font-size: 13.5px;
color: #a0b8a0;
max-width: 700px;
line-height: 1.7;
margin-bottom: 12px;
position: relative;
z-index: 1;
}
.palace-hero .hero-stats {
display: flex;
gap: 28px;
flex-wrap: wrap;
margin-top: 20px;
border-top: 1px solid #2a3a2a;
padding-top: 16px;
}
.palace-hero .hstat {
font-family: 'JetBrains Mono', monospace;
}
.palace-hero .hstat .hval {
font-size: 22px;
font-weight: 500;
color: #5aba7a;
display: block;
line-height: 1.2;
}
.palace-hero .hstat .hlbl {
font-size: 9px;
letter-spacing: 0.15em;
text-transform: uppercase;
color: #607060;
}
.palace-map {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 0;
border: 2px solid var(--ink);
margin-bottom: 28px;
}
.palace-cell {
padding: 20px;
border-right: 1px solid var(--rule);
border-bottom: 1px solid var(--rule);
}
.palace-cell:nth-child(2n) { border-right: none; }
.palace-cell:nth-last-child(-n+2) { border-bottom: none; }
.palace-cell .pc-icon {
font-size: 22px;
margin-bottom: 8px;
}
.palace-cell .pc-term {
font-family: 'JetBrains Mono', monospace;
font-size: 11px;
font-weight: 500;
letter-spacing: 0.1em;
text-transform: uppercase;
color: var(--accent-green);
margin-bottom: 4px;
}
.palace-cell .pc-name {
font-family: 'Playfair Display', serif;
font-size: 17px;
font-weight: 700;
margin-bottom: 6px;
}
.palace-cell .pc-desc {
font-size: 12.5px;
color: var(--muted);
line-height: 1.5;
}
.palace-cell .pc-wiki-map {
margin-top: 8px;
font-family: 'JetBrains Mono', monospace;
font-size: 10px;
color: var(--accent-amber);
background: #fff8e6;
padding: 4px 8px;
border-radius: 2px;
letter-spacing: 0.05em;
}
/* Impact cards */
.impact-grid {
display: grid;
grid-template-columns: repeat(3, 1fr);
gap: 14px;
margin-bottom: 28px;
}
.impact-card {
border: 2px solid var(--rule);
padding: 18px;
position: relative;
}
.impact-card .impact-verdict {
position: absolute;
top: 12px;
right: 12px;
font-family: 'JetBrains Mono', monospace;
font-size: 9px;
letter-spacing: 0.15em;
padding: 2px 7px;
border-radius: 2px;
text-transform: uppercase;
}
.verdict-solved { background: var(--accent-green); color: white; }
.verdict-reduced { background: #4a7a1a; color: white; }
.verdict-shifted { background: var(--accent-blue); color: white; }
.verdict-new { background: var(--accent-amber); color: white; }
.impact-card h4 {
font-family: 'Playfair Display', serif;
font-size: 15px;
font-weight: 700;
margin-bottom: 8px;
padding-right: 60px;
line-height: 1.3;
}
.impact-card p {
font-size: 12.5px;
color: var(--muted);
line-height: 1.5;
}
.impact-card .impact-before-after {
margin-top: 10px;
display: grid;
grid-template-columns: 1fr 1fr;
gap: 6px;
}
.impact-card .ba-cell {
font-size: 11px;
padding: 6px 8px;
border-radius: 2px;
line-height: 1.4;
}
.ba-before { background: #f5e8e8; color: var(--accent-red); }
.ba-after { background: #eaf5ee; color: var(--accent-green); }
.ba-label {
font-family: 'JetBrains Mono', monospace;
font-size: 8px;
letter-spacing: 0.15em;
text-transform: uppercase;
display: block;
margin-bottom: 3px;
opacity: 0.7;
}
.flow-diagram {
background: var(--paper2);
border: 2px solid var(--ink);
padding: 24px 28px;
margin-bottom: 28px;
font-family: 'JetBrains Mono', monospace;
}
.flow-diagram .flow-title {
font-size: 10px;
letter-spacing: 0.2em;
text-transform: uppercase;
color: var(--muted);
margin-bottom: 18px;
}
.flow-row {
display: flex;
align-items: center;
gap: 0;
margin-bottom: 8px;
flex-wrap: nowrap;
overflow-x: auto;
}
.flow-node {
font-size: 11px;
padding: 8px 12px;
border: 1px solid var(--ink);
background: white;
white-space: nowrap;
text-align: center;
line-height: 1.3;
}
.flow-node.palace { background: #e8f5ee; border-color: var(--accent-green); color: var(--accent-green); }
.flow-node.convo { background: #e8eef5; border-color: var(--accent-blue); color: var(--accent-blue); }
.flow-node.raw { background: #f0f0f0; border-color: #888; }
.flow-node.wiki { background: #f5f0e0; border-color: var(--accent-amber); }
.flow-node.qmd { background: #1a0d10; color: #f0c060; border-color: #4a2020; }
.flow-node.llm { background: var(--ink); color: var(--paper); }
.flow-arrow { padding: 0 6px; color: var(--muted); font-size: 14px; flex-shrink: 0; }
.flow-label {
font-size: 9px;
color: var(--muted);
letter-spacing: 0.1em;
text-align: center;
margin-bottom: 14px;
text-transform: uppercase;
}
.caveat-box {
background: #fff3cd;
border: 1px solid #b8820a;
padding: 16px 20px;
margin-bottom: 20px;
font-size: 13px;
line-height: 1.6;
}
.caveat-box .caveat-head {
font-family: 'JetBrains Mono', monospace;
font-size: 10px;
letter-spacing: 0.2em;
text-transform: uppercase;
color: var(--accent-amber);
margin-bottom: 6px;
}
.caveat-box strong { font-weight: 500; }
@media (max-width: 768px) {
.palace-map { grid-template-columns: 1fr; }
.impact-grid { grid-template-columns: 1fr; }
.masthead { grid-template-columns: 1fr; text-align: center; }
.masthead-right { text-align: center; }
.overview-strip { grid-template-columns: repeat(2, 1fr); }
.procon-grid { grid-template-columns: 1fr; }
.procon-col-header.con { border-left: none; border-top: 2px solid var(--ink); }
.use-cases { grid-template-columns: 1fr; }
.signal-grid { grid-template-columns: 1fr; }
.page { padding: 0 20px 40px; }
.masthead { padding: 20px; }
.arch-diagram { flex-direction: column; }
.arch-arrow { transform: rotate(90deg); }
}
</style>
</head>
<body>
<!-- MASTHEAD -->
<header class="masthead">
<div class="masthead-left">
Vol. I &bull; No. 1<br>
April 2026<br>
Special Report
</div>
<div class="masthead-center">
<h1>memex</h1>
<div class="subtitle">Karpathy's Pattern &mdash; Signal &amp; Noise</div>
</div>
<div class="masthead-right">
Source: github.com/karpathy<br>
17M+ Views &bull; 5K+ Stars<br>
Community Analysis
</div>
</header>
<!-- TICKER -->
<div class="ticker">
<span class="ticker-track">
<span class="ticker-item">PERSISTENT MEMORY <span class="ticker-dot"></span></span>
<span class="ticker-item">RAG vs WIKI <span class="ticker-dot"></span></span>
<span class="ticker-item">COMPILE ONCE · QUERY FOREVER <span class="ticker-dot"></span></span>
<span class="ticker-item">~100 ARTICLES SWEET SPOT <span class="ticker-dot"></span></span>
<span class="ticker-item">KNOWLEDGE COMPOUNDS <span class="ticker-dot"></span></span>
<span class="ticker-item">PERSONAL SCALE ONLY <span class="ticker-dot"></span></span>
<span class="ticker-item">HALLUCINATIONS PERSIST <span class="ticker-dot"></span></span>
<span class="ticker-item">NO ENTERPRISE RBAC <span class="ticker-dot"></span></span>
<span class="ticker-item">MARKDOWN IS FUTURE-PROOF <span class="ticker-dot"></span></span>
<span class="ticker-item">PERSISTENT MEMORY <span class="ticker-dot"></span></span>
<span class="ticker-item">RAG vs WIKI <span class="ticker-dot"></span></span>
<span class="ticker-item">COMPILE ONCE · QUERY FOREVER <span class="ticker-dot"></span></span>
<span class="ticker-item">~100 ARTICLES SWEET SPOT <span class="ticker-dot"></span></span>
<span class="ticker-item">KNOWLEDGE COMPOUNDS <span class="ticker-dot"></span></span>
<span class="ticker-item">PERSONAL SCALE ONLY <span class="ticker-dot"></span></span>
<span class="ticker-item">HALLUCINATIONS PERSIST <span class="ticker-dot"></span></span>
<span class="ticker-item">NO ENTERPRISE RBAC <span class="ticker-dot"></span></span>
<span class="ticker-item">MARKDOWN IS FUTURE-PROOF <span class="ticker-dot"></span></span>
</span>
</div>
<!-- STATS STRIP -->
<div class="overview-strip">
<div class="stat-cell">
<div class="stat-num">17M+</div>
<div class="stat-label">Tweet Views</div>
</div>
<div class="stat-cell">
<div class="stat-num" style="color:var(--accent-green)">~100</div>
<div class="stat-label">Articles · Sweet Spot</div>
</div>
<div class="stat-cell">
<div class="stat-num" style="color:var(--accent-amber)">400K</div>
<div class="stat-label">Words · Karpathy's Wiki</div>
</div>
<div class="stat-cell">
<div class="stat-num" style="color:var(--accent-red)">50K</div>
<div class="stat-label">Token Ceiling</div>
</div>
</div>
<div class="page">
<!-- CONCEPT BOX -->
<div class="concept-box">
<div class="kicker">The Core Idea</div>
<p>Instead of making the LLM <em>rediscover</em> knowledge from raw documents on every query — the RAG way — Karpathy proposes having the LLM <em>compile</em> a structured, interlinked wiki once at ingest time. Knowledge <em>accumulates</em>. The LLM maintains the wiki, not the human.</p>
</div>
<!-- ARCHITECTURE -->
<div class="section-header">
<h2>Architecture</h2>
<span class="section-tag">Three Layers</span>
</div>
<div class="arch-diagram">
<div class="arch-node raw">
<div class="node-label" style="color:var(--accent-blue)">Layer 1</div>
<div class="node-name">raw/</div>
<div class="node-desc">PDFs, articles, web clips. Immutable. Human adds, LLM never modifies.</div>
</div>
<div class="arch-arrow"></div>
<div class="arch-node" style="border-color:var(--ink); background:#f0f0f0">
<div class="node-label" style="color:var(--ink)">Process</div>
<div class="node-name">🤖 LLM</div>
<div class="node-desc">Reads sources. Synthesizes, links, and compiles structured pages. Runs lint checks.</div>
</div>
<div class="arch-arrow"></div>
<div class="arch-node wiki">
<div class="node-label" style="color:var(--accent-green)">Layer 2</div>
<div class="node-name">wiki/</div>
<div class="node-desc">Compiled markdown pages. Encyclopedia-style articles with cross-references.</div>
</div>
<div class="arch-arrow">+</div>
<div class="arch-node schema">
<div class="node-label" style="color:var(--accent-amber)">Layer 3</div>
<div class="node-name">schema</div>
<div class="node-desc">CLAUDE.md / AGENTS.md. Rules that discipline the LLM's behavior as maintainer.</div>
</div>
</div>
<!-- TABS -->
<div class="tab-row">
<button class="tab-btn active" onclick="switchTab(this, 'tab-procon')">Pros &amp; Cons</button>
<button class="tab-btn" onclick="switchTab(this, 'tab-vs-rag')">vs RAG</button>
<button class="tab-btn" onclick="switchTab(this, 'tab-usecases')">Best / Worst Fits</button>
<button class="tab-btn" onclick="switchTab(this, 'tab-signals')">Signal Breakdown</button>
<button class="tab-btn" onclick="switchTab(this, 'tab-mitigations')">Mitigations ★</button>
<button class="tab-btn" onclick="switchTab(this, 'tab-mempalace')" style="color:var(--accent-green);font-weight:600">MemPalace ⬡</button>
<button class="tab-btn" onclick="switchTab(this, 'tab-distill')" style="color:var(--accent-amber);font-weight:600">Distill ⬣</button>
</div>
<!-- TAB: PROS & CONS -->
<div id="tab-procon" class="tab-panel active">
<p style="font-size:13px;color:var(--muted);margin-bottom:16px;font-family:'JetBrains Mono',monospace;letter-spacing:0.05em;">↓ Tap any row to expand analysis</p>
<div class="procon-grid">
<!-- PROS -->
<div class="procon-col">
<div class="procon-col-header pro">
<span class="icon"></span> Strengths
</div>
<div class="procon-items">
<div class="procon-item pro-item" onclick="toggleItem(this)">
<div class="item-title"><span class="bullet"></span>Knowledge Compounds Over Time</div>
<div class="item-detail">Unlike RAG — where every query starts from scratch re-deriving connections — the LLM wiki is stateful. Each new source you add integrates into existing pages, strengthening existing connections and building new ones. The system gets more valuable with every addition, not just bigger.</div>
<span class="expand-hint">+</span>
</div>
<div class="procon-item pro-item" onclick="toggleItem(this)">
<div class="item-title"><span class="bullet"></span>Zero Maintenance Burden on Humans</div>
<div class="item-detail">The grunt work of knowledge management — cross-referencing, updating related pages, creating summaries, flagging contradictions — is what kills every personal wiki humans try to maintain. LLMs do this tirelessly. The human's job shrinks to: decide what to read, and what questions to ask.</div>
<span class="expand-hint">+</span>
</div>
<div class="procon-item pro-item" onclick="toggleItem(this)">
<div class="item-title"><span class="bullet"></span>Token-Efficient at Personal Scale</div>
<div class="item-detail">At ~100 articles, the wiki's index.md fits in context. The LLM reads the index, identifies relevant articles, and loads only those — no embedding, no vector search, no retrieval noise. This is faster and cheaper per query than a full RAG pipeline for this scale.</div>
<span class="expand-hint">+</span>
</div>
<div class="procon-item pro-item" onclick="toggleItem(this)">
<div class="item-title"><span class="bullet"></span>Human-Readable & Auditable</div>
<div class="item-detail">The wiki is just markdown. You can open it in any editor, read it yourself, version it in git, and inspect every claim. There's no black-box vector math. Every connection the LLM made is visible as a [[wikilink]]. This transparency is a genuine advantage over opaque embeddings.</div>
<span class="expand-hint">+</span>
</div>
<div class="procon-item pro-item" onclick="toggleItem(this)">
<div class="item-title"><span class="bullet"></span>Future-Proof & Portable</div>
<div class="item-detail">Plain markdown files work with any tool, any model, any era. No vendor lock-in. No proprietary database. When GPT-7 or Claude 5 releases, you point it at the same folder. The data outlives the tooling.</div>
<span class="expand-hint">+</span>
</div>
<div class="procon-item pro-item" onclick="toggleItem(this)">
<div class="item-title"><span class="bullet"></span>Self-Healing via Lint Passes</div>
<div class="item-detail">Karpathy describes periodic "health check" passes where the LLM scans the entire wiki for contradictions, orphaned pages (no links pointing to them), and concepts referenced but not yet given their own page. The wiki actively repairs itself rather than rotting silently.</div>
<span class="expand-hint">+</span>
</div>
<div class="procon-item pro-item" onclick="toggleItem(this)">
<div class="item-title"><span class="bullet"></span>Path to Fine-Tuning</div>
<div class="item-detail">As the wiki matures and gets "purified" through continuous lint passes, it becomes high-quality synthetic training data. Karpathy points to the possibility of fine-tuning a smaller, efficient model directly on the wiki — so the LLM "knows" your knowledge base in its own weights, not just its context.</div>
<span class="expand-hint">+</span>
</div>
</div>
</div>
<!-- CONS -->
<div class="procon-col">
<div class="procon-col-header con">
<span class="icon"></span> Weaknesses
</div>
<div class="procon-items">
<div class="procon-item con-item" onclick="toggleItem(this)">
<div class="item-title"><span class="bullet"></span>Errors Persist &amp; Compound</div>
<div class="item-detail">This is the most serious structural flaw. With RAG, hallucinations are ephemeral — wrong answer this query, clean slate next time. With an LLM wiki, if the LLM incorrectly links two concepts at ingest time, that mistake becomes a prior that future ingest passes build upon. Persistent errors are more dangerous than ephemeral ones.</div>
<span class="expand-hint">+</span>
</div>
<div class="procon-item con-item" onclick="toggleItem(this)">
<div class="item-title"><span class="bullet"></span>Hard Scale Ceiling (~50K tokens)</div>
<div class="item-detail">The wiki approach stops working reliably when the index can no longer fit in the model's context window — roughly 50,000100,000 tokens. Karpathy's own wiki is ~100 articles / ~400K words on a single topic. A mid-size company has thousands of documents; a large one has millions. The architecture simply doesn't extend to that scale.</div>
<span class="expand-hint">+</span>
</div>
<div class="procon-item con-item" onclick="toggleItem(this)">
<div class="item-title"><span class="bullet"></span>No Access Control or Multi-User Support</div>
<div class="item-detail">It's a folder of markdown files. There is no Role-Based Access Control, no audit logging, no concurrency handling for simultaneous writes, no permissions model. Multiple users or agents creating write conflicts is unmanaged. This is not a limitation that can be patched — it's a structural consequence of the architecture.</div>
<span class="expand-hint">+</span>
</div>
<div class="procon-item con-item" onclick="toggleItem(this)">
<div class="item-title"><span class="bullet"></span>Manual Cross-Checking Burden Returns</div>
<div class="item-detail">In precision-critical domains (API specs, version constraints, legal records), LLM-generated content requires human cross-checking against raw sources to catch subtle factual errors. At that point, the maintenance burden you thought you'd eliminated returns in a different form: verification overhead.</div>
<span class="expand-hint">+</span>
</div>
<div class="procon-item con-item" onclick="toggleItem(this)">
<div class="item-title"><span class="bullet"></span>Cognitive Outsourcing Risk</div>
<div class="item-detail">Critics on Hacker News argued that the bookkeeping Karpathy outsources — filing, cross-referencing, summarizing — is precisely where genuine understanding forms. By handing this to an LLM, you may end up with a comprehensive wiki you haven't internalized. You have a great reference; you may lack deep ownership of the knowledge.</div>
<span class="expand-hint">+</span>
</div>
<div class="procon-item con-item" onclick="toggleItem(this)">
<div class="item-title"><span class="bullet"></span>Knowledge Staleness Without Active Upkeep</div>
<div class="item-detail">Community reports show that most people who try this pattern get the folder structure right but end up with a wiki that slowly becomes unreliable or gets abandoned. The system requires consistent source ingestion and regular lint passes. If you stop feeding it, the wiki rots — its age relative to your domain's pace of change becomes a liability.</div>
<span class="expand-hint">+</span>
</div>
<div class="procon-item con-item" onclick="toggleItem(this)">
<div class="item-title"><span class="bullet"></span>Weaker Semantic Retrieval than RAG</div>
<div class="item-detail">Markdown wikilinks are explicit and manually-created. Vector embeddings discover semantic connections across differently-worded text that manual linking simply cannot — finding that an article titled "caching strategies" is semantically related to "performance bottlenecks" without an explicit link. At large corpora, RAG's fuzzy matching is the superior retrieval mechanism.</div>
<span class="expand-hint">+</span>
</div>
</div>
</div>
</div>
</div>
<!-- TAB: VS RAG -->
<div id="tab-vs-rag" class="tab-panel">
<div class="pull-quote">
RAG retrieves and forgets. A wiki accumulates and compounds.
<span class="attribution">— LLM Wiki v2, community extension of Karpathy's pattern</span>
</div>
<div class="scale-note">
<div class="warn-icon"></div>
<div><strong>Scale matters most here.</strong> The comparison is not absolute — it is highly scale-dependent. Below ~50K tokens, the wiki pattern wins. Above that threshold, RAG's architecture becomes necessary regardless of the storage format.</div>
</div>
<table class="compare-table">
<thead>
<tr>
<th>Dimension</th>
<th class="col-wiki">memex / LLM Wiki</th>
<th class="col-rag">RAG</th>
</tr>
</thead>
<tbody>
<tr>
<td class="row-label">Knowledge Accumulation</td>
<td class="col-wiki cell-win">✦ Compounds with each ingest</td>
<td class="col-rag cell-lose">Stateless — restarts every query</td>
</tr>
<tr>
<td class="row-label">Maintenance Cost</td>
<td class="col-wiki cell-win">✦ LLM does the filing</td>
<td class="col-rag cell-mid">Chunking pipelines need upkeep</td>
</tr>
<tr>
<td class="row-label">Scale Ceiling</td>
<td class="col-wiki cell-lose">~50100K tokens hard limit</td>
<td class="col-rag cell-win">✦ Millions of documents, no ceiling</td>
</tr>
<tr>
<td class="row-label">Human Readability</td>
<td class="col-wiki cell-win">✦ Plain markdown, fully auditable</td>
<td class="col-rag cell-lose">Black-box vector space</td>
</tr>
<tr>
<td class="row-label">Semantic Retrieval</td>
<td class="col-wiki cell-mid">Explicit links only</td>
<td class="col-rag cell-win">✦ Fuzzy semantic matching</td>
</tr>
<tr>
<td class="row-label">Error Persistence</td>
<td class="col-wiki cell-lose">Errors compound into future pages</td>
<td class="col-rag cell-mid">Errors are ephemeral per query</td>
</tr>
<tr>
<td class="row-label">Multi-user / RBAC</td>
<td class="col-wiki cell-lose">None — flat file system</td>
<td class="col-rag cell-win">✦ Supported by most platforms</td>
</tr>
<tr>
<td class="row-label">Query Latency</td>
<td class="col-wiki cell-win">✦ Fast at personal scale</td>
<td class="col-rag cell-mid">Embedding search overhead</td>
</tr>
<tr>
<td class="row-label">Setup Complexity</td>
<td class="col-wiki cell-win">✦ Just folders &amp; markdown</td>
<td class="col-rag cell-lose">Vector DB, chunking, embeddings</td>
</tr>
<tr>
<td class="row-label">Vendor Lock-in</td>
<td class="col-wiki cell-win">✦ Zero — any model, any editor</td>
<td class="col-rag cell-lose">Often tied to embedding provider</td>
</tr>
<tr>
<td class="row-label">Cross-reference Quality</td>
<td class="col-wiki cell-win">✦ Rich, named wikilinks</td>
<td class="col-rag cell-mid">Implicit via similarity score</td>
</tr>
<tr>
<td class="row-label">Fine-tuning Pathway</td>
<td class="col-wiki cell-win">✦ Wiki becomes training data</td>
<td class="col-rag cell-lose">Raw chunks are poor training data</td>
</tr>
</tbody>
</table>
</div>
<!-- TAB: USE CASES -->
<div id="tab-usecases" class="tab-panel">
<div class="use-cases">
<div class="use-case-card">
<span class="verdict-badge badge-excellent">Excellent Fit</span>
<h4>Solo Deep Research</h4>
<p>Reading papers, articles, and reports over weeks or months on a single topic. Karpathy's primary use case — his ML research wiki has ~100 articles and 400K words, all compiled without writing a line manually.</p>
<div class="fit-meter"><div class="fit-fill" style="width:95%;background:var(--accent-green)"></div></div>
</div>
<div class="use-case-card">
<span class="verdict-badge badge-excellent">Excellent Fit</span>
<h4>Personal Knowledge Base</h4>
<p>Goals, health tracking, journal entries, podcast notes — building a structured picture of yourself over time. The LLM creates concept pages for recurring themes and connects them across months or years.</p>
<div class="fit-meter"><div class="fit-fill" style="width:90%;background:var(--accent-green)"></div></div>
</div>
<div class="use-case-card">
<span class="verdict-badge badge-good">Good Fit</span>
<h4>Small Team Wiki (&lt;500 articles)</h4>
<p>Engineering team internal docs, competitive analysis, trip planning. Works well if one person owns ingestion and the team reads via Obsidian. Breaks at concurrent writes or RBAC requirements.</p>
<div class="fit-meter"><div class="fit-fill" style="width:65%;background:#4a7a1a"></div></div>
</div>
<div class="use-case-card">
<span class="verdict-badge badge-good">Good Fit</span>
<h4>Agentic Pipeline Memory</h4>
<p>AI agent systems that need persistent memory between sessions. The wiki prevents agents from "waking up blank." Session context is compiled rather than re-derived, dramatically cutting token overhead.</p>
<div class="fit-meter"><div class="fit-fill" style="width:70%;background:#4a7a1a"></div></div>
</div>
<div class="use-case-card">
<span class="verdict-badge badge-poor">Poor Fit</span>
<h4>Mission-Critical Precision</h4>
<p>API parameter specs, version constraints, legal records, medical protocols. LLM-generated pages can silently misstate critical details. Manual cross-checking eliminates the maintenance savings that make this pattern attractive.</p>
<div class="fit-meter"><div class="fit-fill" style="width:25%;background:var(--accent-amber)"></div></div>
</div>
<div class="use-case-card">
<span class="verdict-badge badge-avoid">Avoid</span>
<h4>Enterprise Knowledge Management</h4>
<p>Millions of documents, hundreds of users, RBAC, audit trails, regulatory compliance. The flat file architecture cannot address concurrency, access control, or governance. This is a personal productivity hack, not enterprise infrastructure.</p>
<div class="fit-meter"><div class="fit-fill" style="width:8%;background:var(--accent-red)"></div></div>
</div>
</div>
</div>
<!-- TAB: SIGNALS -->
<div id="tab-signals" class="tab-panel">
<p style="font-size:13px;color:var(--muted);margin-bottom:20px;">A breakdown of where the pattern generates real signal vs. where the noise grows louder.</p>
<div class="signal-grid">
<div class="signal-card">
<div class="signal-category">Signal</div>
<h4>The Compile-Time Insight</h4>
<p>Moving synthesis from query-time (RAG) to ingest-time (wiki) is a genuinely novel architectural choice with real benefits for accumulation. This is the core innovation and it holds up to scrutiny.</p>
<div class="signal-bar">
<div class="signal-bar-track"><div class="signal-bar-fill" style="width:90%;background:var(--accent-green)"></div></div>
<div class="signal-bar-label" style="color:var(--accent-green)">Strong</div>
</div>
</div>
<div class="signal-card">
<div class="signal-category">Signal</div>
<h4>LLM as Librarian</h4>
<p>Offloading the maintenance bottleneck — the work that kills all human-maintained wikis — to an LLM is elegant and correct. The pattern solves a real problem people actually have.</p>
<div class="signal-bar">
<div class="signal-bar-track"><div class="signal-bar-fill" style="width:85%;background:var(--accent-green)"></div></div>
<div class="signal-bar-label" style="color:var(--accent-green)">Strong</div>
</div>
</div>
<div class="signal-card">
<div class="signal-category">Noise</div>
<h4>"RAG is Dead"</h4>
<p>Community hyperbole. RAG and the wiki pattern solve different problems at different scales. The wiki pattern is a personal productivity tool, not a replacement for enterprise-grade retrieval infrastructure.</p>
<div class="signal-bar">
<div class="signal-bar-track"><div class="signal-bar-fill" style="width:80%;background:var(--accent-red)"></div></div>
<div class="signal-bar-label" style="color:var(--accent-red)">High Noise</div>
</div>
</div>
<div class="signal-card">
<div class="signal-category">Noise</div>
<h4>Error Amplification Risk</h4>
<p>Real and underweighted by enthusiasts. The persistent-error problem is structural — not a bug to fix with better prompting. It's a genuine trade-off the pattern makes, and it's most dangerous in precision-critical domains.</p>
<div class="signal-bar">
<div class="signal-bar-track"><div class="signal-bar-fill" style="width:65%;background:var(--accent-amber)"></div></div>
<div class="signal-bar-label" style="color:var(--accent-amber)">Real Risk</div>
</div>
</div>
<div class="signal-card">
<div class="signal-category">Signal</div>
<h4>The Idea File Paradigm</h4>
<p>Karpathy's framing of sharing an "idea file" vs. a code repo — letting each person's agent instantiate a custom version — is genuinely forward-thinking about how patterns propagate in the agent era.</p>
<div class="signal-bar">
<div class="signal-bar-track"><div class="signal-bar-fill" style="width:75%;background:var(--accent-green)"></div></div>
<div class="signal-bar-label" style="color:var(--accent-green)">Solid</div>
</div>
</div>
<div class="signal-card">
<div class="signal-category">Noise</div>
<h4>"It'll Replace Enterprise RAG"</h4>
<p>Karpathy explicitly scoped this to individual researchers. The limitations (no RBAC, no concurrency, ~50K token ceiling) are not bugs — they are consequences of the design assumptions. Enterprise use requires entirely different infrastructure.</p>
<div class="signal-bar">
<div class="signal-bar-track"><div class="signal-bar-fill" style="width:88%;background:var(--accent-red)"></div></div>
<div class="signal-bar-label" style="color:var(--accent-red)">Pure Noise</div>
</div>
</div>
</div>
<div class="pull-quote">
The schema file is a wish, not a discipline. The lack of an actual security model structurally makes this a skill with a dedicated output directory and no guardrails.
<span class="attribution">— Threads community critique, April 2026</span>
</div>
<div class="pull-quote" style="border-left-color:var(--accent-green)">
The bottleneck for personal knowledge bases was never the reading. It was the boring maintenance work nobody wanted to do. LLMs eliminate that bottleneck.
<span class="attribution">— LLM Wiki v2 community extension</span>
</div>
</div>
<!-- TAB: MITIGATIONS -->
<div id="tab-mitigations" class="tab-panel">
<div class="mitigation-intro">
<strong>These are the real engineering answers.</strong> For each known limitation, the community has converged on concrete mitigations — some from Karpathy's own gist, others from production implementations. Click any row to expand the full approach. The Active Upkeep section at the bottom is the one that matters most.
</div>
<!-- SCALING -->
<div class="mitigation-category">
<div class="mitigation-category-header">
<div class="cat-icon" style="background:#e8eef5;font-size:16px">📈</div>
<h3>Scaling Past the Token Ceiling</h3>
<span class="severity-badge sev-high">High Priority</span>
</div>
<div class="mitigation-steps">
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">01</span>
<span class="step-title">Add qmd as your search layer at 50100+ articles</span>
<span class="step-tool-tag">qmd · CLI + MCP</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>The index.md breaks around 100150 articles when it stops fitting cleanly in context. The community-endorsed fix is <strong>qmd</strong> — built by Tobi Lütke (Shopify CEO) and explicitly recommended by Karpathy himself. It's a local, on-device search engine for markdown files using hybrid BM25 + vector search with LLM re-ranking. No API calls, no data leaves your machine.</p>
<p>Install and integrate:</p>
<p>
<span class="code-pill">npm install -g @tobilu/qmd</span>
<span class="code-pill">qmd collection add ./wiki --name my-research</span>
<span class="code-pill">qmd mcp</span>
</p>
<p>The <code>qmd mcp</code> command exposes it as an MCP server so Claude Code uses it as a native tool — no shell-out friction. Three search modes: keyword BM25 (<span class="code-pill">qmd search</span>), semantic vector (<span class="code-pill">qmd vsearch</span>), and hybrid re-ranked (<span class="code-pill">qmd query</span>). Use the JSON output flag to pipe results into agent workflows.</p>
<div class="tip-box"><strong>Sweet spot:</strong> Use plain index.md navigation up to ~50 articles. Introduce qmd around 50100. At 200+, qmd becomes essential — not optional.</div>
<div class="effort-row">
<span class="effort-label">Setup Effort</span>
<div class="effort-pips">
<div class="pip filled"></div><div class="pip filled"></div><div class="pip"></div><div class="pip"></div><div class="pip"></div>
</div>
<span class="effort-desc">30 min one-time setup</span>
</div>
</div>
</div>
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">02</span>
<span class="step-title">Shard the index — one sub-index per topic domain</span>
<span class="step-tool-tag">Schema · CLAUDE.md</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>Before reaching for qmd, a simpler scaling step is to split index.md into domain-specific sub-indexes: <span class="code-pill">wiki/ml-theory/index.md</span>, <span class="code-pill">wiki/infrastructure/index.md</span>, etc. A root index.md points to sub-indexes, keeping any single file within comfortable context window bounds.</p>
<p>Define this in your schema file (CLAUDE.md) so the LLM knows which sub-index to update on ingest and which to consult on query. The LLM reads only the relevant sub-index, not the full corpus.</p>
<div class="warn-box">Sharding adds maintenance complexity to the schema. Document the domain boundaries clearly or the LLM will make inconsistent decisions about where new content lands.</div>
<div class="effort-row">
<span class="effort-label">Setup Effort</span>
<div class="effort-pips">
<div class="pip filled"></div><div class="pip"></div><div class="pip"></div><div class="pip"></div><div class="pip"></div>
</div>
<span class="effort-desc">15 min schema update</span>
</div>
</div>
</div>
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">03</span>
<span class="step-title">Consolidation tiers — promote stable knowledge up the stack</span>
<span class="step-tool-tag">LLM Wiki v2 pattern</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>From the LLM Wiki v2 community extension: structure knowledge in tiers by confidence and stability. Raw observations live in low-confidence pages. After multi-source confirmation, the LLM promotes them to "established" pages. Core principles graduate to a high-confidence tier that rarely changes.</p>
<p>Each tier is more compressed, more confident, and longer-lived than the one below it. The LLM only loads lower tiers when deeper detail is needed. This naturally keeps context window usage lean as the wiki grows — you're querying the compressed tier first, the full tier only on demand.</p>
<div class="tip-box"><strong>Payoff:</strong> This also solves the staleness problem. Lower-tier pages decay naturally; upper-tier facts are reinforced repeatedly and earn their permanence.</div>
<div class="effort-row">
<span class="effort-label">Setup Effort</span>
<div class="effort-pips">
<div class="pip filled"></div><div class="pip filled"></div><div class="pip filled"></div><div class="pip"></div><div class="pip"></div>
</div>
<span class="effort-desc">Schema design work, ongoing co-evolution</span>
</div>
</div>
</div>
</div>
</div>
<!-- ACCESS CONTROL -->
<div class="mitigation-category">
<div class="mitigation-category-header">
<div class="cat-icon" style="background:#eaf5ee;font-size:16px">🔐</div>
<h3>Access Control &amp; Multi-User</h3>
<span class="severity-badge sev-medium">Medium Priority</span>
</div>
<div class="mitigation-steps">
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">01</span>
<span class="step-title">Host behind a lightweight wrapper — llmwiki.app or self-hosted MCP</span>
<span class="step-tool-tag">MCP · llmwiki · FastAPI</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>The flat-file architecture has no access control by default. The cleanest mitigation is to expose the wiki through an MCP server rather than as raw files. The open-source <strong>llmwiki</strong> project (lucasastorian/llmwiki) does exactly this: it wraps the Karpathy pattern with a FastAPI backend, Supabase auth, and MCP endpoints. Claude connects via MCP and has read/write tools — but only through the authenticated layer.</p>
<p>For self-hosted setups: build a minimal FastAPI wrapper that authenticates via JWT before allowing MCP tool calls. The markdown files stay on disk; the API layer enforces who can read and write. This pattern is already used in production implementations like Hjarni.</p>
<div class="tip-box"><strong>Eric's wheelhouse:</strong> Given your OPNsense VLAN setup and existing FastAPI work on TaskForge, a simple auth wrapper is well within reach. Expose via Tailscale to keep it off the public internet entirely — no RBAC needed if the network boundary does the work.</div>
<div class="effort-row">
<span class="effort-label">Setup Effort</span>
<div class="effort-pips">
<div class="pip filled"></div><div class="pip filled"></div><div class="pip filled"></div><div class="pip"></div><div class="pip"></div>
</div>
<span class="effort-desc">Weekend project for self-hosted</span>
</div>
</div>
</div>
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">02</span>
<span class="step-title">Scoped directories for shared vs. private content</span>
<span class="step-tool-tag">Git · Directory structure</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>For small teams, a simpler pattern than full RBAC: separate <span class="code-pill">wiki/shared/</span> from <span class="code-pill">wiki/private/</span> directories, with git branch-level access control. The MCP server only exposes the <code>shared/</code> tree to team members; personal pages stay in <code>private/</code> on a branch only you merge from.</p>
<p>The LLM Wiki v2 pattern calls this "mesh sync with shared/private scoping." The schema file defines what can be promoted from private to shared and the conditions for that promotion.</p>
<div class="warn-box">This is soft access control — it relies on disciplined git usage, not cryptographic enforcement. Fine for trusted small teams; not for anything requiring audit trails or compliance.</div>
<div class="effort-row">
<span class="effort-label">Setup Effort</span>
<div class="effort-pips">
<div class="pip filled"></div><div class="pip"></div><div class="pip"></div><div class="pip"></div><div class="pip"></div>
</div>
<span class="effort-desc">Git config + schema update</span>
</div>
</div>
</div>
</div>
</div>
<!-- CROSS-CHECK / ERROR PERSISTENCE -->
<div class="mitigation-category">
<div class="mitigation-category-header">
<div class="cat-icon" style="background:#f5e8e8;font-size:16px">⚠️</div>
<h3>Cross-Check &amp; Error Persistence</h3>
<span class="severity-badge sev-high">High Priority</span>
</div>
<div class="mitigation-steps">
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">01</span>
<span class="step-title">Confidence scoring — every claim carries a decay score</span>
<span class="step-tool-tag">Frontmatter · Schema</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>The LLM Wiki v2 pattern solves persistent errors by making uncertainty explicit. Every factual claim in a wiki page carries metadata: how many sources support it, when it was last confirmed, and a confidence score (e.g., 0.85). Confidence decays with time and strengthens with reinforcement from new sources.</p>
<p>Implement this in YAML frontmatter on each page:</p>
<p>
<span class="code-pill">confidence: 0.85</span>
<span class="code-pill">sources: 2</span>
<span class="code-pill">last_confirmed: 2026-04-01</span>
</p>
<p>The lint pass checks for pages with decayed confidence scores and flags them for re-verification. The LLM can say "I'm fairly sure about X but less sure about Y" — it's no longer a flat collection of equally-weighted claims.</p>
<div class="tip-box"><strong>Key benefit:</strong> This turns errors from permanent silent landmines into visible, decaying warnings. A wrong claim doesn't compound forever — it eventually gets flagged by its own decaying score.</div>
<div class="effort-row">
<span class="effort-label">Setup Effort</span>
<div class="effort-pips">
<div class="pip filled"></div><div class="pip filled"></div><div class="pip"></div><div class="pip"></div><div class="pip"></div>
</div>
<span class="effort-desc">Schema + frontmatter template update</span>
</div>
</div>
</div>
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">02</span>
<span class="step-title">Typed supersession — new info explicitly replaces old claims</span>
<span class="step-tool-tag">Schema · log.md</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>When new information contradicts an existing wiki claim, the wrong pattern is leaving the old claim with an appended note. The right pattern: the new claim explicitly <strong>supersedes</strong> the old one. The old version is preserved but marked stale with a timestamp and link to what replaced it — version control for knowledge, not just for files.</p>
<p>Define supersession in your schema: the LLM's ingest instructions should check for contradictions against existing pages before writing, and when found, issue a formal supersession record rather than a quiet edit.</p>
<div class="tip-box"><strong>log.md discipline:</strong> Karpathy's second navigation file — the append-only audit log — is the mechanism for this. Every supersession event gets a log entry with timestamp, old claim, new claim, and source. The log is immutable context you can audit.</div>
<div class="effort-row">
<span class="effort-label">Setup Effort</span>
<div class="effort-pips">
<div class="pip filled"></div><div class="pip filled"></div><div class="pip"></div><div class="pip"></div><div class="pip"></div>
</div>
<span class="effort-desc">Schema + ingest prompt engineering</span>
</div>
</div>
</div>
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">03</span>
<span class="step-title">Typed entity system — prevent duplicate and conflicting concepts</span>
<span class="step-tool-tag">Schema · ELF / LLMWiki v2</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>Community implementation ELF (Eli's Lab Framework) uses a strict typed-entity system where every page is declared as a type (<em>library, project, person, concept, decision</em>) and every link between pages has a typed relationship (<em>uses, depends-on, contradicts, caused, fixed, supersedes</em>). This prevents the LLM from creating duplicate concept pages under different names.</p>
<p>A 5-step incremental ingest pass: <strong>diff → summarize → extract → write → image</strong>. The extract step enforces entity typing before the write step creates any new page — if a typed entity already exists, it merges rather than duplicates.</p>
<div class="warn-box">Typed entity systems add upfront schema design work. Start loose; only formalize types after you see which duplicates are actually causing problems.</div>
<div class="effort-row">
<span class="effort-label">Setup Effort</span>
<div class="effort-pips">
<div class="pip filled"></div><div class="pip filled"></div><div class="pip filled"></div><div class="pip filled"></div><div class="pip"></div>
</div>
<span class="effort-desc">Significant schema design investment</span>
</div>
</div>
</div>
</div>
</div>
<!-- ACTIVE UPKEEP — FEATURED -->
<div class="upkeep-system">
<div class="kicker">★ Biggest Mitigation Challenge</div>
<h3>Active Upkeep — The Real Failure Mode</h3>
<p>Community analysis of 120+ comments on Karpathy's gist converged on a clear finding: most people who try this pattern get the folder structure right and still end up with a wiki that slowly becomes unreliable, redundant, or abandoned. The difference between a wiki that compounds and one that quietly rots comes down to operational discipline — not technical setup.</p>
<div class="upkeep-cadence">
<div class="cadence-cell">
<div class="freq">Daily</div>
<div class="cadence-title">Feed the Machine</div>
<ul>
<li>Drop new sources into raw/ via Obsidian Web Clipper</li>
<li>Ingest anything queued in _raw/ staging dir</li>
<li>Log questions answered by the wiki (reinforces confidence)</li>
</ul>
</div>
<div class="cadence-cell">
<div class="freq">Weekly</div>
<div class="cadence-title">Lint Pass</div>
<ul>
<li>Run health check — orphan pages, broken wikilinks</li>
<li>Flag contradictions for review</li>
<li>Identify concepts referenced but not yet given own page</li>
<li>Review low-confidence / decayed pages</li>
</ul>
</div>
<div class="cadence-cell">
<div class="freq">Monthly</div>
<div class="cadence-title">Schema Evolution</div>
<ul>
<li>Review CLAUDE.md / AGENTS.md for outdated rules</li>
<li>Promote stable lower-tier pages up to established tier</li>
<li>Run qmd re-index if collection has grown significantly</li>
<li>Purge truly stale pages per retention curve</li>
</ul>
</div>
<div class="cadence-cell">
<div class="freq">As Needed</div>
<div class="cadence-title">Circuit Breakers</div>
<ul>
<li>Separate vault and agent working directories</li>
<li>Never let agent write directly to vault/verified/</li>
<li>Manual audit any page cited in high-stakes decisions</li>
<li>Keep raw/ as ground truth — always traceable back</li>
</ul>
</div>
</div>
</div>
<div class="mitigation-category">
<div class="mitigation-category-header">
<div class="cat-icon" style="background:#f5f0e0;font-size:16px">🔄</div>
<h3>Upkeep Automation — Making It Stick</h3>
<span class="severity-badge sev-critical">Critical</span>
</div>
<div class="mitigation-steps">
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">01</span>
<span class="step-title">Separate vault from agent working directory — hard partition</span>
<span class="step-tool-tag">Directory structure</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>The instinct is to have the agent write directly into the wiki. This creates the rot. The principle: your curated/verified vault and the agent's working vault (speculative writes, messy drafts, exploratory connections still being tested) must be <strong>physically separate directories</strong>. Only the human promotes content from agent-working to vault.</p>
<p>Structure: <span class="code-pill">wiki/verified/</span> (human-promoted, high trust) vs <span class="code-pill">wiki/staging/</span> (agent writes here first). The lint pass reviews staging and proposes promotions. You approve them. The signal-to-noise ratio in your verified wiki stays high permanently.</p>
<div class="tip-box"><strong>Why this works:</strong> You're not adding friction to the agent — you're protecting the valuable layer. The agent still does all the work. You just gate what graduates to trusted status.</div>
<div class="effort-row">
<span class="effort-label">Setup Effort</span>
<div class="effort-pips">
<div class="pip filled"></div><div class="pip"></div><div class="pip"></div><div class="pip"></div><div class="pip"></div>
</div>
<span class="effort-desc">Directory rename + schema update</span>
</div>
</div>
</div>
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">02</span>
<span class="step-title">Automate the ingest trigger — don't rely on memory to feed it</span>
<span class="step-tool-tag">Cron · Webhooks · Claude Code</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>The number one reason wikis rot: the human stops ingesting because life gets busy. The fix is removing the human from the trigger loop. Set up a cron job or a filesystem watcher on raw/ that automatically triggers the ingest command whenever a new file lands. The human's job shrinks to: drop file, walk away.</p>
<p>Implementations: <span class="code-pill">inotifywait</span> on Linux, <span class="code-pill">fswatch</span> on macOS, or a Node.js chokidar watcher. On drop, the watcher calls your ingest script which runs the LLM compilation pass. You get a notification when it completes.</p>
<div class="tip-box"><strong>For your stack:</strong> This maps cleanly to your existing automation patterns — a simple Node-RED flow watching a directory, triggering a webhook to Claude Code, and notifying via Slack/Telegram through OpenClaw when ingest completes.</div>
<div class="effort-row">
<span class="effort-label">Setup Effort</span>
<div class="effort-pips">
<div class="pip filled"></div><div class="pip filled"></div><div class="pip"></div><div class="pip"></div><div class="pip"></div>
</div>
<span class="effort-desc">24 hours watcher + webhook</span>
</div>
</div>
</div>
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">03</span>
<span class="step-title">Schedule the weekly lint as a non-negotiable calendar block</span>
<span class="step-tool-tag">Cron · Scheduler</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>Lint passes don't happen if you have to remember to run them. The solution is automating them on a schedule — a weekly cron job that runs the lint command, writes a report to a lint-reports/ directory, and sends you a summary notification. The report tells you: N orphan pages found, N contradictions flagged, N pages with decayed confidence.</p>
<p>You review the report (5 minutes), decide which flagged items to address, and optionally run the LLM to resolve them. The system is telling you what needs attention rather than you having to inspect everything.</p>
<div class="warn-box"><strong>What community data shows:</strong> People who automate the lint schedule have wikis that stay healthy at 6 months. People who rely on manual "I'll remember to lint" have wikis that are abandoned or unreliable at 6 weeks.</div>
<div class="effort-row">
<span class="effort-label">Setup Effort</span>
<div class="effort-pips">
<div class="pip filled"></div><div class="pip filled"></div><div class="pip"></div><div class="pip"></div><div class="pip"></div>
</div>
<span class="effort-desc">Cron setup + notification routing</span>
</div>
</div>
</div>
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">04</span>
<span class="step-title">Identity-aware filter — the schema knows who the wiki is for</span>
<span class="step-tool-tag">Schema · CLAUDE.md</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>A community-evolved enhancement to Karpathy's original: add an identity-aware filter to your schema. A prompt section that tells the LLM exactly who the wiki is for, what their goals are, and what "high-signal" means in that context. The LLM then scores sources before ingesting and rewrites that filter over time based on what has proven useful.</p>
<p>This prevents the wiki from becoming a neutral encyclopedia of everything you've read. It stays opinionated, relevant, and tuned to your actual work. Over months, the schema itself becomes a reflection of what you find worth knowing — a second-order artifact of the system.</p>
<div class="tip-box"><strong>Upkeep benefit:</strong> A well-tuned identity filter means the LLM rejects low-signal sources at ingest time rather than filling the wiki with noise you'll have to purge later. Garbage-in prevention beats garbage-out cleanup.</div>
<div class="effort-row">
<span class="effort-label">Setup Effort</span>
<div class="effort-pips">
<div class="pip filled"></div><div class="pip"></div><div class="pip"></div><div class="pip"></div><div class="pip"></div>
</div>
<span class="effort-desc">10 min schema addition, self-evolving after</span>
</div>
</div>
</div>
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">05</span>
<span class="step-title">Retention curve — build in structured forgetting</span>
<span class="step-tool-tag">Frontmatter · Lint pass</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>Not everything should live forever. A wiki that never forgets becomes noisy — important signals buried under outdated context. Implement a retention curve: facts that were important once but haven't been accessed or reinforced in months gradually fade to "archived" status. The lint pass executes this curve automatically.</p>
<p>Frontmatter fields to add: <span class="code-pill">last_accessed</span>, <span class="code-pill">access_count</span>, <span class="code-pill">status: active|fading|archived</span>. The lint pass updates status based on time-since-access and reinforcement count. Archived pages aren't deleted — they move to <span class="code-pill">wiki/archive/</span> where they're out of the active index but still traceable.</p>
<div class="tip-box"><strong>The payoff:</strong> Active upkeep gets easier over time as the wiki self-trims. After 6 months of running with a retention curve, your active wiki is denser and higher-signal than at month 1 — not bloated and harder to navigate.</div>
<div class="effort-row">
<span class="effort-label">Setup Effort</span>
<div class="effort-pips">
<div class="pip filled"></div><div class="pip filled"></div><div class="pip"></div><div class="pip"></div><div class="pip"></div>
</div>
<span class="effort-desc">Frontmatter + lint script update</span>
</div>
</div>
</div>
</div>
</div>
</div><!-- /tab-mitigations -->
<!-- TAB: MEMPALACE INTEGRATION -->
<div id="tab-mempalace" class="tab-panel">
<div class="palace-hero">
<div class="kicker">⬡ Your Stack Extension — MemPalace + qmd + Conversation Pipeline</div>
<h3>The wiki gains a <em>living feed</em> and a structural memory layer.</h3>
<p>Standard Karpathy wiki is fed by sources you manually drop into raw/. Your setup replaces that bottleneck with an automated conversation pipeline: every AI session gets mined into MemPalace, summarized, and fed into raw/ on a continuous basis. The wiki stops being a project you maintain and becomes an organism that grows from your daily work. Combined with qmd replacing ChromaDB for indexing, you have a genuinely novel hybrid that addresses the core limitations differently than any single pattern alone.</p>
<p style="color:#5aba7a;font-size:12.5px;font-family:'JetBrains Mono',monospace;letter-spacing:0.05em;">Note: You are skipping MemPalace's ChromaDB storage layer and using qmd for indexing instead. The implications of that choice are documented throughout this tab.</p>
<div class="hero-stats">
<div class="hstat"><span class="hval">96.6%</span><span class="hlbl">MemPalace R@5 Raw Mode</span></div>
<div class="hstat"><span class="hval">+34%</span><span class="hlbl">Retrieval via wing+room filtering</span></div>
<div class="hstat"><span class="hval">~170</span><span class="hlbl">Tokens on wake-up (L0+L1)</span></div>
<div class="hstat"><span class="hval">19</span><span class="hlbl">MCP Tools available</span></div>
<div class="hstat"><span class="hval">qmd</span><span class="hlbl">Replaces ChromaDB indexing</span></div>
</div>
</div>
<!-- FLOW DIAGRAM -->
<div class="flow-diagram">
<div class="flow-title">Your Architecture — Data Flow</div>
<div class="flow-label">Layer 0 — Conversation Capture</div>
<div class="flow-row">
<div class="flow-node convo">Claude / AI<br>Sessions</div>
<div class="flow-arrow"></div>
<div class="flow-node palace">MemPalace<br>mine --mode convos</div>
<div class="flow-arrow"></div>
<div class="flow-node palace">Wings / Rooms<br>Halls / Tunnels</div>
<div class="flow-arrow"></div>
<div class="flow-node palace">Closets<br>(summaries)</div>
<div class="flow-arrow"></div>
<div class="flow-node palace">Drawers<br>(verbatim)</div>
</div>
<div class="flow-label" style="margin-top:14px">Layer 1 — Wiki Compilation</div>
<div class="flow-row">
<div class="flow-node palace">Conversation<br>Summaries</div>
<div class="flow-arrow"></div>
<div class="flow-node raw">raw/<br>(staged)</div>
<div class="flow-arrow"></div>
<div class="flow-node llm">LLM<br>Compiler</div>
<div class="flow-arrow"></div>
<div class="flow-node wiki">wiki/<br>(compiled pages)</div>
<div class="flow-arrow"></div>
<div class="flow-node qmd">qmd<br>Index</div>
</div>
<div class="flow-label" style="margin-top:14px">Layer 2 — Query</div>
<div class="flow-row">
<div class="flow-node convo">Natural<br>Language Query</div>
<div class="flow-arrow"></div>
<div class="flow-node palace">MemPalace<br>wing+room filter</div>
<div class="flow-arrow">+</div>
<div class="flow-node qmd">qmd<br>BM25+vector</div>
<div class="flow-arrow"></div>
<div class="flow-node llm">LLM reads<br>wiki pages</div>
<div class="flow-arrow"></div>
<div class="flow-node wiki">Grounded<br>Answer</div>
</div>
</div>
<!-- PALACE CONCEPTS -->
<div class="section-header">
<h2>MemPalace Concepts</h2>
<span class="section-tag" style="border-color:var(--accent-green);color:var(--accent-green);background:#eaf5ee">Architecture Map</span>
</div>
<div class="palace-map">
<div class="palace-cell">
<div class="pc-icon">🏛️</div>
<div class="pc-term">Wing</div>
<div class="pc-name">Person or Project</div>
<div class="pc-desc">Top-level namespace — one per person you work with or project you run. Conversations and facts are scoped to their wing automatically via keyword detection on mining.</div>
<div class="pc-wiki-map">→ Maps to wiki domain sub-index (e.g. wiki/taskforge/)</div>
</div>
<div class="palace-cell">
<div class="pc-icon">🚪</div>
<div class="pc-term">Room</div>
<div class="pc-name">Topic / Concept</div>
<div class="pc-desc">Specific subject within a wing — auth-migration, ci-pipeline, database-decisions. When the same room exists across wings, a tunnel auto-connects them. Provides the +34% retrieval boost via wing+room filtering.</div>
<div class="pc-wiki-map">→ Maps to wiki concept page (e.g. wiki/taskforge/auth.md)</div>
</div>
<div class="palace-cell">
<div class="pc-icon">🗂️</div>
<div class="pc-term">Closet</div>
<div class="pc-name">Summary Layer</div>
<div class="pc-desc">Plain-text summaries that point the LLM to the right drawer. This is the layer you are feeding into raw/ — closet output becomes a high-quality, pre-structured input to the wiki compiler rather than raw transcript noise.</div>
<div class="pc-wiki-map">→ These summaries become your raw/ inputs</div>
</div>
<div class="palace-cell">
<div class="pc-icon">📦</div>
<div class="pc-term">Drawer</div>
<div class="pc-name">Verbatim Archive</div>
<div class="pc-desc">The exact original words — never summarized, never lost. This is your ground truth for cross-checking. When confidence scoring flags a wiki claim as decayed, you trace it back to the drawer for verification. Eliminates the "no original source" problem.</div>
<div class="pc-wiki-map">→ Ground truth for cross-check / error persistence mitigation</div>
</div>
<div class="palace-cell">
<div class="pc-icon">🏃</div>
<div class="pc-term">Hall</div>
<div class="pc-name">Memory Type Corridor</div>
<div class="pc-desc">Fixed corridors within every wing: hall_facts (decisions), hall_events (sessions/milestones), hall_discoveries (breakthroughs), hall_preferences (habits), hall_advice (recommendations). Memory typed at ingest time — no post-hoc categorization needed.</div>
<div class="pc-wiki-map">→ Maps to wiki page type in CLAUDE.md schema</div>
</div>
<div class="palace-cell">
<div class="pc-icon">🚇</div>
<div class="pc-term">Tunnel</div>
<div class="pc-name">Cross-Wing Connection</div>
<div class="pc-desc">Automatic links when the same room topic appears across different wings. "Auth-migration" in wing_kai and wing_taskforge creates a tunnel — the palace navigation finds cross-project connections that explicit wikilinks alone would miss.</div>
<div class="pc-wiki-map">→ Enriches wiki cross-references beyond manual [[wikilinks]]</div>
</div>
</div>
<!-- IMPACT ON LIMITATIONS -->
<div class="section-header">
<h2>Impact on Known Limitations</h2>
<span class="section-tag" style="border-color:var(--accent-blue);color:var(--accent-blue);background:#e8eef5">What Changes</span>
</div>
<div class="impact-grid">
<div class="impact-card">
<span class="impact-verdict verdict-solved">Largely Solved</span>
<h4>Active Upkeep — The #1 Failure Mode</h4>
<p>Conversation mining + auto-save hooks make the feed automatic. You no longer have to remember to drop files into raw/. Every Claude Code session is mined. The PreCompact hook fires before context compression. The Stop hook fires every 15 messages.</p>
<div class="impact-before-after">
<div class="ba-cell ba-before"><span class="ba-label">Before</span>Humans forget to ingest → wiki rots at 6 weeks</div>
<div class="ba-cell ba-after"><span class="ba-label">After</span>Hooks auto-mine every session → continuous feed</div>
</div>
</div>
<div class="impact-card">
<span class="impact-verdict verdict-solved">Largely Solved</span>
<h4>Error Persistence / Cross-Check</h4>
<p>Drawers preserve verbatim originals permanently. When a wiki claim is flagged as low-confidence, you have an exact traceable source to verify against — not just "raw/source-2026-04.md" but a wing-scoped, room-tagged original with a drawer ID.</p>
<div class="impact-before-after">
<div class="ba-cell ba-before"><span class="ba-label">Before</span>Errors persist silently, no clear original to check</div>
<div class="ba-cell ba-after"><span class="ba-label">After</span>Drawers = verbatim ground truth, always traceable</div>
</div>
</div>
<div class="impact-card">
<span class="impact-verdict verdict-reduced">Significantly Reduced</span>
<h4>Scale Ceiling</h4>
<p>MemPalace's wing+room metadata filtering means qmd doesn't have to search the entire corpus — it searches a pre-narrowed wing/room scope first. This extends the effective scale ceiling because retrieval is structurally guided before the BM25+vector pass fires.</p>
<div class="impact-before-after">
<div class="ba-cell ba-before"><span class="ba-label">Before</span>qmd searches entire wiki — token ceiling still binding</div>
<div class="ba-cell ba-after"><span class="ba-label">After</span>Wing+room filter → qmd works on relevant subset</div>
</div>
</div>
<div class="impact-card">
<span class="impact-verdict verdict-shifted">Character Shifted</span>
<h4>Knowledge Staleness</h4>
<p>Conversations are the primary source — they're inherently current. Every session you have becomes a potential ingest. Staleness now depends on how actively you use AI tools (which you do constantly), not on whether you remember to read and clip articles.</p>
<div class="impact-before-after">
<div class="ba-cell ba-before"><span class="ba-label">Before</span>Staleness from manual source curation gaps</div>
<div class="ba-cell ba-after"><span class="ba-label">After</span>Staleness from conversation coverage gaps (much smaller)</div>
</div>
</div>
<div class="impact-card">
<span class="impact-verdict verdict-reduced">Reduced</span>
<h4>Semantic Retrieval Gap vs RAG</h4>
<p>The combination of MemPalace structural navigation (wing → room → closet → drawer) plus qmd's BM25+vector search covers both explicit structural navigation and fuzzy semantic matching. You have the best of both retrieval patterns without a full vector database.</p>
<div class="impact-before-after">
<div class="ba-cell ba-before"><span class="ba-label">Before</span>Explicit wikilinks only — misses differently-worded concepts</div>
<div class="ba-cell ba-after"><span class="ba-label">After</span>Structural nav + qmd semantic fills the gap</div>
</div>
</div>
<div class="impact-card">
<span class="impact-verdict verdict-new">New Consideration</span>
<h4>Conversation Noise in raw/</h4>
<p>Not every conversation deserves to enter the wiki. Debugging rabbit holes, exploratory dead-ends, and casual exchanges are valuable in MemPalace's verbatim drawers but would pollute the wiki if compiled directly. The summarization/filtering step before raw/ is now load-bearing.</p>
<div class="impact-before-after">
<div class="ba-cell ba-before"><span class="ba-label">Old Risk</span>No raw/ source, hard to feed continuously</div>
<div class="ba-cell ba-after"><span class="ba-label">New Risk</span>Too much raw/ — summarization quality is critical</div>
</div>
</div>
</div>
<!-- QMD vs CHROMADB -->
<div class="section-header">
<h2>qmd vs ChromaDB — Your Trade-off</h2>
<span class="section-tag" style="border-color:var(--accent-amber);color:var(--accent-amber);background:#fff8e6">Deliberate Choice</span>
</div>
<div class="caveat-box">
<div class="caveat-head">⚠ Honest Assessment of the Trade-off</div>
MemPalace's benchmark-leading 96.6% R@5 score comes specifically from raw verbatim storage in ChromaDB. By replacing ChromaDB with qmd, you are choosing a different design point: simpler local infrastructure and tighter wiki integration over maximum semantic recall on conversation search. This is a defensible choice for your use case — but it's worth knowing what you're trading.
</div>
<table class="compare-table">
<thead>
<tr>
<th>Dimension</th>
<th style="text-align:center;background:#fff8e6">qmd (your choice)</th>
<th style="text-align:center;background:#eaf5ee">ChromaDB (MemPalace default)</th>
</tr>
</thead>
<tbody>
<tr>
<td class="row-label">Storage format</td>
<td style="text-align:center">Markdown files (same as wiki)</td>
<td style="text-align:center" class="cell-win">✦ Proprietary vector DB</td>
</tr>
<tr>
<td class="row-label">Semantic recall (LongMemEval)</td>
<td style="text-align:center" class="cell-mid">Not benchmarked on this task</td>
<td style="text-align:center" class="cell-win">✦ 96.6% R@5 raw mode</td>
</tr>
<tr>
<td class="row-label">Wiki integration</td>
<td style="text-align:center" class="cell-win">✦ Native — indexes wiki/ directly</td>
<td style="text-align:center">Separate store, no wiki awareness</td>
</tr>
<tr>
<td class="row-label">Single index to maintain</td>
<td style="text-align:center" class="cell-win">✦ Yes — one qmd collection</td>
<td style="text-align:center">No — wiki + ChromaDB separate</td>
</tr>
<tr>
<td class="row-label">MCP exposure</td>
<td style="text-align:center" class="cell-win">✦ qmd mcp — native tool for Claude</td>
<td style="text-align:center">Via MemPalace MCP server</td>
</tr>
<tr>
<td class="row-label">Hybrid search (BM25 + vector)</td>
<td style="text-align:center" class="cell-win">✦ Built in — qmd query</td>
<td style="text-align:center">ChromaDB semantic only</td>
</tr>
<tr>
<td class="row-label">Dependencies</td>
<td style="text-align:center" class="cell-win">✦ npm only, local GGUF model</td>
<td style="text-align:center">Python, chromadb, potential version pin issues</td>
</tr>
<tr>
<td class="row-label">Verbatim drawer retrieval</td>
<td style="text-align:center" class="cell-lose">Not designed for this</td>
<td style="text-align:center" class="cell-win">✦ Core feature — drawers are ChromaDB entries</td>
</tr>
<tr>
<td class="row-label">Architectural simplicity</td>
<td style="text-align:center" class="cell-win">✦ One search layer for everything</td>
<td style="text-align:center">Two parallel search systems</td>
</tr>
</tbody>
</table>
<div class="pull-quote" style="border-left-color:var(--accent-green)">
The key practical point: MemPalace's structural navigation (wing+room filtering) still provides the +34% retrieval boost regardless of what sits behind it. You retain the palace architecture's biggest advantage. The ChromaDB vs qmd choice only affects the semantic search layer, not the structural navigation layer.
<span class="attribution">— Analysis based on MemPalace architecture documentation, April 2026</span>
</div>
<!-- UPDATED MITIGATION STATUS -->
<div class="section-header">
<h2>Updated Mitigation Status</h2>
<span class="section-tag" style="border-color:var(--accent-blue);color:var(--accent-blue);background:#e8eef5">With MemPalace in Stack</span>
</div>
<table class="compare-table">
<thead>
<tr>
<th>Limitation</th>
<th style="text-align:center">Before MemPalace</th>
<th style="text-align:center">With MemPalace + qmd</th>
<th style="text-align:center">Residual Work</th>
</tr>
</thead>
<tbody>
<tr>
<td class="row-label">Active Upkeep</td>
<td style="text-align:center" class="cell-lose">Manual — wikis rot</td>
<td style="text-align:center" class="cell-win">✦ Auto-hooks feed continuously</td>
<td style="text-align:center">Summarization quality tuning</td>
</tr>
<tr>
<td class="row-label">Error Persistence</td>
<td style="text-align:center" class="cell-lose">No traceable ground truth</td>
<td style="text-align:center" class="cell-win">✦ Drawers = verbatim source</td>
<td style="text-align:center">Confidence scoring in schema</td>
</tr>
<tr>
<td class="row-label">Scale Ceiling</td>
<td style="text-align:center" class="cell-mid">~50100K token hard limit</td>
<td style="text-align:center" class="cell-mid">Extended by wing+room filtering</td>
<td style="text-align:center">qmd still needed at 200+ articles</td>
</tr>
<tr>
<td class="row-label">Semantic Retrieval Gap</td>
<td style="text-align:center" class="cell-lose">Explicit links only</td>
<td style="text-align:center" class="cell-win">✦ Structure + qmd BM25+vector</td>
<td style="text-align:center">Some ChromaDB recall lost (see above)</td>
</tr>
<tr>
<td class="row-label">Knowledge Staleness</td>
<td style="text-align:center" class="cell-lose">Depends on manual curation</td>
<td style="text-align:center" class="cell-win">✦ Continuous from session mining</td>
<td style="text-align:center">Retention curve still needed</td>
</tr>
<tr>
<td class="row-label">Cross-check</td>
<td style="text-align:center" class="cell-lose">Raw docs only, imprecise</td>
<td style="text-align:center" class="cell-win">✦ Drawer-level verbatim traceability</td>
<td style="text-align:center">fact_checker.py not yet wired (v3)</td>
</tr>
<tr>
<td class="row-label">Access Control</td>
<td style="text-align:center" class="cell-lose">Flat file, none</td>
<td style="text-align:center" class="cell-mid">Still needs MCP wrapper layer</td>
<td style="text-align:center">Tailscale boundary is your fastest path</td>
</tr>
<tr>
<td class="row-label">Cognitive Outsourcing</td>
<td style="text-align:center" class="cell-mid">Valid concern</td>
<td style="text-align:center" class="cell-mid">Unchanged — wiki is still reference only</td>
<td style="text-align:center">Design intent: reference, not replacement</td>
</tr>
</tbody>
</table>
<!-- NEW RISKS -->
<div class="section-header">
<h2>New Risks Introduced</h2>
<span class="section-tag tag-red">Net New</span>
</div>
<div class="mitigation-steps" style="margin-bottom:28px">
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num" style="color:var(--accent-red)">!</span>
<span class="step-title">Summarization quality is now load-bearing</span>
<span class="step-tool-tag">Critical Path</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>In the original pattern, you curated sources manually — only deliberate, quality inputs entered raw/. With conversation mining, the filter is your summarization scripts. If those scripts surface debugging dead-ends, exploratory rabbit holes, or noise, it enters the wiki compilation pipeline. Garbage-in still applies — it's just at a different point in the flow.</p>
<p><strong>Mitigation:</strong> Tune your conversation scripts to filter by memory type (hall_facts and hall_discoveries are high-signal; hall_events is medium; raw session transcripts are low). Only promote closet summaries tagged as decisions, discoveries, or recommendations. Use MemPalace's <span class="code-pill">--extract general</span> mode to auto-classify before staging.</p>
<div class="tip-box"><strong>Practical rule:</strong> Only closets from hall_facts and hall_discoveries should auto-promote to raw/. Other halls should require a manual review step before staging.</div>
</div>
</div>
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num" style="color:var(--accent-red)">!</span>
<span class="step-title">MemPalace fact_checker.py is not yet wired into KG ops (v3.0.0)</span>
<span class="step-tool-tag">Known Gap · Issue #27</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>MemPalace's contradiction detection (fact_checker.py) exists as a standalone utility but is not currently called automatically during knowledge graph operations — the authors acknowledged this in their April 7 correction note. This means cross-wing contradictions won't be auto-flagged at ingest time yet.</p>
<p><strong>Mitigation:</strong> Call fact_checker.py manually as part of your lint pass script until Issue #27 is resolved. Wire it as a pre-commit hook on wiki/ changes: any new page goes through fact_checker before being promoted from staging to verified.</p>
<div class="warn-box">Track Issue #27 on the MemPalace repo. This is being actively fixed. Once wired, contradiction detection becomes a native part of your ingest pipeline — a major upgrade to the cross-check mitigation.</div>
</div>
</div>
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num" style="color:var(--accent-amber)">~</span>
<span class="step-title">Two memory systems need schema alignment</span>
<span class="step-tool-tag">Operational Risk</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>MemPalace's taxonomy (wings, rooms, halls) and the wiki's taxonomy (domains, concept pages, page types in CLAUDE.md) are separate schemas. If they drift — MemPalace calls something "wing_taskforge/hall_facts/auth" while the wiki calls it "infrastructure/auth-decisions" — the structural navigation loses coherence. Tunnels and wikilinks stop reinforcing each other.</p>
<p><strong>Mitigation:</strong> Define a canonical mapping document (a simple markdown table) that maps MemPalace wing/room names to wiki domain/page paths. Reference it in both CLAUDE.md and your MemPalace wing_config.json. Review quarterly — schemas co-evolve, but they need to co-evolve together.</p>
<div class="tip-box"><strong>Your advantage:</strong> You already have a discipline around CLAUDE.md management. Add a "Palace Map" section to your global CLAUDE.md that specifies the canonical wing→wiki-domain mapping. The LLM consults it on every ingest.</div>
</div>
</div>
</div>
</div><!-- /tab-mempalace -->
<!-- TAB: DISTILL — the 8th extension, closing the MemPalace loop -->
<div id="tab-distill" class="tab-panel">
<div class="palace-hero" style="background:linear-gradient(135deg, #2a1810 0%, #1a1a10 50%, #0a1510 100%); border-color:#4a3a1a;">
<div class="kicker" style="color:#f0c060">⬣ The 8th Extension — Closing the MemPalace Loop</div>
<h3>Closet summaries <em>become</em> the source for the wiki itself.</h3>
<p>The first seven extensions came out of the Signal &amp; Noise review. The eighth surfaced only after the other layers were built — and it's the one that makes the MemPalace integration a real pipeline into the wiki instead of just a searchable archive beside it. The mining layer was extracting sessions, classifying bullets into halls, tagging topics, and making everything searchable via qmd. But the knowledge <em>inside</em> the conversations was never being compiled into wiki pages. A decision made in a session, a root cause found during debugging, a pattern spotted in review — these stayed in the conversation summaries forever, findable but not synthesized.</p>
<p style="color:#f0c060;font-size:12.5px;font-family:'JetBrains Mono',monospace;letter-spacing:0.05em;">This is what the <code>wiki-distill.py</code> script solves. It's Phase 1a of <code>wiki-maintain.sh</code> and runs before URL harvesting because conversation content should drive the page, not the URLs the conversation cites.</p>
<div class="hero-stats">
<div class="hstat"><span class="hval">Phase 1a</span><span class="hlbl">Runs before harvest</span></div>
<div class="hstat"><span class="hval">today</span><span class="hlbl">Narrow filter — today's topics</span></div>
<div class="hstat"><span class="hval">∀ history</span><span class="hlbl">Rollup all past conversations on each topic</span></div>
<div class="hstat"><span class="hval">3 halls</span><span class="hlbl">fact + discovery + advice</span></div>
<div class="hstat"><span class="hval">haiku/sonnet</span><span class="hlbl">Auto-routed by topic size</span></div>
</div>
</div>
<!-- FLOW DIAGRAM -->
<div class="flow-diagram">
<div class="flow-title">Distill Flow — Conversation Content → Wiki Pages</div>
<div class="flow-label">Narrow: what topics to process today</div>
<div class="flow-row">
<div class="flow-node convo">Today's<br>conversations</div>
<div class="flow-arrow"></div>
<div class="flow-node palace">Extract<br>topics[]</div>
<div class="flow-arrow">=</div>
<div class="flow-node wiki">Topics of<br>today set</div>
</div>
<div class="flow-label" style="margin-top:14px">Wide: pull full history for each today-topic</div>
<div class="flow-row">
<div class="flow-node wiki">Each<br>today-topic</div>
<div class="flow-arrow"></div>
<div class="flow-node palace">Rollup ALL<br>historical convs</div>
<div class="flow-arrow"></div>
<div class="flow-node palace">Extract<br>fact / discovery / advice</div>
<div class="flow-arrow"></div>
<div class="flow-node llm">claude -p<br>distill prompt</div>
</div>
<div class="flow-label" style="margin-top:14px">Compile: model decides new / update / skip</div>
<div class="flow-row">
<div class="flow-node llm">JSON<br>actions[]</div>
<div class="flow-arrow"></div>
<div class="flow-node wiki">new_page</div>
<div class="flow-arrow">+</div>
<div class="flow-node wiki">update_page<br>(modifies existing)</div>
<div class="flow-arrow"></div>
<div class="flow-node raw">staging/&lt;type&gt;/<br>pending review</div>
</div>
</div>
<!-- SECTION: WHY IT COMPLETES MEMPALACE -->
<div class="section-header">
<h2>Why This Completes MemPalace</h2>
<span class="section-tag" style="border-color:var(--accent-amber);color:var(--accent-amber);background:#fff8e6">Pipeline Closure</span>
</div>
<div class="palace-map">
<div class="palace-cell">
<div class="pc-icon">📦</div>
<div class="pc-term">Drawer — before</div>
<div class="pc-name">Verbatim Archive</div>
<div class="pc-desc">Full transcripts stored, searchable via qmd. No compilation — if you wanted canonical knowledge from them, you had to write it up manually.</div>
<div class="pc-wiki-map">Status: already working</div>
</div>
<div class="palace-cell">
<div class="pc-icon">🗂️</div>
<div class="pc-term">Closet — before</div>
<div class="pc-name">Summary Layer</div>
<div class="pc-desc">Summaries with hall classification (fact / discovery / preference / advice / event / tooling) and topics. Searchable. Terminal: never fed forward into the wiki compiler.</div>
<div class="pc-wiki-map">Status: terminal data, not flowing</div>
</div>
<div class="palace-cell">
<div class="pc-icon"></div>
<div class="pc-term">Distill — NEW</div>
<div class="pc-name">Compiler Bridge</div>
<div class="pc-desc">Reads closet content by topic, rolls up all matching conversations across history, filters to high-signal halls only, sends to claude -p with the current wiki index, emits new or updated wiki pages to staging.</div>
<div class="pc-wiki-map">Status: wiki-distill.py</div>
</div>
<div class="palace-cell">
<div class="pc-icon">📄</div>
<div class="pc-term">Wiki Pages — NEW</div>
<div class="pc-name">Distilled Knowledge</div>
<div class="pc-desc">Pages in staging/&lt;type&gt;/ with full distill provenance: distill_topic, distill_source_conversations, compilation_notes. Promote via staging review. Session knowledge becomes canonical knowledge.</div>
<div class="pc-wiki-map">Status: origin=automated, staged_by=wiki-distill</div>
</div>
</div>
<!-- HALL FILTERING -->
<div class="section-header">
<h2>Which Halls Get Distilled</h2>
<span class="section-tag" style="border-color:var(--accent-green);color:var(--accent-green);background:#eaf5ee">High Signal Only</span>
</div>
<table class="compare-table">
<thead>
<tr>
<th>Hall</th>
<th style="text-align:center">Distilled?</th>
<th>Why</th>
</tr>
</thead>
<tbody>
<tr>
<td class="row-label">hall_facts</td>
<td style="text-align:center" class="cell-win">✦ YES</td>
<td>Decisions locked in, choices made, specs agreed. Canonical knowledge.</td>
</tr>
<tr>
<td class="row-label">hall_discoveries</td>
<td style="text-align:center" class="cell-win">✦ YES</td>
<td>Root causes, breakthroughs, non-obvious findings. The highest-signal content in any session.</td>
</tr>
<tr>
<td class="row-label">hall_advice</td>
<td style="text-align:center" class="cell-win">✦ YES</td>
<td>Recommendations, lessons learned, "next time do X." Worth capturing as patterns.</td>
</tr>
<tr>
<td class="row-label">hall_events</td>
<td style="text-align:center" class="cell-mid">no</td>
<td>Deployments, incidents, milestones. Temporal data — belongs in logs, not the wiki.</td>
</tr>
<tr>
<td class="row-label">hall_preferences</td>
<td style="text-align:center" class="cell-mid">no</td>
<td>User working style notes. Belong in personal configs, not the shared wiki.</td>
</tr>
<tr>
<td class="row-label">hall_tooling</td>
<td style="text-align:center" class="cell-mid">no</td>
<td>Script/command usage, failures, improvements. Usually low-signal or duplicates what's already in the wiki.</td>
</tr>
</tbody>
</table>
<!-- HOW THE NARROW-TODAY + WIDE-HISTORY FILTER WORKS -->
<div class="section-header">
<h2>The Narrow-Today / Wide-History Filter</h2>
<span class="section-tag" style="border-color:var(--accent-blue);color:var(--accent-blue);background:#e8eef5">Key Design</span>
</div>
<div class="mitigation-intro">
<strong>Processing scope stays narrow; LLM context stays wide.</strong> This is the key property that makes distill cheap enough to run daily and smart enough to produce good pages.
</div>
<div class="mitigation-steps">
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">01</span>
<span class="step-title">Daily filter: only process topics appearing in TODAY's conversations</span>
<span class="step-tool-tag">Scope</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>Each daily run only looks at conversations dated today. It extracts the <code>topics:</code> frontmatter from each — that union becomes the "topics of today" set. If you didn't discuss a topic today, it's not in the processing scope. This keeps the cron job cheap and predictable: if today was a light session day, distill runs fast. If today was a heavy architecture discussion, distill does real work.</p>
<div class="tip-box"><strong>First run only:</strong> The very first run uses a 7-day lookback instead of today-only so the state file gets seeded. After that first bootstrap, daily runs stay narrow.</div>
</div>
</div>
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">02</span>
<span class="step-title">Historical rollup: for each today-topic, pull ALL matching conversations</span>
<span class="step-tool-tag">Context</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>Once the today-topic set is known, for each topic the script walks the entire conversation archive and pulls every summarized conversation that shares that topic. A discussion about <code>blue-green-deploy</code> today might roll up 16 conversations across the last 6 months. The claude -p call sees the full history, not just today's fragment.</p>
<p>This is what makes the distilled pages <em>good</em>. The LLM isn't guessing what a pattern looks like from one session — it's synthesizing across everything you've ever discussed on the topic.</p>
</div>
</div>
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">03</span>
<span class="step-title">Self-triggering: dormant topics wake up when they resurface</span>
<span class="step-tool-tag">Emergent</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>The narrow-today/wide-history combination produces a useful emergent property: <strong>dormant topics wake up automatically.</strong> If you discussed <code>database-migrations</code> three months ago and it never came up again, it's not in the daily scope. But the day you mention it again in any new conversation, that topic enters today's set — and the rollup pulls in all three months of historical discussion. The wiki page gets updated with fresh synthesis across the full history without you having to manually trigger reprocessing.</p>
<div class="tip-box"><strong>What this means in practice:</strong> Old knowledge gets distilled <em>when it becomes relevant again</em>. You don't need to remember to ask "hey, is there a wiki page for X?" — the next time X comes up in a session, distill will check the wiki state and either create or update the page for you.</div>
</div>
</div>
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">04</span>
<span class="step-title">State tracking by content hash + topic set</span>
<span class="step-tool-tag">.distill-state.json</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>A conversation is considered "already distilled" only if its body hash AND its topic set match what was seen at the last distill. If the body changes (summarizer re-ran and updated the bullets) OR a new topic is added, the conversation gets re-processed on the next run. Topics get tracked so rejected ones don't get reprocessed forever — if the LLM says "this topic doesn't deserve a wiki page" once, it stays rejected until something meaningful changes.</p>
</div>
</div>
<div class="mitigation-step" onclick="toggleStep(this)">
<div class="mitigation-step-header">
<span class="step-num">05</span>
<span class="step-title">Distill runs BEFORE harvest — conversation content has priority</span>
<span class="step-tool-tag">Phase 1a</span>
<span class="step-arrow"></span>
</div>
<div class="mitigation-step-body">
<p>The orchestrator runs distill as Phase 1a and harvest as Phase 1b. Deliberate: if a topic is being actively discussed in your sessions, you want the wiki page to reflect <em>your</em> synthesis of what you've learned, not just the external URL cited in passing. URL harvesting then fills in gaps — it picks up the docs pages, blog posts, and references that your sessions didn't already cover.</p>
<div class="warn-box">Both phases can produce staging pages. If distill creates <code>patterns/docker-hardening.md</code> and harvest creates <code>patterns/docker-hardening.md</code>, the staging-unique-path helper appends a short hash suffix so they don't collide. The reviewer sees both in staging and picks the better one (usually distill, since it has historical context).</div>
</div>
</div>
</div>
<!-- STAGING FRONTMATTER -->
<div class="section-header">
<h2>Distill Staging Provenance</h2>
<span class="section-tag" style="border-color:var(--accent-green);color:var(--accent-green);background:#eaf5ee">Traceable</span>
</div>
<p style="font-size:13.5px;color:var(--muted);margin-bottom:20px;line-height:1.6;">Every distilled page lands in staging with full provenance in its frontmatter. When you review a page in staging, you can see exactly which conversations it came from and jump directly to those transcripts.</p>
<div class="flow-diagram" style="background:#0d0d0d; border-color:#2a2a2a;">
<div class="flow-title" style="color:#c4b99a">Example: staging/patterns/zoho-crm-integration.md frontmatter</div>
<pre style="font-family:'JetBrains Mono',monospace;font-size:11px;color:#c4b99a;line-height:1.6;margin:0;padding:14px 0;overflow-x:auto;">---
origin: automated
status: pending
staged_date: 2026-04-12
staged_by: wiki-distill
target_path: patterns/zoho-crm-integration.md
distill_topic: zoho-api
distill_source_conversations: conversations/general/2026-04-06-73d15650.md,conversations/mc/2026-03-30-64089d1d.md
compilation_notes: Two separate incidents discovered the same Zoho CRM v2 API limitations, documenting them as a pattern page prevents re-investigation and provides a canonical reference for future Zoho integrations.
title: Zoho CRM Integration
type: pattern
confidence: high
sources: [conversations/general/2026-04-06-73d15650.md, conversations/mc/2026-03-30-64089d1d.md]
related: [database-migrations.md, activity-event-auditing.md]
last_compiled: 2026-04-12
last_verified: 2026-04-12
---</pre>
</div>
<div class="pull-quote" style="border-left-color:var(--accent-amber)">
Without distillation, MemPalace was a searchable archive sitting beside the wiki. With distillation, it's a real ingest pipeline — closet content becomes the source material for the wiki proper, completing the eight-extension story.
<span class="attribution">— memex design rationale, April 2026</span>
</div>
</div><!-- /tab-distill -->
</div><!-- /page -->
<footer class="page-footer">
<span>Sources: VentureBeat · Epsilla · Atlan · Medium · Starmorph · GitHub Gist Community · MemPalace README</span>
<span>memex · Karpathy's Pattern + MemPalace · April 2026</span>
<span>Compiled April 11, 2026</span>
</footer>
<script>
function toggleItem(el) {
const wasActive = el.classList.contains('active');
// Close siblings in same column
const siblings = el.parentElement.querySelectorAll('.procon-item');
siblings.forEach(s => s.classList.remove('active'));
if (!wasActive) el.classList.add('active');
}
function toggleStep(el) {
el.classList.toggle('open');
}
function switchTab(btn, panelId) {
document.querySelectorAll('.tab-btn').forEach(b => b.classList.remove('active'));
document.querySelectorAll('.tab-panel').forEach(p => p.classList.remove('active'));
btn.classList.add('active');
document.getElementById(panelId).classList.add('active');
}
</script>
</body>
</html>