Initial commit — memex

A compounding LLM-maintained knowledge wiki. Synthesis of Andrej Karpathy's persistent-wiki gist and milla-jovovich's mempalace, with an automation layer on top for conversation mining, URL harvesting, human-in-the-loop staging, staleness decay, and hygiene. Includes: - 11 pipeline scripts (extract, summarize, index, harvest, stage, hygiene, maintain, sync, + shared library) - Full docs: README, SETUP, ARCHITECTURE, DESIGN-RATIONALE, CUSTOMIZE - Example CLAUDE.md files (wiki schema + global instructions) tuned for the three-collection qmd setup - 171-test pytest suite (cross-platform, runs in ~1.3s) - MIT licensed
2026-04-12 21:16:02 -06:00
commit ee54a2f5d4
31 changed files with 10792 additions and 0 deletions
@@ -0,0 +1,107 @@
+# Wiki Pipeline Test Suite
+
+Pytest-based test suite covering all 11 scripts in `scripts/`. Runs on both
+macOS and Linux/WSL, uses only the Python standard library + pytest.
+
+## Running
+
+```bash
+# Full suite (from wiki root)
+bash tests/run.sh
+
+# Single test file
+bash tests/run.sh test_wiki_lib.py
+
+# Single test class or function
+bash tests/run.sh test_wiki_hygiene.py::TestArchiveRestore
+bash tests/run.sh test_wiki_hygiene.py::TestArchiveRestore::test_restore_reverses_archive
+
+# Pattern matching
+bash tests/run.sh -k "archive"
+
+# Verbose
+bash tests/run.sh -v
+
+# Stop on first failure
+bash tests/run.sh -x
+
+# Or invoke pytest directly from the tests dir
+cd tests && python3 -m pytest -v
+```
+
+## What's tested
+
+| File | Coverage |
+|------|----------|
+| `test_wiki_lib.py` | YAML parser, frontmatter round-trip, page iterators, date parsing, content hashing, WIKI_DIR env override |
+| `test_wiki_hygiene.py` | Backfill, confidence decay math, frontmatter repair, archive/restore round-trip, orphan detection, broken-xref fuzzy matching, index drift, empty stubs, conversation refresh signals, auto-restore, staging/archive sync, state drift, hygiene state file, full quick-run idempotency |
+| `test_wiki_staging.py` | List, promote, reject, promote-with-modifies, dry-run, staging index regeneration, path resolution |
+| `test_wiki_harvest.py` | URL classification (harvest/check/skip), private IP detection, URL extraction + filtering, filename derivation, content validation, state management, raw file writing, dry-run CLI smoke test |
+| `test_conversation_pipeline.py` | CLI smoke tests for extract-sessions, summarize-conversations, update-conversation-index; dry-run behavior; help flags; integration test with fake conversation files |
+| `test_shell_scripts.py` | wiki-maintain.sh / mine-conversations.sh / wiki-sync.sh: help, dry-run, mutex flags, bash syntax check, strict-mode check, shebang check, py_compile for all .py scripts |
+
+## How it works
+
+**Isolation**: Every test runs against a disposable `tmp_wiki` fixture
+(pytest `tmp_path`). The fixture sets the `WIKI_DIR` environment variable
+so all scripts resolve paths against the tmp directory instead of the real
+wiki. No test ever touches `~/projects/wiki`.
+
+**Hyphenated filenames**: Scripts like `wiki-harvest.py` use hyphens, which
+Python's `import` can't handle directly. `conftest.py` has a
+`_load_script_module` helper that loads a script file by path and exposes
+it as a module object.
+
+**Clean module state**: Each test that loads a module clears any cached
+import first, so `WIKI_DIR` env overrides take effect correctly between
+tests.
+
+**Subprocess tests** (for CLI smoke tests): `conftest.py` provides a
+`run_script` fixture that invokes a script via `python3` or `bash` with
+`WIKI_DIR` set to the tmp wiki. Uses `subprocess.run` with `capture_output`
+and a timeout.
+
+## Cross-platform
+
+- `#!/usr/bin/env bash` shebangs (tested explicitly)
+- `set -euo pipefail` in all shell scripts (tested explicitly)
+- `bash -n` syntax check on all shell scripts
+- `py_compile` on all Python scripts
+- Uses `pathlib` everywhere — no hardcoded path separators
+- Uses the Python stdlib only (except pytest itself)
+
+## Requirements
+
+- Python 3.11+
+- `pytest` — install with `pip install --user pytest` or your distro's package manager
+- `bash` (any version — scripts use only portable features)
+
+The tests do NOT require:
+- `claude` CLI (mocked / skipped)
+- `trafilatura` or `crawl4ai` (only dry-run / classification paths tested)
+- `qmd` (reindex phase is skipped in tests)
+- Network access
+- The real `~/projects/wiki` or `~/.claude/projects` directories
+
+## Speed
+
+Full suite runs in **~1 second** on a modern laptop. All tests are isolated
+and independent so they can run in any order and in parallel.
+
+## What's NOT tested
+
+- **Real LLM calls** (`claude -p`): too expensive, non-deterministic.
+  Tested: CLI parsing, dry-run paths, mocked error handling.
+- **Real web fetches** (trafilatura/crawl4ai): too slow, non-deterministic.
+  Tested: URL classification, filter logic, fetch-result validation.
+- **Real git operations** (wiki-sync.sh): requires a git repo fixture.
+  Tested: script loads, handles non-git dir gracefully, --status exits clean.
+- **Real qmd indexing**: tested elsewhere via `qmd collection list` in the
+  setup verification step.
+- **Real Claude Code session JSONL parsing** with actual sessions: would
+  require fixture JSONL files. Tested: CLI parsing, empty-dir behavior,
+  `CLAUDE_PROJECTS_DIR` env override.
+
+These are smoke-tested end-to-end via the integration tests in
+`test_conversation_pipeline.py` and the dry-run paths in
+`test_shell_scripts.py::TestWikiMaintainSh`.
@@ -0,0 +1,300 @@
+"""Shared test fixtures for the wiki pipeline test suite.
+
+All tests run against a disposable `tmp_wiki` directory — no test ever
+touches the real ~/projects/wiki. Cross-platform: uses pathlib, no
+platform-specific paths, and runs on both macOS and Linux/WSL.
+"""
+
+from __future__ import annotations
+
+import importlib
+import importlib.util
+import json
+import os
+import sys
+from pathlib import Path
+from typing import Any
+
+import pytest
+
+SCRIPTS_DIR = Path(__file__).resolve().parent.parent / "scripts"
+
+
+# ---------------------------------------------------------------------------
+# Module loading helpers
+# ---------------------------------------------------------------------------
+#
+# The wiki scripts use hyphenated filenames (wiki-hygiene.py etc.) which
+# can't be imported via normal `import` syntax. These helpers load a script
+# file as a module object so tests can exercise its functions directly.
+
+
+def _load_script_module(name: str, path: Path) -> Any:
+    """Load a Python script file as a module. Clears any cached version first."""
+    # Clear cached imports so WIKI_DIR env changes take effect between tests
+    for key in list(sys.modules):
+        if key in (name, "wiki_lib"):
+            del sys.modules[key]
+
+    # Make sure scripts/ is on sys.path so intra-script imports (wiki_lib) work
+    scripts_str = str(SCRIPTS_DIR)
+    if scripts_str not in sys.path:
+        sys.path.insert(0, scripts_str)
+
+    spec = importlib.util.spec_from_file_location(name, path)
+    assert spec is not None and spec.loader is not None
+    mod = importlib.util.module_from_spec(spec)
+    sys.modules[name] = mod
+    spec.loader.exec_module(mod)
+    return mod
+
+
+# ---------------------------------------------------------------------------
+# tmp_wiki fixture — builds a realistic wiki tree under a tmp path
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture
+def tmp_wiki(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Path:
+    """Set up a disposable wiki tree with all the directories the scripts expect.
+
+    Sets the WIKI_DIR environment variable so all imported modules resolve
+    paths against this tmp directory.
+    """
+    wiki = tmp_path / "wiki"
+    wiki.mkdir()
+
+    # Create the directory tree
+    for sub in ["patterns", "decisions", "concepts", "environments"]:
+        (wiki / sub).mkdir()
+        (wiki / "staging" / sub).mkdir(parents=True)
+        (wiki / "archive" / sub).mkdir(parents=True)
+    (wiki / "raw" / "harvested").mkdir(parents=True)
+    (wiki / "conversations").mkdir()
+    (wiki / "reports").mkdir()
+
+    # Create minimal index.md
+    (wiki / "index.md").write_text(
+        "# Wiki Index\n\n"
+        "## Patterns\n\n"
+        "## Decisions\n\n"
+        "## Concepts\n\n"
+        "## Environments\n\n"
+    )
+
+    # Empty state files
+    (wiki / ".harvest-state.json").write_text(json.dumps({
+        "harvested_urls": {},
+        "skipped_urls": {},
+        "failed_urls": {},
+        "rejected_urls": {},
+        "last_run": None,
+    }))
+
+    # Point all scripts at this tmp wiki
+    monkeypatch.setenv("WIKI_DIR", str(wiki))
+
+    return wiki
+
+
+# ---------------------------------------------------------------------------
+# Sample page factories
+# ---------------------------------------------------------------------------
+
+
+def make_page(
+    wiki: Path,
+    rel_path: str,
+    *,
+    title: str | None = None,
+    ptype: str | None = None,
+    confidence: str = "high",
+    last_compiled: str = "2026-04-01",
+    last_verified: str = "2026-04-01",
+    origin: str = "manual",
+    sources: list[str] | None = None,
+    related: list[str] | None = None,
+    body: str = "# Content\n\nA substantive page with real content so it is not a stub.\n",
+    extra_fm: dict[str, Any] | None = None,
+) -> Path:
+    """Write a well-formed wiki page with all required frontmatter fields."""
+    if sources is None:
+        sources = []
+    if related is None:
+        related = []
+    """Write a page to the tmp wiki and return its path."""
+    path = wiki / rel_path
+    path.parent.mkdir(parents=True, exist_ok=True)
+
+    if title is None:
+        title = path.stem.replace("-", " ").title()
+    if ptype is None:
+        ptype = path.parent.name.rstrip("s")
+
+    fm_lines = [
+        "---",
+        f"title: {title}",
+        f"type: {ptype}",
+        f"confidence: {confidence}",
+        f"origin: {origin}",
+        f"last_compiled: {last_compiled}",
+        f"last_verified: {last_verified}",
+    ]
+    if sources is not None:
+        if sources:
+            fm_lines.append("sources:")
+            fm_lines.extend(f"  - {s}" for s in sources)
+        else:
+            fm_lines.append("sources: []")
+    if related is not None:
+        if related:
+            fm_lines.append("related:")
+            fm_lines.extend(f"  - {r}" for r in related)
+        else:
+            fm_lines.append("related: []")
+    if extra_fm:
+        for k, v in extra_fm.items():
+            if isinstance(v, list):
+                if v:
+                    fm_lines.append(f"{k}:")
+                    fm_lines.extend(f"  - {item}" for item in v)
+                else:
+                    fm_lines.append(f"{k}: []")
+            else:
+                fm_lines.append(f"{k}: {v}")
+    fm_lines.append("---")
+
+    path.write_text("\n".join(fm_lines) + "\n" + body)
+    return path
+
+
+def make_conversation(
+    wiki: Path,
+    project: str,
+    filename: str,
+    *,
+    date: str = "2026-04-10",
+    status: str = "summarized",
+    messages: int = 100,
+    related: list[str] | None = None,
+    body: str = "## Summary\n\nTest conversation summary.\n",
+) -> Path:
+    """Write a conversation file to the tmp wiki."""
+    proj_dir = wiki / "conversations" / project
+    proj_dir.mkdir(parents=True, exist_ok=True)
+    path = proj_dir / filename
+
+    fm_lines = [
+        "---",
+        f"title: Test Conversation {filename}",
+        "type: conversation",
+        f"project: {project}",
+        f"date: {date}",
+        f"status: {status}",
+        f"messages: {messages}",
+    ]
+    if related:
+        fm_lines.append("related:")
+        fm_lines.extend(f"  - {r}" for r in related)
+    fm_lines.append("---")
+
+    path.write_text("\n".join(fm_lines) + "\n" + body)
+    return path
+
+
+def make_staging_page(
+    wiki: Path,
+    rel_under_staging: str,
+    *,
+    title: str = "Pending Page",
+    ptype: str = "pattern",
+    staged_by: str = "wiki-harvest",
+    staged_date: str = "2026-04-10",
+    modifies: str | None = None,
+    target_path: str | None = None,
+    body: str = "# Pending\n\nStaged content body.\n",
+) -> Path:
+    path = wiki / "staging" / rel_under_staging
+    path.parent.mkdir(parents=True, exist_ok=True)
+
+    if target_path is None:
+        target_path = rel_under_staging
+
+    fm_lines = [
+        "---",
+        f"title: {title}",
+        f"type: {ptype}",
+        "confidence: medium",
+        "origin: automated",
+        "status: pending",
+        f"staged_date: {staged_date}",
+        f"staged_by: {staged_by}",
+        f"target_path: {target_path}",
+    ]
+    if modifies:
+        fm_lines.append(f"modifies: {modifies}")
+    fm_lines.append("compilation_notes: test note")
+    fm_lines.append("last_verified: 2026-04-10")
+    fm_lines.append("---")
+
+    path.write_text("\n".join(fm_lines) + "\n" + body)
+    return path
+
+
+# ---------------------------------------------------------------------------
+# Module fixtures — each loads the corresponding script as a module
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture
+def wiki_lib(tmp_wiki: Path) -> Any:
+    """Load wiki_lib fresh against the tmp_wiki directory."""
+    return _load_script_module("wiki_lib", SCRIPTS_DIR / "wiki_lib.py")
+
+
+@pytest.fixture
+def wiki_hygiene(tmp_wiki: Path) -> Any:
+    """Load wiki-hygiene.py fresh. wiki_lib must be loaded first for its imports."""
+    _load_script_module("wiki_lib", SCRIPTS_DIR / "wiki_lib.py")
+    return _load_script_module("wiki_hygiene", SCRIPTS_DIR / "wiki-hygiene.py")
+
+
+@pytest.fixture
+def wiki_staging(tmp_wiki: Path) -> Any:
+    _load_script_module("wiki_lib", SCRIPTS_DIR / "wiki_lib.py")
+    return _load_script_module("wiki_staging", SCRIPTS_DIR / "wiki-staging.py")
+
+
+@pytest.fixture
+def wiki_harvest(tmp_wiki: Path) -> Any:
+    _load_script_module("wiki_lib", SCRIPTS_DIR / "wiki_lib.py")
+    return _load_script_module("wiki_harvest", SCRIPTS_DIR / "wiki-harvest.py")
+
+
+# ---------------------------------------------------------------------------
+# Subprocess helper — runs a script as if from the CLI, with WIKI_DIR set
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture
+def run_script(tmp_wiki: Path):
+    """Return a function that runs a script via subprocess with WIKI_DIR set."""
+    import subprocess
+
+    def _run(script_rel: str, *args: str, timeout: int = 60) -> subprocess.CompletedProcess:
+        script = SCRIPTS_DIR / script_rel
+        if script.suffix == ".py":
+            cmd = ["python3", str(script), *args]
+        else:
+            cmd = ["bash", str(script), *args]
+        env = os.environ.copy()
+        env["WIKI_DIR"] = str(tmp_wiki)
+        return subprocess.run(
+            cmd,
+            capture_output=True,
+            text=True,
+            timeout=timeout,
+            env=env,
+        )
+
+    return _run
@@ -0,0 +1,9 @@
+[pytest]
+testpaths = .
+python_files = test_*.py
+python_classes = Test*
+python_functions = test_*
+addopts = -ra --strict-markers --tb=short
+markers =
+    slow: tests that take more than 1 second
+    network: tests that hit the network (skipped by default)
@@ -0,0 +1,31 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+# run.sh — Convenience wrapper for running the wiki pipeline test suite.
+#
+# Usage:
+#   bash tests/run.sh               # Run the full suite
+#   bash tests/run.sh -v            # Verbose output
+#   bash tests/run.sh test_wiki_lib # Run one file
+#   bash tests/run.sh -k "parse"    # Run tests matching a pattern
+#
+# All arguments are passed through to pytest.
+
+TESTS_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+cd "${TESTS_DIR}"
+
+# Verify pytest is available
+if ! python3 -c "import pytest" 2>/dev/null; then
+    echo "pytest not installed. Install with: pip install --user pytest"
+    exit 2
+fi
+
+# Clear any previous test artifacts
+rm -rf .pytest_cache 2>/dev/null || true
+
+# Default args: quiet with colored output
+if [[ $# -eq 0 ]]; then
+    exec python3 -m pytest --tb=short
+else
+    exec python3 -m pytest "$@"
+fi
@@ -0,0 +1,121 @@
+"""Smoke + integration tests for the conversation mining pipeline.
+
+These scripts interact with external systems (Claude Code sessions dir,
+claude CLI), so tests focus on CLI parsing, dry-run behavior, and error
+handling rather than exercising the full extraction/summarization path.
+"""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+import pytest
+
+
+# ---------------------------------------------------------------------------
+# extract-sessions.py
+# ---------------------------------------------------------------------------
+
+
+class TestExtractSessions:
+    def test_help_exits_clean(self, run_script) -> None:
+        result = run_script("extract-sessions.py", "--help")
+        assert result.returncode == 0
+        assert "--project" in result.stdout
+        assert "--dry-run" in result.stdout
+
+    def test_dry_run_with_empty_sessions_dir(
+        self, run_script, tmp_wiki: Path, tmp_path: Path, monkeypatch
+    ) -> None:
+        # Point CLAUDE_PROJECTS_DIR at an empty tmp dir via env (not currently
+        # supported — script reads ~/.claude/projects directly). Instead, use
+        # --project with a code that has no sessions to verify clean exit.
+        result = run_script("extract-sessions.py", "--dry-run", "--project", "nonexistent")
+        assert result.returncode == 0
+
+    def test_rejects_unknown_flag(self, run_script) -> None:
+        result = run_script("extract-sessions.py", "--bogus-flag")
+        assert result.returncode != 0
+        assert "error" in result.stderr.lower() or "unrecognized" in result.stderr.lower()
+
+
+# ---------------------------------------------------------------------------
+# summarize-conversations.py
+# ---------------------------------------------------------------------------
+
+
+class TestSummarizeConversations:
+    def test_help_exits_clean(self, run_script) -> None:
+        result = run_script("summarize-conversations.py", "--help")
+        assert result.returncode == 0
+        assert "--claude" in result.stdout
+        assert "--dry-run" in result.stdout
+        assert "--project" in result.stdout
+
+    def test_dry_run_empty_conversations(
+        self, run_script, tmp_wiki: Path
+    ) -> None:
+        result = run_script("summarize-conversations.py", "--claude", "--dry-run")
+        assert result.returncode == 0
+
+    def test_dry_run_with_extracted_conversation(
+        self, run_script, tmp_wiki: Path
+    ) -> None:
+        from conftest import make_conversation
+
+        make_conversation(
+            tmp_wiki,
+            "general",
+            "2026-04-10-abc.md",
+            status="extracted",  # Not yet summarized
+            messages=50,
+        )
+        result = run_script("summarize-conversations.py", "--claude", "--dry-run")
+        assert result.returncode == 0
+        # Should mention the file or show it would be processed
+        assert "2026-04-10-abc.md" in result.stdout or "1 conversation" in result.stdout
+
+
+# ---------------------------------------------------------------------------
+# update-conversation-index.py
+# ---------------------------------------------------------------------------
+
+
+class TestUpdateConversationIndex:
+    def test_help_exits_clean(self, run_script) -> None:
+        result = run_script("update-conversation-index.py", "--help")
+        assert result.returncode == 0
+
+    def test_runs_on_empty_conversations_dir(
+        self, run_script, tmp_wiki: Path
+    ) -> None:
+        result = run_script("update-conversation-index.py")
+        # Should not crash even with no conversations
+        assert result.returncode == 0
+
+    def test_builds_index_from_conversations(
+        self, run_script, tmp_wiki: Path
+    ) -> None:
+        from conftest import make_conversation
+
+        make_conversation(
+            tmp_wiki,
+            "general",
+            "2026-04-10-one.md",
+            status="summarized",
+        )
+        make_conversation(
+            tmp_wiki,
+            "general",
+            "2026-04-11-two.md",
+            status="summarized",
+        )
+        result = run_script("update-conversation-index.py")
+        assert result.returncode == 0
+
+        idx = tmp_wiki / "conversations" / "index.md"
+        assert idx.exists()
+        text = idx.read_text()
+        assert "2026-04-10-one.md" in text or "one.md" in text
+        assert "2026-04-11-two.md" in text or "two.md" in text
@@ -0,0 +1,209 @@
+"""Smoke tests for the bash scripts.
+
+Bash scripts are harder to unit-test in isolation — these tests verify
+CLI parsing, help text, and dry-run/safe flags work correctly and that
+scripts exit cleanly in all the no-op paths.
+
+Cross-platform note: tests invoke scripts via `bash` explicitly, so they
+work on both macOS (default /bin/bash) and Linux/WSL. They avoid anything
+that requires external state (network, git, LLM).
+"""
+
+from __future__ import annotations
+
+import os
+import subprocess
+from pathlib import Path
+from typing import Any
+
+import pytest
+
+from conftest import make_conversation, make_page, make_staging_page
+
+
+# ---------------------------------------------------------------------------
+# wiki-maintain.sh
+# ---------------------------------------------------------------------------
+
+
+class TestWikiMaintainSh:
+    def test_help_flag(self, run_script) -> None:
+        result = run_script("wiki-maintain.sh", "--help")
+        assert result.returncode == 0
+        assert "Usage:" in result.stdout or "usage:" in result.stdout.lower()
+        assert "--full" in result.stdout
+        assert "--harvest-only" in result.stdout
+        assert "--hygiene-only" in result.stdout
+
+    def test_rejects_unknown_flag(self, run_script) -> None:
+        result = run_script("wiki-maintain.sh", "--bogus")
+        assert result.returncode != 0
+        assert "Unknown option" in result.stderr
+
+    def test_harvest_only_and_hygiene_only_conflict(self, run_script) -> None:
+        result = run_script(
+            "wiki-maintain.sh", "--harvest-only", "--hygiene-only"
+        )
+        assert result.returncode != 0
+        assert "mutually exclusive" in result.stderr
+
+    def test_hygiene_only_dry_run_completes(
+        self, run_script, tmp_wiki: Path
+    ) -> None:
+        make_page(tmp_wiki, "patterns/one.md")
+        result = run_script(
+            "wiki-maintain.sh", "--hygiene-only", "--dry-run", "--no-reindex"
+        )
+        assert result.returncode == 0
+        assert "Phase 2: Hygiene checks" in result.stdout
+        assert "finished" in result.stdout
+
+    def test_phase_1_skipped_in_hygiene_only(
+        self, run_script, tmp_wiki: Path
+    ) -> None:
+        result = run_script(
+            "wiki-maintain.sh", "--hygiene-only", "--dry-run", "--no-reindex"
+        )
+        assert result.returncode == 0
+        assert "Phase 1: URL harvesting (skipped)" in result.stdout
+
+    def test_phase_3_skipped_in_dry_run(
+        self, run_script, tmp_wiki: Path
+    ) -> None:
+        make_page(tmp_wiki, "patterns/one.md")
+        result = run_script(
+            "wiki-maintain.sh", "--hygiene-only", "--dry-run"
+        )
+        assert "Phase 3: qmd reindex (skipped)" in result.stdout
+
+    def test_harvest_only_dry_run_completes(
+        self, run_script, tmp_wiki: Path
+    ) -> None:
+        # Add a summarized conversation so harvest has something to scan
+        make_conversation(
+            tmp_wiki,
+            "test",
+            "2026-04-10-test.md",
+            status="summarized",
+            body="See https://docs.python.org/3/library/os.html for details.\n",
+        )
+        result = run_script(
+            "wiki-maintain.sh",
+            "--harvest-only",
+            "--dry-run",
+            "--no-compile",
+            "--no-reindex",
+        )
+        assert result.returncode == 0
+        assert "Phase 2: Hygiene checks (skipped)" in result.stdout
+
+
+# ---------------------------------------------------------------------------
+# wiki-sync.sh
+# ---------------------------------------------------------------------------
+
+
+class TestWikiSyncSh:
+    def test_status_on_non_git_dir_exits_cleanly(self, run_script) -> None:
+        """wiki-sync.sh --status against a non-git dir should fail gracefully.
+
+        The tmp_wiki fixture is not a git repo, so git commands will fail.
+        The script should report the problem without hanging or leaking stack
+        traces. Any exit code is acceptable as long as it exits in reasonable
+        time and prints something useful to stdout/stderr.
+        """
+        result = run_script("wiki-sync.sh", "--status", timeout=30)
+        # Should have produced some output and exited (not hung)
+        assert result.stdout or result.stderr
+        assert "Wiki Sync Status" in result.stdout or "not a git" in result.stderr.lower()
+
+
+# ---------------------------------------------------------------------------
+# mine-conversations.sh
+# ---------------------------------------------------------------------------
+
+
+class TestMineConversationsSh:
+    def test_extract_only_dry_run(self, run_script, tmp_wiki: Path) -> None:
+        """mine-conversations.sh --extract-only --dry-run should complete without LLM."""
+        result = run_script(
+            "mine-conversations.sh", "--extract-only", "--dry-run", timeout=30
+        )
+        assert result.returncode == 0
+
+    def test_rejects_unknown_flag(self, run_script) -> None:
+        result = run_script("mine-conversations.sh", "--bogus-flag")
+        assert result.returncode != 0
+
+
+# ---------------------------------------------------------------------------
+# Cross-platform sanity — scripts use portable bash syntax
+# ---------------------------------------------------------------------------
+
+
+class TestBashPortability:
+    """Verify scripts don't use bashisms that break on macOS /bin/bash 3.2."""
+
+    @pytest.mark.parametrize(
+        "script",
+        ["wiki-maintain.sh", "mine-conversations.sh", "wiki-sync.sh"],
+    )
+    def test_shebang_is_env_bash(self, script: str) -> None:
+        """All shell scripts should use `#!/usr/bin/env bash` for portability."""
+        path = Path(__file__).parent.parent / "scripts" / script
+        first_line = path.read_text().splitlines()[0]
+        assert first_line == "#!/usr/bin/env bash", (
+            f"{script} has shebang {first_line!r}, expected #!/usr/bin/env bash"
+        )
+
+    @pytest.mark.parametrize(
+        "script",
+        ["wiki-maintain.sh", "mine-conversations.sh", "wiki-sync.sh"],
+    )
+    def test_uses_strict_mode(self, script: str) -> None:
+        """All shell scripts should use `set -euo pipefail` for safe defaults."""
+        path = Path(__file__).parent.parent / "scripts" / script
+        text = path.read_text()
+        assert "set -euo pipefail" in text, f"{script} missing strict mode"
+
+    @pytest.mark.parametrize(
+        "script",
+        ["wiki-maintain.sh", "mine-conversations.sh", "wiki-sync.sh"],
+    )
+    def test_bash_syntax_check(self, script: str) -> None:
+        """bash -n does a syntax-only parse and catches obvious errors."""
+        path = Path(__file__).parent.parent / "scripts" / script
+        result = subprocess.run(
+            ["bash", "-n", str(path)],
+            capture_output=True,
+            text=True,
+            timeout=10,
+        )
+        assert result.returncode == 0, f"{script} has bash syntax errors: {result.stderr}"
+
+
+# ---------------------------------------------------------------------------
+# Python script syntax check (smoke)
+# ---------------------------------------------------------------------------
+
+
+class TestPythonSyntax:
+    @pytest.mark.parametrize(
+        "script",
+        [
+            "wiki_lib.py",
+            "wiki-harvest.py",
+            "wiki-staging.py",
+            "wiki-hygiene.py",
+            "extract-sessions.py",
+            "summarize-conversations.py",
+            "update-conversation-index.py",
+        ],
+    )
+    def test_py_compile(self, script: str) -> None:
+        """py_compile catches syntax errors without executing the module."""
+        import py_compile
+
+        path = Path(__file__).parent.parent / "scripts" / script
+        # py_compile.compile raises on error; success returns the .pyc path
+        py_compile.compile(str(path), doraise=True)
@@ -0,0 +1,323 @@
+"""Unit + integration tests for scripts/wiki-harvest.py."""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+from typing import Any
+from unittest.mock import patch
+
+import pytest
+
+from conftest import make_conversation
+
+
+# ---------------------------------------------------------------------------
+# URL classification
+# ---------------------------------------------------------------------------
+
+
+class TestClassifyUrl:
+    def test_regular_docs_site_harvest(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest.classify_url("https://docs.python.org/3/library/os.html") == "harvest"
+        assert wiki_harvest.classify_url("https://blog.example.com/post") == "harvest"
+
+    def test_github_issue_is_check(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest.classify_url("https://github.com/foo/bar/issues/42") == "check"
+
+    def test_github_pr_is_check(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest.classify_url("https://github.com/foo/bar/pull/99") == "check"
+
+    def test_stackoverflow_is_check(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest.classify_url(
+            "https://stackoverflow.com/questions/12345/title"
+        ) == "check"
+
+    def test_localhost_skip(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest.classify_url("http://localhost:3000/path") == "skip"
+        assert wiki_harvest.classify_url("http://localhost/foo") == "skip"
+
+    def test_private_ip_skip(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest.classify_url("http://10.0.0.1/api") == "skip"
+        assert wiki_harvest.classify_url("http://172.30.224.1:8080/v1") == "skip"
+        assert wiki_harvest.classify_url("http://192.168.1.1/test") == "skip"
+        assert wiki_harvest.classify_url("http://127.0.0.1:8080/foo") == "skip"
+
+    def test_local_and_internal_tld_skip(self, wiki_harvest: Any) -> None:
+        # `.local` and `.internal` are baked into SKIP_DOMAIN_PATTERNS
+        assert wiki_harvest.classify_url("https://router.local/admin") == "skip"
+        assert wiki_harvest.classify_url("https://service.internal/api") == "skip"
+
+    def test_custom_skip_pattern_runtime(self, wiki_harvest: Any) -> None:
+        # Users can append their own patterns at runtime — verify the hook works
+        wiki_harvest.SKIP_DOMAIN_PATTERNS.append(r"\.mycompany\.com$")
+        try:
+            assert wiki_harvest.classify_url("https://git.mycompany.com/foo") == "skip"
+            assert wiki_harvest.classify_url("https://docs.mycompany.com/api") == "skip"
+        finally:
+            wiki_harvest.SKIP_DOMAIN_PATTERNS.pop()
+
+    def test_atlassian_skip(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest.classify_url("https://foo.atlassian.net/browse/BAR-1") == "skip"
+
+    def test_slack_skip(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest.classify_url("https://myteam.slack.com/archives/C123") == "skip"
+
+    def test_github_repo_root_is_harvest(self, wiki_harvest: Any) -> None:
+        # Not an issue/pr/discussion — just a repo root, might contain docs
+        assert wiki_harvest.classify_url("https://github.com/foo/bar") == "harvest"
+
+    def test_invalid_url_skip(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest.classify_url("not a url") == "skip"
+
+
+# ---------------------------------------------------------------------------
+# Private IP detection
+# ---------------------------------------------------------------------------
+
+
+class TestPrivateIp:
+    def test_10_range(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest._is_private_ip("10.0.0.1") is True
+        assert wiki_harvest._is_private_ip("10.255.255.255") is True
+
+    def test_172_16_to_31_range(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest._is_private_ip("172.16.0.1") is True
+        assert wiki_harvest._is_private_ip("172.31.255.255") is True
+        assert wiki_harvest._is_private_ip("172.15.0.1") is False
+        assert wiki_harvest._is_private_ip("172.32.0.1") is False
+
+    def test_192_168_range(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest._is_private_ip("192.168.0.1") is True
+        assert wiki_harvest._is_private_ip("192.167.0.1") is False
+
+    def test_loopback(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest._is_private_ip("127.0.0.1") is True
+
+    def test_public_ip(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest._is_private_ip("8.8.8.8") is False
+
+    def test_hostname_not_ip(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest._is_private_ip("example.com") is False
+
+
+# ---------------------------------------------------------------------------
+# URL extraction from files
+# ---------------------------------------------------------------------------
+
+
+class TestExtractUrls:
+    def test_finds_urls_in_markdown(
+        self, wiki_harvest: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_conversation(
+            tmp_wiki,
+            "test",
+            "test.md",
+            body="See https://docs.python.org/3/library/os.html for details.\n"
+            "Also https://fastapi.tiangolo.com/tutorial/.\n",
+        )
+        urls = wiki_harvest.extract_urls_from_file(path)
+        assert "https://docs.python.org/3/library/os.html" in urls
+        assert "https://fastapi.tiangolo.com/tutorial/" in urls
+
+    def test_filters_asset_extensions(
+        self, wiki_harvest: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_conversation(
+            tmp_wiki,
+            "test",
+            "assets.md",
+            body=(
+                "Real: https://example.com/docs/article.html\n"
+                "Image: https://example.com/logo.png\n"
+                "Script: https://cdn.example.com/lib.js\n"
+                "Font: https://fonts.example.com/face.woff2\n"
+            ),
+        )
+        urls = wiki_harvest.extract_urls_from_file(path)
+        assert "https://example.com/docs/article.html" in urls
+        assert not any(u.endswith(".png") for u in urls)
+        assert not any(u.endswith(".js") for u in urls)
+        assert not any(u.endswith(".woff2") for u in urls)
+
+    def test_strips_trailing_punctuation(
+        self, wiki_harvest: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_conversation(
+            tmp_wiki,
+            "test",
+            "punct.md",
+            body="See https://example.com/foo. Also https://example.com/bar, and more.\n",
+        )
+        urls = wiki_harvest.extract_urls_from_file(path)
+        assert "https://example.com/foo" in urls
+        assert "https://example.com/bar" in urls
+
+    def test_deduplicates_within_file(
+        self, wiki_harvest: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_conversation(
+            tmp_wiki,
+            "test",
+            "dup.md",
+            body=(
+                "First mention: https://example.com/same\n"
+                "Second mention: https://example.com/same\n"
+            ),
+        )
+        urls = wiki_harvest.extract_urls_from_file(path)
+        assert urls.count("https://example.com/same") == 1
+
+    def test_returns_empty_for_missing_file(
+        self, wiki_harvest: Any, tmp_wiki: Path
+    ) -> None:
+        assert wiki_harvest.extract_urls_from_file(tmp_wiki / "nope.md") == []
+
+    def test_filters_short_urls(
+        self, wiki_harvest: Any, tmp_wiki: Path
+    ) -> None:
+        # Less than 20 chars are skipped
+        path = make_conversation(
+            tmp_wiki,
+            "test",
+            "short.md",
+            body="tiny http://a.b/ and https://example.com/long-path\n",
+        )
+        urls = wiki_harvest.extract_urls_from_file(path)
+        assert "http://a.b/" not in urls
+        assert "https://example.com/long-path" in urls
+
+
+# ---------------------------------------------------------------------------
+# Raw filename derivation
+# ---------------------------------------------------------------------------
+
+
+class TestRawFilename:
+    def test_basic_url(self, wiki_harvest: Any) -> None:
+        name = wiki_harvest.raw_filename_for_url("https://docs.docker.com/build/multi-stage/")
+        assert name.startswith("docs-docker-com-")
+        assert "build" in name and "multi-stage" in name
+        assert name.endswith(".md")
+
+    def test_strips_www(self, wiki_harvest: Any) -> None:
+        name = wiki_harvest.raw_filename_for_url("https://www.example.com/foo")
+        assert "www" not in name
+
+    def test_root_url_uses_index(self, wiki_harvest: Any) -> None:
+        name = wiki_harvest.raw_filename_for_url("https://example.com/")
+        assert name == "example-com-index.md"
+
+    def test_long_paths_truncated(self, wiki_harvest: Any) -> None:
+        long_url = "https://example.com/" + "a-very-long-segment/" * 20
+        name = wiki_harvest.raw_filename_for_url(long_url)
+        assert len(name) < 200
+
+
+# ---------------------------------------------------------------------------
+# Content validation
+# ---------------------------------------------------------------------------
+
+
+class TestValidateContent:
+    def test_accepts_clean_markdown(self, wiki_harvest: Any) -> None:
+        content = "# Title\n\n" + ("A clean paragraph of markdown content. " * 5)
+        assert wiki_harvest.validate_content(content) is True
+
+    def test_rejects_empty(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest.validate_content("") is False
+
+    def test_rejects_too_short(self, wiki_harvest: Any) -> None:
+        assert wiki_harvest.validate_content("# Short") is False
+
+    def test_rejects_html_leak(self, wiki_harvest: Any) -> None:
+        content = "# Title\n\n<div class='nav'>Navigation</div>\n" + "content " * 30
+        assert wiki_harvest.validate_content(content) is False
+
+    def test_rejects_script_tag(self, wiki_harvest: Any) -> None:
+        content = "# Title\n\n<script>alert()</script>\n" + "content " * 30
+        assert wiki_harvest.validate_content(content) is False
+
+
+# ---------------------------------------------------------------------------
+# State management
+# ---------------------------------------------------------------------------
+
+
+class TestStateManagement:
+    def test_load_returns_defaults_when_file_empty(
+        self, wiki_harvest: Any, tmp_wiki: Path
+    ) -> None:
+        (tmp_wiki / ".harvest-state.json").write_text("{}")
+        state = wiki_harvest.load_state()
+        assert "harvested_urls" in state
+        assert "skipped_urls" in state
+
+    def test_save_and_reload(
+        self, wiki_harvest: Any, tmp_wiki: Path
+    ) -> None:
+        state = wiki_harvest.load_state()
+        state["harvested_urls"]["https://example.com"] = {
+            "first_seen": "2026-04-12",
+            "seen_in": ["conversations/mc/foo.md"],
+            "raw_file": "raw/harvested/example.md",
+            "status": "raw",
+            "fetch_method": "trafilatura",
+        }
+        wiki_harvest.save_state(state)
+
+        reloaded = wiki_harvest.load_state()
+        assert "https://example.com" in reloaded["harvested_urls"]
+        assert reloaded["last_run"] is not None
+
+
+# ---------------------------------------------------------------------------
+# Raw file writer
+# ---------------------------------------------------------------------------
+
+
+class TestWriteRawFile:
+    def test_writes_with_frontmatter(
+        self, wiki_harvest: Any, tmp_wiki: Path
+    ) -> None:
+        conv = make_conversation(tmp_wiki, "test", "source.md")
+        raw_path = wiki_harvest.write_raw_file(
+            "https://example.com/article",
+            "# Article\n\nClean content.\n",
+            "trafilatura",
+            conv,
+        )
+        assert raw_path.exists()
+        text = raw_path.read_text()
+        assert "source_url: https://example.com/article" in text
+        assert "fetch_method: trafilatura" in text
+        assert "content_hash: sha256:" in text
+        assert "discovered_in: conversations/test/source.md" in text
+
+
+# ---------------------------------------------------------------------------
+# Dry-run CLI smoke test (no actual fetches)
+# ---------------------------------------------------------------------------
+
+
+class TestHarvestCli:
+    def test_dry_run_no_network_calls(
+        self, run_script, tmp_wiki: Path
+    ) -> None:
+        make_conversation(
+            tmp_wiki,
+            "test",
+            "test.md",
+            body="See https://docs.python.org/3/ and https://github.com/foo/bar/issues/1.\n",
+        )
+        result = run_script("wiki-harvest.py", "--dry-run")
+        assert result.returncode == 0
+        # Dry-run should classify without fetching
+        assert "would-harvest" in result.stdout or "Summary" in result.stdout
+
+    def test_help_flag(self, run_script) -> None:
+        result = run_script("wiki-harvest.py", "--help")
+        assert result.returncode == 0
+        assert "--dry-run" in result.stdout
+        assert "--no-compile" in result.stdout
@@ -0,0 +1,616 @@
+"""Integration tests for scripts/wiki-hygiene.py.
+
+Uses the tmp_wiki fixture so tests never touch the real wiki.
+"""
+
+from __future__ import annotations
+
+from datetime import date, timedelta
+from pathlib import Path
+from typing import Any
+
+import pytest
+
+from conftest import make_conversation, make_page, make_staging_page
+
+
+# ---------------------------------------------------------------------------
+# Backfill last_verified
+# ---------------------------------------------------------------------------
+
+
+class TestBackfill:
+    def test_sets_last_verified_from_last_compiled(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_page(tmp_wiki, "patterns/foo.md", last_compiled="2026-01-15")
+        # Strip last_verified from the fixture-built file
+        text = path.read_text()
+        text = text.replace("last_verified: 2026-04-01\n", "")
+        path.write_text(text)
+
+        changes = wiki_hygiene.backfill_last_verified()
+        assert len(changes) == 1
+        assert changes[0][1] == "last_compiled"
+
+        reparsed = wiki_hygiene.parse_page(path)
+        assert reparsed.frontmatter["last_verified"] == "2026-01-15"
+
+    def test_skips_pages_already_verified(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        make_page(tmp_wiki, "patterns/done.md", last_verified="2026-04-01")
+        changes = wiki_hygiene.backfill_last_verified()
+        assert changes == []
+
+    def test_dry_run_does_not_write(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_page(tmp_wiki, "patterns/foo.md", last_compiled="2026-01-15")
+        text = path.read_text().replace("last_verified: 2026-04-01\n", "")
+        path.write_text(text)
+
+        changes = wiki_hygiene.backfill_last_verified(dry_run=True)
+        assert len(changes) == 1
+
+        reparsed = wiki_hygiene.parse_page(path)
+        assert "last_verified" not in reparsed.frontmatter
+
+
+# ---------------------------------------------------------------------------
+# Confidence decay math
+# ---------------------------------------------------------------------------
+
+
+class TestConfidenceDecay:
+    def test_recent_page_unchanged(self, wiki_hygiene: Any) -> None:
+        recent = wiki_hygiene.today() - timedelta(days=30)
+        assert wiki_hygiene.expected_confidence("high", recent, False) == "high"
+
+    def test_six_months_decays_high_to_medium(self, wiki_hygiene: Any) -> None:
+        old = wiki_hygiene.today() - timedelta(days=200)
+        assert wiki_hygiene.expected_confidence("high", old, False) == "medium"
+
+    def test_nine_months_decays_medium_to_low(self, wiki_hygiene: Any) -> None:
+        old = wiki_hygiene.today() - timedelta(days=280)
+        assert wiki_hygiene.expected_confidence("medium", old, False) == "low"
+
+    def test_twelve_months_decays_to_stale(self, wiki_hygiene: Any) -> None:
+        old = wiki_hygiene.today() - timedelta(days=400)
+        assert wiki_hygiene.expected_confidence("high", old, False) == "stale"
+
+    def test_superseded_is_always_stale(self, wiki_hygiene: Any) -> None:
+        recent = wiki_hygiene.today() - timedelta(days=1)
+        assert wiki_hygiene.expected_confidence("high", recent, True) == "stale"
+
+    def test_none_date_leaves_confidence_alone(self, wiki_hygiene: Any) -> None:
+        assert wiki_hygiene.expected_confidence("medium", None, False) == "medium"
+
+    def test_bump_confidence_ladder(self, wiki_hygiene: Any) -> None:
+        assert wiki_hygiene.bump_confidence("stale") == "low"
+        assert wiki_hygiene.bump_confidence("low") == "medium"
+        assert wiki_hygiene.bump_confidence("medium") == "high"
+        assert wiki_hygiene.bump_confidence("high") == "high"
+
+
+# ---------------------------------------------------------------------------
+# Frontmatter repair
+# ---------------------------------------------------------------------------
+
+
+class TestFrontmatterRepair:
+    def test_adds_missing_confidence(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        path = tmp_wiki / "patterns" / "no-conf.md"
+        path.write_text(
+            "---\ntitle: No Confidence\ntype: pattern\n"
+            "last_compiled: 2026-04-01\nlast_verified: 2026-04-01\n---\n"
+            "# Body\n\nSubstantive content here for testing purposes.\n"
+        )
+        changes = wiki_hygiene.repair_frontmatter()
+        assert any("confidence" in fields for _, fields in changes)
+
+        reparsed = wiki_hygiene.parse_page(path)
+        assert reparsed.frontmatter["confidence"] == "medium"
+
+    def test_fixes_invalid_confidence(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_page(tmp_wiki, "patterns/bad-conf.md", confidence="wat")
+        changes = wiki_hygiene.repair_frontmatter()
+        assert any(p == path for p, _ in changes)
+
+        reparsed = wiki_hygiene.parse_page(path)
+        assert reparsed.frontmatter["confidence"] == "medium"
+
+    def test_leaves_valid_pages_alone(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        make_page(tmp_wiki, "patterns/good.md")
+        changes = wiki_hygiene.repair_frontmatter()
+        assert changes == []
+
+
+# ---------------------------------------------------------------------------
+# Archive and restore round-trip
+# ---------------------------------------------------------------------------
+
+
+class TestArchiveRestore:
+    def test_archive_moves_file_and_updates_frontmatter(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_page(tmp_wiki, "patterns/doomed.md")
+        page = wiki_hygiene.parse_page(path)
+
+        wiki_hygiene.archive_page(page, "test archive")
+
+        assert not path.exists()
+        archived = tmp_wiki / "archive" / "patterns" / "doomed.md"
+        assert archived.exists()
+
+        reparsed = wiki_hygiene.parse_page(archived)
+        assert reparsed.frontmatter["archived_reason"] == "test archive"
+        assert reparsed.frontmatter["original_path"] == "patterns/doomed.md"
+        assert reparsed.frontmatter["confidence"] == "stale"
+
+    def test_restore_reverses_archive(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        original = make_page(tmp_wiki, "patterns/zombie.md")
+        page = wiki_hygiene.parse_page(original)
+        wiki_hygiene.archive_page(page, "test")
+
+        archived = tmp_wiki / "archive" / "patterns" / "zombie.md"
+        archived_page = wiki_hygiene.parse_page(archived)
+        wiki_hygiene.restore_page(archived_page)
+
+        assert original.exists()
+        assert not archived.exists()
+
+        reparsed = wiki_hygiene.parse_page(original)
+        assert reparsed.frontmatter["confidence"] == "medium"
+        assert "archived_date" not in reparsed.frontmatter
+        assert "archived_reason" not in reparsed.frontmatter
+        assert "original_path" not in reparsed.frontmatter
+
+    def test_archive_rejects_non_live_pages(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        # Page outside the live content dirs — should refuse to archive
+        weird = tmp_wiki / "raw" / "weird.md"
+        weird.parent.mkdir(parents=True, exist_ok=True)
+        weird.write_text("---\ntitle: Weird\n---\nBody\n")
+        page = wiki_hygiene.parse_page(weird)
+        result = wiki_hygiene.archive_page(page, "test")
+        assert result is None
+
+    def test_archive_dry_run_does_not_move(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_page(tmp_wiki, "patterns/safe.md")
+        page = wiki_hygiene.parse_page(path)
+        wiki_hygiene.archive_page(page, "test", dry_run=True)
+        assert path.exists()
+        assert not (tmp_wiki / "archive" / "patterns" / "safe.md").exists()
+
+
+# ---------------------------------------------------------------------------
+# Orphan detection
+# ---------------------------------------------------------------------------
+
+
+class TestOrphanDetection:
+    def test_finds_orphan_page(self, wiki_hygiene: Any, tmp_wiki: Path) -> None:
+        make_page(tmp_wiki, "patterns/lonely.md")
+        orphans = wiki_hygiene.find_orphan_pages()
+        assert len(orphans) == 1
+        assert orphans[0].path.stem == "lonely"
+
+    def test_page_referenced_in_index_is_not_orphan(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        make_page(tmp_wiki, "patterns/linked.md")
+        idx = tmp_wiki / "index.md"
+        idx.write_text(idx.read_text() + "- [Linked](patterns/linked.md) — desc\n")
+        orphans = wiki_hygiene.find_orphan_pages()
+        assert not any(p.path.stem == "linked" for p in orphans)
+
+    def test_page_referenced_in_related_is_not_orphan(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        make_page(tmp_wiki, "patterns/referenced.md")
+        make_page(
+            tmp_wiki,
+            "patterns/referencer.md",
+            related=["patterns/referenced.md"],
+        )
+        orphans = wiki_hygiene.find_orphan_pages()
+        stems = {p.path.stem for p in orphans}
+        assert "referenced" not in stems
+
+    def test_fix_orphan_adds_to_index(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_page(tmp_wiki, "patterns/orphan.md", title="Orphan Test")
+        page = wiki_hygiene.parse_page(path)
+        wiki_hygiene.fix_orphan_page(page)
+        idx_text = (tmp_wiki / "index.md").read_text()
+        assert "patterns/orphan.md" in idx_text
+
+
+# ---------------------------------------------------------------------------
+# Broken cross-references
+# ---------------------------------------------------------------------------
+
+
+class TestBrokenCrossRefs:
+    def test_detects_broken_link(self, wiki_hygiene: Any, tmp_wiki: Path) -> None:
+        make_page(
+            tmp_wiki,
+            "patterns/source.md",
+            body="See [nonexistent](patterns/does-not-exist.md) for details.\n",
+        )
+        broken = wiki_hygiene.find_broken_cross_refs()
+        assert len(broken) == 1
+        target, bad, suggested = broken[0]
+        assert bad == "patterns/does-not-exist.md"
+
+    def test_fuzzy_match_finds_near_miss(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        make_page(tmp_wiki, "patterns/health-endpoint.md")
+        make_page(
+            tmp_wiki,
+            "patterns/source.md",
+            body="See [H](patterns/health-endpoints.md) — typo.\n",
+        )
+        broken = wiki_hygiene.find_broken_cross_refs()
+        assert len(broken) >= 1
+        _, bad, suggested = broken[0]
+        assert suggested == "patterns/health-endpoint.md"
+
+    def test_fix_broken_xref(self, wiki_hygiene: Any, tmp_wiki: Path) -> None:
+        make_page(tmp_wiki, "patterns/health-endpoint.md")
+        src = make_page(
+            tmp_wiki,
+            "patterns/source.md",
+            body="See [H](patterns/health-endpoints.md).\n",
+        )
+        broken = wiki_hygiene.find_broken_cross_refs()
+        for target, bad, suggested in broken:
+            wiki_hygiene.fix_broken_cross_ref(target, bad, suggested)
+        text = src.read_text()
+        assert "patterns/health-endpoints.md" not in text
+        assert "patterns/health-endpoint.md" in text
+
+    def test_archived_link_triggers_restore(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        # Page in archive, referenced by a live page
+        make_page(
+            tmp_wiki,
+            "archive/patterns/ghost.md",
+            confidence="stale",
+            extra_fm={
+                "archived_date": "2026-01-01",
+                "archived_reason": "test",
+                "original_path": "patterns/ghost.md",
+            },
+        )
+        make_page(
+            tmp_wiki,
+            "patterns/caller.md",
+            body="See [ghost](patterns/ghost.md).\n",
+        )
+        broken = wiki_hygiene.find_broken_cross_refs()
+        assert len(broken) >= 1
+        for target, bad, suggested in broken:
+            if suggested and suggested.startswith("__RESTORE__"):
+                wiki_hygiene.fix_broken_cross_ref(target, bad, suggested)
+        # After restore, ghost should be live again
+        assert (tmp_wiki / "patterns" / "ghost.md").exists()
+
+
+# ---------------------------------------------------------------------------
+# Index drift
+# ---------------------------------------------------------------------------
+
+
+class TestIndexDrift:
+    def test_finds_page_missing_from_index(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        make_page(tmp_wiki, "patterns/missing.md")
+        missing, stale = wiki_hygiene.find_index_drift()
+        assert "patterns/missing.md" in missing
+        assert stale == []
+
+    def test_finds_stale_index_entry(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        idx = tmp_wiki / "index.md"
+        idx.write_text(
+            idx.read_text()
+            + "- [Ghost](patterns/ghost.md) — page that no longer exists\n"
+        )
+        missing, stale = wiki_hygiene.find_index_drift()
+        assert "patterns/ghost.md" in stale
+
+    def test_fix_adds_missing_and_removes_stale(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        make_page(tmp_wiki, "patterns/new.md")
+        idx = tmp_wiki / "index.md"
+        idx.write_text(
+            idx.read_text()
+            + "- [Gone](patterns/gone.md) — deleted page\n"
+        )
+        missing, stale = wiki_hygiene.find_index_drift()
+        wiki_hygiene.fix_index_drift(missing, stale)
+        idx_text = idx.read_text()
+        assert "patterns/new.md" in idx_text
+        assert "patterns/gone.md" not in idx_text
+
+
+# ---------------------------------------------------------------------------
+# Empty stubs
+# ---------------------------------------------------------------------------
+
+
+class TestEmptyStubs:
+    def test_flags_small_body(self, wiki_hygiene: Any, tmp_wiki: Path) -> None:
+        make_page(tmp_wiki, "patterns/stub.md", body="# Stub\n\nShort.\n")
+        stubs = wiki_hygiene.find_empty_stubs()
+        assert len(stubs) == 1
+        assert stubs[0].path.stem == "stub"
+
+    def test_ignores_substantive_pages(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        body = "# Full\n\n" + ("This is substantive content. " * 20) + "\n"
+        make_page(tmp_wiki, "patterns/full.md", body=body)
+        stubs = wiki_hygiene.find_empty_stubs()
+        assert stubs == []
+
+
+# ---------------------------------------------------------------------------
+# Conversation refresh signals
+# ---------------------------------------------------------------------------
+
+
+class TestConversationRefreshSignals:
+    def test_picks_up_related_link(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        make_page(tmp_wiki, "patterns/hot.md", last_verified="2026-01-01")
+        make_conversation(
+            tmp_wiki,
+            "test",
+            "2026-04-11-abc.md",
+            date="2026-04-11",
+            related=["patterns/hot.md"],
+        )
+        refs = wiki_hygiene.scan_conversation_references()
+        assert "patterns/hot.md" in refs
+        assert refs["patterns/hot.md"] == date(2026, 4, 11)
+
+    def test_apply_refresh_updates_last_verified(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_page(tmp_wiki, "patterns/hot.md", last_verified="2026-01-01")
+        make_conversation(
+            tmp_wiki,
+            "test",
+            "2026-04-11-abc.md",
+            date="2026-04-11",
+            related=["patterns/hot.md"],
+        )
+        refs = wiki_hygiene.scan_conversation_references()
+        changes = wiki_hygiene.apply_refresh_signals(refs)
+        assert len(changes) == 1
+
+        reparsed = wiki_hygiene.parse_page(path)
+        assert reparsed.frontmatter["last_verified"] == "2026-04-11"
+
+    def test_bumps_low_confidence_to_medium(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_page(
+            tmp_wiki,
+            "patterns/reviving.md",
+            confidence="low",
+            last_verified="2026-01-01",
+        )
+        make_conversation(
+            tmp_wiki,
+            "test",
+            "2026-04-11-ref.md",
+            date="2026-04-11",
+            related=["patterns/reviving.md"],
+        )
+        refs = wiki_hygiene.scan_conversation_references()
+        wiki_hygiene.apply_refresh_signals(refs)
+        reparsed = wiki_hygiene.parse_page(path)
+        assert reparsed.frontmatter["confidence"] == "medium"
+
+
+# ---------------------------------------------------------------------------
+# Auto-restore
+# ---------------------------------------------------------------------------
+
+
+class TestAutoRestore:
+    def test_restores_page_referenced_in_conversation(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        # Archive a page
+        path = make_page(tmp_wiki, "patterns/returning.md")
+        page = wiki_hygiene.parse_page(path)
+        wiki_hygiene.archive_page(page, "aging out")
+        assert (tmp_wiki / "archive" / "patterns" / "returning.md").exists()
+
+        # Reference it in a conversation
+        make_conversation(
+            tmp_wiki,
+            "test",
+            "2026-04-12-ref.md",
+            related=["patterns/returning.md"],
+        )
+
+        # Auto-restore
+        restored = wiki_hygiene.auto_restore_archived()
+        assert len(restored) == 1
+        assert (tmp_wiki / "patterns" / "returning.md").exists()
+        assert not (tmp_wiki / "archive" / "patterns" / "returning.md").exists()
+
+
+# ---------------------------------------------------------------------------
+# Staging / archive index sync
+# ---------------------------------------------------------------------------
+
+
+class TestIndexSync:
+    def test_staging_sync_regenerates_index(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        make_staging_page(tmp_wiki, "patterns/pending.md")
+        changed = wiki_hygiene.sync_staging_index()
+        assert changed is True
+        text = (tmp_wiki / "staging" / "index.md").read_text()
+        assert "pending.md" in text
+
+    def test_staging_sync_idempotent(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        make_staging_page(tmp_wiki, "patterns/pending.md")
+        wiki_hygiene.sync_staging_index()
+        changed_second = wiki_hygiene.sync_staging_index()
+        assert changed_second is False
+
+    def test_archive_sync_regenerates_index(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        make_page(
+            tmp_wiki,
+            "archive/patterns/old.md",
+            confidence="stale",
+            extra_fm={
+                "archived_date": "2026-01-01",
+                "archived_reason": "test",
+                "original_path": "patterns/old.md",
+            },
+        )
+        changed = wiki_hygiene.sync_archive_index()
+        assert changed is True
+        text = (tmp_wiki / "archive" / "index.md").read_text()
+        assert "old" in text.lower()
+
+
+# ---------------------------------------------------------------------------
+# State drift detection
+# ---------------------------------------------------------------------------
+
+
+class TestStateDrift:
+    def test_detects_missing_raw_file(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        import json
+        state = {
+            "harvested_urls": {
+                "https://example.com": {
+                    "raw_file": "raw/harvested/missing.md",
+                    "wiki_pages": [],
+                }
+            }
+        }
+        (tmp_wiki / ".harvest-state.json").write_text(json.dumps(state))
+        issues = wiki_hygiene.find_state_drift()
+        assert any("missing.md" in i for i in issues)
+
+    def test_empty_state_has_no_drift(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        # Fixture already creates an empty .harvest-state.json
+        issues = wiki_hygiene.find_state_drift()
+        assert issues == []
+
+
+# ---------------------------------------------------------------------------
+# Hygiene state file
+# ---------------------------------------------------------------------------
+
+
+class TestHygieneState:
+    def test_load_returns_defaults_when_missing(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        state = wiki_hygiene.load_hygiene_state()
+        assert state["last_quick_run"] is None
+        assert state["pages_checked"] == {}
+
+    def test_save_and_reload(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        state = wiki_hygiene.load_hygiene_state()
+        state["last_quick_run"] = "2026-04-12T00:00:00Z"
+        wiki_hygiene.save_hygiene_state(state)
+
+        reloaded = wiki_hygiene.load_hygiene_state()
+        assert reloaded["last_quick_run"] == "2026-04-12T00:00:00Z"
+
+    def test_mark_page_checked_stores_hash(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_page(tmp_wiki, "patterns/tracked.md")
+        page = wiki_hygiene.parse_page(path)
+        state = wiki_hygiene.load_hygiene_state()
+        wiki_hygiene.mark_page_checked(state, page, "quick")
+        entry = state["pages_checked"]["patterns/tracked.md"]
+        assert entry["content_hash"].startswith("sha256:")
+        assert "last_checked_quick" in entry
+
+    def test_page_changed_since_detects_body_change(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_page(tmp_wiki, "patterns/mutable.md", body="# One\n\nOne body.\n")
+        page = wiki_hygiene.parse_page(path)
+        state = wiki_hygiene.load_hygiene_state()
+        wiki_hygiene.mark_page_checked(state, page, "quick")
+
+        assert not wiki_hygiene.page_changed_since(state, page, "quick")
+
+        # Mutate the body
+        path.write_text(path.read_text().replace("One body", "Two body"))
+        new_page = wiki_hygiene.parse_page(path)
+        assert wiki_hygiene.page_changed_since(state, new_page, "quick")
+
+
+# ---------------------------------------------------------------------------
+# Full quick-hygiene run end-to-end (dry-run, idempotent)
+# ---------------------------------------------------------------------------
+
+
+class TestRunQuickHygiene:
+    def test_empty_wiki_produces_empty_report(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        report = wiki_hygiene.run_quick_hygiene(dry_run=True)
+        assert report.backfilled == []
+        assert report.archived == []
+
+    def test_real_run_is_idempotent(
+        self, wiki_hygiene: Any, tmp_wiki: Path
+    ) -> None:
+        make_page(tmp_wiki, "patterns/one.md")
+        make_page(tmp_wiki, "patterns/two.md")
+
+        report1 = wiki_hygiene.run_quick_hygiene()
+        # Second run should have 0 work
+        report2 = wiki_hygiene.run_quick_hygiene()
+        assert report2.backfilled == []
+        assert report2.decayed == []
+        assert report2.archived == []
+        assert report2.frontmatter_fixes == []
@@ -0,0 +1,314 @@
+"""Unit tests for scripts/wiki_lib.py — the shared frontmatter library."""
+
+from __future__ import annotations
+
+from datetime import date
+from pathlib import Path
+from typing import Any
+
+import pytest
+
+from conftest import make_page, make_staging_page
+
+
+# ---------------------------------------------------------------------------
+# parse_yaml_lite
+# ---------------------------------------------------------------------------
+
+
+class TestParseYamlLite:
+    def test_simple_key_value(self, wiki_lib: Any) -> None:
+        result = wiki_lib.parse_yaml_lite("title: Hello\ntype: pattern\n")
+        assert result == {"title": "Hello", "type": "pattern"}
+
+    def test_quoted_values_are_stripped(self, wiki_lib: Any) -> None:
+        result = wiki_lib.parse_yaml_lite('title: "Hello"\nother: \'World\'\n')
+        assert result["title"] == "Hello"
+        assert result["other"] == "World"
+
+    def test_inline_list(self, wiki_lib: Any) -> None:
+        result = wiki_lib.parse_yaml_lite("tags: [a, b, c]\n")
+        assert result["tags"] == ["a", "b", "c"]
+
+    def test_empty_inline_list(self, wiki_lib: Any) -> None:
+        result = wiki_lib.parse_yaml_lite("sources: []\n")
+        assert result["sources"] == []
+
+    def test_block_list(self, wiki_lib: Any) -> None:
+        yaml = "related:\n  - foo.md\n  - bar.md\n  - baz.md\n"
+        result = wiki_lib.parse_yaml_lite(yaml)
+        assert result["related"] == ["foo.md", "bar.md", "baz.md"]
+
+    def test_mixed_keys(self, wiki_lib: Any) -> None:
+        yaml = (
+            "title: Mixed\n"
+            "type: pattern\n"
+            "related:\n"
+            "  - one.md\n"
+            "  - two.md\n"
+            "confidence: high\n"
+        )
+        result = wiki_lib.parse_yaml_lite(yaml)
+        assert result["title"] == "Mixed"
+        assert result["related"] == ["one.md", "two.md"]
+        assert result["confidence"] == "high"
+
+    def test_empty_value(self, wiki_lib: Any) -> None:
+        result = wiki_lib.parse_yaml_lite("empty: \n")
+        assert result["empty"] == ""
+
+    def test_comment_lines_ignored(self, wiki_lib: Any) -> None:
+        result = wiki_lib.parse_yaml_lite("# this is a comment\ntitle: X\n")
+        assert result == {"title": "X"}
+
+    def test_blank_lines_ignored(self, wiki_lib: Any) -> None:
+        result = wiki_lib.parse_yaml_lite("\ntitle: X\n\ntype: pattern\n\n")
+        assert result == {"title": "X", "type": "pattern"}
+
+
+# ---------------------------------------------------------------------------
+# parse_page
+# ---------------------------------------------------------------------------
+
+
+class TestParsePage:
+    def test_parses_valid_page(self, wiki_lib: Any, tmp_wiki: Path) -> None:
+        path = make_page(tmp_wiki, "patterns/foo.md", title="Foo", confidence="high")
+        page = wiki_lib.parse_page(path)
+        assert page is not None
+        assert page.frontmatter["title"] == "Foo"
+        assert page.frontmatter["confidence"] == "high"
+        assert "# Content" in page.body
+
+    def test_returns_none_without_frontmatter(
+        self, wiki_lib: Any, tmp_wiki: Path
+    ) -> None:
+        path = tmp_wiki / "patterns" / "no-fm.md"
+        path.write_text("# Just a body\n\nNo frontmatter.\n")
+        assert wiki_lib.parse_page(path) is None
+
+    def test_returns_none_for_missing_file(self, wiki_lib: Any, tmp_wiki: Path) -> None:
+        assert wiki_lib.parse_page(tmp_wiki / "nonexistent.md") is None
+
+    def test_returns_none_for_truncated_frontmatter(
+        self, wiki_lib: Any, tmp_wiki: Path
+    ) -> None:
+        path = tmp_wiki / "patterns" / "broken.md"
+        path.write_text("---\ntitle: Broken\n# never closed\n")
+        assert wiki_lib.parse_page(path) is None
+
+    def test_preserves_body_exactly(self, wiki_lib: Any, tmp_wiki: Path) -> None:
+        body = "# Heading\n\nLine 1\nLine 2\n\n## Sub\n\nMore.\n"
+        path = make_page(tmp_wiki, "patterns/body.md", body=body)
+        page = wiki_lib.parse_page(path)
+        assert page.body == body
+
+
+# ---------------------------------------------------------------------------
+# serialize_frontmatter
+# ---------------------------------------------------------------------------
+
+
+class TestSerializeFrontmatter:
+    def test_preferred_key_order(self, wiki_lib: Any) -> None:
+        fm = {
+            "related": ["a.md"],
+            "sources": ["raw/x.md"],
+            "title": "T",
+            "confidence": "high",
+            "type": "pattern",
+        }
+        yaml = wiki_lib.serialize_frontmatter(fm)
+        lines = yaml.split("\n")
+        # title/type/confidence should come before sources/related
+        assert lines[0].startswith("title:")
+        assert lines[1].startswith("type:")
+        assert lines[2].startswith("confidence:")
+        assert "sources:" in yaml
+        assert "related:" in yaml
+        # sources must come before related (both are in PREFERRED_KEY_ORDER)
+        assert yaml.index("sources:") < yaml.index("related:")
+
+    def test_list_formatted_as_block(self, wiki_lib: Any) -> None:
+        fm = {"title": "T", "related": ["one.md", "two.md"]}
+        yaml = wiki_lib.serialize_frontmatter(fm)
+        assert "related:\n  - one.md\n  - two.md" in yaml
+
+    def test_empty_list(self, wiki_lib: Any) -> None:
+        fm = {"title": "T", "sources": []}
+        yaml = wiki_lib.serialize_frontmatter(fm)
+        assert "sources: []" in yaml
+
+    def test_unknown_keys_appear_alphabetically_at_end(self, wiki_lib: Any) -> None:
+        fm = {"title": "T", "type": "pattern", "zoo": "z", "alpha": "a"}
+        yaml = wiki_lib.serialize_frontmatter(fm)
+        # alpha should come before zoo (alphabetical)
+        assert yaml.index("alpha:") < yaml.index("zoo:")
+
+
+# ---------------------------------------------------------------------------
+# Round-trip: parse_page → write_page → parse_page
+# ---------------------------------------------------------------------------
+
+
+class TestRoundTrip:
+    def test_round_trip_preserves_core_fields(
+        self, wiki_lib: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_page(
+            tmp_wiki,
+            "patterns/rt.md",
+            title="Round Trip",
+            sources=["raw/a.md", "raw/b.md"],
+            related=["patterns/other.md"],
+        )
+        page1 = wiki_lib.parse_page(path)
+        wiki_lib.write_page(page1)
+        page2 = wiki_lib.parse_page(path)
+        assert page2.frontmatter["title"] == "Round Trip"
+        assert page2.frontmatter["sources"] == ["raw/a.md", "raw/b.md"]
+        assert page2.frontmatter["related"] == ["patterns/other.md"]
+        assert page2.body == page1.body
+
+    def test_round_trip_preserves_mutation(
+        self, wiki_lib: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_page(tmp_wiki, "patterns/rt.md", confidence="high")
+        page = wiki_lib.parse_page(path)
+        page.frontmatter["confidence"] = "low"
+        wiki_lib.write_page(page)
+        page2 = wiki_lib.parse_page(path)
+        assert page2.frontmatter["confidence"] == "low"
+
+
+# ---------------------------------------------------------------------------
+# parse_date
+# ---------------------------------------------------------------------------
+
+
+class TestParseDate:
+    def test_iso_format(self, wiki_lib: Any) -> None:
+        assert wiki_lib.parse_date("2026-04-10") == date(2026, 4, 10)
+
+    def test_empty_string_returns_none(self, wiki_lib: Any) -> None:
+        assert wiki_lib.parse_date("") is None
+
+    def test_none_returns_none(self, wiki_lib: Any) -> None:
+        assert wiki_lib.parse_date(None) is None
+
+    def test_invalid_format_returns_none(self, wiki_lib: Any) -> None:
+        assert wiki_lib.parse_date("not-a-date") is None
+        assert wiki_lib.parse_date("2026/04/10") is None
+        assert wiki_lib.parse_date("04-10-2026") is None
+
+    def test_date_object_passthrough(self, wiki_lib: Any) -> None:
+        d = date(2026, 4, 10)
+        assert wiki_lib.parse_date(d) == d
+
+
+# ---------------------------------------------------------------------------
+# page_content_hash
+# ---------------------------------------------------------------------------
+
+
+class TestPageContentHash:
+    def test_deterministic(self, wiki_lib: Any, tmp_wiki: Path) -> None:
+        path = make_page(tmp_wiki, "patterns/h.md", body="# Same body\n\nLine.\n")
+        page = wiki_lib.parse_page(path)
+        h1 = wiki_lib.page_content_hash(page)
+        h2 = wiki_lib.page_content_hash(page)
+        assert h1 == h2
+        assert h1.startswith("sha256:")
+
+    def test_different_bodies_yield_different_hashes(
+        self, wiki_lib: Any, tmp_wiki: Path
+    ) -> None:
+        p1 = make_page(tmp_wiki, "patterns/a.md", body="# A\n\nAlpha.\n")
+        p2 = make_page(tmp_wiki, "patterns/b.md", body="# B\n\nBeta.\n")
+        h1 = wiki_lib.page_content_hash(wiki_lib.parse_page(p1))
+        h2 = wiki_lib.page_content_hash(wiki_lib.parse_page(p2))
+        assert h1 != h2
+
+    def test_frontmatter_changes_dont_change_hash(
+        self, wiki_lib: Any, tmp_wiki: Path
+    ) -> None:
+        """Hash is body-only so mechanical frontmatter fixes don't churn it."""
+        path = make_page(tmp_wiki, "patterns/f.md", confidence="high")
+        page = wiki_lib.parse_page(path)
+        h1 = wiki_lib.page_content_hash(page)
+
+        page.frontmatter["confidence"] = "medium"
+        wiki_lib.write_page(page)
+        page2 = wiki_lib.parse_page(path)
+        h2 = wiki_lib.page_content_hash(page2)
+        assert h1 == h2
+
+
+# ---------------------------------------------------------------------------
+# Iterators
+# ---------------------------------------------------------------------------
+
+
+class TestIterators:
+    def test_iter_live_pages_finds_all_types(
+        self, wiki_lib: Any, tmp_wiki: Path
+    ) -> None:
+        make_page(tmp_wiki, "patterns/p1.md")
+        make_page(tmp_wiki, "patterns/p2.md")
+        make_page(tmp_wiki, "decisions/d1.md")
+        make_page(tmp_wiki, "concepts/c1.md")
+        make_page(tmp_wiki, "environments/e1.md")
+        pages = wiki_lib.iter_live_pages()
+        assert len(pages) == 5
+        stems = {p.path.stem for p in pages}
+        assert stems == {"p1", "p2", "d1", "c1", "e1"}
+
+    def test_iter_live_pages_empty_wiki(
+        self, wiki_lib: Any, tmp_wiki: Path
+    ) -> None:
+        assert wiki_lib.iter_live_pages() == []
+
+    def test_iter_staging_pages(self, wiki_lib: Any, tmp_wiki: Path) -> None:
+        make_staging_page(tmp_wiki, "patterns/s1.md")
+        make_staging_page(tmp_wiki, "decisions/s2.md", ptype="decision")
+        pages = wiki_lib.iter_staging_pages()
+        assert len(pages) == 2
+        assert all(p.frontmatter.get("status") == "pending" for p in pages)
+
+    def test_iter_archived_pages(self, wiki_lib: Any, tmp_wiki: Path) -> None:
+        make_page(
+            tmp_wiki,
+            "archive/patterns/old.md",
+            confidence="stale",
+            extra_fm={
+                "archived_date": "2026-01-01",
+                "archived_reason": "test",
+                "original_path": "patterns/old.md",
+            },
+        )
+        pages = wiki_lib.iter_archived_pages()
+        assert len(pages) == 1
+        assert pages[0].frontmatter["archived_reason"] == "test"
+
+    def test_iter_skips_malformed_pages(
+        self, wiki_lib: Any, tmp_wiki: Path
+    ) -> None:
+        make_page(tmp_wiki, "patterns/good.md")
+        (tmp_wiki / "patterns" / "no-fm.md").write_text("# Just a body\n")
+        pages = wiki_lib.iter_live_pages()
+        assert len(pages) == 1
+        assert pages[0].path.stem == "good"
+
+
+# ---------------------------------------------------------------------------
+# WIKI_DIR env var override
+# ---------------------------------------------------------------------------
+
+
+class TestWikiDirEnvVar:
+    def test_honors_env_var(self, wiki_lib: Any, tmp_wiki: Path) -> None:
+        """The tmp_wiki fixture sets WIKI_DIR — verify wiki_lib picks it up."""
+        assert wiki_lib.WIKI_DIR == tmp_wiki
+        assert wiki_lib.STAGING_DIR == tmp_wiki / "staging"
+        assert wiki_lib.ARCHIVE_DIR == tmp_wiki / "archive"
+        assert wiki_lib.INDEX_FILE == tmp_wiki / "index.md"
@@ -0,0 +1,267 @@
+"""Integration tests for scripts/wiki-staging.py."""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+from typing import Any
+
+import pytest
+
+from conftest import make_page, make_staging_page
+
+
+# ---------------------------------------------------------------------------
+# List + page_summary
+# ---------------------------------------------------------------------------
+
+
+class TestListPending:
+    def test_empty_staging(self, wiki_staging: Any, tmp_wiki: Path) -> None:
+        assert wiki_staging.list_pending() == []
+
+    def test_finds_pages_in_all_type_subdirs(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        make_staging_page(tmp_wiki, "patterns/p.md", ptype="pattern")
+        make_staging_page(tmp_wiki, "decisions/d.md", ptype="decision")
+        make_staging_page(tmp_wiki, "concepts/c.md", ptype="concept")
+        pending = wiki_staging.list_pending()
+        assert len(pending) == 3
+
+    def test_skips_staging_index_md(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        (tmp_wiki / "staging" / "index.md").write_text(
+            "---\ntitle: Index\n---\n# staging index\n"
+        )
+        make_staging_page(tmp_wiki, "patterns/real.md")
+        pending = wiki_staging.list_pending()
+        assert len(pending) == 1
+        assert pending[0].path.stem == "real"
+
+    def test_page_summary_populates_all_fields(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        make_staging_page(
+            tmp_wiki,
+            "patterns/sample.md",
+            title="Sample",
+            staged_by="wiki-harvest",
+            staged_date="2026-04-10",
+            target_path="patterns/sample.md",
+        )
+        pending = wiki_staging.list_pending()
+        summary = wiki_staging.page_summary(pending[0])
+        assert summary["title"] == "Sample"
+        assert summary["type"] == "pattern"
+        assert summary["staged_by"] == "wiki-harvest"
+        assert summary["target_path"] == "patterns/sample.md"
+        assert summary["modifies"] is None
+
+
+# ---------------------------------------------------------------------------
+# Promote
+# ---------------------------------------------------------------------------
+
+
+class TestPromote:
+    def test_moves_file_to_live(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        make_staging_page(tmp_wiki, "patterns/new.md", title="New Page")
+        page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "new.md")
+        result = wiki_staging.promote(page)
+        assert result is not None
+        assert (tmp_wiki / "patterns" / "new.md").exists()
+        assert not (tmp_wiki / "staging" / "patterns" / "new.md").exists()
+
+    def test_strips_staging_only_fields(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        make_staging_page(tmp_wiki, "patterns/clean.md")
+        page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "clean.md")
+        wiki_staging.promote(page)
+
+        promoted = wiki_staging.parse_page(tmp_wiki / "patterns" / "clean.md")
+        for field in ("status", "staged_date", "staged_by", "target_path", "compilation_notes"):
+            assert field not in promoted.frontmatter
+
+    def test_preserves_origin_automated(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        make_staging_page(tmp_wiki, "patterns/auto.md")
+        page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "auto.md")
+        wiki_staging.promote(page)
+        promoted = wiki_staging.parse_page(tmp_wiki / "patterns" / "auto.md")
+        assert promoted.frontmatter["origin"] == "automated"
+
+    def test_updates_main_index(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        make_staging_page(tmp_wiki, "patterns/indexed.md", title="Indexed Page")
+        page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "indexed.md")
+        wiki_staging.promote(page)
+
+        idx = (tmp_wiki / "index.md").read_text()
+        assert "patterns/indexed.md" in idx
+
+    def test_regenerates_staging_index(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        make_staging_page(tmp_wiki, "patterns/one.md")
+        make_staging_page(tmp_wiki, "patterns/two.md")
+        page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "one.md")
+        wiki_staging.promote(page)
+
+        idx = (tmp_wiki / "staging" / "index.md").read_text()
+        assert "two.md" in idx
+        assert "1 pending" in idx
+
+    def test_dry_run_does_not_move(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        make_staging_page(tmp_wiki, "patterns/safe.md")
+        page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "safe.md")
+        wiki_staging.promote(page, dry_run=True)
+        assert (tmp_wiki / "staging" / "patterns" / "safe.md").exists()
+        assert not (tmp_wiki / "patterns" / "safe.md").exists()
+
+
+# ---------------------------------------------------------------------------
+# Promote with modifies field
+# ---------------------------------------------------------------------------
+
+
+class TestPromoteUpdate:
+    def test_update_overwrites_existing_live_page(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        # Existing live page
+        make_page(
+            tmp_wiki,
+            "patterns/existing.md",
+            title="Old Title",
+            last_compiled="2026-01-01",
+        )
+        # Staging update with `modifies`
+        make_staging_page(
+            tmp_wiki,
+            "patterns/existing.md",
+            title="New Title",
+            modifies="patterns/existing.md",
+            target_path="patterns/existing.md",
+        )
+        page = wiki_staging.parse_page(
+            tmp_wiki / "staging" / "patterns" / "existing.md"
+        )
+        wiki_staging.promote(page)
+
+        live = wiki_staging.parse_page(tmp_wiki / "patterns" / "existing.md")
+        assert live.frontmatter["title"] == "New Title"
+
+
+# ---------------------------------------------------------------------------
+# Reject
+# ---------------------------------------------------------------------------
+
+
+class TestReject:
+    def test_deletes_file(self, wiki_staging: Any, tmp_wiki: Path) -> None:
+        path = make_staging_page(tmp_wiki, "patterns/bad.md")
+        page = wiki_staging.parse_page(path)
+        wiki_staging.reject(page, "duplicate")
+        assert not path.exists()
+
+    def test_records_rejection_in_harvest_state(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        # Create a raw harvested file with a source_url
+        raw = tmp_wiki / "raw" / "harvested" / "example-com-test.md"
+        raw.parent.mkdir(parents=True, exist_ok=True)
+        raw.write_text(
+            "---\n"
+            "source_url: https://example.com/test\n"
+            "fetched_date: 2026-04-10\n"
+            "fetch_method: trafilatura\n"
+            "discovered_in: conversations/mc/test.md\n"
+            "content_hash: sha256:abc\n"
+            "---\n"
+            "# Example\n"
+        )
+
+        # Create a staging page that references it
+        make_staging_page(tmp_wiki, "patterns/reject-me.md")
+        staging_path = tmp_wiki / "staging" / "patterns" / "reject-me.md"
+        # Inject sources so reject() finds the harvest_source
+        page = wiki_staging.parse_page(staging_path)
+        page.frontmatter["sources"] = ["raw/harvested/example-com-test.md"]
+        wiki_staging.write_page(page)
+
+        page = wiki_staging.parse_page(staging_path)
+        wiki_staging.reject(page, "test rejection")
+
+        state = json.loads((tmp_wiki / ".harvest-state.json").read_text())
+        assert "https://example.com/test" in state["rejected_urls"]
+        assert state["rejected_urls"]["https://example.com/test"]["reason"] == "test rejection"
+
+    def test_reject_dry_run_keeps_file(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        path = make_staging_page(tmp_wiki, "patterns/kept.md")
+        page = wiki_staging.parse_page(path)
+        wiki_staging.reject(page, "test", dry_run=True)
+        assert path.exists()
+
+
+# ---------------------------------------------------------------------------
+# Staging index regeneration
+# ---------------------------------------------------------------------------
+
+
+class TestStagingIndexRegen:
+    def test_empty_index_shows_none(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        wiki_staging.regenerate_staging_index()
+        idx = (tmp_wiki / "staging" / "index.md").read_text()
+        assert "0 pending" in idx
+        assert "No pending items" in idx
+
+    def test_lists_pending_items(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        make_staging_page(tmp_wiki, "patterns/a.md", title="A")
+        make_staging_page(tmp_wiki, "decisions/b.md", title="B", ptype="decision")
+        wiki_staging.regenerate_staging_index()
+        idx = (tmp_wiki / "staging" / "index.md").read_text()
+        assert "2 pending" in idx
+        assert "A" in idx and "B" in idx
+
+
+# ---------------------------------------------------------------------------
+# Path resolution
+# ---------------------------------------------------------------------------
+
+
+class TestResolvePage:
+    def test_resolves_staging_relative_path(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        make_staging_page(tmp_wiki, "patterns/foo.md")
+        page = wiki_staging.resolve_page("staging/patterns/foo.md")
+        assert page is not None
+        assert page.path.name == "foo.md"
+
+    def test_returns_none_for_missing(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        assert wiki_staging.resolve_page("staging/patterns/does-not-exist.md") is None
+
+    def test_resolves_bare_patterns_path_as_staging(
+        self, wiki_staging: Any, tmp_wiki: Path
+    ) -> None:
+        make_staging_page(tmp_wiki, "patterns/bare.md")
+        page = wiki_staging.resolve_page("patterns/bare.md")
+        assert page is not None
+        assert "staging" in str(page.path)