Initial commit — memex

A compounding LLM-maintained knowledge wiki.

Synthesis of Andrej Karpathy's persistent-wiki gist and milla-jovovich's
mempalace, with an automation layer on top for conversation mining, URL
harvesting, human-in-the-loop staging, staleness decay, and hygiene.

Includes:
- 11 pipeline scripts (extract, summarize, index, harvest, stage,
  hygiene, maintain, sync, + shared library)
- Full docs: README, SETUP, ARCHITECTURE, DESIGN-RATIONALE, CUSTOMIZE
- Example CLAUDE.md files (wiki schema + global instructions) tuned for
  the three-collection qmd setup
- 171-test pytest suite (cross-platform, runs in ~1.3s)
- MIT licensed
This commit is contained in:
Eric Turner
2026-04-12 21:16:02 -06:00
commit ee54a2f5d4
31 changed files with 10792 additions and 0 deletions

107
tests/README.md Normal file
View File

@@ -0,0 +1,107 @@
# Wiki Pipeline Test Suite
Pytest-based test suite covering all 11 scripts in `scripts/`. Runs on both
macOS and Linux/WSL, uses only the Python standard library + pytest.
## Running
```bash
# Full suite (from wiki root)
bash tests/run.sh
# Single test file
bash tests/run.sh test_wiki_lib.py
# Single test class or function
bash tests/run.sh test_wiki_hygiene.py::TestArchiveRestore
bash tests/run.sh test_wiki_hygiene.py::TestArchiveRestore::test_restore_reverses_archive
# Pattern matching
bash tests/run.sh -k "archive"
# Verbose
bash tests/run.sh -v
# Stop on first failure
bash tests/run.sh -x
# Or invoke pytest directly from the tests dir
cd tests && python3 -m pytest -v
```
## What's tested
| File | Coverage |
|------|----------|
| `test_wiki_lib.py` | YAML parser, frontmatter round-trip, page iterators, date parsing, content hashing, WIKI_DIR env override |
| `test_wiki_hygiene.py` | Backfill, confidence decay math, frontmatter repair, archive/restore round-trip, orphan detection, broken-xref fuzzy matching, index drift, empty stubs, conversation refresh signals, auto-restore, staging/archive sync, state drift, hygiene state file, full quick-run idempotency |
| `test_wiki_staging.py` | List, promote, reject, promote-with-modifies, dry-run, staging index regeneration, path resolution |
| `test_wiki_harvest.py` | URL classification (harvest/check/skip), private IP detection, URL extraction + filtering, filename derivation, content validation, state management, raw file writing, dry-run CLI smoke test |
| `test_conversation_pipeline.py` | CLI smoke tests for extract-sessions, summarize-conversations, update-conversation-index; dry-run behavior; help flags; integration test with fake conversation files |
| `test_shell_scripts.py` | wiki-maintain.sh / mine-conversations.sh / wiki-sync.sh: help, dry-run, mutex flags, bash syntax check, strict-mode check, shebang check, py_compile for all .py scripts |
## How it works
**Isolation**: Every test runs against a disposable `tmp_wiki` fixture
(pytest `tmp_path`). The fixture sets the `WIKI_DIR` environment variable
so all scripts resolve paths against the tmp directory instead of the real
wiki. No test ever touches `~/projects/wiki`.
**Hyphenated filenames**: Scripts like `wiki-harvest.py` use hyphens, which
Python's `import` can't handle directly. `conftest.py` has a
`_load_script_module` helper that loads a script file by path and exposes
it as a module object.
**Clean module state**: Each test that loads a module clears any cached
import first, so `WIKI_DIR` env overrides take effect correctly between
tests.
**Subprocess tests** (for CLI smoke tests): `conftest.py` provides a
`run_script` fixture that invokes a script via `python3` or `bash` with
`WIKI_DIR` set to the tmp wiki. Uses `subprocess.run` with `capture_output`
and a timeout.
## Cross-platform
- `#!/usr/bin/env bash` shebangs (tested explicitly)
- `set -euo pipefail` in all shell scripts (tested explicitly)
- `bash -n` syntax check on all shell scripts
- `py_compile` on all Python scripts
- Uses `pathlib` everywhere — no hardcoded path separators
- Uses the Python stdlib only (except pytest itself)
## Requirements
- Python 3.11+
- `pytest` — install with `pip install --user pytest` or your distro's package manager
- `bash` (any version — scripts use only portable features)
The tests do NOT require:
- `claude` CLI (mocked / skipped)
- `trafilatura` or `crawl4ai` (only dry-run / classification paths tested)
- `qmd` (reindex phase is skipped in tests)
- Network access
- The real `~/projects/wiki` or `~/.claude/projects` directories
## Speed
Full suite runs in **~1 second** on a modern laptop. All tests are isolated
and independent so they can run in any order and in parallel.
## What's NOT tested
- **Real LLM calls** (`claude -p`): too expensive, non-deterministic.
Tested: CLI parsing, dry-run paths, mocked error handling.
- **Real web fetches** (trafilatura/crawl4ai): too slow, non-deterministic.
Tested: URL classification, filter logic, fetch-result validation.
- **Real git operations** (wiki-sync.sh): requires a git repo fixture.
Tested: script loads, handles non-git dir gracefully, --status exits clean.
- **Real qmd indexing**: tested elsewhere via `qmd collection list` in the
setup verification step.
- **Real Claude Code session JSONL parsing** with actual sessions: would
require fixture JSONL files. Tested: CLI parsing, empty-dir behavior,
`CLAUDE_PROJECTS_DIR` env override.
These are smoke-tested end-to-end via the integration tests in
`test_conversation_pipeline.py` and the dry-run paths in
`test_shell_scripts.py::TestWikiMaintainSh`.

300
tests/conftest.py Normal file
View File

@@ -0,0 +1,300 @@
"""Shared test fixtures for the wiki pipeline test suite.
All tests run against a disposable `tmp_wiki` directory — no test ever
touches the real ~/projects/wiki. Cross-platform: uses pathlib, no
platform-specific paths, and runs on both macOS and Linux/WSL.
"""
from __future__ import annotations
import importlib
import importlib.util
import json
import os
import sys
from pathlib import Path
from typing import Any
import pytest
SCRIPTS_DIR = Path(__file__).resolve().parent.parent / "scripts"
# ---------------------------------------------------------------------------
# Module loading helpers
# ---------------------------------------------------------------------------
#
# The wiki scripts use hyphenated filenames (wiki-hygiene.py etc.) which
# can't be imported via normal `import` syntax. These helpers load a script
# file as a module object so tests can exercise its functions directly.
def _load_script_module(name: str, path: Path) -> Any:
"""Load a Python script file as a module. Clears any cached version first."""
# Clear cached imports so WIKI_DIR env changes take effect between tests
for key in list(sys.modules):
if key in (name, "wiki_lib"):
del sys.modules[key]
# Make sure scripts/ is on sys.path so intra-script imports (wiki_lib) work
scripts_str = str(SCRIPTS_DIR)
if scripts_str not in sys.path:
sys.path.insert(0, scripts_str)
spec = importlib.util.spec_from_file_location(name, path)
assert spec is not None and spec.loader is not None
mod = importlib.util.module_from_spec(spec)
sys.modules[name] = mod
spec.loader.exec_module(mod)
return mod
# ---------------------------------------------------------------------------
# tmp_wiki fixture — builds a realistic wiki tree under a tmp path
# ---------------------------------------------------------------------------
@pytest.fixture
def tmp_wiki(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Path:
"""Set up a disposable wiki tree with all the directories the scripts expect.
Sets the WIKI_DIR environment variable so all imported modules resolve
paths against this tmp directory.
"""
wiki = tmp_path / "wiki"
wiki.mkdir()
# Create the directory tree
for sub in ["patterns", "decisions", "concepts", "environments"]:
(wiki / sub).mkdir()
(wiki / "staging" / sub).mkdir(parents=True)
(wiki / "archive" / sub).mkdir(parents=True)
(wiki / "raw" / "harvested").mkdir(parents=True)
(wiki / "conversations").mkdir()
(wiki / "reports").mkdir()
# Create minimal index.md
(wiki / "index.md").write_text(
"# Wiki Index\n\n"
"## Patterns\n\n"
"## Decisions\n\n"
"## Concepts\n\n"
"## Environments\n\n"
)
# Empty state files
(wiki / ".harvest-state.json").write_text(json.dumps({
"harvested_urls": {},
"skipped_urls": {},
"failed_urls": {},
"rejected_urls": {},
"last_run": None,
}))
# Point all scripts at this tmp wiki
monkeypatch.setenv("WIKI_DIR", str(wiki))
return wiki
# ---------------------------------------------------------------------------
# Sample page factories
# ---------------------------------------------------------------------------
def make_page(
wiki: Path,
rel_path: str,
*,
title: str | None = None,
ptype: str | None = None,
confidence: str = "high",
last_compiled: str = "2026-04-01",
last_verified: str = "2026-04-01",
origin: str = "manual",
sources: list[str] | None = None,
related: list[str] | None = None,
body: str = "# Content\n\nA substantive page with real content so it is not a stub.\n",
extra_fm: dict[str, Any] | None = None,
) -> Path:
"""Write a well-formed wiki page with all required frontmatter fields."""
if sources is None:
sources = []
if related is None:
related = []
"""Write a page to the tmp wiki and return its path."""
path = wiki / rel_path
path.parent.mkdir(parents=True, exist_ok=True)
if title is None:
title = path.stem.replace("-", " ").title()
if ptype is None:
ptype = path.parent.name.rstrip("s")
fm_lines = [
"---",
f"title: {title}",
f"type: {ptype}",
f"confidence: {confidence}",
f"origin: {origin}",
f"last_compiled: {last_compiled}",
f"last_verified: {last_verified}",
]
if sources is not None:
if sources:
fm_lines.append("sources:")
fm_lines.extend(f" - {s}" for s in sources)
else:
fm_lines.append("sources: []")
if related is not None:
if related:
fm_lines.append("related:")
fm_lines.extend(f" - {r}" for r in related)
else:
fm_lines.append("related: []")
if extra_fm:
for k, v in extra_fm.items():
if isinstance(v, list):
if v:
fm_lines.append(f"{k}:")
fm_lines.extend(f" - {item}" for item in v)
else:
fm_lines.append(f"{k}: []")
else:
fm_lines.append(f"{k}: {v}")
fm_lines.append("---")
path.write_text("\n".join(fm_lines) + "\n" + body)
return path
def make_conversation(
wiki: Path,
project: str,
filename: str,
*,
date: str = "2026-04-10",
status: str = "summarized",
messages: int = 100,
related: list[str] | None = None,
body: str = "## Summary\n\nTest conversation summary.\n",
) -> Path:
"""Write a conversation file to the tmp wiki."""
proj_dir = wiki / "conversations" / project
proj_dir.mkdir(parents=True, exist_ok=True)
path = proj_dir / filename
fm_lines = [
"---",
f"title: Test Conversation {filename}",
"type: conversation",
f"project: {project}",
f"date: {date}",
f"status: {status}",
f"messages: {messages}",
]
if related:
fm_lines.append("related:")
fm_lines.extend(f" - {r}" for r in related)
fm_lines.append("---")
path.write_text("\n".join(fm_lines) + "\n" + body)
return path
def make_staging_page(
wiki: Path,
rel_under_staging: str,
*,
title: str = "Pending Page",
ptype: str = "pattern",
staged_by: str = "wiki-harvest",
staged_date: str = "2026-04-10",
modifies: str | None = None,
target_path: str | None = None,
body: str = "# Pending\n\nStaged content body.\n",
) -> Path:
path = wiki / "staging" / rel_under_staging
path.parent.mkdir(parents=True, exist_ok=True)
if target_path is None:
target_path = rel_under_staging
fm_lines = [
"---",
f"title: {title}",
f"type: {ptype}",
"confidence: medium",
"origin: automated",
"status: pending",
f"staged_date: {staged_date}",
f"staged_by: {staged_by}",
f"target_path: {target_path}",
]
if modifies:
fm_lines.append(f"modifies: {modifies}")
fm_lines.append("compilation_notes: test note")
fm_lines.append("last_verified: 2026-04-10")
fm_lines.append("---")
path.write_text("\n".join(fm_lines) + "\n" + body)
return path
# ---------------------------------------------------------------------------
# Module fixtures — each loads the corresponding script as a module
# ---------------------------------------------------------------------------
@pytest.fixture
def wiki_lib(tmp_wiki: Path) -> Any:
"""Load wiki_lib fresh against the tmp_wiki directory."""
return _load_script_module("wiki_lib", SCRIPTS_DIR / "wiki_lib.py")
@pytest.fixture
def wiki_hygiene(tmp_wiki: Path) -> Any:
"""Load wiki-hygiene.py fresh. wiki_lib must be loaded first for its imports."""
_load_script_module("wiki_lib", SCRIPTS_DIR / "wiki_lib.py")
return _load_script_module("wiki_hygiene", SCRIPTS_DIR / "wiki-hygiene.py")
@pytest.fixture
def wiki_staging(tmp_wiki: Path) -> Any:
_load_script_module("wiki_lib", SCRIPTS_DIR / "wiki_lib.py")
return _load_script_module("wiki_staging", SCRIPTS_DIR / "wiki-staging.py")
@pytest.fixture
def wiki_harvest(tmp_wiki: Path) -> Any:
_load_script_module("wiki_lib", SCRIPTS_DIR / "wiki_lib.py")
return _load_script_module("wiki_harvest", SCRIPTS_DIR / "wiki-harvest.py")
# ---------------------------------------------------------------------------
# Subprocess helper — runs a script as if from the CLI, with WIKI_DIR set
# ---------------------------------------------------------------------------
@pytest.fixture
def run_script(tmp_wiki: Path):
"""Return a function that runs a script via subprocess with WIKI_DIR set."""
import subprocess
def _run(script_rel: str, *args: str, timeout: int = 60) -> subprocess.CompletedProcess:
script = SCRIPTS_DIR / script_rel
if script.suffix == ".py":
cmd = ["python3", str(script), *args]
else:
cmd = ["bash", str(script), *args]
env = os.environ.copy()
env["WIKI_DIR"] = str(tmp_wiki)
return subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=timeout,
env=env,
)
return _run

9
tests/pytest.ini Normal file
View File

@@ -0,0 +1,9 @@
[pytest]
testpaths = .
python_files = test_*.py
python_classes = Test*
python_functions = test_*
addopts = -ra --strict-markers --tb=short
markers =
slow: tests that take more than 1 second
network: tests that hit the network (skipped by default)

31
tests/run.sh Executable file
View File

@@ -0,0 +1,31 @@
#!/usr/bin/env bash
set -euo pipefail
# run.sh — Convenience wrapper for running the wiki pipeline test suite.
#
# Usage:
# bash tests/run.sh # Run the full suite
# bash tests/run.sh -v # Verbose output
# bash tests/run.sh test_wiki_lib # Run one file
# bash tests/run.sh -k "parse" # Run tests matching a pattern
#
# All arguments are passed through to pytest.
TESTS_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "${TESTS_DIR}"
# Verify pytest is available
if ! python3 -c "import pytest" 2>/dev/null; then
echo "pytest not installed. Install with: pip install --user pytest"
exit 2
fi
# Clear any previous test artifacts
rm -rf .pytest_cache 2>/dev/null || true
# Default args: quiet with colored output
if [[ $# -eq 0 ]]; then
exec python3 -m pytest --tb=short
else
exec python3 -m pytest "$@"
fi

View File

@@ -0,0 +1,121 @@
"""Smoke + integration tests for the conversation mining pipeline.
These scripts interact with external systems (Claude Code sessions dir,
claude CLI), so tests focus on CLI parsing, dry-run behavior, and error
handling rather than exercising the full extraction/summarization path.
"""
from __future__ import annotations
import json
from pathlib import Path
import pytest
# ---------------------------------------------------------------------------
# extract-sessions.py
# ---------------------------------------------------------------------------
class TestExtractSessions:
def test_help_exits_clean(self, run_script) -> None:
result = run_script("extract-sessions.py", "--help")
assert result.returncode == 0
assert "--project" in result.stdout
assert "--dry-run" in result.stdout
def test_dry_run_with_empty_sessions_dir(
self, run_script, tmp_wiki: Path, tmp_path: Path, monkeypatch
) -> None:
# Point CLAUDE_PROJECTS_DIR at an empty tmp dir via env (not currently
# supported — script reads ~/.claude/projects directly). Instead, use
# --project with a code that has no sessions to verify clean exit.
result = run_script("extract-sessions.py", "--dry-run", "--project", "nonexistent")
assert result.returncode == 0
def test_rejects_unknown_flag(self, run_script) -> None:
result = run_script("extract-sessions.py", "--bogus-flag")
assert result.returncode != 0
assert "error" in result.stderr.lower() or "unrecognized" in result.stderr.lower()
# ---------------------------------------------------------------------------
# summarize-conversations.py
# ---------------------------------------------------------------------------
class TestSummarizeConversations:
def test_help_exits_clean(self, run_script) -> None:
result = run_script("summarize-conversations.py", "--help")
assert result.returncode == 0
assert "--claude" in result.stdout
assert "--dry-run" in result.stdout
assert "--project" in result.stdout
def test_dry_run_empty_conversations(
self, run_script, tmp_wiki: Path
) -> None:
result = run_script("summarize-conversations.py", "--claude", "--dry-run")
assert result.returncode == 0
def test_dry_run_with_extracted_conversation(
self, run_script, tmp_wiki: Path
) -> None:
from conftest import make_conversation
make_conversation(
tmp_wiki,
"general",
"2026-04-10-abc.md",
status="extracted", # Not yet summarized
messages=50,
)
result = run_script("summarize-conversations.py", "--claude", "--dry-run")
assert result.returncode == 0
# Should mention the file or show it would be processed
assert "2026-04-10-abc.md" in result.stdout or "1 conversation" in result.stdout
# ---------------------------------------------------------------------------
# update-conversation-index.py
# ---------------------------------------------------------------------------
class TestUpdateConversationIndex:
def test_help_exits_clean(self, run_script) -> None:
result = run_script("update-conversation-index.py", "--help")
assert result.returncode == 0
def test_runs_on_empty_conversations_dir(
self, run_script, tmp_wiki: Path
) -> None:
result = run_script("update-conversation-index.py")
# Should not crash even with no conversations
assert result.returncode == 0
def test_builds_index_from_conversations(
self, run_script, tmp_wiki: Path
) -> None:
from conftest import make_conversation
make_conversation(
tmp_wiki,
"general",
"2026-04-10-one.md",
status="summarized",
)
make_conversation(
tmp_wiki,
"general",
"2026-04-11-two.md",
status="summarized",
)
result = run_script("update-conversation-index.py")
assert result.returncode == 0
idx = tmp_wiki / "conversations" / "index.md"
assert idx.exists()
text = idx.read_text()
assert "2026-04-10-one.md" in text or "one.md" in text
assert "2026-04-11-two.md" in text or "two.md" in text

209
tests/test_shell_scripts.py Normal file
View File

@@ -0,0 +1,209 @@
"""Smoke tests for the bash scripts.
Bash scripts are harder to unit-test in isolation — these tests verify
CLI parsing, help text, and dry-run/safe flags work correctly and that
scripts exit cleanly in all the no-op paths.
Cross-platform note: tests invoke scripts via `bash` explicitly, so they
work on both macOS (default /bin/bash) and Linux/WSL. They avoid anything
that requires external state (network, git, LLM).
"""
from __future__ import annotations
import os
import subprocess
from pathlib import Path
from typing import Any
import pytest
from conftest import make_conversation, make_page, make_staging_page
# ---------------------------------------------------------------------------
# wiki-maintain.sh
# ---------------------------------------------------------------------------
class TestWikiMaintainSh:
def test_help_flag(self, run_script) -> None:
result = run_script("wiki-maintain.sh", "--help")
assert result.returncode == 0
assert "Usage:" in result.stdout or "usage:" in result.stdout.lower()
assert "--full" in result.stdout
assert "--harvest-only" in result.stdout
assert "--hygiene-only" in result.stdout
def test_rejects_unknown_flag(self, run_script) -> None:
result = run_script("wiki-maintain.sh", "--bogus")
assert result.returncode != 0
assert "Unknown option" in result.stderr
def test_harvest_only_and_hygiene_only_conflict(self, run_script) -> None:
result = run_script(
"wiki-maintain.sh", "--harvest-only", "--hygiene-only"
)
assert result.returncode != 0
assert "mutually exclusive" in result.stderr
def test_hygiene_only_dry_run_completes(
self, run_script, tmp_wiki: Path
) -> None:
make_page(tmp_wiki, "patterns/one.md")
result = run_script(
"wiki-maintain.sh", "--hygiene-only", "--dry-run", "--no-reindex"
)
assert result.returncode == 0
assert "Phase 2: Hygiene checks" in result.stdout
assert "finished" in result.stdout
def test_phase_1_skipped_in_hygiene_only(
self, run_script, tmp_wiki: Path
) -> None:
result = run_script(
"wiki-maintain.sh", "--hygiene-only", "--dry-run", "--no-reindex"
)
assert result.returncode == 0
assert "Phase 1: URL harvesting (skipped)" in result.stdout
def test_phase_3_skipped_in_dry_run(
self, run_script, tmp_wiki: Path
) -> None:
make_page(tmp_wiki, "patterns/one.md")
result = run_script(
"wiki-maintain.sh", "--hygiene-only", "--dry-run"
)
assert "Phase 3: qmd reindex (skipped)" in result.stdout
def test_harvest_only_dry_run_completes(
self, run_script, tmp_wiki: Path
) -> None:
# Add a summarized conversation so harvest has something to scan
make_conversation(
tmp_wiki,
"test",
"2026-04-10-test.md",
status="summarized",
body="See https://docs.python.org/3/library/os.html for details.\n",
)
result = run_script(
"wiki-maintain.sh",
"--harvest-only",
"--dry-run",
"--no-compile",
"--no-reindex",
)
assert result.returncode == 0
assert "Phase 2: Hygiene checks (skipped)" in result.stdout
# ---------------------------------------------------------------------------
# wiki-sync.sh
# ---------------------------------------------------------------------------
class TestWikiSyncSh:
def test_status_on_non_git_dir_exits_cleanly(self, run_script) -> None:
"""wiki-sync.sh --status against a non-git dir should fail gracefully.
The tmp_wiki fixture is not a git repo, so git commands will fail.
The script should report the problem without hanging or leaking stack
traces. Any exit code is acceptable as long as it exits in reasonable
time and prints something useful to stdout/stderr.
"""
result = run_script("wiki-sync.sh", "--status", timeout=30)
# Should have produced some output and exited (not hung)
assert result.stdout or result.stderr
assert "Wiki Sync Status" in result.stdout or "not a git" in result.stderr.lower()
# ---------------------------------------------------------------------------
# mine-conversations.sh
# ---------------------------------------------------------------------------
class TestMineConversationsSh:
def test_extract_only_dry_run(self, run_script, tmp_wiki: Path) -> None:
"""mine-conversations.sh --extract-only --dry-run should complete without LLM."""
result = run_script(
"mine-conversations.sh", "--extract-only", "--dry-run", timeout=30
)
assert result.returncode == 0
def test_rejects_unknown_flag(self, run_script) -> None:
result = run_script("mine-conversations.sh", "--bogus-flag")
assert result.returncode != 0
# ---------------------------------------------------------------------------
# Cross-platform sanity — scripts use portable bash syntax
# ---------------------------------------------------------------------------
class TestBashPortability:
"""Verify scripts don't use bashisms that break on macOS /bin/bash 3.2."""
@pytest.mark.parametrize(
"script",
["wiki-maintain.sh", "mine-conversations.sh", "wiki-sync.sh"],
)
def test_shebang_is_env_bash(self, script: str) -> None:
"""All shell scripts should use `#!/usr/bin/env bash` for portability."""
path = Path(__file__).parent.parent / "scripts" / script
first_line = path.read_text().splitlines()[0]
assert first_line == "#!/usr/bin/env bash", (
f"{script} has shebang {first_line!r}, expected #!/usr/bin/env bash"
)
@pytest.mark.parametrize(
"script",
["wiki-maintain.sh", "mine-conversations.sh", "wiki-sync.sh"],
)
def test_uses_strict_mode(self, script: str) -> None:
"""All shell scripts should use `set -euo pipefail` for safe defaults."""
path = Path(__file__).parent.parent / "scripts" / script
text = path.read_text()
assert "set -euo pipefail" in text, f"{script} missing strict mode"
@pytest.mark.parametrize(
"script",
["wiki-maintain.sh", "mine-conversations.sh", "wiki-sync.sh"],
)
def test_bash_syntax_check(self, script: str) -> None:
"""bash -n does a syntax-only parse and catches obvious errors."""
path = Path(__file__).parent.parent / "scripts" / script
result = subprocess.run(
["bash", "-n", str(path)],
capture_output=True,
text=True,
timeout=10,
)
assert result.returncode == 0, f"{script} has bash syntax errors: {result.stderr}"
# ---------------------------------------------------------------------------
# Python script syntax check (smoke)
# ---------------------------------------------------------------------------
class TestPythonSyntax:
@pytest.mark.parametrize(
"script",
[
"wiki_lib.py",
"wiki-harvest.py",
"wiki-staging.py",
"wiki-hygiene.py",
"extract-sessions.py",
"summarize-conversations.py",
"update-conversation-index.py",
],
)
def test_py_compile(self, script: str) -> None:
"""py_compile catches syntax errors without executing the module."""
import py_compile
path = Path(__file__).parent.parent / "scripts" / script
# py_compile.compile raises on error; success returns the .pyc path
py_compile.compile(str(path), doraise=True)

323
tests/test_wiki_harvest.py Normal file
View File

@@ -0,0 +1,323 @@
"""Unit + integration tests for scripts/wiki-harvest.py."""
from __future__ import annotations
import json
from pathlib import Path
from typing import Any
from unittest.mock import patch
import pytest
from conftest import make_conversation
# ---------------------------------------------------------------------------
# URL classification
# ---------------------------------------------------------------------------
class TestClassifyUrl:
def test_regular_docs_site_harvest(self, wiki_harvest: Any) -> None:
assert wiki_harvest.classify_url("https://docs.python.org/3/library/os.html") == "harvest"
assert wiki_harvest.classify_url("https://blog.example.com/post") == "harvest"
def test_github_issue_is_check(self, wiki_harvest: Any) -> None:
assert wiki_harvest.classify_url("https://github.com/foo/bar/issues/42") == "check"
def test_github_pr_is_check(self, wiki_harvest: Any) -> None:
assert wiki_harvest.classify_url("https://github.com/foo/bar/pull/99") == "check"
def test_stackoverflow_is_check(self, wiki_harvest: Any) -> None:
assert wiki_harvest.classify_url(
"https://stackoverflow.com/questions/12345/title"
) == "check"
def test_localhost_skip(self, wiki_harvest: Any) -> None:
assert wiki_harvest.classify_url("http://localhost:3000/path") == "skip"
assert wiki_harvest.classify_url("http://localhost/foo") == "skip"
def test_private_ip_skip(self, wiki_harvest: Any) -> None:
assert wiki_harvest.classify_url("http://10.0.0.1/api") == "skip"
assert wiki_harvest.classify_url("http://172.30.224.1:8080/v1") == "skip"
assert wiki_harvest.classify_url("http://192.168.1.1/test") == "skip"
assert wiki_harvest.classify_url("http://127.0.0.1:8080/foo") == "skip"
def test_local_and_internal_tld_skip(self, wiki_harvest: Any) -> None:
# `.local` and `.internal` are baked into SKIP_DOMAIN_PATTERNS
assert wiki_harvest.classify_url("https://router.local/admin") == "skip"
assert wiki_harvest.classify_url("https://service.internal/api") == "skip"
def test_custom_skip_pattern_runtime(self, wiki_harvest: Any) -> None:
# Users can append their own patterns at runtime — verify the hook works
wiki_harvest.SKIP_DOMAIN_PATTERNS.append(r"\.mycompany\.com$")
try:
assert wiki_harvest.classify_url("https://git.mycompany.com/foo") == "skip"
assert wiki_harvest.classify_url("https://docs.mycompany.com/api") == "skip"
finally:
wiki_harvest.SKIP_DOMAIN_PATTERNS.pop()
def test_atlassian_skip(self, wiki_harvest: Any) -> None:
assert wiki_harvest.classify_url("https://foo.atlassian.net/browse/BAR-1") == "skip"
def test_slack_skip(self, wiki_harvest: Any) -> None:
assert wiki_harvest.classify_url("https://myteam.slack.com/archives/C123") == "skip"
def test_github_repo_root_is_harvest(self, wiki_harvest: Any) -> None:
# Not an issue/pr/discussion — just a repo root, might contain docs
assert wiki_harvest.classify_url("https://github.com/foo/bar") == "harvest"
def test_invalid_url_skip(self, wiki_harvest: Any) -> None:
assert wiki_harvest.classify_url("not a url") == "skip"
# ---------------------------------------------------------------------------
# Private IP detection
# ---------------------------------------------------------------------------
class TestPrivateIp:
def test_10_range(self, wiki_harvest: Any) -> None:
assert wiki_harvest._is_private_ip("10.0.0.1") is True
assert wiki_harvest._is_private_ip("10.255.255.255") is True
def test_172_16_to_31_range(self, wiki_harvest: Any) -> None:
assert wiki_harvest._is_private_ip("172.16.0.1") is True
assert wiki_harvest._is_private_ip("172.31.255.255") is True
assert wiki_harvest._is_private_ip("172.15.0.1") is False
assert wiki_harvest._is_private_ip("172.32.0.1") is False
def test_192_168_range(self, wiki_harvest: Any) -> None:
assert wiki_harvest._is_private_ip("192.168.0.1") is True
assert wiki_harvest._is_private_ip("192.167.0.1") is False
def test_loopback(self, wiki_harvest: Any) -> None:
assert wiki_harvest._is_private_ip("127.0.0.1") is True
def test_public_ip(self, wiki_harvest: Any) -> None:
assert wiki_harvest._is_private_ip("8.8.8.8") is False
def test_hostname_not_ip(self, wiki_harvest: Any) -> None:
assert wiki_harvest._is_private_ip("example.com") is False
# ---------------------------------------------------------------------------
# URL extraction from files
# ---------------------------------------------------------------------------
class TestExtractUrls:
def test_finds_urls_in_markdown(
self, wiki_harvest: Any, tmp_wiki: Path
) -> None:
path = make_conversation(
tmp_wiki,
"test",
"test.md",
body="See https://docs.python.org/3/library/os.html for details.\n"
"Also https://fastapi.tiangolo.com/tutorial/.\n",
)
urls = wiki_harvest.extract_urls_from_file(path)
assert "https://docs.python.org/3/library/os.html" in urls
assert "https://fastapi.tiangolo.com/tutorial/" in urls
def test_filters_asset_extensions(
self, wiki_harvest: Any, tmp_wiki: Path
) -> None:
path = make_conversation(
tmp_wiki,
"test",
"assets.md",
body=(
"Real: https://example.com/docs/article.html\n"
"Image: https://example.com/logo.png\n"
"Script: https://cdn.example.com/lib.js\n"
"Font: https://fonts.example.com/face.woff2\n"
),
)
urls = wiki_harvest.extract_urls_from_file(path)
assert "https://example.com/docs/article.html" in urls
assert not any(u.endswith(".png") for u in urls)
assert not any(u.endswith(".js") for u in urls)
assert not any(u.endswith(".woff2") for u in urls)
def test_strips_trailing_punctuation(
self, wiki_harvest: Any, tmp_wiki: Path
) -> None:
path = make_conversation(
tmp_wiki,
"test",
"punct.md",
body="See https://example.com/foo. Also https://example.com/bar, and more.\n",
)
urls = wiki_harvest.extract_urls_from_file(path)
assert "https://example.com/foo" in urls
assert "https://example.com/bar" in urls
def test_deduplicates_within_file(
self, wiki_harvest: Any, tmp_wiki: Path
) -> None:
path = make_conversation(
tmp_wiki,
"test",
"dup.md",
body=(
"First mention: https://example.com/same\n"
"Second mention: https://example.com/same\n"
),
)
urls = wiki_harvest.extract_urls_from_file(path)
assert urls.count("https://example.com/same") == 1
def test_returns_empty_for_missing_file(
self, wiki_harvest: Any, tmp_wiki: Path
) -> None:
assert wiki_harvest.extract_urls_from_file(tmp_wiki / "nope.md") == []
def test_filters_short_urls(
self, wiki_harvest: Any, tmp_wiki: Path
) -> None:
# Less than 20 chars are skipped
path = make_conversation(
tmp_wiki,
"test",
"short.md",
body="tiny http://a.b/ and https://example.com/long-path\n",
)
urls = wiki_harvest.extract_urls_from_file(path)
assert "http://a.b/" not in urls
assert "https://example.com/long-path" in urls
# ---------------------------------------------------------------------------
# Raw filename derivation
# ---------------------------------------------------------------------------
class TestRawFilename:
def test_basic_url(self, wiki_harvest: Any) -> None:
name = wiki_harvest.raw_filename_for_url("https://docs.docker.com/build/multi-stage/")
assert name.startswith("docs-docker-com-")
assert "build" in name and "multi-stage" in name
assert name.endswith(".md")
def test_strips_www(self, wiki_harvest: Any) -> None:
name = wiki_harvest.raw_filename_for_url("https://www.example.com/foo")
assert "www" not in name
def test_root_url_uses_index(self, wiki_harvest: Any) -> None:
name = wiki_harvest.raw_filename_for_url("https://example.com/")
assert name == "example-com-index.md"
def test_long_paths_truncated(self, wiki_harvest: Any) -> None:
long_url = "https://example.com/" + "a-very-long-segment/" * 20
name = wiki_harvest.raw_filename_for_url(long_url)
assert len(name) < 200
# ---------------------------------------------------------------------------
# Content validation
# ---------------------------------------------------------------------------
class TestValidateContent:
def test_accepts_clean_markdown(self, wiki_harvest: Any) -> None:
content = "# Title\n\n" + ("A clean paragraph of markdown content. " * 5)
assert wiki_harvest.validate_content(content) is True
def test_rejects_empty(self, wiki_harvest: Any) -> None:
assert wiki_harvest.validate_content("") is False
def test_rejects_too_short(self, wiki_harvest: Any) -> None:
assert wiki_harvest.validate_content("# Short") is False
def test_rejects_html_leak(self, wiki_harvest: Any) -> None:
content = "# Title\n\n<div class='nav'>Navigation</div>\n" + "content " * 30
assert wiki_harvest.validate_content(content) is False
def test_rejects_script_tag(self, wiki_harvest: Any) -> None:
content = "# Title\n\n<script>alert()</script>\n" + "content " * 30
assert wiki_harvest.validate_content(content) is False
# ---------------------------------------------------------------------------
# State management
# ---------------------------------------------------------------------------
class TestStateManagement:
def test_load_returns_defaults_when_file_empty(
self, wiki_harvest: Any, tmp_wiki: Path
) -> None:
(tmp_wiki / ".harvest-state.json").write_text("{}")
state = wiki_harvest.load_state()
assert "harvested_urls" in state
assert "skipped_urls" in state
def test_save_and_reload(
self, wiki_harvest: Any, tmp_wiki: Path
) -> None:
state = wiki_harvest.load_state()
state["harvested_urls"]["https://example.com"] = {
"first_seen": "2026-04-12",
"seen_in": ["conversations/mc/foo.md"],
"raw_file": "raw/harvested/example.md",
"status": "raw",
"fetch_method": "trafilatura",
}
wiki_harvest.save_state(state)
reloaded = wiki_harvest.load_state()
assert "https://example.com" in reloaded["harvested_urls"]
assert reloaded["last_run"] is not None
# ---------------------------------------------------------------------------
# Raw file writer
# ---------------------------------------------------------------------------
class TestWriteRawFile:
def test_writes_with_frontmatter(
self, wiki_harvest: Any, tmp_wiki: Path
) -> None:
conv = make_conversation(tmp_wiki, "test", "source.md")
raw_path = wiki_harvest.write_raw_file(
"https://example.com/article",
"# Article\n\nClean content.\n",
"trafilatura",
conv,
)
assert raw_path.exists()
text = raw_path.read_text()
assert "source_url: https://example.com/article" in text
assert "fetch_method: trafilatura" in text
assert "content_hash: sha256:" in text
assert "discovered_in: conversations/test/source.md" in text
# ---------------------------------------------------------------------------
# Dry-run CLI smoke test (no actual fetches)
# ---------------------------------------------------------------------------
class TestHarvestCli:
def test_dry_run_no_network_calls(
self, run_script, tmp_wiki: Path
) -> None:
make_conversation(
tmp_wiki,
"test",
"test.md",
body="See https://docs.python.org/3/ and https://github.com/foo/bar/issues/1.\n",
)
result = run_script("wiki-harvest.py", "--dry-run")
assert result.returncode == 0
# Dry-run should classify without fetching
assert "would-harvest" in result.stdout or "Summary" in result.stdout
def test_help_flag(self, run_script) -> None:
result = run_script("wiki-harvest.py", "--help")
assert result.returncode == 0
assert "--dry-run" in result.stdout
assert "--no-compile" in result.stdout

616
tests/test_wiki_hygiene.py Normal file
View File

@@ -0,0 +1,616 @@
"""Integration tests for scripts/wiki-hygiene.py.
Uses the tmp_wiki fixture so tests never touch the real wiki.
"""
from __future__ import annotations
from datetime import date, timedelta
from pathlib import Path
from typing import Any
import pytest
from conftest import make_conversation, make_page, make_staging_page
# ---------------------------------------------------------------------------
# Backfill last_verified
# ---------------------------------------------------------------------------
class TestBackfill:
def test_sets_last_verified_from_last_compiled(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
path = make_page(tmp_wiki, "patterns/foo.md", last_compiled="2026-01-15")
# Strip last_verified from the fixture-built file
text = path.read_text()
text = text.replace("last_verified: 2026-04-01\n", "")
path.write_text(text)
changes = wiki_hygiene.backfill_last_verified()
assert len(changes) == 1
assert changes[0][1] == "last_compiled"
reparsed = wiki_hygiene.parse_page(path)
assert reparsed.frontmatter["last_verified"] == "2026-01-15"
def test_skips_pages_already_verified(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
make_page(tmp_wiki, "patterns/done.md", last_verified="2026-04-01")
changes = wiki_hygiene.backfill_last_verified()
assert changes == []
def test_dry_run_does_not_write(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
path = make_page(tmp_wiki, "patterns/foo.md", last_compiled="2026-01-15")
text = path.read_text().replace("last_verified: 2026-04-01\n", "")
path.write_text(text)
changes = wiki_hygiene.backfill_last_verified(dry_run=True)
assert len(changes) == 1
reparsed = wiki_hygiene.parse_page(path)
assert "last_verified" not in reparsed.frontmatter
# ---------------------------------------------------------------------------
# Confidence decay math
# ---------------------------------------------------------------------------
class TestConfidenceDecay:
def test_recent_page_unchanged(self, wiki_hygiene: Any) -> None:
recent = wiki_hygiene.today() - timedelta(days=30)
assert wiki_hygiene.expected_confidence("high", recent, False) == "high"
def test_six_months_decays_high_to_medium(self, wiki_hygiene: Any) -> None:
old = wiki_hygiene.today() - timedelta(days=200)
assert wiki_hygiene.expected_confidence("high", old, False) == "medium"
def test_nine_months_decays_medium_to_low(self, wiki_hygiene: Any) -> None:
old = wiki_hygiene.today() - timedelta(days=280)
assert wiki_hygiene.expected_confidence("medium", old, False) == "low"
def test_twelve_months_decays_to_stale(self, wiki_hygiene: Any) -> None:
old = wiki_hygiene.today() - timedelta(days=400)
assert wiki_hygiene.expected_confidence("high", old, False) == "stale"
def test_superseded_is_always_stale(self, wiki_hygiene: Any) -> None:
recent = wiki_hygiene.today() - timedelta(days=1)
assert wiki_hygiene.expected_confidence("high", recent, True) == "stale"
def test_none_date_leaves_confidence_alone(self, wiki_hygiene: Any) -> None:
assert wiki_hygiene.expected_confidence("medium", None, False) == "medium"
def test_bump_confidence_ladder(self, wiki_hygiene: Any) -> None:
assert wiki_hygiene.bump_confidence("stale") == "low"
assert wiki_hygiene.bump_confidence("low") == "medium"
assert wiki_hygiene.bump_confidence("medium") == "high"
assert wiki_hygiene.bump_confidence("high") == "high"
# ---------------------------------------------------------------------------
# Frontmatter repair
# ---------------------------------------------------------------------------
class TestFrontmatterRepair:
def test_adds_missing_confidence(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
path = tmp_wiki / "patterns" / "no-conf.md"
path.write_text(
"---\ntitle: No Confidence\ntype: pattern\n"
"last_compiled: 2026-04-01\nlast_verified: 2026-04-01\n---\n"
"# Body\n\nSubstantive content here for testing purposes.\n"
)
changes = wiki_hygiene.repair_frontmatter()
assert any("confidence" in fields for _, fields in changes)
reparsed = wiki_hygiene.parse_page(path)
assert reparsed.frontmatter["confidence"] == "medium"
def test_fixes_invalid_confidence(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
path = make_page(tmp_wiki, "patterns/bad-conf.md", confidence="wat")
changes = wiki_hygiene.repair_frontmatter()
assert any(p == path for p, _ in changes)
reparsed = wiki_hygiene.parse_page(path)
assert reparsed.frontmatter["confidence"] == "medium"
def test_leaves_valid_pages_alone(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
make_page(tmp_wiki, "patterns/good.md")
changes = wiki_hygiene.repair_frontmatter()
assert changes == []
# ---------------------------------------------------------------------------
# Archive and restore round-trip
# ---------------------------------------------------------------------------
class TestArchiveRestore:
def test_archive_moves_file_and_updates_frontmatter(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
path = make_page(tmp_wiki, "patterns/doomed.md")
page = wiki_hygiene.parse_page(path)
wiki_hygiene.archive_page(page, "test archive")
assert not path.exists()
archived = tmp_wiki / "archive" / "patterns" / "doomed.md"
assert archived.exists()
reparsed = wiki_hygiene.parse_page(archived)
assert reparsed.frontmatter["archived_reason"] == "test archive"
assert reparsed.frontmatter["original_path"] == "patterns/doomed.md"
assert reparsed.frontmatter["confidence"] == "stale"
def test_restore_reverses_archive(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
original = make_page(tmp_wiki, "patterns/zombie.md")
page = wiki_hygiene.parse_page(original)
wiki_hygiene.archive_page(page, "test")
archived = tmp_wiki / "archive" / "patterns" / "zombie.md"
archived_page = wiki_hygiene.parse_page(archived)
wiki_hygiene.restore_page(archived_page)
assert original.exists()
assert not archived.exists()
reparsed = wiki_hygiene.parse_page(original)
assert reparsed.frontmatter["confidence"] == "medium"
assert "archived_date" not in reparsed.frontmatter
assert "archived_reason" not in reparsed.frontmatter
assert "original_path" not in reparsed.frontmatter
def test_archive_rejects_non_live_pages(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
# Page outside the live content dirs — should refuse to archive
weird = tmp_wiki / "raw" / "weird.md"
weird.parent.mkdir(parents=True, exist_ok=True)
weird.write_text("---\ntitle: Weird\n---\nBody\n")
page = wiki_hygiene.parse_page(weird)
result = wiki_hygiene.archive_page(page, "test")
assert result is None
def test_archive_dry_run_does_not_move(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
path = make_page(tmp_wiki, "patterns/safe.md")
page = wiki_hygiene.parse_page(path)
wiki_hygiene.archive_page(page, "test", dry_run=True)
assert path.exists()
assert not (tmp_wiki / "archive" / "patterns" / "safe.md").exists()
# ---------------------------------------------------------------------------
# Orphan detection
# ---------------------------------------------------------------------------
class TestOrphanDetection:
def test_finds_orphan_page(self, wiki_hygiene: Any, tmp_wiki: Path) -> None:
make_page(tmp_wiki, "patterns/lonely.md")
orphans = wiki_hygiene.find_orphan_pages()
assert len(orphans) == 1
assert orphans[0].path.stem == "lonely"
def test_page_referenced_in_index_is_not_orphan(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
make_page(tmp_wiki, "patterns/linked.md")
idx = tmp_wiki / "index.md"
idx.write_text(idx.read_text() + "- [Linked](patterns/linked.md) — desc\n")
orphans = wiki_hygiene.find_orphan_pages()
assert not any(p.path.stem == "linked" for p in orphans)
def test_page_referenced_in_related_is_not_orphan(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
make_page(tmp_wiki, "patterns/referenced.md")
make_page(
tmp_wiki,
"patterns/referencer.md",
related=["patterns/referenced.md"],
)
orphans = wiki_hygiene.find_orphan_pages()
stems = {p.path.stem for p in orphans}
assert "referenced" not in stems
def test_fix_orphan_adds_to_index(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
path = make_page(tmp_wiki, "patterns/orphan.md", title="Orphan Test")
page = wiki_hygiene.parse_page(path)
wiki_hygiene.fix_orphan_page(page)
idx_text = (tmp_wiki / "index.md").read_text()
assert "patterns/orphan.md" in idx_text
# ---------------------------------------------------------------------------
# Broken cross-references
# ---------------------------------------------------------------------------
class TestBrokenCrossRefs:
def test_detects_broken_link(self, wiki_hygiene: Any, tmp_wiki: Path) -> None:
make_page(
tmp_wiki,
"patterns/source.md",
body="See [nonexistent](patterns/does-not-exist.md) for details.\n",
)
broken = wiki_hygiene.find_broken_cross_refs()
assert len(broken) == 1
target, bad, suggested = broken[0]
assert bad == "patterns/does-not-exist.md"
def test_fuzzy_match_finds_near_miss(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
make_page(tmp_wiki, "patterns/health-endpoint.md")
make_page(
tmp_wiki,
"patterns/source.md",
body="See [H](patterns/health-endpoints.md) — typo.\n",
)
broken = wiki_hygiene.find_broken_cross_refs()
assert len(broken) >= 1
_, bad, suggested = broken[0]
assert suggested == "patterns/health-endpoint.md"
def test_fix_broken_xref(self, wiki_hygiene: Any, tmp_wiki: Path) -> None:
make_page(tmp_wiki, "patterns/health-endpoint.md")
src = make_page(
tmp_wiki,
"patterns/source.md",
body="See [H](patterns/health-endpoints.md).\n",
)
broken = wiki_hygiene.find_broken_cross_refs()
for target, bad, suggested in broken:
wiki_hygiene.fix_broken_cross_ref(target, bad, suggested)
text = src.read_text()
assert "patterns/health-endpoints.md" not in text
assert "patterns/health-endpoint.md" in text
def test_archived_link_triggers_restore(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
# Page in archive, referenced by a live page
make_page(
tmp_wiki,
"archive/patterns/ghost.md",
confidence="stale",
extra_fm={
"archived_date": "2026-01-01",
"archived_reason": "test",
"original_path": "patterns/ghost.md",
},
)
make_page(
tmp_wiki,
"patterns/caller.md",
body="See [ghost](patterns/ghost.md).\n",
)
broken = wiki_hygiene.find_broken_cross_refs()
assert len(broken) >= 1
for target, bad, suggested in broken:
if suggested and suggested.startswith("__RESTORE__"):
wiki_hygiene.fix_broken_cross_ref(target, bad, suggested)
# After restore, ghost should be live again
assert (tmp_wiki / "patterns" / "ghost.md").exists()
# ---------------------------------------------------------------------------
# Index drift
# ---------------------------------------------------------------------------
class TestIndexDrift:
def test_finds_page_missing_from_index(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
make_page(tmp_wiki, "patterns/missing.md")
missing, stale = wiki_hygiene.find_index_drift()
assert "patterns/missing.md" in missing
assert stale == []
def test_finds_stale_index_entry(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
idx = tmp_wiki / "index.md"
idx.write_text(
idx.read_text()
+ "- [Ghost](patterns/ghost.md) — page that no longer exists\n"
)
missing, stale = wiki_hygiene.find_index_drift()
assert "patterns/ghost.md" in stale
def test_fix_adds_missing_and_removes_stale(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
make_page(tmp_wiki, "patterns/new.md")
idx = tmp_wiki / "index.md"
idx.write_text(
idx.read_text()
+ "- [Gone](patterns/gone.md) — deleted page\n"
)
missing, stale = wiki_hygiene.find_index_drift()
wiki_hygiene.fix_index_drift(missing, stale)
idx_text = idx.read_text()
assert "patterns/new.md" in idx_text
assert "patterns/gone.md" not in idx_text
# ---------------------------------------------------------------------------
# Empty stubs
# ---------------------------------------------------------------------------
class TestEmptyStubs:
def test_flags_small_body(self, wiki_hygiene: Any, tmp_wiki: Path) -> None:
make_page(tmp_wiki, "patterns/stub.md", body="# Stub\n\nShort.\n")
stubs = wiki_hygiene.find_empty_stubs()
assert len(stubs) == 1
assert stubs[0].path.stem == "stub"
def test_ignores_substantive_pages(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
body = "# Full\n\n" + ("This is substantive content. " * 20) + "\n"
make_page(tmp_wiki, "patterns/full.md", body=body)
stubs = wiki_hygiene.find_empty_stubs()
assert stubs == []
# ---------------------------------------------------------------------------
# Conversation refresh signals
# ---------------------------------------------------------------------------
class TestConversationRefreshSignals:
def test_picks_up_related_link(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
make_page(tmp_wiki, "patterns/hot.md", last_verified="2026-01-01")
make_conversation(
tmp_wiki,
"test",
"2026-04-11-abc.md",
date="2026-04-11",
related=["patterns/hot.md"],
)
refs = wiki_hygiene.scan_conversation_references()
assert "patterns/hot.md" in refs
assert refs["patterns/hot.md"] == date(2026, 4, 11)
def test_apply_refresh_updates_last_verified(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
path = make_page(tmp_wiki, "patterns/hot.md", last_verified="2026-01-01")
make_conversation(
tmp_wiki,
"test",
"2026-04-11-abc.md",
date="2026-04-11",
related=["patterns/hot.md"],
)
refs = wiki_hygiene.scan_conversation_references()
changes = wiki_hygiene.apply_refresh_signals(refs)
assert len(changes) == 1
reparsed = wiki_hygiene.parse_page(path)
assert reparsed.frontmatter["last_verified"] == "2026-04-11"
def test_bumps_low_confidence_to_medium(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
path = make_page(
tmp_wiki,
"patterns/reviving.md",
confidence="low",
last_verified="2026-01-01",
)
make_conversation(
tmp_wiki,
"test",
"2026-04-11-ref.md",
date="2026-04-11",
related=["patterns/reviving.md"],
)
refs = wiki_hygiene.scan_conversation_references()
wiki_hygiene.apply_refresh_signals(refs)
reparsed = wiki_hygiene.parse_page(path)
assert reparsed.frontmatter["confidence"] == "medium"
# ---------------------------------------------------------------------------
# Auto-restore
# ---------------------------------------------------------------------------
class TestAutoRestore:
def test_restores_page_referenced_in_conversation(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
# Archive a page
path = make_page(tmp_wiki, "patterns/returning.md")
page = wiki_hygiene.parse_page(path)
wiki_hygiene.archive_page(page, "aging out")
assert (tmp_wiki / "archive" / "patterns" / "returning.md").exists()
# Reference it in a conversation
make_conversation(
tmp_wiki,
"test",
"2026-04-12-ref.md",
related=["patterns/returning.md"],
)
# Auto-restore
restored = wiki_hygiene.auto_restore_archived()
assert len(restored) == 1
assert (tmp_wiki / "patterns" / "returning.md").exists()
assert not (tmp_wiki / "archive" / "patterns" / "returning.md").exists()
# ---------------------------------------------------------------------------
# Staging / archive index sync
# ---------------------------------------------------------------------------
class TestIndexSync:
def test_staging_sync_regenerates_index(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
make_staging_page(tmp_wiki, "patterns/pending.md")
changed = wiki_hygiene.sync_staging_index()
assert changed is True
text = (tmp_wiki / "staging" / "index.md").read_text()
assert "pending.md" in text
def test_staging_sync_idempotent(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
make_staging_page(tmp_wiki, "patterns/pending.md")
wiki_hygiene.sync_staging_index()
changed_second = wiki_hygiene.sync_staging_index()
assert changed_second is False
def test_archive_sync_regenerates_index(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
make_page(
tmp_wiki,
"archive/patterns/old.md",
confidence="stale",
extra_fm={
"archived_date": "2026-01-01",
"archived_reason": "test",
"original_path": "patterns/old.md",
},
)
changed = wiki_hygiene.sync_archive_index()
assert changed is True
text = (tmp_wiki / "archive" / "index.md").read_text()
assert "old" in text.lower()
# ---------------------------------------------------------------------------
# State drift detection
# ---------------------------------------------------------------------------
class TestStateDrift:
def test_detects_missing_raw_file(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
import json
state = {
"harvested_urls": {
"https://example.com": {
"raw_file": "raw/harvested/missing.md",
"wiki_pages": [],
}
}
}
(tmp_wiki / ".harvest-state.json").write_text(json.dumps(state))
issues = wiki_hygiene.find_state_drift()
assert any("missing.md" in i for i in issues)
def test_empty_state_has_no_drift(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
# Fixture already creates an empty .harvest-state.json
issues = wiki_hygiene.find_state_drift()
assert issues == []
# ---------------------------------------------------------------------------
# Hygiene state file
# ---------------------------------------------------------------------------
class TestHygieneState:
def test_load_returns_defaults_when_missing(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
state = wiki_hygiene.load_hygiene_state()
assert state["last_quick_run"] is None
assert state["pages_checked"] == {}
def test_save_and_reload(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
state = wiki_hygiene.load_hygiene_state()
state["last_quick_run"] = "2026-04-12T00:00:00Z"
wiki_hygiene.save_hygiene_state(state)
reloaded = wiki_hygiene.load_hygiene_state()
assert reloaded["last_quick_run"] == "2026-04-12T00:00:00Z"
def test_mark_page_checked_stores_hash(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
path = make_page(tmp_wiki, "patterns/tracked.md")
page = wiki_hygiene.parse_page(path)
state = wiki_hygiene.load_hygiene_state()
wiki_hygiene.mark_page_checked(state, page, "quick")
entry = state["pages_checked"]["patterns/tracked.md"]
assert entry["content_hash"].startswith("sha256:")
assert "last_checked_quick" in entry
def test_page_changed_since_detects_body_change(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
path = make_page(tmp_wiki, "patterns/mutable.md", body="# One\n\nOne body.\n")
page = wiki_hygiene.parse_page(path)
state = wiki_hygiene.load_hygiene_state()
wiki_hygiene.mark_page_checked(state, page, "quick")
assert not wiki_hygiene.page_changed_since(state, page, "quick")
# Mutate the body
path.write_text(path.read_text().replace("One body", "Two body"))
new_page = wiki_hygiene.parse_page(path)
assert wiki_hygiene.page_changed_since(state, new_page, "quick")
# ---------------------------------------------------------------------------
# Full quick-hygiene run end-to-end (dry-run, idempotent)
# ---------------------------------------------------------------------------
class TestRunQuickHygiene:
def test_empty_wiki_produces_empty_report(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
report = wiki_hygiene.run_quick_hygiene(dry_run=True)
assert report.backfilled == []
assert report.archived == []
def test_real_run_is_idempotent(
self, wiki_hygiene: Any, tmp_wiki: Path
) -> None:
make_page(tmp_wiki, "patterns/one.md")
make_page(tmp_wiki, "patterns/two.md")
report1 = wiki_hygiene.run_quick_hygiene()
# Second run should have 0 work
report2 = wiki_hygiene.run_quick_hygiene()
assert report2.backfilled == []
assert report2.decayed == []
assert report2.archived == []
assert report2.frontmatter_fixes == []

314
tests/test_wiki_lib.py Normal file
View File

@@ -0,0 +1,314 @@
"""Unit tests for scripts/wiki_lib.py — the shared frontmatter library."""
from __future__ import annotations
from datetime import date
from pathlib import Path
from typing import Any
import pytest
from conftest import make_page, make_staging_page
# ---------------------------------------------------------------------------
# parse_yaml_lite
# ---------------------------------------------------------------------------
class TestParseYamlLite:
def test_simple_key_value(self, wiki_lib: Any) -> None:
result = wiki_lib.parse_yaml_lite("title: Hello\ntype: pattern\n")
assert result == {"title": "Hello", "type": "pattern"}
def test_quoted_values_are_stripped(self, wiki_lib: Any) -> None:
result = wiki_lib.parse_yaml_lite('title: "Hello"\nother: \'World\'\n')
assert result["title"] == "Hello"
assert result["other"] == "World"
def test_inline_list(self, wiki_lib: Any) -> None:
result = wiki_lib.parse_yaml_lite("tags: [a, b, c]\n")
assert result["tags"] == ["a", "b", "c"]
def test_empty_inline_list(self, wiki_lib: Any) -> None:
result = wiki_lib.parse_yaml_lite("sources: []\n")
assert result["sources"] == []
def test_block_list(self, wiki_lib: Any) -> None:
yaml = "related:\n - foo.md\n - bar.md\n - baz.md\n"
result = wiki_lib.parse_yaml_lite(yaml)
assert result["related"] == ["foo.md", "bar.md", "baz.md"]
def test_mixed_keys(self, wiki_lib: Any) -> None:
yaml = (
"title: Mixed\n"
"type: pattern\n"
"related:\n"
" - one.md\n"
" - two.md\n"
"confidence: high\n"
)
result = wiki_lib.parse_yaml_lite(yaml)
assert result["title"] == "Mixed"
assert result["related"] == ["one.md", "two.md"]
assert result["confidence"] == "high"
def test_empty_value(self, wiki_lib: Any) -> None:
result = wiki_lib.parse_yaml_lite("empty: \n")
assert result["empty"] == ""
def test_comment_lines_ignored(self, wiki_lib: Any) -> None:
result = wiki_lib.parse_yaml_lite("# this is a comment\ntitle: X\n")
assert result == {"title": "X"}
def test_blank_lines_ignored(self, wiki_lib: Any) -> None:
result = wiki_lib.parse_yaml_lite("\ntitle: X\n\ntype: pattern\n\n")
assert result == {"title": "X", "type": "pattern"}
# ---------------------------------------------------------------------------
# parse_page
# ---------------------------------------------------------------------------
class TestParsePage:
def test_parses_valid_page(self, wiki_lib: Any, tmp_wiki: Path) -> None:
path = make_page(tmp_wiki, "patterns/foo.md", title="Foo", confidence="high")
page = wiki_lib.parse_page(path)
assert page is not None
assert page.frontmatter["title"] == "Foo"
assert page.frontmatter["confidence"] == "high"
assert "# Content" in page.body
def test_returns_none_without_frontmatter(
self, wiki_lib: Any, tmp_wiki: Path
) -> None:
path = tmp_wiki / "patterns" / "no-fm.md"
path.write_text("# Just a body\n\nNo frontmatter.\n")
assert wiki_lib.parse_page(path) is None
def test_returns_none_for_missing_file(self, wiki_lib: Any, tmp_wiki: Path) -> None:
assert wiki_lib.parse_page(tmp_wiki / "nonexistent.md") is None
def test_returns_none_for_truncated_frontmatter(
self, wiki_lib: Any, tmp_wiki: Path
) -> None:
path = tmp_wiki / "patterns" / "broken.md"
path.write_text("---\ntitle: Broken\n# never closed\n")
assert wiki_lib.parse_page(path) is None
def test_preserves_body_exactly(self, wiki_lib: Any, tmp_wiki: Path) -> None:
body = "# Heading\n\nLine 1\nLine 2\n\n## Sub\n\nMore.\n"
path = make_page(tmp_wiki, "patterns/body.md", body=body)
page = wiki_lib.parse_page(path)
assert page.body == body
# ---------------------------------------------------------------------------
# serialize_frontmatter
# ---------------------------------------------------------------------------
class TestSerializeFrontmatter:
def test_preferred_key_order(self, wiki_lib: Any) -> None:
fm = {
"related": ["a.md"],
"sources": ["raw/x.md"],
"title": "T",
"confidence": "high",
"type": "pattern",
}
yaml = wiki_lib.serialize_frontmatter(fm)
lines = yaml.split("\n")
# title/type/confidence should come before sources/related
assert lines[0].startswith("title:")
assert lines[1].startswith("type:")
assert lines[2].startswith("confidence:")
assert "sources:" in yaml
assert "related:" in yaml
# sources must come before related (both are in PREFERRED_KEY_ORDER)
assert yaml.index("sources:") < yaml.index("related:")
def test_list_formatted_as_block(self, wiki_lib: Any) -> None:
fm = {"title": "T", "related": ["one.md", "two.md"]}
yaml = wiki_lib.serialize_frontmatter(fm)
assert "related:\n - one.md\n - two.md" in yaml
def test_empty_list(self, wiki_lib: Any) -> None:
fm = {"title": "T", "sources": []}
yaml = wiki_lib.serialize_frontmatter(fm)
assert "sources: []" in yaml
def test_unknown_keys_appear_alphabetically_at_end(self, wiki_lib: Any) -> None:
fm = {"title": "T", "type": "pattern", "zoo": "z", "alpha": "a"}
yaml = wiki_lib.serialize_frontmatter(fm)
# alpha should come before zoo (alphabetical)
assert yaml.index("alpha:") < yaml.index("zoo:")
# ---------------------------------------------------------------------------
# Round-trip: parse_page → write_page → parse_page
# ---------------------------------------------------------------------------
class TestRoundTrip:
def test_round_trip_preserves_core_fields(
self, wiki_lib: Any, tmp_wiki: Path
) -> None:
path = make_page(
tmp_wiki,
"patterns/rt.md",
title="Round Trip",
sources=["raw/a.md", "raw/b.md"],
related=["patterns/other.md"],
)
page1 = wiki_lib.parse_page(path)
wiki_lib.write_page(page1)
page2 = wiki_lib.parse_page(path)
assert page2.frontmatter["title"] == "Round Trip"
assert page2.frontmatter["sources"] == ["raw/a.md", "raw/b.md"]
assert page2.frontmatter["related"] == ["patterns/other.md"]
assert page2.body == page1.body
def test_round_trip_preserves_mutation(
self, wiki_lib: Any, tmp_wiki: Path
) -> None:
path = make_page(tmp_wiki, "patterns/rt.md", confidence="high")
page = wiki_lib.parse_page(path)
page.frontmatter["confidence"] = "low"
wiki_lib.write_page(page)
page2 = wiki_lib.parse_page(path)
assert page2.frontmatter["confidence"] == "low"
# ---------------------------------------------------------------------------
# parse_date
# ---------------------------------------------------------------------------
class TestParseDate:
def test_iso_format(self, wiki_lib: Any) -> None:
assert wiki_lib.parse_date("2026-04-10") == date(2026, 4, 10)
def test_empty_string_returns_none(self, wiki_lib: Any) -> None:
assert wiki_lib.parse_date("") is None
def test_none_returns_none(self, wiki_lib: Any) -> None:
assert wiki_lib.parse_date(None) is None
def test_invalid_format_returns_none(self, wiki_lib: Any) -> None:
assert wiki_lib.parse_date("not-a-date") is None
assert wiki_lib.parse_date("2026/04/10") is None
assert wiki_lib.parse_date("04-10-2026") is None
def test_date_object_passthrough(self, wiki_lib: Any) -> None:
d = date(2026, 4, 10)
assert wiki_lib.parse_date(d) == d
# ---------------------------------------------------------------------------
# page_content_hash
# ---------------------------------------------------------------------------
class TestPageContentHash:
def test_deterministic(self, wiki_lib: Any, tmp_wiki: Path) -> None:
path = make_page(tmp_wiki, "patterns/h.md", body="# Same body\n\nLine.\n")
page = wiki_lib.parse_page(path)
h1 = wiki_lib.page_content_hash(page)
h2 = wiki_lib.page_content_hash(page)
assert h1 == h2
assert h1.startswith("sha256:")
def test_different_bodies_yield_different_hashes(
self, wiki_lib: Any, tmp_wiki: Path
) -> None:
p1 = make_page(tmp_wiki, "patterns/a.md", body="# A\n\nAlpha.\n")
p2 = make_page(tmp_wiki, "patterns/b.md", body="# B\n\nBeta.\n")
h1 = wiki_lib.page_content_hash(wiki_lib.parse_page(p1))
h2 = wiki_lib.page_content_hash(wiki_lib.parse_page(p2))
assert h1 != h2
def test_frontmatter_changes_dont_change_hash(
self, wiki_lib: Any, tmp_wiki: Path
) -> None:
"""Hash is body-only so mechanical frontmatter fixes don't churn it."""
path = make_page(tmp_wiki, "patterns/f.md", confidence="high")
page = wiki_lib.parse_page(path)
h1 = wiki_lib.page_content_hash(page)
page.frontmatter["confidence"] = "medium"
wiki_lib.write_page(page)
page2 = wiki_lib.parse_page(path)
h2 = wiki_lib.page_content_hash(page2)
assert h1 == h2
# ---------------------------------------------------------------------------
# Iterators
# ---------------------------------------------------------------------------
class TestIterators:
def test_iter_live_pages_finds_all_types(
self, wiki_lib: Any, tmp_wiki: Path
) -> None:
make_page(tmp_wiki, "patterns/p1.md")
make_page(tmp_wiki, "patterns/p2.md")
make_page(tmp_wiki, "decisions/d1.md")
make_page(tmp_wiki, "concepts/c1.md")
make_page(tmp_wiki, "environments/e1.md")
pages = wiki_lib.iter_live_pages()
assert len(pages) == 5
stems = {p.path.stem for p in pages}
assert stems == {"p1", "p2", "d1", "c1", "e1"}
def test_iter_live_pages_empty_wiki(
self, wiki_lib: Any, tmp_wiki: Path
) -> None:
assert wiki_lib.iter_live_pages() == []
def test_iter_staging_pages(self, wiki_lib: Any, tmp_wiki: Path) -> None:
make_staging_page(tmp_wiki, "patterns/s1.md")
make_staging_page(tmp_wiki, "decisions/s2.md", ptype="decision")
pages = wiki_lib.iter_staging_pages()
assert len(pages) == 2
assert all(p.frontmatter.get("status") == "pending" for p in pages)
def test_iter_archived_pages(self, wiki_lib: Any, tmp_wiki: Path) -> None:
make_page(
tmp_wiki,
"archive/patterns/old.md",
confidence="stale",
extra_fm={
"archived_date": "2026-01-01",
"archived_reason": "test",
"original_path": "patterns/old.md",
},
)
pages = wiki_lib.iter_archived_pages()
assert len(pages) == 1
assert pages[0].frontmatter["archived_reason"] == "test"
def test_iter_skips_malformed_pages(
self, wiki_lib: Any, tmp_wiki: Path
) -> None:
make_page(tmp_wiki, "patterns/good.md")
(tmp_wiki / "patterns" / "no-fm.md").write_text("# Just a body\n")
pages = wiki_lib.iter_live_pages()
assert len(pages) == 1
assert pages[0].path.stem == "good"
# ---------------------------------------------------------------------------
# WIKI_DIR env var override
# ---------------------------------------------------------------------------
class TestWikiDirEnvVar:
def test_honors_env_var(self, wiki_lib: Any, tmp_wiki: Path) -> None:
"""The tmp_wiki fixture sets WIKI_DIR — verify wiki_lib picks it up."""
assert wiki_lib.WIKI_DIR == tmp_wiki
assert wiki_lib.STAGING_DIR == tmp_wiki / "staging"
assert wiki_lib.ARCHIVE_DIR == tmp_wiki / "archive"
assert wiki_lib.INDEX_FILE == tmp_wiki / "index.md"

267
tests/test_wiki_staging.py Normal file
View File

@@ -0,0 +1,267 @@
"""Integration tests for scripts/wiki-staging.py."""
from __future__ import annotations
import json
from pathlib import Path
from typing import Any
import pytest
from conftest import make_page, make_staging_page
# ---------------------------------------------------------------------------
# List + page_summary
# ---------------------------------------------------------------------------
class TestListPending:
def test_empty_staging(self, wiki_staging: Any, tmp_wiki: Path) -> None:
assert wiki_staging.list_pending() == []
def test_finds_pages_in_all_type_subdirs(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
make_staging_page(tmp_wiki, "patterns/p.md", ptype="pattern")
make_staging_page(tmp_wiki, "decisions/d.md", ptype="decision")
make_staging_page(tmp_wiki, "concepts/c.md", ptype="concept")
pending = wiki_staging.list_pending()
assert len(pending) == 3
def test_skips_staging_index_md(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
(tmp_wiki / "staging" / "index.md").write_text(
"---\ntitle: Index\n---\n# staging index\n"
)
make_staging_page(tmp_wiki, "patterns/real.md")
pending = wiki_staging.list_pending()
assert len(pending) == 1
assert pending[0].path.stem == "real"
def test_page_summary_populates_all_fields(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
make_staging_page(
tmp_wiki,
"patterns/sample.md",
title="Sample",
staged_by="wiki-harvest",
staged_date="2026-04-10",
target_path="patterns/sample.md",
)
pending = wiki_staging.list_pending()
summary = wiki_staging.page_summary(pending[0])
assert summary["title"] == "Sample"
assert summary["type"] == "pattern"
assert summary["staged_by"] == "wiki-harvest"
assert summary["target_path"] == "patterns/sample.md"
assert summary["modifies"] is None
# ---------------------------------------------------------------------------
# Promote
# ---------------------------------------------------------------------------
class TestPromote:
def test_moves_file_to_live(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
make_staging_page(tmp_wiki, "patterns/new.md", title="New Page")
page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "new.md")
result = wiki_staging.promote(page)
assert result is not None
assert (tmp_wiki / "patterns" / "new.md").exists()
assert not (tmp_wiki / "staging" / "patterns" / "new.md").exists()
def test_strips_staging_only_fields(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
make_staging_page(tmp_wiki, "patterns/clean.md")
page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "clean.md")
wiki_staging.promote(page)
promoted = wiki_staging.parse_page(tmp_wiki / "patterns" / "clean.md")
for field in ("status", "staged_date", "staged_by", "target_path", "compilation_notes"):
assert field not in promoted.frontmatter
def test_preserves_origin_automated(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
make_staging_page(tmp_wiki, "patterns/auto.md")
page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "auto.md")
wiki_staging.promote(page)
promoted = wiki_staging.parse_page(tmp_wiki / "patterns" / "auto.md")
assert promoted.frontmatter["origin"] == "automated"
def test_updates_main_index(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
make_staging_page(tmp_wiki, "patterns/indexed.md", title="Indexed Page")
page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "indexed.md")
wiki_staging.promote(page)
idx = (tmp_wiki / "index.md").read_text()
assert "patterns/indexed.md" in idx
def test_regenerates_staging_index(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
make_staging_page(tmp_wiki, "patterns/one.md")
make_staging_page(tmp_wiki, "patterns/two.md")
page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "one.md")
wiki_staging.promote(page)
idx = (tmp_wiki / "staging" / "index.md").read_text()
assert "two.md" in idx
assert "1 pending" in idx
def test_dry_run_does_not_move(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
make_staging_page(tmp_wiki, "patterns/safe.md")
page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "safe.md")
wiki_staging.promote(page, dry_run=True)
assert (tmp_wiki / "staging" / "patterns" / "safe.md").exists()
assert not (tmp_wiki / "patterns" / "safe.md").exists()
# ---------------------------------------------------------------------------
# Promote with modifies field
# ---------------------------------------------------------------------------
class TestPromoteUpdate:
def test_update_overwrites_existing_live_page(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
# Existing live page
make_page(
tmp_wiki,
"patterns/existing.md",
title="Old Title",
last_compiled="2026-01-01",
)
# Staging update with `modifies`
make_staging_page(
tmp_wiki,
"patterns/existing.md",
title="New Title",
modifies="patterns/existing.md",
target_path="patterns/existing.md",
)
page = wiki_staging.parse_page(
tmp_wiki / "staging" / "patterns" / "existing.md"
)
wiki_staging.promote(page)
live = wiki_staging.parse_page(tmp_wiki / "patterns" / "existing.md")
assert live.frontmatter["title"] == "New Title"
# ---------------------------------------------------------------------------
# Reject
# ---------------------------------------------------------------------------
class TestReject:
def test_deletes_file(self, wiki_staging: Any, tmp_wiki: Path) -> None:
path = make_staging_page(tmp_wiki, "patterns/bad.md")
page = wiki_staging.parse_page(path)
wiki_staging.reject(page, "duplicate")
assert not path.exists()
def test_records_rejection_in_harvest_state(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
# Create a raw harvested file with a source_url
raw = tmp_wiki / "raw" / "harvested" / "example-com-test.md"
raw.parent.mkdir(parents=True, exist_ok=True)
raw.write_text(
"---\n"
"source_url: https://example.com/test\n"
"fetched_date: 2026-04-10\n"
"fetch_method: trafilatura\n"
"discovered_in: conversations/mc/test.md\n"
"content_hash: sha256:abc\n"
"---\n"
"# Example\n"
)
# Create a staging page that references it
make_staging_page(tmp_wiki, "patterns/reject-me.md")
staging_path = tmp_wiki / "staging" / "patterns" / "reject-me.md"
# Inject sources so reject() finds the harvest_source
page = wiki_staging.parse_page(staging_path)
page.frontmatter["sources"] = ["raw/harvested/example-com-test.md"]
wiki_staging.write_page(page)
page = wiki_staging.parse_page(staging_path)
wiki_staging.reject(page, "test rejection")
state = json.loads((tmp_wiki / ".harvest-state.json").read_text())
assert "https://example.com/test" in state["rejected_urls"]
assert state["rejected_urls"]["https://example.com/test"]["reason"] == "test rejection"
def test_reject_dry_run_keeps_file(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
path = make_staging_page(tmp_wiki, "patterns/kept.md")
page = wiki_staging.parse_page(path)
wiki_staging.reject(page, "test", dry_run=True)
assert path.exists()
# ---------------------------------------------------------------------------
# Staging index regeneration
# ---------------------------------------------------------------------------
class TestStagingIndexRegen:
def test_empty_index_shows_none(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
wiki_staging.regenerate_staging_index()
idx = (tmp_wiki / "staging" / "index.md").read_text()
assert "0 pending" in idx
assert "No pending items" in idx
def test_lists_pending_items(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
make_staging_page(tmp_wiki, "patterns/a.md", title="A")
make_staging_page(tmp_wiki, "decisions/b.md", title="B", ptype="decision")
wiki_staging.regenerate_staging_index()
idx = (tmp_wiki / "staging" / "index.md").read_text()
assert "2 pending" in idx
assert "A" in idx and "B" in idx
# ---------------------------------------------------------------------------
# Path resolution
# ---------------------------------------------------------------------------
class TestResolvePage:
def test_resolves_staging_relative_path(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
make_staging_page(tmp_wiki, "patterns/foo.md")
page = wiki_staging.resolve_page("staging/patterns/foo.md")
assert page is not None
assert page.path.name == "foo.md"
def test_returns_none_for_missing(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
assert wiki_staging.resolve_page("staging/patterns/does-not-exist.md") is None
def test_resolves_bare_patterns_path_as_staging(
self, wiki_staging: Any, tmp_wiki: Path
) -> None:
make_staging_page(tmp_wiki, "patterns/bare.md")
page = wiki_staging.resolve_page("patterns/bare.md")
assert page is not None
assert "staging" in str(page.path)