Initial commit — memex
A compounding LLM-maintained knowledge wiki. Synthesis of Andrej Karpathy's persistent-wiki gist and milla-jovovich's mempalace, with an automation layer on top for conversation mining, URL harvesting, human-in-the-loop staging, staleness decay, and hygiene. Includes: - 11 pipeline scripts (extract, summarize, index, harvest, stage, hygiene, maintain, sync, + shared library) - Full docs: README, SETUP, ARCHITECTURE, DESIGN-RATIONALE, CUSTOMIZE - Example CLAUDE.md files (wiki schema + global instructions) tuned for the three-collection qmd setup - 171-test pytest suite (cross-platform, runs in ~1.3s) - MIT licensed
This commit is contained in:
107
tests/README.md
Normal file
107
tests/README.md
Normal file
@@ -0,0 +1,107 @@
|
||||
# Wiki Pipeline Test Suite
|
||||
|
||||
Pytest-based test suite covering all 11 scripts in `scripts/`. Runs on both
|
||||
macOS and Linux/WSL, uses only the Python standard library + pytest.
|
||||
|
||||
## Running
|
||||
|
||||
```bash
|
||||
# Full suite (from wiki root)
|
||||
bash tests/run.sh
|
||||
|
||||
# Single test file
|
||||
bash tests/run.sh test_wiki_lib.py
|
||||
|
||||
# Single test class or function
|
||||
bash tests/run.sh test_wiki_hygiene.py::TestArchiveRestore
|
||||
bash tests/run.sh test_wiki_hygiene.py::TestArchiveRestore::test_restore_reverses_archive
|
||||
|
||||
# Pattern matching
|
||||
bash tests/run.sh -k "archive"
|
||||
|
||||
# Verbose
|
||||
bash tests/run.sh -v
|
||||
|
||||
# Stop on first failure
|
||||
bash tests/run.sh -x
|
||||
|
||||
# Or invoke pytest directly from the tests dir
|
||||
cd tests && python3 -m pytest -v
|
||||
```
|
||||
|
||||
## What's tested
|
||||
|
||||
| File | Coverage |
|
||||
|------|----------|
|
||||
| `test_wiki_lib.py` | YAML parser, frontmatter round-trip, page iterators, date parsing, content hashing, WIKI_DIR env override |
|
||||
| `test_wiki_hygiene.py` | Backfill, confidence decay math, frontmatter repair, archive/restore round-trip, orphan detection, broken-xref fuzzy matching, index drift, empty stubs, conversation refresh signals, auto-restore, staging/archive sync, state drift, hygiene state file, full quick-run idempotency |
|
||||
| `test_wiki_staging.py` | List, promote, reject, promote-with-modifies, dry-run, staging index regeneration, path resolution |
|
||||
| `test_wiki_harvest.py` | URL classification (harvest/check/skip), private IP detection, URL extraction + filtering, filename derivation, content validation, state management, raw file writing, dry-run CLI smoke test |
|
||||
| `test_conversation_pipeline.py` | CLI smoke tests for extract-sessions, summarize-conversations, update-conversation-index; dry-run behavior; help flags; integration test with fake conversation files |
|
||||
| `test_shell_scripts.py` | wiki-maintain.sh / mine-conversations.sh / wiki-sync.sh: help, dry-run, mutex flags, bash syntax check, strict-mode check, shebang check, py_compile for all .py scripts |
|
||||
|
||||
## How it works
|
||||
|
||||
**Isolation**: Every test runs against a disposable `tmp_wiki` fixture
|
||||
(pytest `tmp_path`). The fixture sets the `WIKI_DIR` environment variable
|
||||
so all scripts resolve paths against the tmp directory instead of the real
|
||||
wiki. No test ever touches `~/projects/wiki`.
|
||||
|
||||
**Hyphenated filenames**: Scripts like `wiki-harvest.py` use hyphens, which
|
||||
Python's `import` can't handle directly. `conftest.py` has a
|
||||
`_load_script_module` helper that loads a script file by path and exposes
|
||||
it as a module object.
|
||||
|
||||
**Clean module state**: Each test that loads a module clears any cached
|
||||
import first, so `WIKI_DIR` env overrides take effect correctly between
|
||||
tests.
|
||||
|
||||
**Subprocess tests** (for CLI smoke tests): `conftest.py` provides a
|
||||
`run_script` fixture that invokes a script via `python3` or `bash` with
|
||||
`WIKI_DIR` set to the tmp wiki. Uses `subprocess.run` with `capture_output`
|
||||
and a timeout.
|
||||
|
||||
## Cross-platform
|
||||
|
||||
- `#!/usr/bin/env bash` shebangs (tested explicitly)
|
||||
- `set -euo pipefail` in all shell scripts (tested explicitly)
|
||||
- `bash -n` syntax check on all shell scripts
|
||||
- `py_compile` on all Python scripts
|
||||
- Uses `pathlib` everywhere — no hardcoded path separators
|
||||
- Uses the Python stdlib only (except pytest itself)
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.11+
|
||||
- `pytest` — install with `pip install --user pytest` or your distro's package manager
|
||||
- `bash` (any version — scripts use only portable features)
|
||||
|
||||
The tests do NOT require:
|
||||
- `claude` CLI (mocked / skipped)
|
||||
- `trafilatura` or `crawl4ai` (only dry-run / classification paths tested)
|
||||
- `qmd` (reindex phase is skipped in tests)
|
||||
- Network access
|
||||
- The real `~/projects/wiki` or `~/.claude/projects` directories
|
||||
|
||||
## Speed
|
||||
|
||||
Full suite runs in **~1 second** on a modern laptop. All tests are isolated
|
||||
and independent so they can run in any order and in parallel.
|
||||
|
||||
## What's NOT tested
|
||||
|
||||
- **Real LLM calls** (`claude -p`): too expensive, non-deterministic.
|
||||
Tested: CLI parsing, dry-run paths, mocked error handling.
|
||||
- **Real web fetches** (trafilatura/crawl4ai): too slow, non-deterministic.
|
||||
Tested: URL classification, filter logic, fetch-result validation.
|
||||
- **Real git operations** (wiki-sync.sh): requires a git repo fixture.
|
||||
Tested: script loads, handles non-git dir gracefully, --status exits clean.
|
||||
- **Real qmd indexing**: tested elsewhere via `qmd collection list` in the
|
||||
setup verification step.
|
||||
- **Real Claude Code session JSONL parsing** with actual sessions: would
|
||||
require fixture JSONL files. Tested: CLI parsing, empty-dir behavior,
|
||||
`CLAUDE_PROJECTS_DIR` env override.
|
||||
|
||||
These are smoke-tested end-to-end via the integration tests in
|
||||
`test_conversation_pipeline.py` and the dry-run paths in
|
||||
`test_shell_scripts.py::TestWikiMaintainSh`.
|
||||
300
tests/conftest.py
Normal file
300
tests/conftest.py
Normal file
@@ -0,0 +1,300 @@
|
||||
"""Shared test fixtures for the wiki pipeline test suite.
|
||||
|
||||
All tests run against a disposable `tmp_wiki` directory — no test ever
|
||||
touches the real ~/projects/wiki. Cross-platform: uses pathlib, no
|
||||
platform-specific paths, and runs on both macOS and Linux/WSL.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib
|
||||
import importlib.util
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import pytest
|
||||
|
||||
SCRIPTS_DIR = Path(__file__).resolve().parent.parent / "scripts"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Module loading helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
#
|
||||
# The wiki scripts use hyphenated filenames (wiki-hygiene.py etc.) which
|
||||
# can't be imported via normal `import` syntax. These helpers load a script
|
||||
# file as a module object so tests can exercise its functions directly.
|
||||
|
||||
|
||||
def _load_script_module(name: str, path: Path) -> Any:
|
||||
"""Load a Python script file as a module. Clears any cached version first."""
|
||||
# Clear cached imports so WIKI_DIR env changes take effect between tests
|
||||
for key in list(sys.modules):
|
||||
if key in (name, "wiki_lib"):
|
||||
del sys.modules[key]
|
||||
|
||||
# Make sure scripts/ is on sys.path so intra-script imports (wiki_lib) work
|
||||
scripts_str = str(SCRIPTS_DIR)
|
||||
if scripts_str not in sys.path:
|
||||
sys.path.insert(0, scripts_str)
|
||||
|
||||
spec = importlib.util.spec_from_file_location(name, path)
|
||||
assert spec is not None and spec.loader is not None
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
sys.modules[name] = mod
|
||||
spec.loader.exec_module(mod)
|
||||
return mod
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# tmp_wiki fixture — builds a realistic wiki tree under a tmp path
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def tmp_wiki(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Path:
|
||||
"""Set up a disposable wiki tree with all the directories the scripts expect.
|
||||
|
||||
Sets the WIKI_DIR environment variable so all imported modules resolve
|
||||
paths against this tmp directory.
|
||||
"""
|
||||
wiki = tmp_path / "wiki"
|
||||
wiki.mkdir()
|
||||
|
||||
# Create the directory tree
|
||||
for sub in ["patterns", "decisions", "concepts", "environments"]:
|
||||
(wiki / sub).mkdir()
|
||||
(wiki / "staging" / sub).mkdir(parents=True)
|
||||
(wiki / "archive" / sub).mkdir(parents=True)
|
||||
(wiki / "raw" / "harvested").mkdir(parents=True)
|
||||
(wiki / "conversations").mkdir()
|
||||
(wiki / "reports").mkdir()
|
||||
|
||||
# Create minimal index.md
|
||||
(wiki / "index.md").write_text(
|
||||
"# Wiki Index\n\n"
|
||||
"## Patterns\n\n"
|
||||
"## Decisions\n\n"
|
||||
"## Concepts\n\n"
|
||||
"## Environments\n\n"
|
||||
)
|
||||
|
||||
# Empty state files
|
||||
(wiki / ".harvest-state.json").write_text(json.dumps({
|
||||
"harvested_urls": {},
|
||||
"skipped_urls": {},
|
||||
"failed_urls": {},
|
||||
"rejected_urls": {},
|
||||
"last_run": None,
|
||||
}))
|
||||
|
||||
# Point all scripts at this tmp wiki
|
||||
monkeypatch.setenv("WIKI_DIR", str(wiki))
|
||||
|
||||
return wiki
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Sample page factories
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def make_page(
|
||||
wiki: Path,
|
||||
rel_path: str,
|
||||
*,
|
||||
title: str | None = None,
|
||||
ptype: str | None = None,
|
||||
confidence: str = "high",
|
||||
last_compiled: str = "2026-04-01",
|
||||
last_verified: str = "2026-04-01",
|
||||
origin: str = "manual",
|
||||
sources: list[str] | None = None,
|
||||
related: list[str] | None = None,
|
||||
body: str = "# Content\n\nA substantive page with real content so it is not a stub.\n",
|
||||
extra_fm: dict[str, Any] | None = None,
|
||||
) -> Path:
|
||||
"""Write a well-formed wiki page with all required frontmatter fields."""
|
||||
if sources is None:
|
||||
sources = []
|
||||
if related is None:
|
||||
related = []
|
||||
"""Write a page to the tmp wiki and return its path."""
|
||||
path = wiki / rel_path
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
if title is None:
|
||||
title = path.stem.replace("-", " ").title()
|
||||
if ptype is None:
|
||||
ptype = path.parent.name.rstrip("s")
|
||||
|
||||
fm_lines = [
|
||||
"---",
|
||||
f"title: {title}",
|
||||
f"type: {ptype}",
|
||||
f"confidence: {confidence}",
|
||||
f"origin: {origin}",
|
||||
f"last_compiled: {last_compiled}",
|
||||
f"last_verified: {last_verified}",
|
||||
]
|
||||
if sources is not None:
|
||||
if sources:
|
||||
fm_lines.append("sources:")
|
||||
fm_lines.extend(f" - {s}" for s in sources)
|
||||
else:
|
||||
fm_lines.append("sources: []")
|
||||
if related is not None:
|
||||
if related:
|
||||
fm_lines.append("related:")
|
||||
fm_lines.extend(f" - {r}" for r in related)
|
||||
else:
|
||||
fm_lines.append("related: []")
|
||||
if extra_fm:
|
||||
for k, v in extra_fm.items():
|
||||
if isinstance(v, list):
|
||||
if v:
|
||||
fm_lines.append(f"{k}:")
|
||||
fm_lines.extend(f" - {item}" for item in v)
|
||||
else:
|
||||
fm_lines.append(f"{k}: []")
|
||||
else:
|
||||
fm_lines.append(f"{k}: {v}")
|
||||
fm_lines.append("---")
|
||||
|
||||
path.write_text("\n".join(fm_lines) + "\n" + body)
|
||||
return path
|
||||
|
||||
|
||||
def make_conversation(
|
||||
wiki: Path,
|
||||
project: str,
|
||||
filename: str,
|
||||
*,
|
||||
date: str = "2026-04-10",
|
||||
status: str = "summarized",
|
||||
messages: int = 100,
|
||||
related: list[str] | None = None,
|
||||
body: str = "## Summary\n\nTest conversation summary.\n",
|
||||
) -> Path:
|
||||
"""Write a conversation file to the tmp wiki."""
|
||||
proj_dir = wiki / "conversations" / project
|
||||
proj_dir.mkdir(parents=True, exist_ok=True)
|
||||
path = proj_dir / filename
|
||||
|
||||
fm_lines = [
|
||||
"---",
|
||||
f"title: Test Conversation {filename}",
|
||||
"type: conversation",
|
||||
f"project: {project}",
|
||||
f"date: {date}",
|
||||
f"status: {status}",
|
||||
f"messages: {messages}",
|
||||
]
|
||||
if related:
|
||||
fm_lines.append("related:")
|
||||
fm_lines.extend(f" - {r}" for r in related)
|
||||
fm_lines.append("---")
|
||||
|
||||
path.write_text("\n".join(fm_lines) + "\n" + body)
|
||||
return path
|
||||
|
||||
|
||||
def make_staging_page(
|
||||
wiki: Path,
|
||||
rel_under_staging: str,
|
||||
*,
|
||||
title: str = "Pending Page",
|
||||
ptype: str = "pattern",
|
||||
staged_by: str = "wiki-harvest",
|
||||
staged_date: str = "2026-04-10",
|
||||
modifies: str | None = None,
|
||||
target_path: str | None = None,
|
||||
body: str = "# Pending\n\nStaged content body.\n",
|
||||
) -> Path:
|
||||
path = wiki / "staging" / rel_under_staging
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
if target_path is None:
|
||||
target_path = rel_under_staging
|
||||
|
||||
fm_lines = [
|
||||
"---",
|
||||
f"title: {title}",
|
||||
f"type: {ptype}",
|
||||
"confidence: medium",
|
||||
"origin: automated",
|
||||
"status: pending",
|
||||
f"staged_date: {staged_date}",
|
||||
f"staged_by: {staged_by}",
|
||||
f"target_path: {target_path}",
|
||||
]
|
||||
if modifies:
|
||||
fm_lines.append(f"modifies: {modifies}")
|
||||
fm_lines.append("compilation_notes: test note")
|
||||
fm_lines.append("last_verified: 2026-04-10")
|
||||
fm_lines.append("---")
|
||||
|
||||
path.write_text("\n".join(fm_lines) + "\n" + body)
|
||||
return path
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Module fixtures — each loads the corresponding script as a module
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def wiki_lib(tmp_wiki: Path) -> Any:
|
||||
"""Load wiki_lib fresh against the tmp_wiki directory."""
|
||||
return _load_script_module("wiki_lib", SCRIPTS_DIR / "wiki_lib.py")
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def wiki_hygiene(tmp_wiki: Path) -> Any:
|
||||
"""Load wiki-hygiene.py fresh. wiki_lib must be loaded first for its imports."""
|
||||
_load_script_module("wiki_lib", SCRIPTS_DIR / "wiki_lib.py")
|
||||
return _load_script_module("wiki_hygiene", SCRIPTS_DIR / "wiki-hygiene.py")
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def wiki_staging(tmp_wiki: Path) -> Any:
|
||||
_load_script_module("wiki_lib", SCRIPTS_DIR / "wiki_lib.py")
|
||||
return _load_script_module("wiki_staging", SCRIPTS_DIR / "wiki-staging.py")
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def wiki_harvest(tmp_wiki: Path) -> Any:
|
||||
_load_script_module("wiki_lib", SCRIPTS_DIR / "wiki_lib.py")
|
||||
return _load_script_module("wiki_harvest", SCRIPTS_DIR / "wiki-harvest.py")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Subprocess helper — runs a script as if from the CLI, with WIKI_DIR set
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def run_script(tmp_wiki: Path):
|
||||
"""Return a function that runs a script via subprocess with WIKI_DIR set."""
|
||||
import subprocess
|
||||
|
||||
def _run(script_rel: str, *args: str, timeout: int = 60) -> subprocess.CompletedProcess:
|
||||
script = SCRIPTS_DIR / script_rel
|
||||
if script.suffix == ".py":
|
||||
cmd = ["python3", str(script), *args]
|
||||
else:
|
||||
cmd = ["bash", str(script), *args]
|
||||
env = os.environ.copy()
|
||||
env["WIKI_DIR"] = str(tmp_wiki)
|
||||
return subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout,
|
||||
env=env,
|
||||
)
|
||||
|
||||
return _run
|
||||
9
tests/pytest.ini
Normal file
9
tests/pytest.ini
Normal file
@@ -0,0 +1,9 @@
|
||||
[pytest]
|
||||
testpaths = .
|
||||
python_files = test_*.py
|
||||
python_classes = Test*
|
||||
python_functions = test_*
|
||||
addopts = -ra --strict-markers --tb=short
|
||||
markers =
|
||||
slow: tests that take more than 1 second
|
||||
network: tests that hit the network (skipped by default)
|
||||
31
tests/run.sh
Executable file
31
tests/run.sh
Executable file
@@ -0,0 +1,31 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# run.sh — Convenience wrapper for running the wiki pipeline test suite.
|
||||
#
|
||||
# Usage:
|
||||
# bash tests/run.sh # Run the full suite
|
||||
# bash tests/run.sh -v # Verbose output
|
||||
# bash tests/run.sh test_wiki_lib # Run one file
|
||||
# bash tests/run.sh -k "parse" # Run tests matching a pattern
|
||||
#
|
||||
# All arguments are passed through to pytest.
|
||||
|
||||
TESTS_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
cd "${TESTS_DIR}"
|
||||
|
||||
# Verify pytest is available
|
||||
if ! python3 -c "import pytest" 2>/dev/null; then
|
||||
echo "pytest not installed. Install with: pip install --user pytest"
|
||||
exit 2
|
||||
fi
|
||||
|
||||
# Clear any previous test artifacts
|
||||
rm -rf .pytest_cache 2>/dev/null || true
|
||||
|
||||
# Default args: quiet with colored output
|
||||
if [[ $# -eq 0 ]]; then
|
||||
exec python3 -m pytest --tb=short
|
||||
else
|
||||
exec python3 -m pytest "$@"
|
||||
fi
|
||||
121
tests/test_conversation_pipeline.py
Normal file
121
tests/test_conversation_pipeline.py
Normal file
@@ -0,0 +1,121 @@
|
||||
"""Smoke + integration tests for the conversation mining pipeline.
|
||||
|
||||
These scripts interact with external systems (Claude Code sessions dir,
|
||||
claude CLI), so tests focus on CLI parsing, dry-run behavior, and error
|
||||
handling rather than exercising the full extraction/summarization path.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# extract-sessions.py
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestExtractSessions:
|
||||
def test_help_exits_clean(self, run_script) -> None:
|
||||
result = run_script("extract-sessions.py", "--help")
|
||||
assert result.returncode == 0
|
||||
assert "--project" in result.stdout
|
||||
assert "--dry-run" in result.stdout
|
||||
|
||||
def test_dry_run_with_empty_sessions_dir(
|
||||
self, run_script, tmp_wiki: Path, tmp_path: Path, monkeypatch
|
||||
) -> None:
|
||||
# Point CLAUDE_PROJECTS_DIR at an empty tmp dir via env (not currently
|
||||
# supported — script reads ~/.claude/projects directly). Instead, use
|
||||
# --project with a code that has no sessions to verify clean exit.
|
||||
result = run_script("extract-sessions.py", "--dry-run", "--project", "nonexistent")
|
||||
assert result.returncode == 0
|
||||
|
||||
def test_rejects_unknown_flag(self, run_script) -> None:
|
||||
result = run_script("extract-sessions.py", "--bogus-flag")
|
||||
assert result.returncode != 0
|
||||
assert "error" in result.stderr.lower() or "unrecognized" in result.stderr.lower()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# summarize-conversations.py
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestSummarizeConversations:
|
||||
def test_help_exits_clean(self, run_script) -> None:
|
||||
result = run_script("summarize-conversations.py", "--help")
|
||||
assert result.returncode == 0
|
||||
assert "--claude" in result.stdout
|
||||
assert "--dry-run" in result.stdout
|
||||
assert "--project" in result.stdout
|
||||
|
||||
def test_dry_run_empty_conversations(
|
||||
self, run_script, tmp_wiki: Path
|
||||
) -> None:
|
||||
result = run_script("summarize-conversations.py", "--claude", "--dry-run")
|
||||
assert result.returncode == 0
|
||||
|
||||
def test_dry_run_with_extracted_conversation(
|
||||
self, run_script, tmp_wiki: Path
|
||||
) -> None:
|
||||
from conftest import make_conversation
|
||||
|
||||
make_conversation(
|
||||
tmp_wiki,
|
||||
"general",
|
||||
"2026-04-10-abc.md",
|
||||
status="extracted", # Not yet summarized
|
||||
messages=50,
|
||||
)
|
||||
result = run_script("summarize-conversations.py", "--claude", "--dry-run")
|
||||
assert result.returncode == 0
|
||||
# Should mention the file or show it would be processed
|
||||
assert "2026-04-10-abc.md" in result.stdout or "1 conversation" in result.stdout
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# update-conversation-index.py
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestUpdateConversationIndex:
|
||||
def test_help_exits_clean(self, run_script) -> None:
|
||||
result = run_script("update-conversation-index.py", "--help")
|
||||
assert result.returncode == 0
|
||||
|
||||
def test_runs_on_empty_conversations_dir(
|
||||
self, run_script, tmp_wiki: Path
|
||||
) -> None:
|
||||
result = run_script("update-conversation-index.py")
|
||||
# Should not crash even with no conversations
|
||||
assert result.returncode == 0
|
||||
|
||||
def test_builds_index_from_conversations(
|
||||
self, run_script, tmp_wiki: Path
|
||||
) -> None:
|
||||
from conftest import make_conversation
|
||||
|
||||
make_conversation(
|
||||
tmp_wiki,
|
||||
"general",
|
||||
"2026-04-10-one.md",
|
||||
status="summarized",
|
||||
)
|
||||
make_conversation(
|
||||
tmp_wiki,
|
||||
"general",
|
||||
"2026-04-11-two.md",
|
||||
status="summarized",
|
||||
)
|
||||
result = run_script("update-conversation-index.py")
|
||||
assert result.returncode == 0
|
||||
|
||||
idx = tmp_wiki / "conversations" / "index.md"
|
||||
assert idx.exists()
|
||||
text = idx.read_text()
|
||||
assert "2026-04-10-one.md" in text or "one.md" in text
|
||||
assert "2026-04-11-two.md" in text or "two.md" in text
|
||||
209
tests/test_shell_scripts.py
Normal file
209
tests/test_shell_scripts.py
Normal file
@@ -0,0 +1,209 @@
|
||||
"""Smoke tests for the bash scripts.
|
||||
|
||||
Bash scripts are harder to unit-test in isolation — these tests verify
|
||||
CLI parsing, help text, and dry-run/safe flags work correctly and that
|
||||
scripts exit cleanly in all the no-op paths.
|
||||
|
||||
Cross-platform note: tests invoke scripts via `bash` explicitly, so they
|
||||
work on both macOS (default /bin/bash) and Linux/WSL. They avoid anything
|
||||
that requires external state (network, git, LLM).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import pytest
|
||||
|
||||
from conftest import make_conversation, make_page, make_staging_page
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# wiki-maintain.sh
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestWikiMaintainSh:
|
||||
def test_help_flag(self, run_script) -> None:
|
||||
result = run_script("wiki-maintain.sh", "--help")
|
||||
assert result.returncode == 0
|
||||
assert "Usage:" in result.stdout or "usage:" in result.stdout.lower()
|
||||
assert "--full" in result.stdout
|
||||
assert "--harvest-only" in result.stdout
|
||||
assert "--hygiene-only" in result.stdout
|
||||
|
||||
def test_rejects_unknown_flag(self, run_script) -> None:
|
||||
result = run_script("wiki-maintain.sh", "--bogus")
|
||||
assert result.returncode != 0
|
||||
assert "Unknown option" in result.stderr
|
||||
|
||||
def test_harvest_only_and_hygiene_only_conflict(self, run_script) -> None:
|
||||
result = run_script(
|
||||
"wiki-maintain.sh", "--harvest-only", "--hygiene-only"
|
||||
)
|
||||
assert result.returncode != 0
|
||||
assert "mutually exclusive" in result.stderr
|
||||
|
||||
def test_hygiene_only_dry_run_completes(
|
||||
self, run_script, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_page(tmp_wiki, "patterns/one.md")
|
||||
result = run_script(
|
||||
"wiki-maintain.sh", "--hygiene-only", "--dry-run", "--no-reindex"
|
||||
)
|
||||
assert result.returncode == 0
|
||||
assert "Phase 2: Hygiene checks" in result.stdout
|
||||
assert "finished" in result.stdout
|
||||
|
||||
def test_phase_1_skipped_in_hygiene_only(
|
||||
self, run_script, tmp_wiki: Path
|
||||
) -> None:
|
||||
result = run_script(
|
||||
"wiki-maintain.sh", "--hygiene-only", "--dry-run", "--no-reindex"
|
||||
)
|
||||
assert result.returncode == 0
|
||||
assert "Phase 1: URL harvesting (skipped)" in result.stdout
|
||||
|
||||
def test_phase_3_skipped_in_dry_run(
|
||||
self, run_script, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_page(tmp_wiki, "patterns/one.md")
|
||||
result = run_script(
|
||||
"wiki-maintain.sh", "--hygiene-only", "--dry-run"
|
||||
)
|
||||
assert "Phase 3: qmd reindex (skipped)" in result.stdout
|
||||
|
||||
def test_harvest_only_dry_run_completes(
|
||||
self, run_script, tmp_wiki: Path
|
||||
) -> None:
|
||||
# Add a summarized conversation so harvest has something to scan
|
||||
make_conversation(
|
||||
tmp_wiki,
|
||||
"test",
|
||||
"2026-04-10-test.md",
|
||||
status="summarized",
|
||||
body="See https://docs.python.org/3/library/os.html for details.\n",
|
||||
)
|
||||
result = run_script(
|
||||
"wiki-maintain.sh",
|
||||
"--harvest-only",
|
||||
"--dry-run",
|
||||
"--no-compile",
|
||||
"--no-reindex",
|
||||
)
|
||||
assert result.returncode == 0
|
||||
assert "Phase 2: Hygiene checks (skipped)" in result.stdout
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# wiki-sync.sh
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestWikiSyncSh:
|
||||
def test_status_on_non_git_dir_exits_cleanly(self, run_script) -> None:
|
||||
"""wiki-sync.sh --status against a non-git dir should fail gracefully.
|
||||
|
||||
The tmp_wiki fixture is not a git repo, so git commands will fail.
|
||||
The script should report the problem without hanging or leaking stack
|
||||
traces. Any exit code is acceptable as long as it exits in reasonable
|
||||
time and prints something useful to stdout/stderr.
|
||||
"""
|
||||
result = run_script("wiki-sync.sh", "--status", timeout=30)
|
||||
# Should have produced some output and exited (not hung)
|
||||
assert result.stdout or result.stderr
|
||||
assert "Wiki Sync Status" in result.stdout or "not a git" in result.stderr.lower()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# mine-conversations.sh
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestMineConversationsSh:
|
||||
def test_extract_only_dry_run(self, run_script, tmp_wiki: Path) -> None:
|
||||
"""mine-conversations.sh --extract-only --dry-run should complete without LLM."""
|
||||
result = run_script(
|
||||
"mine-conversations.sh", "--extract-only", "--dry-run", timeout=30
|
||||
)
|
||||
assert result.returncode == 0
|
||||
|
||||
def test_rejects_unknown_flag(self, run_script) -> None:
|
||||
result = run_script("mine-conversations.sh", "--bogus-flag")
|
||||
assert result.returncode != 0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Cross-platform sanity — scripts use portable bash syntax
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestBashPortability:
|
||||
"""Verify scripts don't use bashisms that break on macOS /bin/bash 3.2."""
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"script",
|
||||
["wiki-maintain.sh", "mine-conversations.sh", "wiki-sync.sh"],
|
||||
)
|
||||
def test_shebang_is_env_bash(self, script: str) -> None:
|
||||
"""All shell scripts should use `#!/usr/bin/env bash` for portability."""
|
||||
path = Path(__file__).parent.parent / "scripts" / script
|
||||
first_line = path.read_text().splitlines()[0]
|
||||
assert first_line == "#!/usr/bin/env bash", (
|
||||
f"{script} has shebang {first_line!r}, expected #!/usr/bin/env bash"
|
||||
)
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"script",
|
||||
["wiki-maintain.sh", "mine-conversations.sh", "wiki-sync.sh"],
|
||||
)
|
||||
def test_uses_strict_mode(self, script: str) -> None:
|
||||
"""All shell scripts should use `set -euo pipefail` for safe defaults."""
|
||||
path = Path(__file__).parent.parent / "scripts" / script
|
||||
text = path.read_text()
|
||||
assert "set -euo pipefail" in text, f"{script} missing strict mode"
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"script",
|
||||
["wiki-maintain.sh", "mine-conversations.sh", "wiki-sync.sh"],
|
||||
)
|
||||
def test_bash_syntax_check(self, script: str) -> None:
|
||||
"""bash -n does a syntax-only parse and catches obvious errors."""
|
||||
path = Path(__file__).parent.parent / "scripts" / script
|
||||
result = subprocess.run(
|
||||
["bash", "-n", str(path)],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=10,
|
||||
)
|
||||
assert result.returncode == 0, f"{script} has bash syntax errors: {result.stderr}"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Python script syntax check (smoke)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestPythonSyntax:
|
||||
@pytest.mark.parametrize(
|
||||
"script",
|
||||
[
|
||||
"wiki_lib.py",
|
||||
"wiki-harvest.py",
|
||||
"wiki-staging.py",
|
||||
"wiki-hygiene.py",
|
||||
"extract-sessions.py",
|
||||
"summarize-conversations.py",
|
||||
"update-conversation-index.py",
|
||||
],
|
||||
)
|
||||
def test_py_compile(self, script: str) -> None:
|
||||
"""py_compile catches syntax errors without executing the module."""
|
||||
import py_compile
|
||||
|
||||
path = Path(__file__).parent.parent / "scripts" / script
|
||||
# py_compile.compile raises on error; success returns the .pyc path
|
||||
py_compile.compile(str(path), doraise=True)
|
||||
323
tests/test_wiki_harvest.py
Normal file
323
tests/test_wiki_harvest.py
Normal file
@@ -0,0 +1,323 @@
|
||||
"""Unit + integration tests for scripts/wiki-harvest.py."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
|
||||
from conftest import make_conversation
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# URL classification
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestClassifyUrl:
|
||||
def test_regular_docs_site_harvest(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest.classify_url("https://docs.python.org/3/library/os.html") == "harvest"
|
||||
assert wiki_harvest.classify_url("https://blog.example.com/post") == "harvest"
|
||||
|
||||
def test_github_issue_is_check(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest.classify_url("https://github.com/foo/bar/issues/42") == "check"
|
||||
|
||||
def test_github_pr_is_check(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest.classify_url("https://github.com/foo/bar/pull/99") == "check"
|
||||
|
||||
def test_stackoverflow_is_check(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest.classify_url(
|
||||
"https://stackoverflow.com/questions/12345/title"
|
||||
) == "check"
|
||||
|
||||
def test_localhost_skip(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest.classify_url("http://localhost:3000/path") == "skip"
|
||||
assert wiki_harvest.classify_url("http://localhost/foo") == "skip"
|
||||
|
||||
def test_private_ip_skip(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest.classify_url("http://10.0.0.1/api") == "skip"
|
||||
assert wiki_harvest.classify_url("http://172.30.224.1:8080/v1") == "skip"
|
||||
assert wiki_harvest.classify_url("http://192.168.1.1/test") == "skip"
|
||||
assert wiki_harvest.classify_url("http://127.0.0.1:8080/foo") == "skip"
|
||||
|
||||
def test_local_and_internal_tld_skip(self, wiki_harvest: Any) -> None:
|
||||
# `.local` and `.internal` are baked into SKIP_DOMAIN_PATTERNS
|
||||
assert wiki_harvest.classify_url("https://router.local/admin") == "skip"
|
||||
assert wiki_harvest.classify_url("https://service.internal/api") == "skip"
|
||||
|
||||
def test_custom_skip_pattern_runtime(self, wiki_harvest: Any) -> None:
|
||||
# Users can append their own patterns at runtime — verify the hook works
|
||||
wiki_harvest.SKIP_DOMAIN_PATTERNS.append(r"\.mycompany\.com$")
|
||||
try:
|
||||
assert wiki_harvest.classify_url("https://git.mycompany.com/foo") == "skip"
|
||||
assert wiki_harvest.classify_url("https://docs.mycompany.com/api") == "skip"
|
||||
finally:
|
||||
wiki_harvest.SKIP_DOMAIN_PATTERNS.pop()
|
||||
|
||||
def test_atlassian_skip(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest.classify_url("https://foo.atlassian.net/browse/BAR-1") == "skip"
|
||||
|
||||
def test_slack_skip(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest.classify_url("https://myteam.slack.com/archives/C123") == "skip"
|
||||
|
||||
def test_github_repo_root_is_harvest(self, wiki_harvest: Any) -> None:
|
||||
# Not an issue/pr/discussion — just a repo root, might contain docs
|
||||
assert wiki_harvest.classify_url("https://github.com/foo/bar") == "harvest"
|
||||
|
||||
def test_invalid_url_skip(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest.classify_url("not a url") == "skip"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Private IP detection
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestPrivateIp:
|
||||
def test_10_range(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest._is_private_ip("10.0.0.1") is True
|
||||
assert wiki_harvest._is_private_ip("10.255.255.255") is True
|
||||
|
||||
def test_172_16_to_31_range(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest._is_private_ip("172.16.0.1") is True
|
||||
assert wiki_harvest._is_private_ip("172.31.255.255") is True
|
||||
assert wiki_harvest._is_private_ip("172.15.0.1") is False
|
||||
assert wiki_harvest._is_private_ip("172.32.0.1") is False
|
||||
|
||||
def test_192_168_range(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest._is_private_ip("192.168.0.1") is True
|
||||
assert wiki_harvest._is_private_ip("192.167.0.1") is False
|
||||
|
||||
def test_loopback(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest._is_private_ip("127.0.0.1") is True
|
||||
|
||||
def test_public_ip(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest._is_private_ip("8.8.8.8") is False
|
||||
|
||||
def test_hostname_not_ip(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest._is_private_ip("example.com") is False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# URL extraction from files
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestExtractUrls:
|
||||
def test_finds_urls_in_markdown(
|
||||
self, wiki_harvest: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_conversation(
|
||||
tmp_wiki,
|
||||
"test",
|
||||
"test.md",
|
||||
body="See https://docs.python.org/3/library/os.html for details.\n"
|
||||
"Also https://fastapi.tiangolo.com/tutorial/.\n",
|
||||
)
|
||||
urls = wiki_harvest.extract_urls_from_file(path)
|
||||
assert "https://docs.python.org/3/library/os.html" in urls
|
||||
assert "https://fastapi.tiangolo.com/tutorial/" in urls
|
||||
|
||||
def test_filters_asset_extensions(
|
||||
self, wiki_harvest: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_conversation(
|
||||
tmp_wiki,
|
||||
"test",
|
||||
"assets.md",
|
||||
body=(
|
||||
"Real: https://example.com/docs/article.html\n"
|
||||
"Image: https://example.com/logo.png\n"
|
||||
"Script: https://cdn.example.com/lib.js\n"
|
||||
"Font: https://fonts.example.com/face.woff2\n"
|
||||
),
|
||||
)
|
||||
urls = wiki_harvest.extract_urls_from_file(path)
|
||||
assert "https://example.com/docs/article.html" in urls
|
||||
assert not any(u.endswith(".png") for u in urls)
|
||||
assert not any(u.endswith(".js") for u in urls)
|
||||
assert not any(u.endswith(".woff2") for u in urls)
|
||||
|
||||
def test_strips_trailing_punctuation(
|
||||
self, wiki_harvest: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_conversation(
|
||||
tmp_wiki,
|
||||
"test",
|
||||
"punct.md",
|
||||
body="See https://example.com/foo. Also https://example.com/bar, and more.\n",
|
||||
)
|
||||
urls = wiki_harvest.extract_urls_from_file(path)
|
||||
assert "https://example.com/foo" in urls
|
||||
assert "https://example.com/bar" in urls
|
||||
|
||||
def test_deduplicates_within_file(
|
||||
self, wiki_harvest: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_conversation(
|
||||
tmp_wiki,
|
||||
"test",
|
||||
"dup.md",
|
||||
body=(
|
||||
"First mention: https://example.com/same\n"
|
||||
"Second mention: https://example.com/same\n"
|
||||
),
|
||||
)
|
||||
urls = wiki_harvest.extract_urls_from_file(path)
|
||||
assert urls.count("https://example.com/same") == 1
|
||||
|
||||
def test_returns_empty_for_missing_file(
|
||||
self, wiki_harvest: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
assert wiki_harvest.extract_urls_from_file(tmp_wiki / "nope.md") == []
|
||||
|
||||
def test_filters_short_urls(
|
||||
self, wiki_harvest: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
# Less than 20 chars are skipped
|
||||
path = make_conversation(
|
||||
tmp_wiki,
|
||||
"test",
|
||||
"short.md",
|
||||
body="tiny http://a.b/ and https://example.com/long-path\n",
|
||||
)
|
||||
urls = wiki_harvest.extract_urls_from_file(path)
|
||||
assert "http://a.b/" not in urls
|
||||
assert "https://example.com/long-path" in urls
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Raw filename derivation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestRawFilename:
|
||||
def test_basic_url(self, wiki_harvest: Any) -> None:
|
||||
name = wiki_harvest.raw_filename_for_url("https://docs.docker.com/build/multi-stage/")
|
||||
assert name.startswith("docs-docker-com-")
|
||||
assert "build" in name and "multi-stage" in name
|
||||
assert name.endswith(".md")
|
||||
|
||||
def test_strips_www(self, wiki_harvest: Any) -> None:
|
||||
name = wiki_harvest.raw_filename_for_url("https://www.example.com/foo")
|
||||
assert "www" not in name
|
||||
|
||||
def test_root_url_uses_index(self, wiki_harvest: Any) -> None:
|
||||
name = wiki_harvest.raw_filename_for_url("https://example.com/")
|
||||
assert name == "example-com-index.md"
|
||||
|
||||
def test_long_paths_truncated(self, wiki_harvest: Any) -> None:
|
||||
long_url = "https://example.com/" + "a-very-long-segment/" * 20
|
||||
name = wiki_harvest.raw_filename_for_url(long_url)
|
||||
assert len(name) < 200
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Content validation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestValidateContent:
|
||||
def test_accepts_clean_markdown(self, wiki_harvest: Any) -> None:
|
||||
content = "# Title\n\n" + ("A clean paragraph of markdown content. " * 5)
|
||||
assert wiki_harvest.validate_content(content) is True
|
||||
|
||||
def test_rejects_empty(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest.validate_content("") is False
|
||||
|
||||
def test_rejects_too_short(self, wiki_harvest: Any) -> None:
|
||||
assert wiki_harvest.validate_content("# Short") is False
|
||||
|
||||
def test_rejects_html_leak(self, wiki_harvest: Any) -> None:
|
||||
content = "# Title\n\n<div class='nav'>Navigation</div>\n" + "content " * 30
|
||||
assert wiki_harvest.validate_content(content) is False
|
||||
|
||||
def test_rejects_script_tag(self, wiki_harvest: Any) -> None:
|
||||
content = "# Title\n\n<script>alert()</script>\n" + "content " * 30
|
||||
assert wiki_harvest.validate_content(content) is False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# State management
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestStateManagement:
|
||||
def test_load_returns_defaults_when_file_empty(
|
||||
self, wiki_harvest: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
(tmp_wiki / ".harvest-state.json").write_text("{}")
|
||||
state = wiki_harvest.load_state()
|
||||
assert "harvested_urls" in state
|
||||
assert "skipped_urls" in state
|
||||
|
||||
def test_save_and_reload(
|
||||
self, wiki_harvest: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
state = wiki_harvest.load_state()
|
||||
state["harvested_urls"]["https://example.com"] = {
|
||||
"first_seen": "2026-04-12",
|
||||
"seen_in": ["conversations/mc/foo.md"],
|
||||
"raw_file": "raw/harvested/example.md",
|
||||
"status": "raw",
|
||||
"fetch_method": "trafilatura",
|
||||
}
|
||||
wiki_harvest.save_state(state)
|
||||
|
||||
reloaded = wiki_harvest.load_state()
|
||||
assert "https://example.com" in reloaded["harvested_urls"]
|
||||
assert reloaded["last_run"] is not None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Raw file writer
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestWriteRawFile:
|
||||
def test_writes_with_frontmatter(
|
||||
self, wiki_harvest: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
conv = make_conversation(tmp_wiki, "test", "source.md")
|
||||
raw_path = wiki_harvest.write_raw_file(
|
||||
"https://example.com/article",
|
||||
"# Article\n\nClean content.\n",
|
||||
"trafilatura",
|
||||
conv,
|
||||
)
|
||||
assert raw_path.exists()
|
||||
text = raw_path.read_text()
|
||||
assert "source_url: https://example.com/article" in text
|
||||
assert "fetch_method: trafilatura" in text
|
||||
assert "content_hash: sha256:" in text
|
||||
assert "discovered_in: conversations/test/source.md" in text
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Dry-run CLI smoke test (no actual fetches)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestHarvestCli:
|
||||
def test_dry_run_no_network_calls(
|
||||
self, run_script, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_conversation(
|
||||
tmp_wiki,
|
||||
"test",
|
||||
"test.md",
|
||||
body="See https://docs.python.org/3/ and https://github.com/foo/bar/issues/1.\n",
|
||||
)
|
||||
result = run_script("wiki-harvest.py", "--dry-run")
|
||||
assert result.returncode == 0
|
||||
# Dry-run should classify without fetching
|
||||
assert "would-harvest" in result.stdout or "Summary" in result.stdout
|
||||
|
||||
def test_help_flag(self, run_script) -> None:
|
||||
result = run_script("wiki-harvest.py", "--help")
|
||||
assert result.returncode == 0
|
||||
assert "--dry-run" in result.stdout
|
||||
assert "--no-compile" in result.stdout
|
||||
616
tests/test_wiki_hygiene.py
Normal file
616
tests/test_wiki_hygiene.py
Normal file
@@ -0,0 +1,616 @@
|
||||
"""Integration tests for scripts/wiki-hygiene.py.
|
||||
|
||||
Uses the tmp_wiki fixture so tests never touch the real wiki.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import date, timedelta
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import pytest
|
||||
|
||||
from conftest import make_conversation, make_page, make_staging_page
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Backfill last_verified
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestBackfill:
|
||||
def test_sets_last_verified_from_last_compiled(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_page(tmp_wiki, "patterns/foo.md", last_compiled="2026-01-15")
|
||||
# Strip last_verified from the fixture-built file
|
||||
text = path.read_text()
|
||||
text = text.replace("last_verified: 2026-04-01\n", "")
|
||||
path.write_text(text)
|
||||
|
||||
changes = wiki_hygiene.backfill_last_verified()
|
||||
assert len(changes) == 1
|
||||
assert changes[0][1] == "last_compiled"
|
||||
|
||||
reparsed = wiki_hygiene.parse_page(path)
|
||||
assert reparsed.frontmatter["last_verified"] == "2026-01-15"
|
||||
|
||||
def test_skips_pages_already_verified(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_page(tmp_wiki, "patterns/done.md", last_verified="2026-04-01")
|
||||
changes = wiki_hygiene.backfill_last_verified()
|
||||
assert changes == []
|
||||
|
||||
def test_dry_run_does_not_write(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_page(tmp_wiki, "patterns/foo.md", last_compiled="2026-01-15")
|
||||
text = path.read_text().replace("last_verified: 2026-04-01\n", "")
|
||||
path.write_text(text)
|
||||
|
||||
changes = wiki_hygiene.backfill_last_verified(dry_run=True)
|
||||
assert len(changes) == 1
|
||||
|
||||
reparsed = wiki_hygiene.parse_page(path)
|
||||
assert "last_verified" not in reparsed.frontmatter
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Confidence decay math
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestConfidenceDecay:
|
||||
def test_recent_page_unchanged(self, wiki_hygiene: Any) -> None:
|
||||
recent = wiki_hygiene.today() - timedelta(days=30)
|
||||
assert wiki_hygiene.expected_confidence("high", recent, False) == "high"
|
||||
|
||||
def test_six_months_decays_high_to_medium(self, wiki_hygiene: Any) -> None:
|
||||
old = wiki_hygiene.today() - timedelta(days=200)
|
||||
assert wiki_hygiene.expected_confidence("high", old, False) == "medium"
|
||||
|
||||
def test_nine_months_decays_medium_to_low(self, wiki_hygiene: Any) -> None:
|
||||
old = wiki_hygiene.today() - timedelta(days=280)
|
||||
assert wiki_hygiene.expected_confidence("medium", old, False) == "low"
|
||||
|
||||
def test_twelve_months_decays_to_stale(self, wiki_hygiene: Any) -> None:
|
||||
old = wiki_hygiene.today() - timedelta(days=400)
|
||||
assert wiki_hygiene.expected_confidence("high", old, False) == "stale"
|
||||
|
||||
def test_superseded_is_always_stale(self, wiki_hygiene: Any) -> None:
|
||||
recent = wiki_hygiene.today() - timedelta(days=1)
|
||||
assert wiki_hygiene.expected_confidence("high", recent, True) == "stale"
|
||||
|
||||
def test_none_date_leaves_confidence_alone(self, wiki_hygiene: Any) -> None:
|
||||
assert wiki_hygiene.expected_confidence("medium", None, False) == "medium"
|
||||
|
||||
def test_bump_confidence_ladder(self, wiki_hygiene: Any) -> None:
|
||||
assert wiki_hygiene.bump_confidence("stale") == "low"
|
||||
assert wiki_hygiene.bump_confidence("low") == "medium"
|
||||
assert wiki_hygiene.bump_confidence("medium") == "high"
|
||||
assert wiki_hygiene.bump_confidence("high") == "high"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Frontmatter repair
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestFrontmatterRepair:
|
||||
def test_adds_missing_confidence(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = tmp_wiki / "patterns" / "no-conf.md"
|
||||
path.write_text(
|
||||
"---\ntitle: No Confidence\ntype: pattern\n"
|
||||
"last_compiled: 2026-04-01\nlast_verified: 2026-04-01\n---\n"
|
||||
"# Body\n\nSubstantive content here for testing purposes.\n"
|
||||
)
|
||||
changes = wiki_hygiene.repair_frontmatter()
|
||||
assert any("confidence" in fields for _, fields in changes)
|
||||
|
||||
reparsed = wiki_hygiene.parse_page(path)
|
||||
assert reparsed.frontmatter["confidence"] == "medium"
|
||||
|
||||
def test_fixes_invalid_confidence(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_page(tmp_wiki, "patterns/bad-conf.md", confidence="wat")
|
||||
changes = wiki_hygiene.repair_frontmatter()
|
||||
assert any(p == path for p, _ in changes)
|
||||
|
||||
reparsed = wiki_hygiene.parse_page(path)
|
||||
assert reparsed.frontmatter["confidence"] == "medium"
|
||||
|
||||
def test_leaves_valid_pages_alone(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_page(tmp_wiki, "patterns/good.md")
|
||||
changes = wiki_hygiene.repair_frontmatter()
|
||||
assert changes == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Archive and restore round-trip
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestArchiveRestore:
|
||||
def test_archive_moves_file_and_updates_frontmatter(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_page(tmp_wiki, "patterns/doomed.md")
|
||||
page = wiki_hygiene.parse_page(path)
|
||||
|
||||
wiki_hygiene.archive_page(page, "test archive")
|
||||
|
||||
assert not path.exists()
|
||||
archived = tmp_wiki / "archive" / "patterns" / "doomed.md"
|
||||
assert archived.exists()
|
||||
|
||||
reparsed = wiki_hygiene.parse_page(archived)
|
||||
assert reparsed.frontmatter["archived_reason"] == "test archive"
|
||||
assert reparsed.frontmatter["original_path"] == "patterns/doomed.md"
|
||||
assert reparsed.frontmatter["confidence"] == "stale"
|
||||
|
||||
def test_restore_reverses_archive(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
original = make_page(tmp_wiki, "patterns/zombie.md")
|
||||
page = wiki_hygiene.parse_page(original)
|
||||
wiki_hygiene.archive_page(page, "test")
|
||||
|
||||
archived = tmp_wiki / "archive" / "patterns" / "zombie.md"
|
||||
archived_page = wiki_hygiene.parse_page(archived)
|
||||
wiki_hygiene.restore_page(archived_page)
|
||||
|
||||
assert original.exists()
|
||||
assert not archived.exists()
|
||||
|
||||
reparsed = wiki_hygiene.parse_page(original)
|
||||
assert reparsed.frontmatter["confidence"] == "medium"
|
||||
assert "archived_date" not in reparsed.frontmatter
|
||||
assert "archived_reason" not in reparsed.frontmatter
|
||||
assert "original_path" not in reparsed.frontmatter
|
||||
|
||||
def test_archive_rejects_non_live_pages(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
# Page outside the live content dirs — should refuse to archive
|
||||
weird = tmp_wiki / "raw" / "weird.md"
|
||||
weird.parent.mkdir(parents=True, exist_ok=True)
|
||||
weird.write_text("---\ntitle: Weird\n---\nBody\n")
|
||||
page = wiki_hygiene.parse_page(weird)
|
||||
result = wiki_hygiene.archive_page(page, "test")
|
||||
assert result is None
|
||||
|
||||
def test_archive_dry_run_does_not_move(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_page(tmp_wiki, "patterns/safe.md")
|
||||
page = wiki_hygiene.parse_page(path)
|
||||
wiki_hygiene.archive_page(page, "test", dry_run=True)
|
||||
assert path.exists()
|
||||
assert not (tmp_wiki / "archive" / "patterns" / "safe.md").exists()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Orphan detection
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestOrphanDetection:
|
||||
def test_finds_orphan_page(self, wiki_hygiene: Any, tmp_wiki: Path) -> None:
|
||||
make_page(tmp_wiki, "patterns/lonely.md")
|
||||
orphans = wiki_hygiene.find_orphan_pages()
|
||||
assert len(orphans) == 1
|
||||
assert orphans[0].path.stem == "lonely"
|
||||
|
||||
def test_page_referenced_in_index_is_not_orphan(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_page(tmp_wiki, "patterns/linked.md")
|
||||
idx = tmp_wiki / "index.md"
|
||||
idx.write_text(idx.read_text() + "- [Linked](patterns/linked.md) — desc\n")
|
||||
orphans = wiki_hygiene.find_orphan_pages()
|
||||
assert not any(p.path.stem == "linked" for p in orphans)
|
||||
|
||||
def test_page_referenced_in_related_is_not_orphan(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_page(tmp_wiki, "patterns/referenced.md")
|
||||
make_page(
|
||||
tmp_wiki,
|
||||
"patterns/referencer.md",
|
||||
related=["patterns/referenced.md"],
|
||||
)
|
||||
orphans = wiki_hygiene.find_orphan_pages()
|
||||
stems = {p.path.stem for p in orphans}
|
||||
assert "referenced" not in stems
|
||||
|
||||
def test_fix_orphan_adds_to_index(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_page(tmp_wiki, "patterns/orphan.md", title="Orphan Test")
|
||||
page = wiki_hygiene.parse_page(path)
|
||||
wiki_hygiene.fix_orphan_page(page)
|
||||
idx_text = (tmp_wiki / "index.md").read_text()
|
||||
assert "patterns/orphan.md" in idx_text
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Broken cross-references
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestBrokenCrossRefs:
|
||||
def test_detects_broken_link(self, wiki_hygiene: Any, tmp_wiki: Path) -> None:
|
||||
make_page(
|
||||
tmp_wiki,
|
||||
"patterns/source.md",
|
||||
body="See [nonexistent](patterns/does-not-exist.md) for details.\n",
|
||||
)
|
||||
broken = wiki_hygiene.find_broken_cross_refs()
|
||||
assert len(broken) == 1
|
||||
target, bad, suggested = broken[0]
|
||||
assert bad == "patterns/does-not-exist.md"
|
||||
|
||||
def test_fuzzy_match_finds_near_miss(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_page(tmp_wiki, "patterns/health-endpoint.md")
|
||||
make_page(
|
||||
tmp_wiki,
|
||||
"patterns/source.md",
|
||||
body="See [H](patterns/health-endpoints.md) — typo.\n",
|
||||
)
|
||||
broken = wiki_hygiene.find_broken_cross_refs()
|
||||
assert len(broken) >= 1
|
||||
_, bad, suggested = broken[0]
|
||||
assert suggested == "patterns/health-endpoint.md"
|
||||
|
||||
def test_fix_broken_xref(self, wiki_hygiene: Any, tmp_wiki: Path) -> None:
|
||||
make_page(tmp_wiki, "patterns/health-endpoint.md")
|
||||
src = make_page(
|
||||
tmp_wiki,
|
||||
"patterns/source.md",
|
||||
body="See [H](patterns/health-endpoints.md).\n",
|
||||
)
|
||||
broken = wiki_hygiene.find_broken_cross_refs()
|
||||
for target, bad, suggested in broken:
|
||||
wiki_hygiene.fix_broken_cross_ref(target, bad, suggested)
|
||||
text = src.read_text()
|
||||
assert "patterns/health-endpoints.md" not in text
|
||||
assert "patterns/health-endpoint.md" in text
|
||||
|
||||
def test_archived_link_triggers_restore(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
# Page in archive, referenced by a live page
|
||||
make_page(
|
||||
tmp_wiki,
|
||||
"archive/patterns/ghost.md",
|
||||
confidence="stale",
|
||||
extra_fm={
|
||||
"archived_date": "2026-01-01",
|
||||
"archived_reason": "test",
|
||||
"original_path": "patterns/ghost.md",
|
||||
},
|
||||
)
|
||||
make_page(
|
||||
tmp_wiki,
|
||||
"patterns/caller.md",
|
||||
body="See [ghost](patterns/ghost.md).\n",
|
||||
)
|
||||
broken = wiki_hygiene.find_broken_cross_refs()
|
||||
assert len(broken) >= 1
|
||||
for target, bad, suggested in broken:
|
||||
if suggested and suggested.startswith("__RESTORE__"):
|
||||
wiki_hygiene.fix_broken_cross_ref(target, bad, suggested)
|
||||
# After restore, ghost should be live again
|
||||
assert (tmp_wiki / "patterns" / "ghost.md").exists()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Index drift
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestIndexDrift:
|
||||
def test_finds_page_missing_from_index(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_page(tmp_wiki, "patterns/missing.md")
|
||||
missing, stale = wiki_hygiene.find_index_drift()
|
||||
assert "patterns/missing.md" in missing
|
||||
assert stale == []
|
||||
|
||||
def test_finds_stale_index_entry(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
idx = tmp_wiki / "index.md"
|
||||
idx.write_text(
|
||||
idx.read_text()
|
||||
+ "- [Ghost](patterns/ghost.md) — page that no longer exists\n"
|
||||
)
|
||||
missing, stale = wiki_hygiene.find_index_drift()
|
||||
assert "patterns/ghost.md" in stale
|
||||
|
||||
def test_fix_adds_missing_and_removes_stale(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_page(tmp_wiki, "patterns/new.md")
|
||||
idx = tmp_wiki / "index.md"
|
||||
idx.write_text(
|
||||
idx.read_text()
|
||||
+ "- [Gone](patterns/gone.md) — deleted page\n"
|
||||
)
|
||||
missing, stale = wiki_hygiene.find_index_drift()
|
||||
wiki_hygiene.fix_index_drift(missing, stale)
|
||||
idx_text = idx.read_text()
|
||||
assert "patterns/new.md" in idx_text
|
||||
assert "patterns/gone.md" not in idx_text
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Empty stubs
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestEmptyStubs:
|
||||
def test_flags_small_body(self, wiki_hygiene: Any, tmp_wiki: Path) -> None:
|
||||
make_page(tmp_wiki, "patterns/stub.md", body="# Stub\n\nShort.\n")
|
||||
stubs = wiki_hygiene.find_empty_stubs()
|
||||
assert len(stubs) == 1
|
||||
assert stubs[0].path.stem == "stub"
|
||||
|
||||
def test_ignores_substantive_pages(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
body = "# Full\n\n" + ("This is substantive content. " * 20) + "\n"
|
||||
make_page(tmp_wiki, "patterns/full.md", body=body)
|
||||
stubs = wiki_hygiene.find_empty_stubs()
|
||||
assert stubs == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Conversation refresh signals
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestConversationRefreshSignals:
|
||||
def test_picks_up_related_link(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_page(tmp_wiki, "patterns/hot.md", last_verified="2026-01-01")
|
||||
make_conversation(
|
||||
tmp_wiki,
|
||||
"test",
|
||||
"2026-04-11-abc.md",
|
||||
date="2026-04-11",
|
||||
related=["patterns/hot.md"],
|
||||
)
|
||||
refs = wiki_hygiene.scan_conversation_references()
|
||||
assert "patterns/hot.md" in refs
|
||||
assert refs["patterns/hot.md"] == date(2026, 4, 11)
|
||||
|
||||
def test_apply_refresh_updates_last_verified(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_page(tmp_wiki, "patterns/hot.md", last_verified="2026-01-01")
|
||||
make_conversation(
|
||||
tmp_wiki,
|
||||
"test",
|
||||
"2026-04-11-abc.md",
|
||||
date="2026-04-11",
|
||||
related=["patterns/hot.md"],
|
||||
)
|
||||
refs = wiki_hygiene.scan_conversation_references()
|
||||
changes = wiki_hygiene.apply_refresh_signals(refs)
|
||||
assert len(changes) == 1
|
||||
|
||||
reparsed = wiki_hygiene.parse_page(path)
|
||||
assert reparsed.frontmatter["last_verified"] == "2026-04-11"
|
||||
|
||||
def test_bumps_low_confidence_to_medium(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_page(
|
||||
tmp_wiki,
|
||||
"patterns/reviving.md",
|
||||
confidence="low",
|
||||
last_verified="2026-01-01",
|
||||
)
|
||||
make_conversation(
|
||||
tmp_wiki,
|
||||
"test",
|
||||
"2026-04-11-ref.md",
|
||||
date="2026-04-11",
|
||||
related=["patterns/reviving.md"],
|
||||
)
|
||||
refs = wiki_hygiene.scan_conversation_references()
|
||||
wiki_hygiene.apply_refresh_signals(refs)
|
||||
reparsed = wiki_hygiene.parse_page(path)
|
||||
assert reparsed.frontmatter["confidence"] == "medium"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Auto-restore
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestAutoRestore:
|
||||
def test_restores_page_referenced_in_conversation(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
# Archive a page
|
||||
path = make_page(tmp_wiki, "patterns/returning.md")
|
||||
page = wiki_hygiene.parse_page(path)
|
||||
wiki_hygiene.archive_page(page, "aging out")
|
||||
assert (tmp_wiki / "archive" / "patterns" / "returning.md").exists()
|
||||
|
||||
# Reference it in a conversation
|
||||
make_conversation(
|
||||
tmp_wiki,
|
||||
"test",
|
||||
"2026-04-12-ref.md",
|
||||
related=["patterns/returning.md"],
|
||||
)
|
||||
|
||||
# Auto-restore
|
||||
restored = wiki_hygiene.auto_restore_archived()
|
||||
assert len(restored) == 1
|
||||
assert (tmp_wiki / "patterns" / "returning.md").exists()
|
||||
assert not (tmp_wiki / "archive" / "patterns" / "returning.md").exists()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Staging / archive index sync
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestIndexSync:
|
||||
def test_staging_sync_regenerates_index(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_staging_page(tmp_wiki, "patterns/pending.md")
|
||||
changed = wiki_hygiene.sync_staging_index()
|
||||
assert changed is True
|
||||
text = (tmp_wiki / "staging" / "index.md").read_text()
|
||||
assert "pending.md" in text
|
||||
|
||||
def test_staging_sync_idempotent(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_staging_page(tmp_wiki, "patterns/pending.md")
|
||||
wiki_hygiene.sync_staging_index()
|
||||
changed_second = wiki_hygiene.sync_staging_index()
|
||||
assert changed_second is False
|
||||
|
||||
def test_archive_sync_regenerates_index(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_page(
|
||||
tmp_wiki,
|
||||
"archive/patterns/old.md",
|
||||
confidence="stale",
|
||||
extra_fm={
|
||||
"archived_date": "2026-01-01",
|
||||
"archived_reason": "test",
|
||||
"original_path": "patterns/old.md",
|
||||
},
|
||||
)
|
||||
changed = wiki_hygiene.sync_archive_index()
|
||||
assert changed is True
|
||||
text = (tmp_wiki / "archive" / "index.md").read_text()
|
||||
assert "old" in text.lower()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# State drift detection
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestStateDrift:
|
||||
def test_detects_missing_raw_file(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
import json
|
||||
state = {
|
||||
"harvested_urls": {
|
||||
"https://example.com": {
|
||||
"raw_file": "raw/harvested/missing.md",
|
||||
"wiki_pages": [],
|
||||
}
|
||||
}
|
||||
}
|
||||
(tmp_wiki / ".harvest-state.json").write_text(json.dumps(state))
|
||||
issues = wiki_hygiene.find_state_drift()
|
||||
assert any("missing.md" in i for i in issues)
|
||||
|
||||
def test_empty_state_has_no_drift(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
# Fixture already creates an empty .harvest-state.json
|
||||
issues = wiki_hygiene.find_state_drift()
|
||||
assert issues == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Hygiene state file
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestHygieneState:
|
||||
def test_load_returns_defaults_when_missing(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
state = wiki_hygiene.load_hygiene_state()
|
||||
assert state["last_quick_run"] is None
|
||||
assert state["pages_checked"] == {}
|
||||
|
||||
def test_save_and_reload(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
state = wiki_hygiene.load_hygiene_state()
|
||||
state["last_quick_run"] = "2026-04-12T00:00:00Z"
|
||||
wiki_hygiene.save_hygiene_state(state)
|
||||
|
||||
reloaded = wiki_hygiene.load_hygiene_state()
|
||||
assert reloaded["last_quick_run"] == "2026-04-12T00:00:00Z"
|
||||
|
||||
def test_mark_page_checked_stores_hash(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_page(tmp_wiki, "patterns/tracked.md")
|
||||
page = wiki_hygiene.parse_page(path)
|
||||
state = wiki_hygiene.load_hygiene_state()
|
||||
wiki_hygiene.mark_page_checked(state, page, "quick")
|
||||
entry = state["pages_checked"]["patterns/tracked.md"]
|
||||
assert entry["content_hash"].startswith("sha256:")
|
||||
assert "last_checked_quick" in entry
|
||||
|
||||
def test_page_changed_since_detects_body_change(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_page(tmp_wiki, "patterns/mutable.md", body="# One\n\nOne body.\n")
|
||||
page = wiki_hygiene.parse_page(path)
|
||||
state = wiki_hygiene.load_hygiene_state()
|
||||
wiki_hygiene.mark_page_checked(state, page, "quick")
|
||||
|
||||
assert not wiki_hygiene.page_changed_since(state, page, "quick")
|
||||
|
||||
# Mutate the body
|
||||
path.write_text(path.read_text().replace("One body", "Two body"))
|
||||
new_page = wiki_hygiene.parse_page(path)
|
||||
assert wiki_hygiene.page_changed_since(state, new_page, "quick")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Full quick-hygiene run end-to-end (dry-run, idempotent)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestRunQuickHygiene:
|
||||
def test_empty_wiki_produces_empty_report(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
report = wiki_hygiene.run_quick_hygiene(dry_run=True)
|
||||
assert report.backfilled == []
|
||||
assert report.archived == []
|
||||
|
||||
def test_real_run_is_idempotent(
|
||||
self, wiki_hygiene: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_page(tmp_wiki, "patterns/one.md")
|
||||
make_page(tmp_wiki, "patterns/two.md")
|
||||
|
||||
report1 = wiki_hygiene.run_quick_hygiene()
|
||||
# Second run should have 0 work
|
||||
report2 = wiki_hygiene.run_quick_hygiene()
|
||||
assert report2.backfilled == []
|
||||
assert report2.decayed == []
|
||||
assert report2.archived == []
|
||||
assert report2.frontmatter_fixes == []
|
||||
314
tests/test_wiki_lib.py
Normal file
314
tests/test_wiki_lib.py
Normal file
@@ -0,0 +1,314 @@
|
||||
"""Unit tests for scripts/wiki_lib.py — the shared frontmatter library."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import date
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import pytest
|
||||
|
||||
from conftest import make_page, make_staging_page
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# parse_yaml_lite
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestParseYamlLite:
|
||||
def test_simple_key_value(self, wiki_lib: Any) -> None:
|
||||
result = wiki_lib.parse_yaml_lite("title: Hello\ntype: pattern\n")
|
||||
assert result == {"title": "Hello", "type": "pattern"}
|
||||
|
||||
def test_quoted_values_are_stripped(self, wiki_lib: Any) -> None:
|
||||
result = wiki_lib.parse_yaml_lite('title: "Hello"\nother: \'World\'\n')
|
||||
assert result["title"] == "Hello"
|
||||
assert result["other"] == "World"
|
||||
|
||||
def test_inline_list(self, wiki_lib: Any) -> None:
|
||||
result = wiki_lib.parse_yaml_lite("tags: [a, b, c]\n")
|
||||
assert result["tags"] == ["a", "b", "c"]
|
||||
|
||||
def test_empty_inline_list(self, wiki_lib: Any) -> None:
|
||||
result = wiki_lib.parse_yaml_lite("sources: []\n")
|
||||
assert result["sources"] == []
|
||||
|
||||
def test_block_list(self, wiki_lib: Any) -> None:
|
||||
yaml = "related:\n - foo.md\n - bar.md\n - baz.md\n"
|
||||
result = wiki_lib.parse_yaml_lite(yaml)
|
||||
assert result["related"] == ["foo.md", "bar.md", "baz.md"]
|
||||
|
||||
def test_mixed_keys(self, wiki_lib: Any) -> None:
|
||||
yaml = (
|
||||
"title: Mixed\n"
|
||||
"type: pattern\n"
|
||||
"related:\n"
|
||||
" - one.md\n"
|
||||
" - two.md\n"
|
||||
"confidence: high\n"
|
||||
)
|
||||
result = wiki_lib.parse_yaml_lite(yaml)
|
||||
assert result["title"] == "Mixed"
|
||||
assert result["related"] == ["one.md", "two.md"]
|
||||
assert result["confidence"] == "high"
|
||||
|
||||
def test_empty_value(self, wiki_lib: Any) -> None:
|
||||
result = wiki_lib.parse_yaml_lite("empty: \n")
|
||||
assert result["empty"] == ""
|
||||
|
||||
def test_comment_lines_ignored(self, wiki_lib: Any) -> None:
|
||||
result = wiki_lib.parse_yaml_lite("# this is a comment\ntitle: X\n")
|
||||
assert result == {"title": "X"}
|
||||
|
||||
def test_blank_lines_ignored(self, wiki_lib: Any) -> None:
|
||||
result = wiki_lib.parse_yaml_lite("\ntitle: X\n\ntype: pattern\n\n")
|
||||
assert result == {"title": "X", "type": "pattern"}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# parse_page
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestParsePage:
|
||||
def test_parses_valid_page(self, wiki_lib: Any, tmp_wiki: Path) -> None:
|
||||
path = make_page(tmp_wiki, "patterns/foo.md", title="Foo", confidence="high")
|
||||
page = wiki_lib.parse_page(path)
|
||||
assert page is not None
|
||||
assert page.frontmatter["title"] == "Foo"
|
||||
assert page.frontmatter["confidence"] == "high"
|
||||
assert "# Content" in page.body
|
||||
|
||||
def test_returns_none_without_frontmatter(
|
||||
self, wiki_lib: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = tmp_wiki / "patterns" / "no-fm.md"
|
||||
path.write_text("# Just a body\n\nNo frontmatter.\n")
|
||||
assert wiki_lib.parse_page(path) is None
|
||||
|
||||
def test_returns_none_for_missing_file(self, wiki_lib: Any, tmp_wiki: Path) -> None:
|
||||
assert wiki_lib.parse_page(tmp_wiki / "nonexistent.md") is None
|
||||
|
||||
def test_returns_none_for_truncated_frontmatter(
|
||||
self, wiki_lib: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = tmp_wiki / "patterns" / "broken.md"
|
||||
path.write_text("---\ntitle: Broken\n# never closed\n")
|
||||
assert wiki_lib.parse_page(path) is None
|
||||
|
||||
def test_preserves_body_exactly(self, wiki_lib: Any, tmp_wiki: Path) -> None:
|
||||
body = "# Heading\n\nLine 1\nLine 2\n\n## Sub\n\nMore.\n"
|
||||
path = make_page(tmp_wiki, "patterns/body.md", body=body)
|
||||
page = wiki_lib.parse_page(path)
|
||||
assert page.body == body
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# serialize_frontmatter
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestSerializeFrontmatter:
|
||||
def test_preferred_key_order(self, wiki_lib: Any) -> None:
|
||||
fm = {
|
||||
"related": ["a.md"],
|
||||
"sources": ["raw/x.md"],
|
||||
"title": "T",
|
||||
"confidence": "high",
|
||||
"type": "pattern",
|
||||
}
|
||||
yaml = wiki_lib.serialize_frontmatter(fm)
|
||||
lines = yaml.split("\n")
|
||||
# title/type/confidence should come before sources/related
|
||||
assert lines[0].startswith("title:")
|
||||
assert lines[1].startswith("type:")
|
||||
assert lines[2].startswith("confidence:")
|
||||
assert "sources:" in yaml
|
||||
assert "related:" in yaml
|
||||
# sources must come before related (both are in PREFERRED_KEY_ORDER)
|
||||
assert yaml.index("sources:") < yaml.index("related:")
|
||||
|
||||
def test_list_formatted_as_block(self, wiki_lib: Any) -> None:
|
||||
fm = {"title": "T", "related": ["one.md", "two.md"]}
|
||||
yaml = wiki_lib.serialize_frontmatter(fm)
|
||||
assert "related:\n - one.md\n - two.md" in yaml
|
||||
|
||||
def test_empty_list(self, wiki_lib: Any) -> None:
|
||||
fm = {"title": "T", "sources": []}
|
||||
yaml = wiki_lib.serialize_frontmatter(fm)
|
||||
assert "sources: []" in yaml
|
||||
|
||||
def test_unknown_keys_appear_alphabetically_at_end(self, wiki_lib: Any) -> None:
|
||||
fm = {"title": "T", "type": "pattern", "zoo": "z", "alpha": "a"}
|
||||
yaml = wiki_lib.serialize_frontmatter(fm)
|
||||
# alpha should come before zoo (alphabetical)
|
||||
assert yaml.index("alpha:") < yaml.index("zoo:")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Round-trip: parse_page → write_page → parse_page
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestRoundTrip:
|
||||
def test_round_trip_preserves_core_fields(
|
||||
self, wiki_lib: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_page(
|
||||
tmp_wiki,
|
||||
"patterns/rt.md",
|
||||
title="Round Trip",
|
||||
sources=["raw/a.md", "raw/b.md"],
|
||||
related=["patterns/other.md"],
|
||||
)
|
||||
page1 = wiki_lib.parse_page(path)
|
||||
wiki_lib.write_page(page1)
|
||||
page2 = wiki_lib.parse_page(path)
|
||||
assert page2.frontmatter["title"] == "Round Trip"
|
||||
assert page2.frontmatter["sources"] == ["raw/a.md", "raw/b.md"]
|
||||
assert page2.frontmatter["related"] == ["patterns/other.md"]
|
||||
assert page2.body == page1.body
|
||||
|
||||
def test_round_trip_preserves_mutation(
|
||||
self, wiki_lib: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_page(tmp_wiki, "patterns/rt.md", confidence="high")
|
||||
page = wiki_lib.parse_page(path)
|
||||
page.frontmatter["confidence"] = "low"
|
||||
wiki_lib.write_page(page)
|
||||
page2 = wiki_lib.parse_page(path)
|
||||
assert page2.frontmatter["confidence"] == "low"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# parse_date
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestParseDate:
|
||||
def test_iso_format(self, wiki_lib: Any) -> None:
|
||||
assert wiki_lib.parse_date("2026-04-10") == date(2026, 4, 10)
|
||||
|
||||
def test_empty_string_returns_none(self, wiki_lib: Any) -> None:
|
||||
assert wiki_lib.parse_date("") is None
|
||||
|
||||
def test_none_returns_none(self, wiki_lib: Any) -> None:
|
||||
assert wiki_lib.parse_date(None) is None
|
||||
|
||||
def test_invalid_format_returns_none(self, wiki_lib: Any) -> None:
|
||||
assert wiki_lib.parse_date("not-a-date") is None
|
||||
assert wiki_lib.parse_date("2026/04/10") is None
|
||||
assert wiki_lib.parse_date("04-10-2026") is None
|
||||
|
||||
def test_date_object_passthrough(self, wiki_lib: Any) -> None:
|
||||
d = date(2026, 4, 10)
|
||||
assert wiki_lib.parse_date(d) == d
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# page_content_hash
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestPageContentHash:
|
||||
def test_deterministic(self, wiki_lib: Any, tmp_wiki: Path) -> None:
|
||||
path = make_page(tmp_wiki, "patterns/h.md", body="# Same body\n\nLine.\n")
|
||||
page = wiki_lib.parse_page(path)
|
||||
h1 = wiki_lib.page_content_hash(page)
|
||||
h2 = wiki_lib.page_content_hash(page)
|
||||
assert h1 == h2
|
||||
assert h1.startswith("sha256:")
|
||||
|
||||
def test_different_bodies_yield_different_hashes(
|
||||
self, wiki_lib: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
p1 = make_page(tmp_wiki, "patterns/a.md", body="# A\n\nAlpha.\n")
|
||||
p2 = make_page(tmp_wiki, "patterns/b.md", body="# B\n\nBeta.\n")
|
||||
h1 = wiki_lib.page_content_hash(wiki_lib.parse_page(p1))
|
||||
h2 = wiki_lib.page_content_hash(wiki_lib.parse_page(p2))
|
||||
assert h1 != h2
|
||||
|
||||
def test_frontmatter_changes_dont_change_hash(
|
||||
self, wiki_lib: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
"""Hash is body-only so mechanical frontmatter fixes don't churn it."""
|
||||
path = make_page(tmp_wiki, "patterns/f.md", confidence="high")
|
||||
page = wiki_lib.parse_page(path)
|
||||
h1 = wiki_lib.page_content_hash(page)
|
||||
|
||||
page.frontmatter["confidence"] = "medium"
|
||||
wiki_lib.write_page(page)
|
||||
page2 = wiki_lib.parse_page(path)
|
||||
h2 = wiki_lib.page_content_hash(page2)
|
||||
assert h1 == h2
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Iterators
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestIterators:
|
||||
def test_iter_live_pages_finds_all_types(
|
||||
self, wiki_lib: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_page(tmp_wiki, "patterns/p1.md")
|
||||
make_page(tmp_wiki, "patterns/p2.md")
|
||||
make_page(tmp_wiki, "decisions/d1.md")
|
||||
make_page(tmp_wiki, "concepts/c1.md")
|
||||
make_page(tmp_wiki, "environments/e1.md")
|
||||
pages = wiki_lib.iter_live_pages()
|
||||
assert len(pages) == 5
|
||||
stems = {p.path.stem for p in pages}
|
||||
assert stems == {"p1", "p2", "d1", "c1", "e1"}
|
||||
|
||||
def test_iter_live_pages_empty_wiki(
|
||||
self, wiki_lib: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
assert wiki_lib.iter_live_pages() == []
|
||||
|
||||
def test_iter_staging_pages(self, wiki_lib: Any, tmp_wiki: Path) -> None:
|
||||
make_staging_page(tmp_wiki, "patterns/s1.md")
|
||||
make_staging_page(tmp_wiki, "decisions/s2.md", ptype="decision")
|
||||
pages = wiki_lib.iter_staging_pages()
|
||||
assert len(pages) == 2
|
||||
assert all(p.frontmatter.get("status") == "pending" for p in pages)
|
||||
|
||||
def test_iter_archived_pages(self, wiki_lib: Any, tmp_wiki: Path) -> None:
|
||||
make_page(
|
||||
tmp_wiki,
|
||||
"archive/patterns/old.md",
|
||||
confidence="stale",
|
||||
extra_fm={
|
||||
"archived_date": "2026-01-01",
|
||||
"archived_reason": "test",
|
||||
"original_path": "patterns/old.md",
|
||||
},
|
||||
)
|
||||
pages = wiki_lib.iter_archived_pages()
|
||||
assert len(pages) == 1
|
||||
assert pages[0].frontmatter["archived_reason"] == "test"
|
||||
|
||||
def test_iter_skips_malformed_pages(
|
||||
self, wiki_lib: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_page(tmp_wiki, "patterns/good.md")
|
||||
(tmp_wiki / "patterns" / "no-fm.md").write_text("# Just a body\n")
|
||||
pages = wiki_lib.iter_live_pages()
|
||||
assert len(pages) == 1
|
||||
assert pages[0].path.stem == "good"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# WIKI_DIR env var override
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestWikiDirEnvVar:
|
||||
def test_honors_env_var(self, wiki_lib: Any, tmp_wiki: Path) -> None:
|
||||
"""The tmp_wiki fixture sets WIKI_DIR — verify wiki_lib picks it up."""
|
||||
assert wiki_lib.WIKI_DIR == tmp_wiki
|
||||
assert wiki_lib.STAGING_DIR == tmp_wiki / "staging"
|
||||
assert wiki_lib.ARCHIVE_DIR == tmp_wiki / "archive"
|
||||
assert wiki_lib.INDEX_FILE == tmp_wiki / "index.md"
|
||||
267
tests/test_wiki_staging.py
Normal file
267
tests/test_wiki_staging.py
Normal file
@@ -0,0 +1,267 @@
|
||||
"""Integration tests for scripts/wiki-staging.py."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import pytest
|
||||
|
||||
from conftest import make_page, make_staging_page
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# List + page_summary
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestListPending:
|
||||
def test_empty_staging(self, wiki_staging: Any, tmp_wiki: Path) -> None:
|
||||
assert wiki_staging.list_pending() == []
|
||||
|
||||
def test_finds_pages_in_all_type_subdirs(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_staging_page(tmp_wiki, "patterns/p.md", ptype="pattern")
|
||||
make_staging_page(tmp_wiki, "decisions/d.md", ptype="decision")
|
||||
make_staging_page(tmp_wiki, "concepts/c.md", ptype="concept")
|
||||
pending = wiki_staging.list_pending()
|
||||
assert len(pending) == 3
|
||||
|
||||
def test_skips_staging_index_md(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
(tmp_wiki / "staging" / "index.md").write_text(
|
||||
"---\ntitle: Index\n---\n# staging index\n"
|
||||
)
|
||||
make_staging_page(tmp_wiki, "patterns/real.md")
|
||||
pending = wiki_staging.list_pending()
|
||||
assert len(pending) == 1
|
||||
assert pending[0].path.stem == "real"
|
||||
|
||||
def test_page_summary_populates_all_fields(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_staging_page(
|
||||
tmp_wiki,
|
||||
"patterns/sample.md",
|
||||
title="Sample",
|
||||
staged_by="wiki-harvest",
|
||||
staged_date="2026-04-10",
|
||||
target_path="patterns/sample.md",
|
||||
)
|
||||
pending = wiki_staging.list_pending()
|
||||
summary = wiki_staging.page_summary(pending[0])
|
||||
assert summary["title"] == "Sample"
|
||||
assert summary["type"] == "pattern"
|
||||
assert summary["staged_by"] == "wiki-harvest"
|
||||
assert summary["target_path"] == "patterns/sample.md"
|
||||
assert summary["modifies"] is None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Promote
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestPromote:
|
||||
def test_moves_file_to_live(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_staging_page(tmp_wiki, "patterns/new.md", title="New Page")
|
||||
page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "new.md")
|
||||
result = wiki_staging.promote(page)
|
||||
assert result is not None
|
||||
assert (tmp_wiki / "patterns" / "new.md").exists()
|
||||
assert not (tmp_wiki / "staging" / "patterns" / "new.md").exists()
|
||||
|
||||
def test_strips_staging_only_fields(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_staging_page(tmp_wiki, "patterns/clean.md")
|
||||
page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "clean.md")
|
||||
wiki_staging.promote(page)
|
||||
|
||||
promoted = wiki_staging.parse_page(tmp_wiki / "patterns" / "clean.md")
|
||||
for field in ("status", "staged_date", "staged_by", "target_path", "compilation_notes"):
|
||||
assert field not in promoted.frontmatter
|
||||
|
||||
def test_preserves_origin_automated(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_staging_page(tmp_wiki, "patterns/auto.md")
|
||||
page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "auto.md")
|
||||
wiki_staging.promote(page)
|
||||
promoted = wiki_staging.parse_page(tmp_wiki / "patterns" / "auto.md")
|
||||
assert promoted.frontmatter["origin"] == "automated"
|
||||
|
||||
def test_updates_main_index(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_staging_page(tmp_wiki, "patterns/indexed.md", title="Indexed Page")
|
||||
page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "indexed.md")
|
||||
wiki_staging.promote(page)
|
||||
|
||||
idx = (tmp_wiki / "index.md").read_text()
|
||||
assert "patterns/indexed.md" in idx
|
||||
|
||||
def test_regenerates_staging_index(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_staging_page(tmp_wiki, "patterns/one.md")
|
||||
make_staging_page(tmp_wiki, "patterns/two.md")
|
||||
page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "one.md")
|
||||
wiki_staging.promote(page)
|
||||
|
||||
idx = (tmp_wiki / "staging" / "index.md").read_text()
|
||||
assert "two.md" in idx
|
||||
assert "1 pending" in idx
|
||||
|
||||
def test_dry_run_does_not_move(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_staging_page(tmp_wiki, "patterns/safe.md")
|
||||
page = wiki_staging.parse_page(tmp_wiki / "staging" / "patterns" / "safe.md")
|
||||
wiki_staging.promote(page, dry_run=True)
|
||||
assert (tmp_wiki / "staging" / "patterns" / "safe.md").exists()
|
||||
assert not (tmp_wiki / "patterns" / "safe.md").exists()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Promote with modifies field
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestPromoteUpdate:
|
||||
def test_update_overwrites_existing_live_page(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
# Existing live page
|
||||
make_page(
|
||||
tmp_wiki,
|
||||
"patterns/existing.md",
|
||||
title="Old Title",
|
||||
last_compiled="2026-01-01",
|
||||
)
|
||||
# Staging update with `modifies`
|
||||
make_staging_page(
|
||||
tmp_wiki,
|
||||
"patterns/existing.md",
|
||||
title="New Title",
|
||||
modifies="patterns/existing.md",
|
||||
target_path="patterns/existing.md",
|
||||
)
|
||||
page = wiki_staging.parse_page(
|
||||
tmp_wiki / "staging" / "patterns" / "existing.md"
|
||||
)
|
||||
wiki_staging.promote(page)
|
||||
|
||||
live = wiki_staging.parse_page(tmp_wiki / "patterns" / "existing.md")
|
||||
assert live.frontmatter["title"] == "New Title"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Reject
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestReject:
|
||||
def test_deletes_file(self, wiki_staging: Any, tmp_wiki: Path) -> None:
|
||||
path = make_staging_page(tmp_wiki, "patterns/bad.md")
|
||||
page = wiki_staging.parse_page(path)
|
||||
wiki_staging.reject(page, "duplicate")
|
||||
assert not path.exists()
|
||||
|
||||
def test_records_rejection_in_harvest_state(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
# Create a raw harvested file with a source_url
|
||||
raw = tmp_wiki / "raw" / "harvested" / "example-com-test.md"
|
||||
raw.parent.mkdir(parents=True, exist_ok=True)
|
||||
raw.write_text(
|
||||
"---\n"
|
||||
"source_url: https://example.com/test\n"
|
||||
"fetched_date: 2026-04-10\n"
|
||||
"fetch_method: trafilatura\n"
|
||||
"discovered_in: conversations/mc/test.md\n"
|
||||
"content_hash: sha256:abc\n"
|
||||
"---\n"
|
||||
"# Example\n"
|
||||
)
|
||||
|
||||
# Create a staging page that references it
|
||||
make_staging_page(tmp_wiki, "patterns/reject-me.md")
|
||||
staging_path = tmp_wiki / "staging" / "patterns" / "reject-me.md"
|
||||
# Inject sources so reject() finds the harvest_source
|
||||
page = wiki_staging.parse_page(staging_path)
|
||||
page.frontmatter["sources"] = ["raw/harvested/example-com-test.md"]
|
||||
wiki_staging.write_page(page)
|
||||
|
||||
page = wiki_staging.parse_page(staging_path)
|
||||
wiki_staging.reject(page, "test rejection")
|
||||
|
||||
state = json.loads((tmp_wiki / ".harvest-state.json").read_text())
|
||||
assert "https://example.com/test" in state["rejected_urls"]
|
||||
assert state["rejected_urls"]["https://example.com/test"]["reason"] == "test rejection"
|
||||
|
||||
def test_reject_dry_run_keeps_file(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
path = make_staging_page(tmp_wiki, "patterns/kept.md")
|
||||
page = wiki_staging.parse_page(path)
|
||||
wiki_staging.reject(page, "test", dry_run=True)
|
||||
assert path.exists()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Staging index regeneration
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestStagingIndexRegen:
|
||||
def test_empty_index_shows_none(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
wiki_staging.regenerate_staging_index()
|
||||
idx = (tmp_wiki / "staging" / "index.md").read_text()
|
||||
assert "0 pending" in idx
|
||||
assert "No pending items" in idx
|
||||
|
||||
def test_lists_pending_items(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_staging_page(tmp_wiki, "patterns/a.md", title="A")
|
||||
make_staging_page(tmp_wiki, "decisions/b.md", title="B", ptype="decision")
|
||||
wiki_staging.regenerate_staging_index()
|
||||
idx = (tmp_wiki / "staging" / "index.md").read_text()
|
||||
assert "2 pending" in idx
|
||||
assert "A" in idx and "B" in idx
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Path resolution
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestResolvePage:
|
||||
def test_resolves_staging_relative_path(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_staging_page(tmp_wiki, "patterns/foo.md")
|
||||
page = wiki_staging.resolve_page("staging/patterns/foo.md")
|
||||
assert page is not None
|
||||
assert page.path.name == "foo.md"
|
||||
|
||||
def test_returns_none_for_missing(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
assert wiki_staging.resolve_page("staging/patterns/does-not-exist.md") is None
|
||||
|
||||
def test_resolves_bare_patterns_path_as_staging(
|
||||
self, wiki_staging: Any, tmp_wiki: Path
|
||||
) -> None:
|
||||
make_staging_page(tmp_wiki, "patterns/bare.md")
|
||||
page = wiki_staging.resolve_page("patterns/bare.md")
|
||||
assert page is not None
|
||||
assert "staging" in str(page.path)
|
||||
Reference in New Issue
Block a user