# Wiki Pipeline Test Suite Pytest-based test suite covering all 11 scripts in `scripts/`. Runs on both macOS and Linux/WSL, uses only the Python standard library + pytest. ## Running ```bash # Full suite (from wiki root) bash tests/run.sh # Single test file bash tests/run.sh test_wiki_lib.py # Single test class or function bash tests/run.sh test_wiki_hygiene.py::TestArchiveRestore bash tests/run.sh test_wiki_hygiene.py::TestArchiveRestore::test_restore_reverses_archive # Pattern matching bash tests/run.sh -k "archive" # Verbose bash tests/run.sh -v # Stop on first failure bash tests/run.sh -x # Or invoke pytest directly from the tests dir cd tests && python3 -m pytest -v ``` ## What's tested | File | Coverage | |------|----------| | `test_wiki_lib.py` | YAML parser, frontmatter round-trip, page iterators, date parsing, content hashing, WIKI_DIR env override | | `test_wiki_hygiene.py` | Backfill, confidence decay math, frontmatter repair, archive/restore round-trip, orphan detection, broken-xref fuzzy matching, index drift, empty stubs, conversation refresh signals, auto-restore, staging/archive sync, state drift, hygiene state file, full quick-run idempotency | | `test_wiki_staging.py` | List, promote, reject, promote-with-modifies, dry-run, staging index regeneration, path resolution | | `test_wiki_harvest.py` | URL classification (harvest/check/skip), private IP detection, URL extraction + filtering, filename derivation, content validation, state management, raw file writing, dry-run CLI smoke test | | `test_conversation_pipeline.py` | CLI smoke tests for extract-sessions, summarize-conversations, update-conversation-index; dry-run behavior; help flags; integration test with fake conversation files | | `test_shell_scripts.py` | wiki-maintain.sh / mine-conversations.sh / wiki-sync.sh: help, dry-run, mutex flags, bash syntax check, strict-mode check, shebang check, py_compile for all .py scripts | ## How it works **Isolation**: Every test runs against a disposable `tmp_wiki` fixture (pytest `tmp_path`). The fixture sets the `WIKI_DIR` environment variable so all scripts resolve paths against the tmp directory instead of the real wiki. No test ever touches `~/projects/wiki`. **Hyphenated filenames**: Scripts like `wiki-harvest.py` use hyphens, which Python's `import` can't handle directly. `conftest.py` has a `_load_script_module` helper that loads a script file by path and exposes it as a module object. **Clean module state**: Each test that loads a module clears any cached import first, so `WIKI_DIR` env overrides take effect correctly between tests. **Subprocess tests** (for CLI smoke tests): `conftest.py` provides a `run_script` fixture that invokes a script via `python3` or `bash` with `WIKI_DIR` set to the tmp wiki. Uses `subprocess.run` with `capture_output` and a timeout. ## Cross-platform - `#!/usr/bin/env bash` shebangs (tested explicitly) - `set -euo pipefail` in all shell scripts (tested explicitly) - `bash -n` syntax check on all shell scripts - `py_compile` on all Python scripts - Uses `pathlib` everywhere — no hardcoded path separators - Uses the Python stdlib only (except pytest itself) ## Requirements - Python 3.11+ - `pytest` — install with `pip install --user pytest` or your distro's package manager - `bash` (any version — scripts use only portable features) The tests do NOT require: - `claude` CLI (mocked / skipped) - `trafilatura` or `crawl4ai` (only dry-run / classification paths tested) - `qmd` (reindex phase is skipped in tests) - Network access - The real `~/projects/wiki` or `~/.claude/projects` directories ## Speed Full suite runs in **~1 second** on a modern laptop. All tests are isolated and independent so they can run in any order and in parallel. ## What's NOT tested - **Real LLM calls** (`claude -p`): too expensive, non-deterministic. Tested: CLI parsing, dry-run paths, mocked error handling. - **Real web fetches** (trafilatura/crawl4ai): too slow, non-deterministic. Tested: URL classification, filter logic, fetch-result validation. - **Real git operations** (wiki-sync.sh): requires a git repo fixture. Tested: script loads, handles non-git dir gracefully, --status exits clean. - **Real qmd indexing**: tested elsewhere via `qmd collection list` in the setup verification step. - **Real Claude Code session JSONL parsing** with actual sessions: would require fixture JSONL files. Tested: CLI parsing, empty-dir behavior, `CLAUDE_PROJECTS_DIR` env override. These are smoke-tested end-to-end via the integration tests in `test_conversation_pipeline.py` and the dry-run paths in `test_shell_scripts.py::TestWikiMaintainSh`.