memex/tests/README.md

# Wiki Pipeline Test Suite

Pytest-based test suite covering all 11 scripts in `scripts/`. Runs on both
macOS and Linux/WSL, uses only the Python standard library + pytest.

## Running

```bash
# Full suite (from wiki root)
bash tests/run.sh

# Single test file
bash tests/run.sh test_wiki_lib.py

# Single test class or function
bash tests/run.sh test_wiki_hygiene.py::TestArchiveRestore
bash tests/run.sh test_wiki_hygiene.py::TestArchiveRestore::test_restore_reverses_archive

# Pattern matching
bash tests/run.sh -k "archive"

# Verbose
bash tests/run.sh -v

# Stop on first failure
bash tests/run.sh -x

# Or invoke pytest directly from the tests dir
cd tests && python3 -m pytest -v
```

## What's tested

| File | Coverage |
|------|----------|
| `test_wiki_lib.py` | YAML parser, frontmatter round-trip, page iterators, date parsing, content hashing, WIKI_DIR env override |
| `test_wiki_hygiene.py` | Backfill, confidence decay math, frontmatter repair, archive/restore round-trip, orphan detection, broken-xref fuzzy matching, index drift, empty stubs, conversation refresh signals, auto-restore, staging/archive sync, state drift, hygiene state file, full quick-run idempotency |
| `test_wiki_staging.py` | List, promote, reject, promote-with-modifies, dry-run, staging index regeneration, path resolution |
| `test_wiki_harvest.py` | URL classification (harvest/check/skip), private IP detection, URL extraction + filtering, filename derivation, content validation, state management, raw file writing, dry-run CLI smoke test |
| `test_conversation_pipeline.py` | CLI smoke tests for extract-sessions, summarize-conversations, update-conversation-index; dry-run behavior; help flags; integration test with fake conversation files |
| `test_shell_scripts.py` | wiki-maintain.sh / mine-conversations.sh / wiki-sync.sh: help, dry-run, mutex flags, bash syntax check, strict-mode check, shebang check, py_compile for all .py scripts |

## How it works

**Isolation**: Every test runs against a disposable `tmp_wiki` fixture
(pytest `tmp_path`). The fixture sets the `WIKI_DIR` environment variable
so all scripts resolve paths against the tmp directory instead of the real
wiki. No test ever touches `~/projects/wiki`.

**Hyphenated filenames**: Scripts like `wiki-harvest.py` use hyphens, which
Python's `import` can't handle directly. `conftest.py` has a
`_load_script_module` helper that loads a script file by path and exposes
it as a module object.

**Clean module state**: Each test that loads a module clears any cached
import first, so `WIKI_DIR` env overrides take effect correctly between
tests.

**Subprocess tests** (for CLI smoke tests): `conftest.py` provides a
`run_script` fixture that invokes a script via `python3` or `bash` with
`WIKI_DIR` set to the tmp wiki. Uses `subprocess.run` with `capture_output`
and a timeout.

## Cross-platform

- `#!/usr/bin/env bash` shebangs (tested explicitly)
- `set -euo pipefail` in all shell scripts (tested explicitly)
- `bash -n` syntax check on all shell scripts
- `py_compile` on all Python scripts
- Uses `pathlib` everywhere — no hardcoded path separators
- Uses the Python stdlib only (except pytest itself)

## Requirements

- Python 3.11+
- `pytest` — install with `pip install --user pytest` or your distro's package manager
- `bash` (any version — scripts use only portable features)

The tests do NOT require:
- `claude` CLI (mocked / skipped)
- `trafilatura` or `crawl4ai` (only dry-run / classification paths tested)
- `qmd` (reindex phase is skipped in tests)
- Network access
- The real `~/projects/wiki` or `~/.claude/projects` directories

## Speed

Full suite runs in **~1 second** on a modern laptop. All tests are isolated
and independent so they can run in any order and in parallel.

## What's NOT tested

- **Real LLM calls** (`claude -p`): too expensive, non-deterministic.
  Tested: CLI parsing, dry-run paths, mocked error handling.
- **Real web fetches** (trafilatura/crawl4ai): too slow, non-deterministic.
  Tested: URL classification, filter logic, fetch-result validation.
- **Real git operations** (wiki-sync.sh): requires a git repo fixture.
  Tested: script loads, handles non-git dir gracefully, --status exits clean.
- **Real qmd indexing**: tested elsewhere via `qmd collection list` in the
  setup verification step.
- **Real Claude Code session JSONL parsing** with actual sessions: would
  require fixture JSONL files. Tested: CLI parsing, empty-dir behavior,
  `CLAUDE_PROJECTS_DIR` env override.

These are smoke-tested end-to-end via the integration tests in
`test_conversation_pipeline.py` and the dry-run paths in
`test_shell_scripts.py::TestWikiMaintainSh`.