Contributing¶
rlwatch is a monitoring library — if it has bugs, it costs someone a GPU budget. The test harness is the most load-bearing part of the repo. Two documents cover the contract every PR has to meet:
- CLAUDE.md — the authoritative spec. Cardinal rules, the five-tier test harness contract, code style, repository conventions. Read this first.
- TESTING.md — the practical "how to run, write, and debug tests" guide. Tier-by-tier breakdown, pytest invocation cookbook, "how to write a new detector test" checklist, "how to add a regression fixture" workflow, flaky-test policy.
Quickstart for a typical change¶
git clone https://github.com/varun1724/rlwatch
cd rlwatch
pip install -e ".[dev]"
# Make your change
# ...
pytest -v # all five tiers green
pytest --cov=rlwatch --cov-fail-under=90 # coverage gate
git commit -m "..."
git push
CI runs the same checks plus a few more (the cardinal-rule-#1 smoke test, the forbidden-pattern grep, the TRL integration test under the [trl] extra). All five tier jobs must be green for a PR to merge.
What every PR must include¶
- Unit tests for any new branch in the library code
- Integration tests if the change touches CLI, alert delivery, storage schema, or framework integration
- A simulation fixture if the change fixes or introduces a failure-mode detection (this is the regression moat — see TESTING.md)
- A benchmark if the change touches the hot path
- A
CHANGELOG.mdentry under the appropriate[Unreleased]category
What will get bounced¶
- Mocking
DetectorSuitein a test forRLWatch.log_step - Asserting on exact alert messages (assert on
alert.detectorandalert.severityinstead — messages will change) - Tests that touch the real filesystem outside
tmp_path - Tests that hit the real network
- Tests that depend on wall-clock time
- Flaky tests "fixed" with retries
- Network calls outside
src/rlwatch/alerts.py(CI greps for this)
See CLAUDE.md's "Anti-patterns to refuse" section for the full list with rationale.