rlwatch¶

Catch broken RL training runs before they waste your GPU budget.

If you train language models with GRPO or PPO, you already know the pain: you kick off a run on 8 H100s, go to sleep, and wake up to find the policy collapsed into repeating the same token 12 hours ago. Nobody saw it. Nothing paged. The run just quietly rotted.

rlwatch is a tiny Python library that watches your training metrics in real time and pings you on Slack, Discord, email, or any HTTP endpoint the moment things start going wrong — before the run is ruined.

The 30-second pitch¶

pip install rlwatch
Add two lines to your training script:
```
import rlwatch
rlwatch.attach()
```
Keep training. If something breaks, you get a message like:

🚨 rlwatch CRITICAL: entropy_collapse Run: grpo_v3_exp12 | Step: 340 Policy entropy dropped from 2.8 to 0.4 over 50 steps (threshold: 1.0). Recommended action: reduce learning rate by 5× or increase KL penalty.

You open the dashboard, confirm the curve, kill the run, fix the config, and you've just saved ~30 GPU-hours.

What's in these docs¶

Page	What it covers
Getting started	Install, two-line attach, see your first alert fire
Detectors	Every detector — what it watches for, default thresholds, when to tune them
Configuration	YAML schema, environment variables, resolution order
CLI	`rlwatch init / runs / diagnose / dashboard`
Alerts	Slack, email, Discord, generic webhook — setup and payload formats
TRL + GRPO end-to-end tutorial	Catch a real entropy collapse on a real GPT-2 + TRL GRPO run in under 5 minutes on a laptop CPU
FAQ	Does it work offline? Does it upload anything? Why no telemetry?
Contributing	The development workflow and the testing harness contract

Project direction¶

rlwatch is heading toward a hosted, team-oriented product. The local-first open-source library will stay free and useful on its own. See ROADMAP.md on GitHub for the full plan.