Orchestration
Pause & recovery
Pausing work, surviving rate limits, and reclaiming orphaned runs.
Orchestration is interruptible and self-healing: a plan can be paused at any time, providers that hit rate limits back off instead of failing, and runs orphaned by a dead session are reclaimed automatically.
Pause & resume
-
Pausing a plan (
status = paused) halts all new dispatch, auto-review and merge until it is resumed. -
A task paused mid-flight moves to
waiting_for_resume; Resume returns it todraftfor clean re-dispatch. -
A provider that hits a rate limit pauses its run via
POST /api/v1/cli-runs/{uuid}/pause(with a resume time), instead of burning the task as failed.
Recovery
- Dead-session reclaim — runs left by a vanished session are re-emitted to the spawn queue.
- On lead connect/resume, orphaned queued runs are recovered and re-dispatched.
- A spawn loop guard + in-flight cap stop a lead from fanning out unboundedly.
- Idle workers (no chunk for ~45s) are pruned from the lead digest; the SessionEnd hook reaps spawned worker processes by pid.
-
The reject → revision loop is capped at 3 cycles (
REVISION_CAP); past that an operator must intervene.