Operations Checklist
Operations checklist
Verify shell health first, then follow the core operator path. Use the runbook or endpoint map when something stops working.
Core operator modes
- Healthy: project pipeline state updates and event stream are current.
- Degraded: health is up but event stream/pipeline updates are delayed.
- Blocked: missing permissions, budget hard stop, or unresolved backend contract gaps.
Pre-check before shift
- Open dashboard and verify health status.
- Open projects and confirm at least one project loads.
- Open one project and verify pipeline state endpoint responds.
- Open endpoint map and verify deployment contract version.
Core operator flow
- Start triage: open files route (
/projects/[projectId]/files), upload files, and verify pipeline entersrunningphasetriage. - Review plan: open plan route (
/projects/[projectId]/plan), commit/resume, and verify plan version increments. - Run evaluation: open calibration route (
/projects/[projectId]/runs/[runId]/calibration), launch full run, and verify redirect to live route with new run id. - Review results: open results route (
/projects/[projectId]/runs/[runId]/results) and inspect filters, evidence, and applicant receipts.
Incident triage checklist
- Capture project id, run id, route, request id, timestamp.
- Check run state, stream/events, and budget endpoints.
- Classify as auth, validation, transient infra, budget stop, or orchestration failure.
Escalation path
1. Start here for the core operator sequence.
2. Endpoint map — contract or permissions failures.
3. Runbook — route-specific recovery.
GET /api/v1/projects/{project_id}/runs/{run_id}/stateGET /api/v1/projects/{project_id}/runs/{run_id}/events/streamGET /api/v1/projects/{project_id}/budget?run_id={run_id}
Troubleshooting matrix
- 401/403: verify user role and org membership.
- 409: plan version conflict; reload plan and retry.
- 429: throttled or budget-limited; wait or request approval.
- 5xx: capture request id and escalate.
- stale UI: compare event freshness against SLO targets.
- cancelled run: confirm cancellation reason and route to run history/new run action.
Release readiness artifacts
- Troubleshooting runbook
- Release readiness templates:
docs/v2-architecture/40-frontend/operations/05-RELEASE-READINESS-TEMPLATES.md