Skip to content

When to run evals

Agent evals are most useful when output quality matters as much as code correctness.

Good moments to run them

On pull requests

Run evals in CI for PRs that change prompts, agent logic, tools, retrieval, or model settings.

Why:

  • Catch behavior regressions before merge
  • Keep expected quality stable over time
  • Make prompt changes reviewable with pass/fail feedback

Before releases

Run the full eval suite before deployments (for example tests/AgentEvals).

Why:

  • Validate critical flows after dependency/model updates
  • Confirm outputs still meet product expectations

During prompt iteration

Run a subset locally while editing prompt logic.

Why:

  • Tight feedback loop for faster prompt tuning
  • Immediate signal when a change breaks a known behavior

Suggested cadence

  • Small local run while developing: php artisan ai-evals:run --filter="..."
  • Full suite on PR and on merge to main
  • Optional scheduled nightly run for broader confidence

Cases where evals are especially valuable

  • Customer support answers (policy, compliance, tone)
  • Structured outputs that must include key facts
  • Agent workflows where wrong wording creates support or legal risk
  • High-impact user journeys (checkout, cancellation, refunds)