Dealing with rate limits
If your eval runs fail with errors like 429, rate limit, or too many requests, the provider is rejecting request bursts.
This guide helps you make eval runs stable in local development and CI.
Why this happens
Live evals call real model APIs. If too many calls happen in a short window, providers throttle requests.
Common causes:
- Running evals in parallel test workers
- Running multiple CI jobs against the same API key at the same time
- Using retries with no pause between attempts
Recommended baseline
Use standalone eval runs for live model checks and keep them serial:
php artisan ai-evals:runThen add conservative retry settings:
AI_EVAL_RETRIES=2
AI_EVAL_RETRY_SLEEP_MS=500If 429 responses still happen, increase AI_EVAL_RETRY_SLEEP_MS to 750 or 1000.
CI setup that usually works
Use a dedicated serial job for evals:
- name: Run AI evals (serial)
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
AI_EVAL_RETRIES: 2
AI_EVAL_RETRY_SLEEP_MS: 500
AI_EVAL_SUMMARY: true
run: php artisan ai-evals:runAlso avoid matrix fan-out for live eval jobs unless each job has its own provider key and quota.
If you run evals in Pest
Prefer a non-parallel run for eval tests:
vendor/bin/pest tests/AgentEvalsAvoid --parallel for live eval suites.
Fast troubleshooting checklist
- Confirm failures include
429or rate-limit wording - Verify live eval suites are not parallelized
- Increase
AI_EVAL_RETRY_SLEEP_MS - Reduce suite size with
--filterduring local iteration - Run full suites in CI only when needed (PRs touching prompts/agents, pre-release, or nightly)
Practical strategy for teams
- Keep deterministic tests (unit/feature with mocks) fast and parallel
- Keep live evals small on PRs and serial
- Run broader live eval coverage on schedule (for example nightly)
This split keeps developer feedback fast while making live eval quality checks reliable.