Run in CI
You can run agent evals in CI with the standalone command.
GitHub Actions example
yaml
name: ai-evals
on:
pull_request:
branches: [main]
push:
branches: [main]
jobs:
agent-evals:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
php-version: '8.4'
tools: composer:v2
- name: Install dependencies
run: composer install --no-interaction --prefer-dist --no-progress
- name: Run AI evals
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
AI_EVAL_RETRIES: 1
AI_EVAL_RETRY_SLEEP_MS: 250
AI_EVAL_REPORT_INCLUDE_INPUT: false
AI_EVAL_REPORT_MAX_OUTPUT_LENGTH: 2000
run: php artisan ai-evals:run --format=githubUpload report artifacts
For CI test report UIs and debugging artifacts, write JUnit and JSON reports to files and upload them.
yaml
- name: Run AI evals with reports
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
AI_EVAL_REPORT_INCLUDE_INPUT: false
AI_EVAL_REPORT_MAX_OUTPUT_LENGTH: 2000
run: |
status=0
php artisan ai-evals:run --format=junit --output=storage/ai-evals/junit.xml || status=$?
php artisan ai-evals:run --format=json --output=storage/ai-evals/results.json || status=$?
exit $status
- name: Upload AI eval reports
if: always()
uses: actions/upload-artifact@v4
with:
name: ai-eval-reports
path: storage/ai-evalsUse --format=github when you want failures shown inline as GitHub annotations:
bash
php artisan ai-evals:run --format=githubReport formats
The standalone runner supports four report formats.
For complete examples of each format, see Output formats.
| Format | Best for | Example |
|---|---|---|
text | Local terminal runs | php artisan ai-evals:run |
github | Inline GitHub Actions annotations | php artisan ai-evals:run --format=github |
junit | CI test report UIs | php artisan ai-evals:run --format=junit --output=storage/ai-evals/junit.xml |
json | Artifacts, dashboards, post-processing | php artisan ai-evals:run --format=json --output=storage/ai-evals/results.json |
If you want a browser-readable HTML report from the JUnit file, use a JUnit/XUnit viewer in CI or locally:
bash
npx xunit-viewer --results=storage/ai-evals/junit.xml --output=storage/ai-evals/junit.html --title="AI Eval Report"Optional: run only matching cases
bash
php artisan ai-evals:run --filter="refund"Recommended strategy
Start with a small high-signal eval suite on pull requests. Run broader eval coverage before releases or on a schedule.
Good PR candidates:
- Prompts or system instructions changed
- Agent tools changed
- Retrieval or knowledge-base logic changed
- Model, provider, or temperature settings changed
For larger suites, use a dedicated scheduled workflow:
yaml
on:
schedule:
- cron: '0 3 * * *'Keep live eval jobs serial unless each job has its own provider key and quota.
Important notes
- The command exits non-zero on failure, so CI will fail automatically.
- Keep API keys in CI secrets, never in the repository.
- Prefer a dedicated API key for eval jobs (separate from production) with limited quota/budget.
- Keep
AI_EVAL_REPORT_INCLUDE_INPUT=falseunless prompts are safe to publish as CI artifacts. - Use
AI_EVAL_REPORT_MAX_OUTPUT_LENGTHandAI_EVAL_REPORT_MAX_FAILURE_LENGTHto keep reports concise. - Keep eval jobs serial to reduce
429bursts when using a shared provider key. - Start with a small
tests/AgentEvalsstandalone*.eval.phpset and expand gradually. - Standalone report formats support
text,json,junit, andgithub; see Output formats for examples. - If CI hits
429/rate limits, follow the dedicated guide: Dealing with rate limits.