Skip to content

Run in CI

You can run agent evals in CI with the standalone command.

GitHub Actions example

yaml
name: ai-evals

on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

jobs:
  agent-evals:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Setup PHP
        uses: shivammathur/setup-php@v2
        with:
          php-version: '8.4'
          tools: composer:v2

      - name: Install dependencies
        run: composer install --no-interaction --prefer-dist --no-progress

      - name: Run AI evals
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          AI_EVAL_RETRIES: 1
          AI_EVAL_RETRY_SLEEP_MS: 250
          AI_EVAL_REPORT_INCLUDE_INPUT: false
          AI_EVAL_REPORT_MAX_OUTPUT_LENGTH: 2000
        run: php artisan ai-evals:run --format=github

Upload report artifacts

For CI test report UIs and debugging artifacts, write JUnit and JSON reports to files and upload them.

yaml
      - name: Run AI evals with reports
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          AI_EVAL_REPORT_INCLUDE_INPUT: false
          AI_EVAL_REPORT_MAX_OUTPUT_LENGTH: 2000
        run: |
          status=0
          php artisan ai-evals:run --format=junit --output=storage/ai-evals/junit.xml || status=$?
          php artisan ai-evals:run --format=json --output=storage/ai-evals/results.json || status=$?
          exit $status

      - name: Upload AI eval reports
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: ai-eval-reports
          path: storage/ai-evals

Use --format=github when you want failures shown inline as GitHub annotations:

bash
php artisan ai-evals:run --format=github

Report formats

The standalone runner supports four report formats.

For complete examples of each format, see Output formats.

FormatBest forExample
textLocal terminal runsphp artisan ai-evals:run
githubInline GitHub Actions annotationsphp artisan ai-evals:run --format=github
junitCI test report UIsphp artisan ai-evals:run --format=junit --output=storage/ai-evals/junit.xml
jsonArtifacts, dashboards, post-processingphp artisan ai-evals:run --format=json --output=storage/ai-evals/results.json

If you want a browser-readable HTML report from the JUnit file, use a JUnit/XUnit viewer in CI or locally:

bash
npx xunit-viewer --results=storage/ai-evals/junit.xml --output=storage/ai-evals/junit.html --title="AI Eval Report"

Optional: run only matching cases

bash
php artisan ai-evals:run --filter="refund"

Start with a small high-signal eval suite on pull requests. Run broader eval coverage before releases or on a schedule.

Good PR candidates:

  • Prompts or system instructions changed
  • Agent tools changed
  • Retrieval or knowledge-base logic changed
  • Model, provider, or temperature settings changed

For larger suites, use a dedicated scheduled workflow:

yaml
on:
  schedule:
    - cron: '0 3 * * *'

Keep live eval jobs serial unless each job has its own provider key and quota.

Important notes

  • The command exits non-zero on failure, so CI will fail automatically.
  • Keep API keys in CI secrets, never in the repository.
  • Prefer a dedicated API key for eval jobs (separate from production) with limited quota/budget.
  • Keep AI_EVAL_REPORT_INCLUDE_INPUT=false unless prompts are safe to publish as CI artifacts.
  • Use AI_EVAL_REPORT_MAX_OUTPUT_LENGTH and AI_EVAL_REPORT_MAX_FAILURE_LENGTH to keep reports concise.
  • Keep eval jobs serial to reduce 429 bursts when using a shared provider key.
  • Start with a small tests/AgentEvals standalone *.eval.php set and expand gradually.
  • Standalone report formats support text, json, junit, and github; see Output formats for examples.
  • If CI hits 429/rate limits, follow the dedicated guide: Dealing with rate limits.