Run standalone
If you want to run AI evals without Pest or PHPUnit, use the built-in Artisan command.
Run all standalone evals
php artisan ai-evals:runRun a different folder
php artisan ai-evals:run tests/AgentEvals/BillingUseful filters
Run matching eval cases:
php artisan ai-evals:run --filter="refund policy"By default, the command runs tests/AgentEvals.
To customize the default standalone folder, set:
// config/laravel-ai-evaluation.php
'standalone' => [
'path' => 'tests/AgentEvals',
],Standalone eval file format
Each standalone eval file should use a *.eval.php filename and return a callable that receives StandaloneEvalSuite and registers one or more eval cases.
<?php
use LaravelAIEvaluation\AIEval;
use LaravelAIEvaluation\Standalone\StandaloneEvalSuite;
return static function (StandaloneEvalSuite $suite): void {
$suite->eval('refund-policy', static function () {
return AIEval::agent(App\Ai\Agents\SupportAgent::class)
->input('What is your refund policy?')
->expectContains(['refund', '30 days'])
->run();
});
};Dataset evals
Standalone evals may return dataset results. The runner expands each dataset row into its own output/report case:
return static function (StandaloneEvalSuite $suite): void {
$suite->eval('refund-policy', static function () {
return AIEval::agent(App\Ai\Agents\SupportAgent::class)
->dataset('tests/AgentEvals/datasets/refunds.json')
->expectContainsFrom('required_terms')
->run();
});
};See Dataset evals for the dataset file format.
Use real provider keys safely
Live evals call real model APIs, so keep credentials outside your repository.
- Set provider API keys in
.envfor local development and in secret stores for CI. - Do not commit keys to eval files, config files, or source control.
- Prefer a dedicated eval key (separate from production) with quota and spend limits.
- Keep live eval runs serial (
php artisan ai-evals:run) to avoid burst traffic.
Example local .env setup:
# Use the provider key names expected by your Laravel AI configuration.
OPENAI_API_KEY=your-openai-key
# ANTHROPIC_API_KEY=your-anthropic-key
AI_EVAL_RETRIES=1
AI_EVAL_RETRY_SLEEP_MS=250
AI_EVAL_SUMMARY=true
AI_EVAL_SUMMARY_FORMAT=text
AI_EVAL_SUMMARY_CURRENCY=USDOutput formats
For a complete format-by-format walkthrough with sample output, see Output formats.
Text output is the default:
php artisan ai-evals:runFor CI artifacts and dashboards, write machine-readable reports:
php artisan ai-evals:run --format=json --output=storage/ai-evals/results.json
php artisan ai-evals:run --format=junit --output=storage/ai-evals/junit.xml
php artisan ai-evals:run --format=githubSupported standalone report formats are text, json, junit, and github.
Use text for local development and quick terminal feedback.
php artisan ai-evals:run --format=textUse json when you want a complete machine-readable artifact for dashboards, debugging, or post-processing.
php artisan ai-evals:run --format=json --output=storage/ai-evals/results.jsonUse junit when your CI provider can ingest test reports. This works well with GitHub Actions test reporters, GitLab test reports, Jenkins, and Azure DevOps.
php artisan ai-evals:run --format=junit --output=storage/ai-evals/junit.xmlYou can turn the JUnit XML into a local browser report with a viewer such as xunit-viewer:
npx xunit-viewer --results=storage/ai-evals/junit.xml --output=storage/ai-evals/junit.html --title="AI Eval Report"Use github when running inside GitHub Actions and you want failed evals to appear as inline annotations.
php artisan ai-evals:run --format=github--output writes the selected format to a file. When omitted, the formatted report is written to the console.
Report safety
Use report config to avoid leaking full prompts or secrets into CI artifacts:
AI_EVAL_REPORT_INCLUDE_INPUT=false
AI_EVAL_REPORT_INCLUDE_OUTPUT=true
AI_EVAL_REPORT_MAX_INPUT_LENGTH=500
AI_EVAL_REPORT_MAX_OUTPUT_LENGTH=2000
AI_EVAL_REPORT_MAX_FAILURE_LENGTH=1000Inputs are omitted by default. Outputs are included by default but truncated and passed through the configured redaction patterns.
Verbose output and summaries
The standalone runner supports verbose eval output format configuration:
AI_EVAL_VERBOSE=true
AI_EVAL_FORMAT=jsonFor transient provider/network issues, you can add lightweight retries:
AI_EVAL_RETRIES=1
AI_EVAL_RETRY_SLEEP_MS=250Verbose per-eval dump formats are text and json.
You can configure end-of-run summaries with:
AI_EVAL_SUMMARY=true
AI_EVAL_SUMMARY_FORMAT=text
AI_EVAL_SUMMARY_CURRENCY=USDExample text summary output:
AI Eval Summary
Total: 13
Passed: 12
Failed: 1
Prompt tokens: 7842
Completion tokens: 1966
Total tokens: 9808
Estimated cost: USD 0.070000Example JSON summary output (AI_EVAL_SUMMARY_FORMAT=json):
{"type":"ai_eval_summary","total":13,"passed":12,"failed":1,"prompt_tokens":7842,"completion_tokens":1966,"total_tokens":9808,"estimated_cost":0.07,"currency":"USD"}