Dataset evals
Dataset evals run the same agent checks across multiple rows from a JSON, PHP, or CSV file.
Use them when you want broad coverage without hand-writing one eval case per prompt.
JSON dataset
Store datasets under tests/AgentEvals/datasets:
[
{
"name": "refund inside window",
"input": "I bought this last week. Can I get a refund?",
"required_terms": ["refund", "30 days"],
"forbidden_terms": ["guaranteed"]
}
]Each row should be an object. The optional name column is used in failures and standalone reports.
PHP dataset
PHP datasets work like Pest dataset files: return an array of rows from a PHP file.
<?php
return [
[
'name' => 'refund inside window',
'input' => 'I bought this last week. Can I get a refund?',
'required_terms' => ['refund', '30 days'],
'forbidden_terms' => ['guaranteed'],
],
];You can also return ['rows' => [...]] if you want to keep metadata next to the rows.
CSV dataset
CSV datasets use the first row as column headers:
name,input,required_term,forbidden_term
refund inside window,I bought this last week. Can I get a refund?,refund,guaranteedCSV cells are strings. Use JSON or PHP datasets when a row needs arrays such as multiple required_terms.
Pest example
use LaravelAIEvaluation\AIEval;
it('answers refund dataset cases', function () {
AIEval::agent(App\Ai\Agents\SupportAgent::class)
->name('refund-policy')
->dataset('tests/AgentEvals/datasets/refunds.json')
->inputColumn('input')
->expectContainsFrom('required_terms')
->expectNotContainsFrom('forbidden_terms')
->run()
->assertPasses();
});run() returns a dataset result with one EvalResult per row. assertPasses() fails if any row fails.
Standalone example
use LaravelAIEvaluation\AIEval;
use LaravelAIEvaluation\Standalone\StandaloneEvalSuite;
return static function (StandaloneEvalSuite $suite): void {
$suite->eval('refund-policy', static function () {
return AIEval::agent(App\Ai\Agents\SupportAgent::class)
->dataset('tests/AgentEvals/datasets/refunds.json')
->inputColumn('input')
->expectContainsFrom('required_terms')
->expectNotContainsFrom('forbidden_terms')
->run();
});
};The standalone runner expands dataset results into separate report cases, such as refund-policy / refund inside window.
Scaffold a dataset eval
php artisan make:ai-evals refund-policy --type=standalone --datasetThis creates both an eval file and a starter dataset file in tests/AgentEvals/datasets.
Column methods
dataset('path/to/file.json')loads a JSON dataset.dataset('path/to/file.php')loads a PHP dataset that returns rows.dataset('path/to/file.csv')loads a CSV dataset with a header row.inputColumn('input')chooses the row value sent to the agent. The default isinput.nameColumn('name')chooses the row label. Passnullto userow 1,row 2, and so on.expectContainsFrom('required_terms')reads a string or array of required substrings from each row.expectNotContainsFrom('forbidden_terms')reads a string or array of forbidden substrings from each row.
Column paths support dot notation, such as expected.required_terms or cases.0.input.