Dataset evals

Dataset evals run the same agent checks across multiple rows from a JSON, PHP, or CSV file.

Use them when you want broad coverage without hand-writing one eval case per prompt.

JSON dataset

Store datasets under tests/AgentEvals/datasets:

json

[
    {
        "name": "refund inside window",
        "input": "I bought this last week. Can I get a refund?",
        "required_terms": ["refund", "30 days"],
        "forbidden_terms": ["guaranteed"]
    }
]

Each row should be an object. The optional name column is used in failures and standalone reports.

PHP dataset

PHP datasets work like Pest dataset files: return an array of rows from a PHP file.

php

<?php

return [
    [
        'name' => 'refund inside window',
        'input' => 'I bought this last week. Can I get a refund?',
        'required_terms' => ['refund', '30 days'],
        'forbidden_terms' => ['guaranteed'],
    ],
];

You can also return ['rows' => [...]] if you want to keep metadata next to the rows.

CSV dataset

CSV datasets use the first row as column headers:

csv

name,input,required_term,forbidden_term
refund inside window,I bought this last week. Can I get a refund?,refund,guaranteed

CSV cells are strings. Use JSON or PHP datasets when a row needs arrays such as multiple required_terms.

Pest example

php

use LaravelAIEvaluation\AIEval;

it('answers refund dataset cases', function () {
    AIEval::agent(App\Ai\Agents\SupportAgent::class)
        ->name('refund-policy')
        ->dataset('tests/AgentEvals/datasets/refunds.json')
        ->inputColumn('input')
        ->expectContainsFrom('required_terms')
        ->expectNotContainsFrom('forbidden_terms')
        ->run()
        ->assertPasses();
});

run() returns a dataset result with one EvalResult per row. assertPasses() fails if any row fails.

Standalone example

php

use LaravelAIEvaluation\AIEval;
use LaravelAIEvaluation\Standalone\StandaloneEvalSuite;

return static function (StandaloneEvalSuite $suite): void {
    $suite->eval('refund-policy', static function () {
        return AIEval::agent(App\Ai\Agents\SupportAgent::class)
            ->dataset('tests/AgentEvals/datasets/refunds.json')
            ->inputColumn('input')
            ->expectContainsFrom('required_terms')
            ->expectNotContainsFrom('forbidden_terms')
            ->run();
    });
};

The standalone runner expands dataset results into separate report cases, such as refund-policy / refund inside window.

Scaffold a dataset eval

bash

php artisan make:ai-evals refund-policy --type=standalone --dataset

This creates both an eval file and a starter dataset file in tests/AgentEvals/datasets.

Column methods

dataset('path/to/file.json') loads a JSON dataset.
dataset('path/to/file.php') loads a PHP dataset that returns rows.
dataset('path/to/file.csv') loads a CSV dataset with a header row.
inputColumn('input') chooses the row value sent to the agent. The default is input.
nameColumn('name') chooses the row label. Pass null to use row 1, row 2, and so on.
expectContainsFrom('required_terms') reads a string or array of required substrings from each row.
expectNotContainsFrom('forbidden_terms') reads a string or array of forbidden substrings from each row.

Column paths support dot notation, such as expected.required_terms or cases.0.input.

Dataset evals ​

JSON dataset ​

PHP dataset ​

CSV dataset ​

Pest example ​

Standalone example ​

Scaffold a dataset eval ​

Column methods ​