API reference
This page summarizes the fluent eval API.
AIEval::agent()
Creates an eval builder for an agent class string or object instance.
AIEval::agent(App\Ai\Agents\SupportAgent::class)The resolved agent must implement Laravel\Ai\Contracts\Agent or expose prompt(string $prompt).
name()
Sets the eval name shown in output and failure messages.
->name('refund-policy')If omitted, the package tries to infer the Pest test name or standalone suite name.
input()
Sets the prompt sent to the agent.
->input('What is your refund policy?')expectContains()
Requires one or more substrings to appear in the agent output.
->expectContains('refund')
->expectContains(['refund', '30 days'])All provided strings must be present. Matching is case-sensitive.
expectExact()
Requires the full output to match exactly after trimming both values.
->expectExact('OK')expectRegex()
Requires the output to match a regular expression.
->expectRegex('/refunds? within \d+ days/i')expectNotContains()
Requires one or more substrings to be absent from the agent output.
->expectNotContains('always approved')
->expectNotContains(['legal guarantee', 'always approved'])Matching is case-sensitive.
expectJson()
Requires the output to be valid JSON.
->expectJson()expectJsonPath()
Requires a JSON path to exist, or to equal an expected value when provided.
->expectJsonPath('status')
->expectJsonPath('status', 'eligible')
->expectJsonPath('policy.days', 30)Paths use dot notation and may include array indexes, such as items.0.name.
expectLength()
Requires the output length to be within the provided bounds.
->expectLength(min: 20)
->expectLength(max: 500)
->expectLength(min: 20, max: 500)expectStartsWith()
Requires the output to start with a string.
->expectStartsWith('{')expectEndsWith()
Requires the output to end with a string.
->expectEndsWith('}')expect()
Adds a custom expectation. Accepts closures, expectation objects, invokable objects, or container-resolvable class strings.
->expect(fn (string $output): bool => str_contains($output, '30 days'))
->expect(new RefundPolicyExpectation)
->expect(RefundPolicyExpectation::class)Reusable expectation classes may implement LaravelAIEvaluation\Contracts\EvalExpectation and return LaravelAIEvaluation\Evaluation\ExpectationResult.
dataset()
Runs the builder once for each row in a JSON, PHP, or CSV dataset.
->dataset('tests/AgentEvals/datasets/refunds.json')
->inputColumn('input')
->expectContainsFrom('required_terms')
->expectNotContainsFrom('forbidden_terms')Dataset rows should be JSON objects. The default input column is input, and the default row name column is name.
PHP datasets should return the same row array shape from a PHP file, similar to Pest dataset files.
CSV datasets should include a header row. CSV cell values are strings.
Column paths support dot notation.
conversation()
Starts a multi-turn conversation eval. The transcript is flattened into one prompt and the next assistant response is evaluated.
->conversation()
->user('I bought this last week.')
->assistantShouldContain('order number')
->user('The order is #123.')
->expectContains('refund')Conversation evals also support datasets with dataset(), turnsColumn(), inputColumn(), expectContainsFrom(), and expectNotContainsFrom().
See Conversation evals for examples and limitations.
expectJudge()
Scores the output with an LLM judge using criteria and an optional threshold.
->expectJudge(
criteria: 'The answer should be accurate, concise, and polite.',
threshold: 0.8,
)expectJudgeAgainst()
Scores the output with an LLM judge using criteria plus a reference answer.
->expectJudgeAgainst(
reference: 'Refunds are available within 30 days of purchase.',
criteria: 'The answer should mention the refund window.',
threshold: 0.8,
)useJudge()
Sets a judge agent for all judge expectations on the builder.
->useJudge(App\Ai\Agents\JudgeAgent::class)You can also pass judge: directly to expectJudge() or expectJudgeAgainst().
run()
Runs the eval and returns an EvalResult.
$result = AIEval::agent(SupportAgent::class)
->input('What is your refund policy?')
->expectContains('refund')
->run();At least one expectation is required.
For dataset evals, run() returns a dataset result containing one EvalResult per row.
assertPasses()
Fails the current Pest/PHPUnit test, or throws a runtime exception outside PHPUnit, if the eval failed.
->run()
->assertPasses();dump()
Writes eval details in text or json format.
->run()
->dump(format: 'json');EvalResult
Useful methods on the result object:
passed()returnstruewhen every expectation passed.failures()returns failure messages.output()returns the normalized agent output.expectationResults()returns details for each expectation.usage()returns token and cost usage when the provider response exposes it.
php artisan ai-evals:run
Runs standalone eval files.
php artisan ai-evals:run {path?} --filter="refund" --format=json --output=storage/ai-evals/results.jsonOptions:
--filter=runs eval cases whose names contain the filter.--format=supportstext,json,junit, andgithub.--output=writes the formatted report to a file.
For examples of each report format, see Output formats.
Formats:
textprints the human-readable terminal report and is the default.jsonprints or writes the full standalone run report as JSON.junitprints or writes JUnit XML for CI test report UIs.githubprints GitHub Actions::error file=...,line=...::...annotations for failed evals.