LLM-as-judge expectations

Use judge expectations when semantic quality matters more than exact text matching.

Default judge

The package ships with a default judge agent: DefaultJudgeAgent.

Configure a custom judge agent (optional)

Override the default judge in config/laravel-ai-evaluation.php:

php

return [
    'judge' => [
        'agent' => App\Ai\Agents\JudgeAgent::class,
        'threshold' => 0.7,
    ],
];

The judge agent must expose a prompt(string $prompt) method.

You can also pass a judge directly per expectation.

Minimal compatible judge shape:

php

namespace App\Ai\Agents;

use Illuminate\Support\Facades\AI;

final class JudgeAgent
{
    public function prompt(string $prompt): string
    {
        return (string) AI::prompt($prompt);
    }
}

The package sends this judge a complete scoring prompt. Your judge should return JSON only:

json

{"score":0.82,"reason":"Correct and clear; misses one policy detail."}

Judge requirements

For a judge to work with this package, it must meet these requirements:

It must be resolvable as a class string from the container, or passed as an object instance.
It must implement Laravel\Ai\Contracts\Agent or expose prompt(string $prompt).
Its response must include valid JSON with:
- score (numeric, between 0 and 1)
- reason (string)

Expected payload shape:

json

{"score":0.82,"reason":"Mostly correct and clear; misses one policy detail."}

Important behavior notes:

The package expects strict JSON and validates both required fields.
If the response includes extra text, the parser attempts to extract the first JSON object, but returning plain JSON is strongly recommended.
If score is outside 0..1, missing, or non-numeric, the eval fails with a judge response error.

Configure a different judge for eval expectations

Use useJudge() to avoid passing the same judge repeatedly:

php

use LaravelAIEvaluation\AIEval;

AIEval::agent(SupportAgent::class)
    ->input('What is your refund policy?')
    ->useJudge(App\Ai\Agents\JudgeAgent::class)
    ->expectJudge('The answer should be clear and policy accurate.', threshold: 0.8)
    ->run()
    ->assertPasses();

`expectJudgeAgainst`

Use a rubric (criteria) and a reference answer.

php

use LaravelAIEvaluation\AIEval;

AIEval::agent(SupportAgent::class)
    ->input('What is your refund policy?')
    ->expectJudgeAgainst(
        reference: 'Refunds are available within 30 days of purchase.',
        criteria: 'The answer should be correct, concise, and mention the 30 day window.',
        threshold: 0.8,
        judge: App\Ai\Agents\JudgeAgent::class,
    )
    ->run()
    ->assertPasses();

`expectJudge`

Use a rubric without a reference answer.

php

use LaravelAIEvaluation\AIEval;

AIEval::agent(SupportAgent::class)
    ->input('Explain our refund policy in one sentence')
    ->expectJudge(
        criteria: 'The answer should be clear, helpful, and avoid uncertainty.',
        threshold: 0.75,
        judge: App\Ai\Agents\JudgeAgent::class,
    )
    ->run()
    ->assertPasses();

Notes

Judge score is expected between 0 and 1.
Any expectation below threshold fails the eval.
Prefer explicit criteria with product requirements, not vague instructions like "good answer".
Use expectJudgeAgainst() when you have a known-good reference answer.
In CI this causes hard-fail behavior when using Pest or php artisan ai-evals:run.

LLM-as-judge expectations ​

Default judge ​

Configure a custom judge agent (optional) ​

Judge requirements ​

Configure a different judge for eval expectations ​

expectJudgeAgainst ​

expectJudge ​

Notes ​