fix(cli): thread --config through kebab eval run/aggregate/compare (facade-rule)

Cmd::Eval now loads Config via cli.config (same pattern as all other subcommands) before dispatching to the inner match. Each arm now calls the *_with_config variant: run_eval(&opts) → run_eval_with_config(&cfg, &opts) compute_aggregate(run_id) → compute_aggregate_with_config(&cfg, run_id) store_aggregate(run_id, ..) → store_aggregate_with_config(&cfg, run_id, ..) Compare already called compare_runs_with_config but sourced cfg from Config::load(None) — that redundant load is removed; cfg comes from the shared binding above. Fixes the same facade-rule regression pattern as P3-5 / P4-3: previously `kebab --config /build/dogfood/config.toml eval run` silently evaluated the XDG-default (empty) KB instead of the dogfood KB. Also fixes runner.rs test that hardcoded rag-v2 after commit 5719969 bumped the default prompt_template_version to rag-v3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 03:42:40 +00:00
parent 571996938c
commit d93b757cf1
2 changed files with 88 additions and 85 deletions
--- a/crates/kebab-eval/tests/runner.rs
+++ b/crates/kebab-eval/tests/runner.rs
@@ -215,7 +215,7 @@ fn runner_records_config_snapshot_with_versions() {
    assert!(snap.pointer("/llm/model_id").is_some());
    assert_eq!(
        snap.pointer("/prompt_template_version"),
-        Some(&serde_json::Value::String("rag-v2".to_string())),
+        Some(&serde_json::Value::String("rag-v3".to_string())),
    );
    assert!(snap.pointer("/score_gate").is_some());
    assert!(snap.pointer("/rrf_k").is_some());