OMC team `post-pr9-refactor` 의 architectural cleanup. architect priorities 분석 후 executor + test-engineer 가 file edits, system-architect 가 component-level review 로 *pre-cut nothing — all v0.18.1+ defer* 결론.
## Executor 작업 (H1/H2/H3/D/E)
- **H1** (kebab-nli/src/onnx.rs): `[models.nli]` config wire 활성화. `DEFAULT_MODEL_ID` const 제거 (kebab-config 의 NliCfg::defaults 가 single source). OnnxNliVerifier::new 가 config.models.nli.model 읽고 config.models.nli.provider 가 "onnx" 아니면 anyhow::bail. 3 stale "PR-9c-1 will wire this" 코멘트 제거. 2 unit test 추가 (`new_uses_config_model_id`, `new_rejects_unsupported_provider`).
- **H2** (kebab-rag/src/pipeline.rs): `truncate_for_nli(premise: &str, _hypothesis: &str)` → `truncate_for_nli(premise: &str)`. v0.18.1 placeholder doc 제거. 4 callsite (tests/multi_hop.rs) 갱신 + test rename `multi_hop_truncate_for_nli_preserves_hypothesis` → `multi_hop_truncate_for_nli_char_budget` (contract 정합).
- **H3** (kebab-rag/src/pipeline.rs:1041): `was_truncated` 가 tracing::debug! 으로 surface (observability 추가, signature 보존 — caller logging contract).
- **D** (kebab-mcp/tests/tools_call_ask_multi_hop.rs): request_timeout_secs 2 → 5 (slow CI 안정성), `mh_code` discriminator 제거. dispatch contract = `mh.is_error.unwrap_or(false)` (기존 assertion 으로 충분).
- **E** (tasks/HOTFIXES.md + pipeline.rs:1633-1638): fb-41 PR-9 closure entry 의 sibling 으로 "### PR-9 NLI refusal: terminal Synthesize hop omitted from hops trace" subsection 추가. pipeline 의 "cleanup deferred to a follow-up" → "// See tasks/HOTFIXES.md ... for follow-up" cross-link.
## Test-engineer 작업 (T1/T2/T3/T4, 9 new tests)
- **T1** (kebab-nli/src/onnx.rs::tests): sanitize_model_id 3 unit (replaces_slash / idempotent / leaves_other_chars).
- **T2** (kebab-rag/tests/multi_hop_nli_panic.rs 신규): 2 panic-path tests — facade invariant (`expect("verifier must be Some when nli_threshold > 0.0")`) 의 #[should_panic] + threshold=0 의 companion.
- **T3** (kebab-rag/tests/multi_hop_nli_stream.rs 신규): 2 StreamEvent::Final tests — refuse_nli_verification + refuse_nli_model_unavailable 의 stream_sink Final 분기 wire shape pinning.
- **T4** (kebab-app/tests/open_with_config_nli.rs 신규): 2 NLI failure path — model_dir 가 unwritable 일 때 App::open_with_config 의 Result<App> Err (with "OnnxNliVerifier" in chain) + threshold=0 일 때 graceful skip.
## System-architect 결론
3 lenses (absorption / duplication / under-engineered interface) 분석 결과 — *pre-cut nothing*. Top-3 items 모두 v0.18.1+ defer:
- Lens 1: kebab-normalize + kebab-parse-types 흡수 가능 (parse-md 만 사용, 5 parsers 우회) → v0.18.1+.
- Lens 3: Extractor + Chunker trait 의 dead polymorphism (모든 callsite 가 hardcoded) → v0.18.1+.
- Lens 1 bundled: kebab-source-fs 가 kebab-parse-code 의 9 tree-sitter grammars drag → low-risk dep-graph win, v0.18.1+ bundled.
- Defer-with-intent: LanguageModel async refactor (cloud-LLM 시), NliVerifier::score_batch + typed NliError (2nd impl 시), compute_stale → kebab-core::stale.
보고서: /build/cache/tmp/post-pr9-refactor-priorities.md, /build/cache/tmp/system-architecture-priorities.md (둘 다 repo 외 — analysis 보존).
## 검증
- cargo test -p kebab-nli -j 1 → 11/11 pass.
- cargo test -p kebab-rag -j 1 → 75/75 pass (5 NLI multi-hop + 4 신규 T2/T3 포함).
- cargo test -p kebab-app -j 1 → 23 pass + 2 ignored (T4 의 2 포함).
- cargo test -p kebab-mcp --test tools_call_ask_multi_hop -j 1 → 1 pass + 1 pre-existing flaky (HOTFIX #15, no_chunks short-circuit, executor D fix 와 무관 — line 86 의 base assertion 이 fixture 없어서 fail).
- cargo clippy --workspace --all-targets -j 1 -- -D warnings clean.
- cargo test --workspace --no-fail-fast -j 1 → 1304 passed (+11 new) + 1 pre-existing flaky 동일.
- **Post-refactor dogfood retest byte-identical** (PR-9d / post-cleanup / post-refactor 3번 모두): S7 0.0035389824770390987, S1 0.058334656059741974, S10 0.0027875436935573816, S3 nli_model_unavailable.
docs/dogfood/v0.18.0/SUMMARY.md 에 "Post-architectural-refactor retest" section 추가.
Wire 영향: 없음.
Behavior 영향: 없음 (H1 의 config wiring 가 default 와 같은 model → byte-identical).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
149 lines
6.3 KiB
Rust
149 lines
6.3 KiB
Rust
//! p9-fb-41 PR-5: MCP `ask` tool with `multi_hop: true` argument.
|
|
//!
|
|
//! Two Ollama-free pins:
|
|
//!
|
|
//! 1. `ask_tool_routes_multi_hop_true_to_decompose_first` — multi-hop
|
|
//! dispatch differs from single-pass on dispatch shape. Single-pass
|
|
//! retrieves *first* (empty KB → `NoChunks` refusal, no LLM call,
|
|
//! `grounded=false`). Multi-hop calls *decompose first* (no
|
|
//! retrieval yet), so an empty KB + no Ollama yields `error.v1`
|
|
//! with `code=model_unreachable` — different wire shape than the
|
|
//! refusal envelope. The two surfaces' divergence is the signal
|
|
//! that the `multi_hop` arg actually routed the dispatch.
|
|
//! 2. `ask_input_schema_advertises_multi_hop_field` — `AskInput`'s
|
|
//! `JsonSchema` exposes the new field so MCP host capability
|
|
//! discovery (tools/list) renders it for agents.
|
|
//!
|
|
//! A live-Ollama end-to-end multi-hop pin lands in a follow-up
|
|
//! `#[ignore]` test (same pattern as `wire_ask_stale.rs`).
|
|
|
|
use kebab_config::Config;
|
|
use kebab_core::SourceScope;
|
|
use kebab_mcp::{KebabAppState, KebabHandler};
|
|
use rmcp::model::RawContent;
|
|
|
|
fn minimal_config(data_dir: &std::path::Path, workspace_root: &std::path::Path) -> Config {
|
|
let mut cfg = Config::defaults();
|
|
cfg.storage.data_dir = data_dir.to_string_lossy().into_owned();
|
|
cfg.storage.model_dir = data_dir.join("models").to_string_lossy().into_owned();
|
|
cfg.workspace.root = workspace_root.to_string_lossy().into_owned();
|
|
cfg.workspace.exclude.clear();
|
|
cfg.models.embedding.provider = "none".to_string();
|
|
cfg.models.embedding.dimensions = 0;
|
|
// Force the LLM endpoint to a known-unreachable port so this test
|
|
// is robust against whether a real Ollama happens to be running
|
|
// on 127.0.0.1:11434 (the developer's box; CI; etc.). The
|
|
// `request_timeout_secs = 5` gives slow CI / Docker network stacks
|
|
// enough headroom that *some* error fires deterministically — the
|
|
// dispatch contract below only cares that `is_error` flipped, not
|
|
// which specific error code surfaced.
|
|
cfg.models.llm.endpoint = "http://127.0.0.1:1".to_string();
|
|
cfg.models.llm.request_timeout_secs = 5;
|
|
cfg
|
|
}
|
|
|
|
/// The dispatch contract: with an empty KB, single-pass `ask` short-
|
|
/// circuits at retrieval (no LLM call) and returns a refusal Answer
|
|
/// (`grounded=false`, `isError=false`). Multi-hop calls *decompose
|
|
/// first*, so the same empty KB + unreachable LLM yields `error.v1`
|
|
/// with `code=model_unreachable` (`isError=true`). The divergence
|
|
/// confirms the `multi_hop` arg actually rerouted the dispatch.
|
|
#[tokio::test]
|
|
async fn ask_tool_routes_multi_hop_true_to_decompose_first() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let data_dir = dir.path().join("data");
|
|
let workspace_root = dir.path().join("notes");
|
|
std::fs::create_dir_all(&data_dir).unwrap();
|
|
std::fs::create_dir_all(&workspace_root).unwrap();
|
|
let cfg = minimal_config(&data_dir, &workspace_root);
|
|
|
|
let scope = SourceScope {
|
|
root: workspace_root.clone(),
|
|
include: vec![],
|
|
exclude: vec![],
|
|
};
|
|
let _ = kebab_app::ingest_with_config(cfg.clone(), scope, false).unwrap();
|
|
|
|
let state = KebabAppState::new(cfg, None);
|
|
let handler = KebabHandler::new(state);
|
|
|
|
// Multi-hop branch — decompose runs first, hits the unreachable
|
|
// endpoint, MCP wraps as error.v1.
|
|
let state_mh = handler.state().clone();
|
|
let mh = tokio::task::spawn_blocking(move || {
|
|
kebab_mcp::tools::ask::handle(
|
|
&state_mh,
|
|
kebab_mcp::tools::ask::AskInput {
|
|
query: "compound about X and Y".to_string(),
|
|
session_id: None,
|
|
mode: Some("lexical".to_string()),
|
|
multi_hop: Some(true),
|
|
},
|
|
)
|
|
})
|
|
.await
|
|
.unwrap();
|
|
assert!(
|
|
mh.is_error.unwrap_or(false),
|
|
"multi_hop=true must reach the LLM (decompose first) — got {mh:?}"
|
|
);
|
|
let mh_text = match &mh.content.first().unwrap().raw {
|
|
RawContent::Text(t) => t.text.clone(),
|
|
other => panic!("expected text, got {other:?}"),
|
|
};
|
|
let mh_v: serde_json::Value = serde_json::from_str(&mh_text).unwrap();
|
|
assert_eq!(mh_v["schema_version"], "error.v1");
|
|
// The dispatch contract is "multi-hop reached the LLM" — i.e.
|
|
// `is_error` fires because decompose tried to talk to the LLM and
|
|
// failed. Which *specific* error code lands (`model_unreachable`
|
|
// on fast ECONNREFUSED hosts, `timeout` on slow connect-timeout
|
|
// stacks, etc.) is implementation detail of the host TCP/HTTP
|
|
// path; pinning it here would just produce flakes on slow CI.
|
|
|
|
// Single-pass branch — empty KB short-circuits at retrieve, no LLM
|
|
// call happens, refusal Answer comes back as isError=false.
|
|
let state_sp = handler.state().clone();
|
|
let sp = tokio::task::spawn_blocking(move || {
|
|
kebab_mcp::tools::ask::handle(
|
|
&state_sp,
|
|
kebab_mcp::tools::ask::AskInput {
|
|
query: "anything".to_string(),
|
|
session_id: None,
|
|
mode: Some("lexical".to_string()),
|
|
multi_hop: Some(false),
|
|
},
|
|
)
|
|
})
|
|
.await
|
|
.unwrap();
|
|
assert!(
|
|
!sp.is_error.unwrap_or(false),
|
|
"single-pass empty-KB refusal must NOT be isError — got {sp:?}"
|
|
);
|
|
let sp_text = match &sp.content.first().unwrap().raw {
|
|
RawContent::Text(t) => t.text.clone(),
|
|
other => panic!("expected text, got {other:?}"),
|
|
};
|
|
let sp_v: serde_json::Value = serde_json::from_str(&sp_text).unwrap();
|
|
assert_eq!(sp_v["schema_version"], "answer.v1");
|
|
assert_eq!(sp_v["grounded"], false);
|
|
}
|
|
|
|
/// AskInput's JSON-schema (rendered for tools/list) advertises the
|
|
/// new `multi_hop` field. Pins agent / MCP host capability discovery
|
|
/// against accidental schema-rename or omission.
|
|
#[test]
|
|
fn ask_input_schema_advertises_multi_hop_field() {
|
|
let schema = schemars::schema_for!(kebab_mcp::tools::ask::AskInput);
|
|
let v = serde_json::to_value(&schema).unwrap();
|
|
let props = v
|
|
.get("properties")
|
|
.and_then(|p| p.as_object())
|
|
.expect("AskInput schema must declare properties");
|
|
assert!(
|
|
props.contains_key("multi_hop"),
|
|
"AskInput.multi_hop must surface in the JsonSchema — got keys: {:?}",
|
|
props.keys().collect::<Vec<_>>()
|
|
);
|
|
}
|