feat(p4-3): rag-pipeline — kb-rag 크레이트 + kb-app::ask 와이어링 #23

altair823 · 2026-05-01T15:07:21Z

altair823 commented

2026-05-01 15:07:21 +00:00

변경 요약

P4 마지막 task. 사용자 입장에서 가장 큰 payoff입니다 — 본 PR 머지 후 kb ask가 실제로 동작합니다 (Ollama 백엔드 연결 시). 새 크레이트 kb-rag이 retrieve → score gate → pack → render → generate → cite-validate → persist 9단계 파이프라인을 담고, kb-app::ask의 stub을 실제 본체로 교체합니다.

무엇을 했는가

새 크레이트 `kb-rag`

RagPipeline: Arc<dyn Retriever + Send + Sync> + Arc<dyn LanguageModel + Send + Sync> + Arc<SqliteStore> + Config. 그 자체로 Send + Sync (compile-time check로 pin).
AskOpts { k, explain, mode, temperature, seed, stream_sink: Option<mpsc::Sender<String>> }. k는 config.search.default_k을 floor로 가지므로 caller가 실수로 0을 넘겨 retrieval을 starve할 수 없게 됨.

9단계 파이프라인

Retrieve: 주입된 dyn Retriever로 SearchQuery 검색.
Score gate: 빈 hits → NoChunks refusal (LLM 호출 없음). top-1 < config.rag.score_gate → ScoreGate refusal (LLM 호출 없음, top-3 후보 답변 텍스트에 나열).
Pack: 예산 = max_context_tokens.saturating_sub(prompt overhead). 각 hit에 [#n] doc=… heading=… span=…\n<text> 포맷. 모든 hit의 chunk가 store에서 unfetchable이면 (search와 pack 사이 삭제) 빈 [근거]를 LLM에 넘기지 않고 NoChunks refusal로 fallback (리뷰 MUST-FIX 반영).
Render rag-v1 prompt: spec의 한국어 system 문자열 verbatim + [질문]/[근거] user 템플릿.
Generate: 단일 스레드 token loop이 iterator를 소유. opts.stream_sink가 있으면 forward, SendError silently drop (caller cancellation에도 panic 없이 계속). max_completion은 llm.context_tokens().saturating_sub(used_for_input).max(64) — rag.max_context_tokens로 cap하지 않음 (그건 [근거] 패킹 예산이지 LM 출력 예산이 아니라는 리뷰 MUST-FIX 반영).
Citation extract: STRICT regex \[#(\d{1,3})\] (OnceLock으로 한 번 컴파일). 느슨한 [1], [ #1 ], [#foo], [#1234], vec![1]은 모두 거부 — 산문/Markdown footnote/code block 내 false positive 차단.
Citation validate 4 케이스:
- 미지 marker [#7] (packed 3개일 때) → LlmSelfJudge.
- 빈 답변 + hits 있음 → LlmSelfJudge.
- 비어있지 않은 답변 + marker 없음 + 근거 (가|이) 부족 매치 → LlmSelfJudge (canonical Korean refusal phrase). phrase 매치는 tracing::debug에 기록.
- 비어있지 않은 답변 + marker 없음 + refusal phrase 없음 → LlmSelfJudge (silent ungrounded는 여전히 refusal).
- 비어있지 않은 답변 + ≥1개 valid marker → grounded = true.
Build Answer:
- citations: packed 리스트를 모델이 실제 인용한 marker로 필터. wire format marker: Some(\"[1]\") (square-bracketed bare index, design §2.3) — prompt-side [#n] 문법과 분리 (리뷰 MUST-FIX 반영).
- embedding ModelRef: Vector/Hybrid 모드는 config.models.embedding에서 읽음, Lexical은 None. ScoreGate/NoChunks refusal에서도 vector retriever가 "consulted"되었으므로 Some(...) 유지 (코드 코멘트로 명문화).
- TraceId: ret_<8-hex> from blake3(query, top_score, model_id, ns).
- usage: 마지막 Done chunk에서. LLM이 0 보고 시 wall-clock fallback.
Persist: 새로 추가된 SqliteStore::put_answer(&Answer, query, packed_chunks_json). 모든 경로 (refusal 포함) answers 행 INSERT. packed_chunks_json은 opts.explain=true일 때만.

`kb-app::ask` 와이어링

bail!(\"not yet wired (P4-3)\")을 실제 본체로 교체. opts.mode에 따라 LexicalRetriever / VectorRetriever / HybridRetriever 빌드 + OllamaLanguageModel 인스턴스 + RagPipeline::new 후 ask(). kb-app::AskOpts는 pub use kb_rag::AskOpts으로 이전 (kb-cli의 kb_app::AskOpts import 변경 없음).

kb-app/Cargo.toml에 kb-rag, kb-llm, kb-llm-local 추가. P3-5 시점에 forbid되어 있던 항목들은 P4-3 spec에서 lift됩니다 — kb-app이 orchestrator이고 ask는 trait 크레이트와 Ollama adapter 둘 다 필요.

kb-cli/src/main.rs의 AskOpts literal에 stream_sink: None 추가 (P9 TUI에서 실제 sink 연결).

테스트

kb-rag (18건): 3 단위 (marker regex strictness — 모든 느슨한 form 거부 byte-equal로 검증, Send+Sync compile check, embedding_ref_for 모드별 동작) + 15 통합 (모든 spec test row + unfetchable_chunks_fall_back_to_no_chunks 신규 가드 테스트).
kb-app (#[ignore] 1건): 실제 Ollama on localhost:11434을 요구하는 ask end-to-end smoke. opt-in via cargo test -p kb-app -- --ignored.

워크스페이스 default 319 passed / 26 ignored / 0 failed. cargo clippy --workspace --all-targets -- -D warnings clean.

의존성

Allowed deps 준수: kb-core, kb-config, kb-search, kb-llm, kb-store-sqlite, serde, serde_json, regex, time, tracing, thiserror + 강제 anyhow (trait 반환 타입), blake3 (TraceId 해싱).

Forbidden deps (kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-vector, kb-embed*, kb-llm-local, kb-tui, kb-desktop) 모두 직접 의존 0건. concrete adapter는 모두 trait object를 통해서만 도달.

변경 파일

crates/kb-rag/Cargo.toml (신규)
crates/kb-rag/src/lib.rs (신규)
crates/kb-rag/src/pipeline.rs (신규 — 9단계 파이프라인)
crates/kb-rag/tests/common/mod.rs (신규 — RagEnv + MockRetriever)
crates/kb-rag/tests/pipeline.rs (신규)
crates/kb-store-sqlite/src/answers.rs (신규 — put_answer)
crates/kb-store-sqlite/src/lib.rs (mod 등록)
crates/kb-app/Cargo.toml (kb-rag/kb-llm/kb-llm-local 추가)
crates/kb-app/src/lib.rs (ask 본체 와이어링)
crates/kb-app/tests/ask_smoke.rs (신규 #[ignore])
crates/kb-cli/src/main.rs (stream_sink: None 추가)
Cargo.toml (workspace member + regex workspace dep)
Cargo.lock

Out of scope

리랭커 (P+).
멀티턴 채팅 메모리 (P+).
LLM-as-judge 평가 — P5는 rule-based must_contain을 사용.
--json 스트리밍 — design §0 Q5 hybrid 따라 버퍼링.

머지 후 사용 예시

$ ollama serve &
$ ollama pull qwen2.5:7b-instruct
$ kb --config /path/to/config.toml ask \"Rust ownership 모델이 뭐야?\" --mode hybrid

design §0 Q4 (refusal), §0 Q7 (footer), §1.1-1.4 (ask scenes), §2.3 (Answer wire), §3.8 (internal Answer), §6.4 [rag], §10 errors 참고. 본 PR로 P4 단계 종료 — 다음은 P5-1 golden-fixture-runner.

## 변경 요약 P4 마지막 task. 사용자 입장에서 가장 큰 payoff입니다 — 본 PR 머지 후 `kb ask`가 실제로 동작합니다 (Ollama 백엔드 연결 시). 새 크레이트 `kb-rag`이 retrieve → score gate → pack → render → generate → cite-validate → persist 9단계 파이프라인을 담고, `kb-app::ask`의 stub을 실제 본체로 교체합니다. ## 무엇을 했는가 ### 새 크레이트 `kb-rag` - **`RagPipeline`**: `Arc<dyn Retriever + Send + Sync>` + `Arc<dyn LanguageModel + Send + Sync>` + `Arc<SqliteStore>` + `Config`. 그 자체로 `Send + Sync` (compile-time check로 pin). - **`AskOpts { k, explain, mode, temperature, seed, stream_sink: Option<mpsc::Sender<String>> }`**. `k`는 `config.search.default_k`을 floor로 가지므로 caller가 실수로 0을 넘겨 retrieval을 starve할 수 없게 됨. ### 9단계 파이프라인 1. **Retrieve**: 주입된 `dyn Retriever`로 `SearchQuery` 검색. 2. **Score gate**: 빈 hits → `NoChunks` refusal (LLM 호출 없음). top-1 < `config.rag.score_gate` → `ScoreGate` refusal (LLM 호출 없음, top-3 후보 답변 텍스트에 나열). 3. **Pack**: 예산 = `max_context_tokens.saturating_sub(prompt overhead)`. 각 hit에 `[#n] doc=… heading=… span=…\n<text>` 포맷. 모든 hit의 chunk가 store에서 unfetchable이면 (search와 pack 사이 삭제) 빈 `[근거]`를 LLM에 넘기지 않고 `NoChunks` refusal로 fallback (리뷰 MUST-FIX 반영). 4. **Render rag-v1 prompt**: spec의 한국어 system 문자열 verbatim + `[질문]/[근거]` user 템플릿. 5. **Generate**: 단일 스레드 token loop이 iterator를 소유. opts.stream_sink가 있으면 forward, `SendError` silently drop (caller cancellation에도 panic 없이 계속). `max_completion`은 `llm.context_tokens().saturating_sub(used_for_input).max(64)` — `rag.max_context_tokens`로 cap하지 **않음** (그건 [근거] 패킹 예산이지 LM 출력 예산이 아니라는 리뷰 MUST-FIX 반영). 6. **Citation extract**: STRICT regex `\[#(\d{1,3})\]` (`OnceLock`으로 한 번 컴파일). 느슨한 `[1]`, `[ #1 ]`, `[#foo]`, `[#1234]`, `vec![1]`은 모두 거부 — 산문/Markdown footnote/code block 내 false positive 차단. 7. **Citation validate** 4 케이스: - 미지 marker `[#7]` (packed 3개일 때) → LlmSelfJudge. - 빈 답변 + hits 있음 → LlmSelfJudge. - 비어있지 않은 답변 + marker 없음 + `근거 (가|이) 부족` 매치 → LlmSelfJudge (canonical Korean refusal phrase). phrase 매치는 `tracing::debug`에 기록. - 비어있지 않은 답변 + marker 없음 + refusal phrase 없음 → LlmSelfJudge (silent ungrounded는 여전히 refusal). - 비어있지 않은 답변 + ≥1개 valid marker → grounded = true. 8. **Build Answer**: - `citations`: packed 리스트를 모델이 실제 인용한 marker로 필터. wire format `marker: Some(\"[1]\")` (square-bracketed bare index, design §2.3) — prompt-side `[#n]` 문법과 분리 (리뷰 MUST-FIX 반영). - `embedding ModelRef`: Vector/Hybrid 모드는 `config.models.embedding`에서 읽음, Lexical은 `None`. ScoreGate/NoChunks refusal에서도 vector retriever가 \"consulted\"되었으므로 `Some(...)` 유지 (코드 코멘트로 명문화). - `TraceId`: `ret_<8-hex>` from `blake3(query, top_score, model_id, ns)`. - `usage`: 마지막 `Done` chunk에서. LLM이 0 보고 시 wall-clock fallback. 9. **Persist**: 새로 추가된 `SqliteStore::put_answer(&Answer, query, packed_chunks_json)`. 모든 경로 (refusal 포함) `answers` 행 INSERT. `packed_chunks_json`은 `opts.explain=true`일 때만. ### `kb-app::ask` 와이어링 `bail!(\"not yet wired (P4-3)\")`을 실제 본체로 교체. `opts.mode`에 따라 `LexicalRetriever` / `VectorRetriever` / `HybridRetriever` 빌드 + `OllamaLanguageModel` 인스턴스 + `RagPipeline::new` 후 `ask()`. `kb-app::AskOpts`는 `pub use kb_rag::AskOpts`으로 이전 (kb-cli의 `kb_app::AskOpts` import 변경 없음). `kb-app/Cargo.toml`에 `kb-rag`, `kb-llm`, `kb-llm-local` 추가. P3-5 시점에 forbid되어 있던 항목들은 P4-3 spec에서 lift됩니다 — kb-app이 orchestrator이고 `ask`는 trait 크레이트와 Ollama adapter 둘 다 필요. `kb-cli/src/main.rs`의 `AskOpts` literal에 `stream_sink: None` 추가 (P9 TUI에서 실제 sink 연결). ## 테스트 - **kb-rag (18건)**: 3 단위 (marker regex strictness — 모든 느슨한 form 거부 byte-equal로 검증, Send+Sync compile check, `embedding_ref_for` 모드별 동작) + 15 통합 (모든 spec test row + `unfetchable_chunks_fall_back_to_no_chunks` 신규 가드 테스트). - **kb-app (`#[ignore]` 1건)**: 실제 Ollama on `localhost:11434`을 요구하는 ask end-to-end smoke. opt-in via `cargo test -p kb-app -- --ignored`. 워크스페이스 default 319 passed / 26 ignored / 0 failed. `cargo clippy --workspace --all-targets -- -D warnings` clean. ## 의존성 Allowed deps 준수: `kb-core`, `kb-config`, `kb-search`, `kb-llm`, `kb-store-sqlite`, `serde`, `serde_json`, `regex`, `time`, `tracing`, `thiserror` + 강제 `anyhow` (trait 반환 타입), `blake3` (TraceId 해싱). Forbidden deps (`kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-llm-local`, `kb-tui`, `kb-desktop`) 모두 직접 의존 0건. concrete adapter는 모두 trait object를 통해서만 도달. ## 변경 파일 - `crates/kb-rag/Cargo.toml` (신규) - `crates/kb-rag/src/lib.rs` (신규) - `crates/kb-rag/src/pipeline.rs` (신규 — 9단계 파이프라인) - `crates/kb-rag/tests/common/mod.rs` (신규 — RagEnv + MockRetriever) - `crates/kb-rag/tests/pipeline.rs` (신규) - `crates/kb-store-sqlite/src/answers.rs` (신규 — `put_answer`) - `crates/kb-store-sqlite/src/lib.rs` (mod 등록) - `crates/kb-app/Cargo.toml` (kb-rag/kb-llm/kb-llm-local 추가) - `crates/kb-app/src/lib.rs` (`ask` 본체 와이어링) - `crates/kb-app/tests/ask_smoke.rs` (신규 `#[ignore]`) - `crates/kb-cli/src/main.rs` (`stream_sink: None` 추가) - `Cargo.toml` (workspace member + `regex` workspace dep) - `Cargo.lock` ## Out of scope - 리랭커 (P+). - 멀티턴 채팅 메모리 (P+). - LLM-as-judge 평가 — P5는 rule-based `must_contain`을 사용. - `--json` 스트리밍 — design §0 Q5 hybrid 따라 버퍼링. ## 머지 후 사용 예시 ```bash $ ollama serve & $ ollama pull qwen2.5:7b-instruct $ kb --config /path/to/config.toml ask \"Rust ownership 모델이 뭐야?\" --mode hybrid ``` design §0 Q4 (refusal), §0 Q7 (footer), §1.1-1.4 (ask scenes), §2.3 (Answer wire), §3.8 (internal Answer), §6.4 [rag], §10 errors 참고. 본 PR로 P4 단계 종료 — 다음은 P5-1 golden-fixture-runner.

altair823 added 1 commit 2026-05-01 15:07:21 +00:00

feat(p4-3): kb-rag crate — full RAG pipeline + kb-app::ask wired e35b06d0d0

P4 terminal task. Implements the user-facing payoff: retrieve →
score gate → pack → render → generate → cite-validate → persist.
After this commit, `kb ask` actually works against an Ollama
backend; the pipeline grounds the answer in retrieved chunks and
refuses cleanly when the gate trips or the model self-judges.

New crate kb-rag:
- pub struct RagPipeline { retriever, llm, docs, config } — all
  Arc<dyn Trait + Send + Sync> so the pipeline shares + Sync.
- pub fn ask(query, opts) -> Result<Answer> drives the nine-stage
  flow per spec §1.
- pub struct AskOpts { k, explain, mode, temperature, seed,
  stream_sink: Option<mpsc::Sender<String>> }. k acts as a floor
  over config.search.default_k so a low-k caller can't starve
  retrieval (documented in field doc).

Pipeline stages:
1. Retrieve via the injected dyn Retriever.
2. Score gate: empty hits → NoChunks refusal (no LLM call); top-1 <
   config.rag.score_gate → ScoreGate refusal (no LLM call) with
   top-3 candidates listed in the synthesized answer text.
3. Pack: budget = config.rag.max_context_tokens.saturating_sub
   (prompt overhead). Per-hit `[#n] doc=… heading=… span=…\n<text>`
   with deterministic enumeration. If every hit's chunk is
   unfetchable from the store (deleted between search and pack),
   fall back to NoChunks refusal with a tracing::warn rather than
   feeding an empty [근거] to the LLM.
4. Render rag-v1 prompt with the spec's verbatim Korean system
   string + `[질문]/[근거]` user template.
5. Generate via dyn LanguageModel. Single-thread token loop owns
   the iterator; tokens optionally forward to opts.stream_sink (a
   `mpsc::Sender<String>`). SendError silently dropped — caller
   cancellation never panics the pipeline. After Done the loop
   reads (acc, finish_reason, usage) in lockstep with no race.
   max_completion = llm.context_tokens().saturating_sub
   (used_for_input).max(64) — explicitly NOT capped by
   config.rag.max_context_tokens (that's the packing budget for
   [근거], not the LM completion ceiling).
6. Citation extract via STRICT regex `\[#(\d{1,3})\]` (compiled
   once via OnceLock). Loose forms `[1]`, `[ #1 ]`, `[#foo]`,
   `[#1234]`, `vec![1]` are all rejected to prevent prose
   false-positives.
7. Citation validate covers four cases:
   - unknown marker (e.g. `[#7]` when only 3 packed) →
     LlmSelfJudge refusal.
   - empty answer with hits → LlmSelfJudge.
   - non-empty + no marker + matches `근거 (가|이) 부족` regex →
     LlmSelfJudge (model self-refused with the canonical phrase;
     phrase match logged via tracing::debug for observability).
   - non-empty + no marker + no refusal phrase → LlmSelfJudge
     (silent ungrounded answers are still refusals).
   - non-empty + ≥1 valid marker → grounded = true.
8. Build Answer per kb_core::Answer shape:
   - citations: filter packed list to exactly the markers cited.
     Wire format `marker: Some("[1]")` (square-bracketed bare
     index) per design §2.3, distinct from the prompt-side
     `[#n]` grammar.
   - embedding ModelRef: read from config.models.embedding for
     Vector/Hybrid; None for Lexical. Documented deviation since
     the Retriever trait doesn't expose the embedder. For
     ScoreGate/NoChunks refusals on Vector/Hybrid the embedding
     model is still recorded — the vector retriever WAS consulted
     even when the gate tripped.
   - TraceId minted as `ret_<8-hex>` from blake3(query, top_score,
     model_id, ns).
   - retrieval AnswerRetrievalSummary populated.
   - usage from the final Done chunk; latency_ms wall-clock
     fallback when the LLM reports zero.
   - created_at OffsetDateTime::now_utc().
9. Persist via SqliteStore::put_answer (new inherent method on
   SqliteStore, not on the DocumentStore trait — answers aren't
   documents and adding to kb-core was forbidden). Always inserts,
   refusals included. packed_chunks_json is null unless
   opts.explain == true.

kb-store-sqlite extension:
- pub fn put_answer(&Answer, query, packed_chunks_json) ->
  Result<AnswerId>. Maps all 22 fields of the answers table per
  V001 schema in a single INSERT under a transaction.

kb-app::ask wired:
- bail!("not yet wired (P4-3)") replaced with a real body that
  builds the retriever per opts.mode (Lexical | Vector | Hybrid),
  instantiates OllamaLanguageModel from config, constructs
  RagPipeline, calls pipeline.ask. AskOpts moves to kb-rag and is
  re-exported via `pub use kb_rag::AskOpts` so kb-cli's
  `use kb_app::AskOpts` keeps working.
- kb-app/Cargo.toml gains kb-rag, kb-llm, kb-llm-local. P3-5's
  forbids on these are lifted by P4-3 spec — kb-app is the
  orchestrator and ask requires both the trait crate and the
  Ollama adapter.
- kb-cli/main.rs's AskOpts literal updated with stream_sink: None
  for the CLI path (TUI in P9 will plumb a real sink).

Tests (kb-rag: 18; kb-app: 1 ignored):
- 3 unit in src/pipeline.rs: marker regex strictness (rejects all
  loose forms with byte-equal expectations), Send+Sync compile
  check, embedding_ref_for behavior across modes.
- 15 integration in tests/pipeline.rs covering every spec test row
  + the new "all chunks unfetchable falls back to NoChunks" guard:
  empty-hits, score-gate, grounded happy path, unknown-marker,
  prose-`[1]` rejection, `vec![1]` rejection, refusal-phrase,
  packing-budget overflow, streaming-forwards-to-mpsc, dropped-
  receiver-no-panic, usage-from-final-Done, answers-row-inserted-
  for-each-refusal-kind, determinism temp=0 seed=0, Answer JSON
  shape, unfetchable-chunks-fall-back-to-no-chunks (the new
  M3 test).
- kb-app/tests/ask_smoke.rs: 1 #[ignore]'d real-Ollama smoke that
  drives the wired ask end-to-end against `localhost:11434`.

Workspace: 319 passed / 26 ignored / 0 failed. cargo clippy
--workspace --all-targets -- -D warnings clean.

Allowed deps respected (kb-core, kb-config, kb-search, kb-llm,
kb-store-sqlite, serde, serde_json, regex, time, tracing,
thiserror) plus forced waivers anyhow (Retriever / LanguageModel
trait return types) and blake3 (TraceId minting). Forbidden
(kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-
vector direct, kb-embed* direct, kb-llm-local direct, kb-tui,
kb-desktop) all absent from `cargo tree -p kb-rag` — concrete
adapters reach the pipeline only through trait objects.

Out of scope: reranker between retrieve and pack (P+), multi-turn
chat memory (P+), LLM-as-judge eval (P5 uses rule-based
must_contain), --json streaming (buffers per §0 Q5 hybrid).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude-reviewer-01 reviewed 2026-05-01 15:08:16 +00:00

claude-reviewer-01 left a comment

P4-3 코드 리뷰 — 셀프 머지 게이트로 인해 COMMENT only.

P4 마지막 작업이라 양쪽 리뷰에서 의미 있는 발견 다수:

spec compliance: 1 MUST-FIX (citation marker wire format \"#1\" vs spec \"[1]\") + 여러 NIT.
code quality: 0 BLOCKER + 4 MUST-FIX + 7 NIT.

5개 MUST-FIX 모두 + 3개 핵심 NIT을 PR에 반영했습니다:

citation marker wire format \"#1\" → \"[1]\" (spec §2.3 준수, prompt-side [#n]과 분리).
max_completion에서 .min(rag.max_context_tokens) 제거 — 패킹 예산과 LM 출력 예산은 별개.
unfetchable chunks 시 NoChunks fallback — 빈 [근거]를 LLM에 넘기지 않음. 신규 테스트로 pin.
dead _matched_refusal_phrase을 tracing::debug 필드로 wire — 미래 metric 수집의 발판.
AskOpts.k가 floor over default_k라는 의미를 doc-comment에 명문화.
6-8. embedding_ref DRY, fixtures 빈 디렉토리 정리, 거절 시 embedding=Some(...) 의미 코멘트.

핵심 포인트:

9단계 파이프라인이 spec 그대로 직선 흐름. retrieve → 두 단계 score gate (둘 다 LLM 호출 없음) → pack (빈 packed list 시 NoChunks fallback) → render → generate (단일 스레드, sink forward, SendError swallow) → STRICT regex extract (OnceLock cached) → 4-case validate → build Answer → persist.
SqliteStore::put_answer이 inherent method로 추가 — DocumentStore trait 오염 방지.
kb-app::ask 와이어링으로 kb ask이 실제 동작 (Ollama 백엔드 연결 시).
Send + Sync compile-time check로 향후 async UI 통합에 대비.

워크스페이스 default 319 passed / 26 ignored / 0 failed. clippy clean.

inline 코멘트는 모두 핵심 결정에 대한 노트입니다. 머지 진행해도 됩니다. 머지 후 위에 만들어둔 /tmp/kb-smoke/ 워크스페이스에서 실제 Ollama로 kb ask 스모크를 진행할 수 있습니다 (사전: ollama serve + ollama pull qwen2.5:7b-instruct).

P4 단계 종료 — 다음은 P5-1 golden-fixture-runner.

P4-3 코드 리뷰 — 셀프 머지 게이트로 인해 COMMENT only. P4 마지막 작업이라 양쪽 리뷰에서 의미 있는 발견 다수: - spec compliance: 1 MUST-FIX (citation marker wire format `\"#1\"` vs spec `\"[1]\"`) + 여러 NIT. - code quality: 0 BLOCKER + 4 MUST-FIX + 7 NIT. 5개 MUST-FIX 모두 + 3개 핵심 NIT을 PR에 반영했습니다: 1. citation marker wire format `\"#1\"` → `\"[1]\"` (spec §2.3 준수, prompt-side `[#n]`과 분리). 2. `max_completion`에서 `.min(rag.max_context_tokens)` 제거 — 패킹 예산과 LM 출력 예산은 별개. 3. unfetchable chunks 시 NoChunks fallback — 빈 [근거]를 LLM에 넘기지 않음. 신규 테스트로 pin. 4. dead `_matched_refusal_phrase`을 `tracing::debug` 필드로 wire — 미래 metric 수집의 발판. 5. `AskOpts.k`가 floor over `default_k`라는 의미를 doc-comment에 명문화. 6-8. embedding_ref DRY, fixtures 빈 디렉토리 정리, 거절 시 embedding=Some(...) 의미 코멘트. 핵심 포인트: - 9단계 파이프라인이 spec 그대로 직선 흐름. retrieve → 두 단계 score gate (둘 다 LLM 호출 없음) → pack (빈 packed list 시 NoChunks fallback) → render → generate (단일 스레드, sink forward, SendError swallow) → STRICT regex extract (`OnceLock` cached) → 4-case validate → build Answer → persist. - `SqliteStore::put_answer`이 inherent method로 추가 — DocumentStore trait 오염 방지. - kb-app::ask 와이어링으로 `kb ask`이 실제 동작 (Ollama 백엔드 연결 시). - `Send + Sync` compile-time check로 향후 async UI 통합에 대비. 워크스페이스 default 319 passed / 26 ignored / 0 failed. clippy clean. inline 코멘트는 모두 핵심 결정에 대한 노트입니다. 머지 진행해도 됩니다. 머지 후 위에 만들어둔 `/tmp/kb-smoke/` 워크스페이스에서 실제 Ollama로 `kb ask` 스모크를 진행할 수 있습니다 (사전: `ollama serve` + `ollama pull qwen2.5:7b-instruct`). P4 단계 종료 — 다음은 P5-1 golden-fixture-runner.

crates/kb-app/src/lib.rs

						
				@@ -818,0 +867,4 @@

				            let emb_dyn: Arc<dyn Embedder> = emb;

				            let vec_retr = Arc::new(VectorRetriever::with_settings(

				                vec_dyn,

				                emb_dyn,

claude-reviewer-01 commented

2026-05-01 15:08:16 +00:00

ask_with_config이 opts.mode에 따라 retriever를 빌드하고 OllamaLanguageModel을 인스턴스화한 뒤 RagPipeline::new로 조립. P3-5에서 lift된 kb-rag / kb-llm / kb-llm-local 의존성이 P4-3 시점에서 정확히 필요해진 형태. pub use kb_rag::AskOpts로 kb-cli의 use kb_app::AskOpts import는 무변 — backward compat.

`ask_with_config`이 `opts.mode`에 따라 retriever를 빌드하고 `OllamaLanguageModel`을 인스턴스화한 뒤 `RagPipeline::new`로 조립. P3-5에서 lift된 `kb-rag` / `kb-llm` / `kb-llm-local` 의존성이 P4-3 시점에서 정확히 필요해진 형태. `pub use kb_rag::AskOpts`로 kb-cli의 `use kb_app::AskOpts` import는 무변 — backward compat.

crates/kb-rag/src/pipeline.rs

						
				@@ -0,0 +162,4 @@

				        if packed_entries.is_empty() {

				            tracing::warn!(

				                target: "kb-rag",

				                chunks_returned = hits.len(),

claude-reviewer-01 commented

2026-05-01 15:08:16 +00:00

unfetchable chunks → NoChunks fallback 가드. search hits가 있어도 모든 get_chunk이 None을 반환하면 (검색-패킹 사이 chunk 삭제) 빈 [근거]를 LLM에 넘기지 않고 NoChunks refusal로. diagnostic 정확성이 큰 차이를 만드는 부분 — 이전엔 LlmSelfJudge로 misleading하게 분류되던 경로. tracing::warn으로 "왜 fallback이 발생했는지" surface, 신규 테스트로 pin.

`unfetchable chunks → NoChunks fallback` 가드. search hits가 있어도 모든 `get_chunk`이 None을 반환하면 (검색-패킹 사이 chunk 삭제) 빈 [근거]를 LLM에 넘기지 않고 NoChunks refusal로. diagnostic 정확성이 큰 차이를 만드는 부분 — 이전엔 LlmSelfJudge로 misleading하게 분류되던 경로. tracing::warn으로 "왜 fallback이 발생했는지" surface, 신규 테스트로 pin.

crates/kb-rag/src/pipeline.rs

						
				@@ -0,0 +177,4 @@

				        // Completion budget is bounded only by what the LM context window

				        // has left after the input. NOTE: `rag.max_context_tokens` is the

				        // *packing budget* for the [근거] block (used by `pack_context`)

				        // — it is intentionally NOT used here as a completion cap.

claude-reviewer-01 commented

2026-05-01 15:08:16 +00:00

max_completion에서 .min(rag.max_context_tokens) 제거한 결정. 두 예산이 의미상 다르다는 점 — rag.max_context_tokens는 [근거] 패킹 cap, llm.context_tokens()은 LM 입출력 합산 cap — 을 코드 코멘트로 분리. 작은 패킹 cap (테스트용 50 등) 때문에 LM ��력이 64로 압축되던 버그가 닫힘.

`max_completion`에서 `.min(rag.max_context_tokens)` 제거한 결정. 두 예산이 의미상 다르다는 점 — `rag.max_context_tokens`는 [근거] 패킹 cap, `llm.context_tokens()`은 LM 입출력 합산 cap — 을 코드 코멘트로 분리. 작은 패킹 cap (테스트용 50 등) 때문에 LM ��력이 64로 압축되던 버그가 닫힘.

crates/kb-rag/src/pipeline.rs

						
				@@ -0,0 +192,4 @@

				            system: system.clone(),

				            user: user.clone(),

				            stop: vec!["\n\n[질문]".to_string()],

				            max_tokens: max_completion,

claude-reviewer-01 commented

2026-05-01 15:08:16 +00:00

단일 스레드 token loop이 iterator를 소유 + opts.stream_sink가 있으면 forward, SendError은 silently drop. caller가 receiver를 dropped해도 panic 없이 답변 collection + persist까지 진행됨 — dropped_receiver_does_not_abort_generation 테스트로 검증. UI / TUI에서 사용자가 ask을 cancel해도 answers 행은 보존.

단일 스레드 token loop이 iterator를 소유 + opts.stream_sink가 있으면 forward, `SendError`은 silently drop. caller가 receiver를 dropped해도 panic 없이 답변 collection + persist까지 진행됨 — `dropped_receiver_does_not_abort_generation` 테스트로 검증. UI / TUI에서 사용자가 ask을 cancel해도 `answers` 행은 보존.

crates/kb-rag/src/pipeline.rs

						
				@@ -0,0 +254,4 @@

				        });

				        let trimmed_answer = acc.trim();

				        let matched_refusal_phrase = refusal_phrase.is_match(&acc);

				        let grounded = !trimmed_answer.is_empty()

claude-reviewer-01 commented

2026-05-01 15:08:16 +00:00

STRICT marker regex \[#(\d{1,3})\]이 OnceLock으로 한 번만 컴파일. 느슨한 form ([1], vec![1], [ #1 ], [#foo], [#1234]) 모두 거부. 산문/Markdown footnote/코드 블록 내 false positive 차단이 grounded judgment의 정확성을 직접 좌우합니다 — 모델이 [1]만 emit하면 "인용 안 함"으로 처리되어 LlmSelfJudge refusal로 기록됨.

STRICT marker regex `\[#(\d{1,3})\]`이 OnceLock으로 한 번만 컴파일. 느슨한 form (`[1]`, `vec![1]`, `[ #1 ]`, `[#foo]`, `[#1234]`) 모두 거부. 산문/Markdown footnote/코드 블록 내 false positive 차단이 grounded judgment의 정확성을 직접 좌우합니다 — 모델이 `[1]`만 emit하면 "인용 안 함"으로 처리되어 LlmSelfJudge refusal로 기록됨.

crates/kb-rag/src/pipeline.rs

						
				@@ -0,0 +275,4 @@

				                // `[1]`. The `[#1]` form is the *prompt-side* citation

				                // grammar (what the LLM emits in its text); the wire-side

				                // `AnswerCitation.marker` strips the `#`.

				                marker: Some(format!("[{n}]")),

claude-reviewer-01 commented

2026-05-01 15:08:16 +00:00

citation marker wire format을 "#1"에서 "[1]"로 정정한 결정. 두 grammar의 분리가 명확합니다 — prompt-side는 [#n] (LLM이 텍스트에 emit), wire-side는 [n] (design §2.3 line 228). 코드 코멘트로 두 grammar의 차이를 명시. 첫 리뷰에서 spec 위반으로 잡혔던 항목을 정확히 닫음.

citation marker wire format을 `"#1"`에서 `"[1]"`로 정정한 결정. 두 grammar의 분리가 명확합니다 — prompt-side는 `[#n]` (LLM이 텍스트에 emit), wire-side는 `[n]` (design §2.3 line 228). 코드 코멘트로 두 grammar의 차이를 명시. 첫 리뷰에서 spec 위반으로 잡혔던 항목을 정확히 닫음.

crates/kb-rag/src/pipeline.rs

						
				@@ -0,0 +540,4 @@

				/// Build the `ModelRef` recorded in `Answer.embedding` for a given

				/// retrieval mode. `Lexical` paths leave it `None`; vector / hybrid

				/// paths attach the configured embedding model so `kb explain` can

				/// later identify which embedder shaped the retrieval (even on

claude-reviewer-01 commented

2026-05-01 15:08:16 +00:00

embedding_ref_for(mode, cfg) private helper로 main path와 score-gate refusal path의 중복 제거. ScoreGate refusal에서도 vector retriever가 "consulted" 되었으므로 Some(ModelRef)이 의미상 정확하다는 점 — 코멘트로 명문화 — 미래 누군가가 "refusal인데 왜 embedding이 None이 아니지?" 의문을 제기하고 "고치는" 회귀를 차단합니다.

`embedding_ref_for(mode, cfg)` private helper로 main path와 score-gate refusal path의 중복 제거. ScoreGate refusal에서도 vector retriever가 "consulted" 되었으므로 `Some(ModelRef)`이 의미상 정확하다는 점 — 코멘트로 명문화 — 미래 누군가가 "refusal인데 왜 embedding이 None이 아니지?" 의문을 제기하고 "고치는" 회귀를 차단합니다.

crates/kb-store-sqlite/src/answers.rs

						
				@@ -0,0 +1,113 @@

				//! `answers` row writer (P4-3 — design §5.7).

claude-reviewer-01 commented

2026-05-01 15:08:16 +00:00

SqliteStore::put_answer이 inherent method로 추가 — DocumentStore trait에 안 박은 결정이 정답. answer는 document가 아니고, kb-core::DocumentStore을 확장하면 (a) kb-core에 새 타입 추가가 필요해서 spec 위반, (b) 모든 DocumentStore 구현체가 어쩌면 절대 안 쓸 메서드를 강제로 구현해야 함. 기능을 kb-store-sqlite에 inherent로 두면 caller (kb-rag)만 명시적으로 알면 됩니다.

`SqliteStore::put_answer`이 inherent method로 추가 — `DocumentStore` trait에 안 박은 결정이 정답. answer는 document가 아니고, `kb-core::DocumentStore`을 확장하면 (a) kb-core에 새 타입 추가가 필요해서 spec 위반, (b) 모든 DocumentStore 구현체가 어쩌면 절대 안 쓸 메서드를 강제로 구현해야 함. 기능을 `kb-store-sqlite`에 inherent로 두면 caller (kb-rag)만 명시적으로 알면 됩니다.

altair823 merged commit 6210c52efd into main

2026-05-01 15:17:10 +00:00

altair823 deleted branch feat/p4-3-rag-pipeline

2026-05-01 15:17:12 +00:00

altair823 referenced this issue from a commit

2026-05-01 15:17:12 +00:00

Merge pull request 'feat(p4-3): rag-pipeline — kb-rag 크레이트 + kb-app::ask 와이어링' (#23) from feat/p4-3-rag-pipeline into main

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: altair823-org/kebab#23

				`@@ -0,0 +1,113 @@`
				//! `answers` row writer (P4-3 — design §5.7).

feat(p4-3): rag-pipeline — kb-rag 크레이트 + kb-app::ask 와이어링 #23

변경 요약

무엇을 했는가

새 크레이트 kb-rag

9단계 파이프라인

kb-app::ask 와이어링

테스트

의존성

변경 파일

Out of scope

머지 후 사용 예시

새 크레이트 `kb-rag`

`kb-app::ask` 와이어링