Commit Graph

5 Commits

Author SHA1 Message Date
e35b06d0d0 feat(p4-3): kb-rag crate — full RAG pipeline + kb-app::ask wired
P4 terminal task. Implements the user-facing payoff: retrieve →
score gate → pack → render → generate → cite-validate → persist.
After this commit, `kb ask` actually works against an Ollama
backend; the pipeline grounds the answer in retrieved chunks and
refuses cleanly when the gate trips or the model self-judges.

New crate kb-rag:
- pub struct RagPipeline { retriever, llm, docs, config } — all
  Arc<dyn Trait + Send + Sync> so the pipeline shares + Sync.
- pub fn ask(query, opts) -> Result<Answer> drives the nine-stage
  flow per spec §1.
- pub struct AskOpts { k, explain, mode, temperature, seed,
  stream_sink: Option<mpsc::Sender<String>> }. k acts as a floor
  over config.search.default_k so a low-k caller can't starve
  retrieval (documented in field doc).

Pipeline stages:
1. Retrieve via the injected dyn Retriever.
2. Score gate: empty hits → NoChunks refusal (no LLM call); top-1 <
   config.rag.score_gate → ScoreGate refusal (no LLM call) with
   top-3 candidates listed in the synthesized answer text.
3. Pack: budget = config.rag.max_context_tokens.saturating_sub
   (prompt overhead). Per-hit `[#n] doc=… heading=… span=…\n<text>`
   with deterministic enumeration. If every hit's chunk is
   unfetchable from the store (deleted between search and pack),
   fall back to NoChunks refusal with a tracing::warn rather than
   feeding an empty [근거] to the LLM.
4. Render rag-v1 prompt with the spec's verbatim Korean system
   string + `[질문]/[근거]` user template.
5. Generate via dyn LanguageModel. Single-thread token loop owns
   the iterator; tokens optionally forward to opts.stream_sink (a
   `mpsc::Sender<String>`). SendError silently dropped — caller
   cancellation never panics the pipeline. After Done the loop
   reads (acc, finish_reason, usage) in lockstep with no race.
   max_completion = llm.context_tokens().saturating_sub
   (used_for_input).max(64) — explicitly NOT capped by
   config.rag.max_context_tokens (that's the packing budget for
   [근거], not the LM completion ceiling).
6. Citation extract via STRICT regex `\[#(\d{1,3})\]` (compiled
   once via OnceLock). Loose forms `[1]`, `[ #1 ]`, `[#foo]`,
   `[#1234]`, `vec![1]` are all rejected to prevent prose
   false-positives.
7. Citation validate covers four cases:
   - unknown marker (e.g. `[#7]` when only 3 packed) →
     LlmSelfJudge refusal.
   - empty answer with hits → LlmSelfJudge.
   - non-empty + no marker + matches `근거 (가|이) 부족` regex →
     LlmSelfJudge (model self-refused with the canonical phrase;
     phrase match logged via tracing::debug for observability).
   - non-empty + no marker + no refusal phrase → LlmSelfJudge
     (silent ungrounded answers are still refusals).
   - non-empty + ≥1 valid marker → grounded = true.
8. Build Answer per kb_core::Answer shape:
   - citations: filter packed list to exactly the markers cited.
     Wire format `marker: Some("[1]")` (square-bracketed bare
     index) per design §2.3, distinct from the prompt-side
     `[#n]` grammar.
   - embedding ModelRef: read from config.models.embedding for
     Vector/Hybrid; None for Lexical. Documented deviation since
     the Retriever trait doesn't expose the embedder. For
     ScoreGate/NoChunks refusals on Vector/Hybrid the embedding
     model is still recorded — the vector retriever WAS consulted
     even when the gate tripped.
   - TraceId minted as `ret_<8-hex>` from blake3(query, top_score,
     model_id, ns).
   - retrieval AnswerRetrievalSummary populated.
   - usage from the final Done chunk; latency_ms wall-clock
     fallback when the LLM reports zero.
   - created_at OffsetDateTime::now_utc().
9. Persist via SqliteStore::put_answer (new inherent method on
   SqliteStore, not on the DocumentStore trait — answers aren't
   documents and adding to kb-core was forbidden). Always inserts,
   refusals included. packed_chunks_json is null unless
   opts.explain == true.

kb-store-sqlite extension:
- pub fn put_answer(&Answer, query, packed_chunks_json) ->
  Result<AnswerId>. Maps all 22 fields of the answers table per
  V001 schema in a single INSERT under a transaction.

kb-app::ask wired:
- bail!("not yet wired (P4-3)") replaced with a real body that
  builds the retriever per opts.mode (Lexical | Vector | Hybrid),
  instantiates OllamaLanguageModel from config, constructs
  RagPipeline, calls pipeline.ask. AskOpts moves to kb-rag and is
  re-exported via `pub use kb_rag::AskOpts` so kb-cli's
  `use kb_app::AskOpts` keeps working.
- kb-app/Cargo.toml gains kb-rag, kb-llm, kb-llm-local. P3-5's
  forbids on these are lifted by P4-3 spec — kb-app is the
  orchestrator and ask requires both the trait crate and the
  Ollama adapter.
- kb-cli/main.rs's AskOpts literal updated with stream_sink: None
  for the CLI path (TUI in P9 will plumb a real sink).

Tests (kb-rag: 18; kb-app: 1 ignored):
- 3 unit in src/pipeline.rs: marker regex strictness (rejects all
  loose forms with byte-equal expectations), Send+Sync compile
  check, embedding_ref_for behavior across modes.
- 15 integration in tests/pipeline.rs covering every spec test row
  + the new "all chunks unfetchable falls back to NoChunks" guard:
  empty-hits, score-gate, grounded happy path, unknown-marker,
  prose-`[1]` rejection, `vec![1]` rejection, refusal-phrase,
  packing-budget overflow, streaming-forwards-to-mpsc, dropped-
  receiver-no-panic, usage-from-final-Done, answers-row-inserted-
  for-each-refusal-kind, determinism temp=0 seed=0, Answer JSON
  shape, unfetchable-chunks-fall-back-to-no-chunks (the new
  M3 test).
- kb-app/tests/ask_smoke.rs: 1 #[ignore]'d real-Ollama smoke that
  drives the wired ask end-to-end against `localhost:11434`.

Workspace: 319 passed / 26 ignored / 0 failed. cargo clippy
--workspace --all-targets -- -D warnings clean.

Allowed deps respected (kb-core, kb-config, kb-search, kb-llm,
kb-store-sqlite, serde, serde_json, regex, time, tracing,
thiserror) plus forced waivers anyhow (Retriever / LanguageModel
trait return types) and blake3 (TraceId minting). Forbidden
(kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-
vector direct, kb-embed* direct, kb-llm-local direct, kb-tui,
kb-desktop) all absent from `cargo tree -p kb-rag` — concrete
adapters reach the pipeline only through trait objects.

Out of scope: reranker between retrieve and pack (P+), multi-turn
chat memory (P+), LLM-as-judge eval (P5 uses rule-based
must_contain), --json streaming (buffers per §0 Q5 hybrid).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:06:10 +00:00
a08f61a242 fix(cli): honor --config flag + improve search output legibility
Two issues surfaced during the post-P3-5 manual smoke test against a
six-document workspace:

1. --config flag was silently ignored. kb-cli read cli.config only
   while building SourceScope inside the Ingest arm, then called
   kb_app::ingest(scope, summary_only) which internally re-loads
   Config::load(None) — falling back to ~/.config/kb/config.toml
   regardless of what the user passed. Same pattern in search,
   list, inspect, doctor. Users had to rely on KB_* env vars to
   point at a non-default config.

2. Search output collapsed RRF hybrid scores to "0.02" because
   `{:.2}` truncated the (0, 0.033]-bounded fused score, and
   chunks from the same document showed up as identical lines
   ("3. 0.02  arch/rag-architecture.md") since heading_path was
   never printed.

Fix:

- kb-app: doctor/ingest/search/list/inspect already had
  *_with_config(Config, ...) seams introduced for integration tests
  (#[doc(hidden)] pub). Repurpose them as the official "config-explicit"
  API — kb-cli now builds the Config once via
  Config::load(cli.config.as_deref()) at the top of every subcommand
  and threads it into the *_with_config variant. Module doc-comment
  updated to reflect three callers (CLI --config, integration tests,
  TUI session) instead of "test-only seam".
- kb-app: doctor() rewritten as doctor_with_config_path(Option<&Path>)
  that respects an explicit path. config_loaded probe now reports the
  actual path checked, returning a clear hard error if --config points
  at a non-existent or malformed file (defaults would silently mask
  user intent). data_dir_writable resolves storage.data_dir from the
  loaded config (with env overrides applied via Config::apply_env) so
  --config users see their custom paths reflected. Original doctor()
  signature kept as a None-passing wrapper.
- kb-cli: ingest/search/list/inspect/doctor each call the
  *_with_config* companion. Search printer switches to {:.4} score
  formatting (RRF hybrid range bounded by ~2/k_rrf ≈ 0.033 at k_rrf=60
  default) and appends `> head1 / head2` when heading_path is non-
  empty so chunks from the same document are visually distinguishable.

Verified manually:
- `kb --config /tmp/kb-smoke/config.toml doctor` reports the
  custom config path + custom data_dir, not the XDG defaults.
- `kb --config /tmp/kb-smoke/config.toml search "..." --mode hybrid`
  returns hits with distinct 4-digit scores and heading paths
  ("rust/ownership.md > Rust 소유권 모델 / Borrow checker").

Workspace 269 passed / 24 ignored / 0 failed; cargo clippy
--workspace --all-targets -- -D warnings clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:46:37 +00:00
d91b60325e p0-1: address review (apply_env full schema map, drop dead Option in logging::init)
- kb-config::apply_env now covers every leaf key in `Config` via an
  explicit grep-friendly match block (one arm per leaf), keyed
  `KB_<SECTION>_<KEY>`. Booleans flow through a shared `parse_bool` helper.
  Numeric leaves silently keep their prior value on parse failure so a
  malformed env entry can't crash startup.
- New tests: env_unknown_key_is_ignored,
  env_overrides_chunking_target_tokens,
  env_overrides_models_llm_endpoint_and_temperature,
  env_overrides_indexing_watch_filesystem_bool.
- kb-app::logging::init now returns `Result<WorkerGuard>` instead of
  `Result<Option<WorkerGuard>>` — the inner `Option` was always `Some` so
  the wrapper was dead. kb-cli/main.rs collapses the call from
  `.ok().flatten()` to `.ok()`, preserving fail-soft semantics on logging
  init.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:53:59 +00:00
5af07c174d p0-1: address quality review (wire convention, IngestItemKind re-export, clippy)
Three follow-ups from the code-quality review pass on P0-1:

- Re-export `IngestItemKind` from `kb-core` so downstream tasks
  constructing `IngestItem` don't need `kb_core::ingest::IngestItemKind`.
- Document the `--json` wire-schema convention by introducing
  `kb-cli/src/wire.rs` with `wire_*` helpers paralleling the existing
  inline `wire_ingest`. Each Ok-path `--json` branch now routes through
  these helpers so future P1-5/P3/P4/P5 implementations slot the
  `schema_version` envelope in automatically. `DoctorReport` keeps its
  struct-field `schema_version` (the documented exception), and the
  helper round-trips it idempotently. Records the convention in
  `kb-app/src/lib.rs`'s top docstring.
- Fix clippy `single_char_add_str` in `kb_core::normalize` (replace
  `out.push_str(".")` with `out.push('.')`).

Verified: `cargo check`, `cargo test` (5 new wire-helper tests),
`cargo clippy -D warnings`, and `RUSTFLAGS=-D warnings cargo build` all
clean. Smoke-tested `kb doctor --json` still emits
`{"schema_version":"doctor.v1",...}`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 05:33:31 +00:00
ec8a4ddb1b p0-1: kb-cli clap entry with §10 exit-code mapping
Adds the kb binary with clap v4 derive subcommands mapping 1:1 to
kb-app facade functions:

  init | ingest | list docs | inspect (doc|chunk) | search | ask
  | doctor | eval run

Global flags: --config, --verbose, --debug, --json. On --json, output
conforms to wire schema v1 (e.g. doctor.v1 emitted by kb-app::doctor).

Exit-code mapping per design §10:
  0 success
  1 RefusalSignal / NoHitSignal (kb ask refusal, kb search no-hit)
  2 any other anyhow::Error
  3 DoctorUnhealthy

Tracing initialized at startup with the file appender from kb-app.

Verified via:
  XDG_*=… cargo run -p kb-cli -- init    → idempotent
  XDG_*=… cargo run -p kb-cli -- doctor --json
    → {"schema_version":"doctor.v1","ok":true,…} exit 0
  XDG_*=… cargo run -p kb-cli -- doctor   (human form, ✓ marks)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 05:17:18 +00:00