e35b06d0d0fc21e43a1af3db479611cf6f45a004
5 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
| e35b06d0d0 |
feat(p4-3): kb-rag crate — full RAG pipeline + kb-app::ask wired
P4 terminal task. Implements the user-facing payoff: retrieve →
score gate → pack → render → generate → cite-validate → persist.
After this commit, `kb ask` actually works against an Ollama
backend; the pipeline grounds the answer in retrieved chunks and
refuses cleanly when the gate trips or the model self-judges.
New crate kb-rag:
- pub struct RagPipeline { retriever, llm, docs, config } — all
Arc<dyn Trait + Send + Sync> so the pipeline shares + Sync.
- pub fn ask(query, opts) -> Result<Answer> drives the nine-stage
flow per spec §1.
- pub struct AskOpts { k, explain, mode, temperature, seed,
stream_sink: Option<mpsc::Sender<String>> }. k acts as a floor
over config.search.default_k so a low-k caller can't starve
retrieval (documented in field doc).
Pipeline stages:
1. Retrieve via the injected dyn Retriever.
2. Score gate: empty hits → NoChunks refusal (no LLM call); top-1 <
config.rag.score_gate → ScoreGate refusal (no LLM call) with
top-3 candidates listed in the synthesized answer text.
3. Pack: budget = config.rag.max_context_tokens.saturating_sub
(prompt overhead). Per-hit `[#n] doc=… heading=… span=…\n<text>`
with deterministic enumeration. If every hit's chunk is
unfetchable from the store (deleted between search and pack),
fall back to NoChunks refusal with a tracing::warn rather than
feeding an empty [근거] to the LLM.
4. Render rag-v1 prompt with the spec's verbatim Korean system
string + `[질문]/[근거]` user template.
5. Generate via dyn LanguageModel. Single-thread token loop owns
the iterator; tokens optionally forward to opts.stream_sink (a
`mpsc::Sender<String>`). SendError silently dropped — caller
cancellation never panics the pipeline. After Done the loop
reads (acc, finish_reason, usage) in lockstep with no race.
max_completion = llm.context_tokens().saturating_sub
(used_for_input).max(64) — explicitly NOT capped by
config.rag.max_context_tokens (that's the packing budget for
[근거], not the LM completion ceiling).
6. Citation extract via STRICT regex `\[#(\d{1,3})\]` (compiled
once via OnceLock). Loose forms `[1]`, `[ #1 ]`, `[#foo]`,
`[#1234]`, `vec![1]` are all rejected to prevent prose
false-positives.
7. Citation validate covers four cases:
- unknown marker (e.g. `[#7]` when only 3 packed) →
LlmSelfJudge refusal.
- empty answer with hits → LlmSelfJudge.
- non-empty + no marker + matches `근거 (가|이) 부족` regex →
LlmSelfJudge (model self-refused with the canonical phrase;
phrase match logged via tracing::debug for observability).
- non-empty + no marker + no refusal phrase → LlmSelfJudge
(silent ungrounded answers are still refusals).
- non-empty + ≥1 valid marker → grounded = true.
8. Build Answer per kb_core::Answer shape:
- citations: filter packed list to exactly the markers cited.
Wire format `marker: Some("[1]")` (square-bracketed bare
index) per design §2.3, distinct from the prompt-side
`[#n]` grammar.
- embedding ModelRef: read from config.models.embedding for
Vector/Hybrid; None for Lexical. Documented deviation since
the Retriever trait doesn't expose the embedder. For
ScoreGate/NoChunks refusals on Vector/Hybrid the embedding
model is still recorded — the vector retriever WAS consulted
even when the gate tripped.
- TraceId minted as `ret_<8-hex>` from blake3(query, top_score,
model_id, ns).
- retrieval AnswerRetrievalSummary populated.
- usage from the final Done chunk; latency_ms wall-clock
fallback when the LLM reports zero.
- created_at OffsetDateTime::now_utc().
9. Persist via SqliteStore::put_answer (new inherent method on
SqliteStore, not on the DocumentStore trait — answers aren't
documents and adding to kb-core was forbidden). Always inserts,
refusals included. packed_chunks_json is null unless
opts.explain == true.
kb-store-sqlite extension:
- pub fn put_answer(&Answer, query, packed_chunks_json) ->
Result<AnswerId>. Maps all 22 fields of the answers table per
V001 schema in a single INSERT under a transaction.
kb-app::ask wired:
- bail!("not yet wired (P4-3)") replaced with a real body that
builds the retriever per opts.mode (Lexical | Vector | Hybrid),
instantiates OllamaLanguageModel from config, constructs
RagPipeline, calls pipeline.ask. AskOpts moves to kb-rag and is
re-exported via `pub use kb_rag::AskOpts` so kb-cli's
`use kb_app::AskOpts` keeps working.
- kb-app/Cargo.toml gains kb-rag, kb-llm, kb-llm-local. P3-5's
forbids on these are lifted by P4-3 spec — kb-app is the
orchestrator and ask requires both the trait crate and the
Ollama adapter.
- kb-cli/main.rs's AskOpts literal updated with stream_sink: None
for the CLI path (TUI in P9 will plumb a real sink).
Tests (kb-rag: 18; kb-app: 1 ignored):
- 3 unit in src/pipeline.rs: marker regex strictness (rejects all
loose forms with byte-equal expectations), Send+Sync compile
check, embedding_ref_for behavior across modes.
- 15 integration in tests/pipeline.rs covering every spec test row
+ the new "all chunks unfetchable falls back to NoChunks" guard:
empty-hits, score-gate, grounded happy path, unknown-marker,
prose-`[1]` rejection, `vec![1]` rejection, refusal-phrase,
packing-budget overflow, streaming-forwards-to-mpsc, dropped-
receiver-no-panic, usage-from-final-Done, answers-row-inserted-
for-each-refusal-kind, determinism temp=0 seed=0, Answer JSON
shape, unfetchable-chunks-fall-back-to-no-chunks (the new
M3 test).
- kb-app/tests/ask_smoke.rs: 1 #[ignore]'d real-Ollama smoke that
drives the wired ask end-to-end against `localhost:11434`.
Workspace: 319 passed / 26 ignored / 0 failed. cargo clippy
--workspace --all-targets -- -D warnings clean.
Allowed deps respected (kb-core, kb-config, kb-search, kb-llm,
kb-store-sqlite, serde, serde_json, regex, time, tracing,
thiserror) plus forced waivers anyhow (Retriever / LanguageModel
trait return types) and blake3 (TraceId minting). Forbidden
(kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-
vector direct, kb-embed* direct, kb-llm-local direct, kb-tui,
kb-desktop) all absent from `cargo tree -p kb-rag` — concrete
adapters reach the pipeline only through trait objects.
Out of scope: reranker between retrieve and pack (P+), multi-turn
chat memory (P+), LLM-as-judge eval (P5 uses rule-based
must_contain), --json streaming (buffers per §0 Q5 hybrid).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|||
| a08f61a242 |
fix(cli): honor --config flag + improve search output legibility
Two issues surfaced during the post-P3-5 manual smoke test against a
six-document workspace:
1. --config flag was silently ignored. kb-cli read cli.config only
while building SourceScope inside the Ingest arm, then called
kb_app::ingest(scope, summary_only) which internally re-loads
Config::load(None) — falling back to ~/.config/kb/config.toml
regardless of what the user passed. Same pattern in search,
list, inspect, doctor. Users had to rely on KB_* env vars to
point at a non-default config.
2. Search output collapsed RRF hybrid scores to "0.02" because
`{:.2}` truncated the (0, 0.033]-bounded fused score, and
chunks from the same document showed up as identical lines
("3. 0.02 arch/rag-architecture.md") since heading_path was
never printed.
Fix:
- kb-app: doctor/ingest/search/list/inspect already had
*_with_config(Config, ...) seams introduced for integration tests
(#[doc(hidden)] pub). Repurpose them as the official "config-explicit"
API — kb-cli now builds the Config once via
Config::load(cli.config.as_deref()) at the top of every subcommand
and threads it into the *_with_config variant. Module doc-comment
updated to reflect three callers (CLI --config, integration tests,
TUI session) instead of "test-only seam".
- kb-app: doctor() rewritten as doctor_with_config_path(Option<&Path>)
that respects an explicit path. config_loaded probe now reports the
actual path checked, returning a clear hard error if --config points
at a non-existent or malformed file (defaults would silently mask
user intent). data_dir_writable resolves storage.data_dir from the
loaded config (with env overrides applied via Config::apply_env) so
--config users see their custom paths reflected. Original doctor()
signature kept as a None-passing wrapper.
- kb-cli: ingest/search/list/inspect/doctor each call the
*_with_config* companion. Search printer switches to {:.4} score
formatting (RRF hybrid range bounded by ~2/k_rrf ≈ 0.033 at k_rrf=60
default) and appends `> head1 / head2` when heading_path is non-
empty so chunks from the same document are visually distinguishable.
Verified manually:
- `kb --config /tmp/kb-smoke/config.toml doctor` reports the
custom config path + custom data_dir, not the XDG defaults.
- `kb --config /tmp/kb-smoke/config.toml search "..." --mode hybrid`
returns hits with distinct 4-digit scores and heading paths
("rust/ownership.md > Rust 소유권 모델 / Borrow checker").
Workspace 269 passed / 24 ignored / 0 failed; cargo clippy
--workspace --all-targets -- -D warnings clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|||
| d91b60325e |
p0-1: address review (apply_env full schema map, drop dead Option in logging::init)
- kb-config::apply_env now covers every leaf key in `Config` via an explicit grep-friendly match block (one arm per leaf), keyed `KB_<SECTION>_<KEY>`. Booleans flow through a shared `parse_bool` helper. Numeric leaves silently keep their prior value on parse failure so a malformed env entry can't crash startup. - New tests: env_unknown_key_is_ignored, env_overrides_chunking_target_tokens, env_overrides_models_llm_endpoint_and_temperature, env_overrides_indexing_watch_filesystem_bool. - kb-app::logging::init now returns `Result<WorkerGuard>` instead of `Result<Option<WorkerGuard>>` — the inner `Option` was always `Some` so the wrapper was dead. kb-cli/main.rs collapses the call from `.ok().flatten()` to `.ok()`, preserving fail-soft semantics on logging init. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
|||
| 5af07c174d |
p0-1: address quality review (wire convention, IngestItemKind re-export, clippy)
Three follow-ups from the code-quality review pass on P0-1:
- Re-export `IngestItemKind` from `kb-core` so downstream tasks
constructing `IngestItem` don't need `kb_core::ingest::IngestItemKind`.
- Document the `--json` wire-schema convention by introducing
`kb-cli/src/wire.rs` with `wire_*` helpers paralleling the existing
inline `wire_ingest`. Each Ok-path `--json` branch now routes through
these helpers so future P1-5/P3/P4/P5 implementations slot the
`schema_version` envelope in automatically. `DoctorReport` keeps its
struct-field `schema_version` (the documented exception), and the
helper round-trips it idempotently. Records the convention in
`kb-app/src/lib.rs`'s top docstring.
- Fix clippy `single_char_add_str` in `kb_core::normalize` (replace
`out.push_str(".")` with `out.push('.')`).
Verified: `cargo check`, `cargo test` (5 new wire-helper tests),
`cargo clippy -D warnings`, and `RUSTFLAGS=-D warnings cargo build` all
clean. Smoke-tested `kb doctor --json` still emits
`{"schema_version":"doctor.v1",...}`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|||
| ec8a4ddb1b |
p0-1: kb-cli clap entry with §10 exit-code mapping
Adds the kb binary with clap v4 derive subcommands mapping 1:1 to
kb-app facade functions:
init | ingest | list docs | inspect (doc|chunk) | search | ask
| doctor | eval run
Global flags: --config, --verbose, --debug, --json. On --json, output
conforms to wire schema v1 (e.g. doctor.v1 emitted by kb-app::doctor).
Exit-code mapping per design §10:
0 success
1 RefusalSignal / NoHitSignal (kb ask refusal, kb search no-hit)
2 any other anyhow::Error
3 DoctorUnhealthy
Tracing initialized at startup with the file appender from kb-app.
Verified via:
XDG_*=… cargo run -p kb-cli -- init → idempotent
XDG_*=… cargo run -p kb-cli -- doctor --json
→ {"schema_version":"doctor.v1","ok":true,…} exit 0
XDG_*=… cargo run -p kb-cli -- doctor (human form, ✓ marks)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|