f9714aa5cb
docs(rename): kb → kebab — README, tasks/, docs/, design doc, report
...
마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.
- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
`kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
`kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
`kb doctor` / `kb inspect` / `kb list` / `kb eval` →
`kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
`target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
`completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).
## 검증
- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
--include="*.md"` 0 hits (task-decomposition.md 의 git author
제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
로 update).
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-02 04:01:55 +00:00
ee1f2339dd
fix(p5-2): apply push-time review items — citation/refusal correctness + nits
...
두 reviewer 의 should-fix 4 건 + nit 5 건 push 전 반영.
## should-fix
- `citation_coverage`: 빈 citations[] 가 `Iterator::all` vacuous-true 로
1.0 새는 거 차단 — `!is_empty() && all(non-empty path)` 로 변경.
또한 `_store: &SqliteStore` dead 인자 시그니처에서 제거 (호출 사이트
+ 테스트 helper 정리).
- `refusal_correctness`: lexical-only run 에서 `answer == None` 인 경우
분모 증가 안 함 (NaN/null 출력) — 자동 fail 처리하던 게 metric 의미를
왜곡함. 새 unit test `refusal_correctness_nan_for_non_rag_run` 추가.
- `groundedness`: `must_contain.is_empty() && forbidden.is_empty()`인
golden 은 분모에서 제외. unconfigured entry 가 free 1.0 받지 않게.
새 unit test `groundedness_skips_unconfigured_goldens` 추가.
- `kb-cli/Cargo.toml` rationale 코멘트 사실 오류 정정 — kb-eval →
kb-app 의존이지 그 반대 아님.
## nits
- `KB_EVAL_GOLDEN` / `DEFAULT_GOLDEN_PATH` 중복 — `metrics::` 의
`pub(crate)` 로 단일화, `runner` 가 import.
- `render_report_md` 의 `{:?}` `ComparisonKind` → 명시적 lowercase
매핑 함수 (`win`/`loss`/`draw`/`regression`) — JSON 직렬화 컨벤션과
통일.
- `extract_chunker_version` `None == None` 매치 silent 위험에 대한
defensive 코멘트.
- `delta_null_when_either_nan` 테스트의 `let mut` suppress hack →
struct update syntax 로 정리.
- `empty_store` test helper + 매번 `mem::forget(tmp)` 죽은 코드 제거.
## 추가 spec doc
`tasks/p5/p5-2-metrics-compare.md` deviations 섹션 4 항목 추가:
- `kb-eval` crate-level `kb-app` dep — P5-1 inheritance, 새 모듈 surface
는 import 안 함.
- `citation_coverage` 약화된 resolver — `document_exists_by_path` 기다리는
중.
- `refusal_correctness` non-RAG 런 NaN.
- `groundedness` no-check golden skip.
## 검증
- `cargo test -p kb-eval` 35/35 (18 unit + 2 loader + 8 integration + 7
runner; 새 3 unit test).
- `cargo clippy --workspace --all-targets -- -D warnings` clean.
- `compare_report_snapshot_matches_fixture` 변경 없이 통과 — 새 동작이
스냅샷 입력 (lexical-only, no must_contain, no should-refuse) 영향 없음.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-02 03:17:32 +00:00
d9a5b88d27
feat(p5-2): kb-eval metrics + compare — AggregateMetrics, CompareReport, kb eval CLI
...
P5-2 구현. 저장된 eval_runs / eval_query_results 위에서:
- `kb_eval::metrics`: hit@k / MRR / recall@k_doc / citation_coverage /
groundedness / empty_result_rate / refusal_correctness 계산. NaN
metrics (분모 0)는 JSON null. 4-decimal round + Deserialize 추가로
aggregate_json 라운드트립.
- `kb_eval::compare`: 두 run 비교 → CompareReport (per-metric Δ + per-
query Win/Loss/Draw/Regression). chunker_version drift 시 graceful
doc-id fallback (chunker_version_match: "fallback_doc"), `strict`
옵션이면 refuse.
- `render_report_md`: 인간용 Markdown (집계 + Wins/Losses/Regressions
표).
- `SqliteStore::{load_eval_run, load_eval_query_results,
update_eval_run_aggregate}` + owned `EvalRunRecord` /
`EvalQueryResultRecord` 추가 — write 측 borrow-shape는 그대로.
- `kb eval` CLI: `run` (P5-1 위임), `aggregate <id>`, `compare <a> <b>
[--strict-chunker-version] [--write-report]`. `--json` 으로 raw
CompareReport, 기본은 Markdown 출력.
## Spec deviations (intentional, doc 명시)
- Graceful 매칭은 doc-id-only (chunker_version_match: "fallback_doc")
— 50% span overlap은 chunker re-index 후 양쪽 chunks 동시 보존이
현실적으로 안 돼서 P6+ 로 deferred.
- `*_with_config` 헬퍼 추가: 통합 테스트가 TempDir Config 로 드라이브.
no-arg 형태는 Config::load(None) 로 위임.
- CLI 는 kb-cli → kb-eval 직접 wire (kb-app cycle 회피). DoD 의 "via
kb-app" 의도는 facade 단일화였지만 cycle 발생.
- `AggregateMetrics: Deserialize` 추가 — aggregate_json 라운드트립.
## 검증
- `cargo test -p kb-eval` 30/30 (15 unit + 2 loader + 8 metrics+compare
통합 + 7 runner). 8 통합 중 snapshot 1 건 (`compare-1.json`).
- `cargo test -p kb-store-sqlite` 33/33.
- `cargo clippy --workspace --all-targets -- -D warnings` clean.
- forbidden imports 부재 (`kb-source-fs|kb-parse|kb-normalize|kb-chunk|
kb-store-vector|kb-embed|kb-search|kb-llm|kb-rag|kb-tui|kb-desktop|
kb-app` — kb-app 는 metrics/compare 모듈에 부재; runner 만 사용).
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-02 03:05:13 +00:00
58a11cc2b8
feat(p5-1): kb-eval crate — golden-fixture runner + eval persistence
...
- new kb-eval crate: load_golden_set (YAML) + run_eval (per-query search/ask + persistence)
- new kb-store-sqlite::eval module: record_eval_run_with_results (transactional), document_exists / chunk_exists probes
- fixtures/golden_queries.yaml: 5-entry KO+EN template
- tests: 13 pass (loader: parse, dup-id, missing chunk_id; runner: elapsed, snapshot, error capture, JSONL, determinism, persistence, config_snapshot)
- per_query.jsonl mirror written to runs_dir/<run_id>/
- temperature=0 + fixed seed → byte-identical per_query.jsonl (lexical)
deviations from spec (documented in code):
- run_id uses uuid::Uuid::now_v7().simple() (timestamp-ordered hex) instead of ULID — uuid already in workspace deps
- load_golden_set_validated kept #[cfg(test)] pub(crate) — production inlines validate_against_db
- snapshot fixture uses normalized projection (id/query/mode/first_hit) — full byte-determinism covered by separate test
- index_version in config_snapshot left null (composed per call by kb-app, not config-level)
deferred to follow-up:
- App reuse across queries (currently rebuilds App per query)
- expand_path hoist to kb-config (3 crate clones now)
- --max-queries flag (deferred to P5-2 per updated spec)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-01 18:01:09 +00:00
kb
b999a12ab5
tasks: address PR #1 review
...
- p3-3: SQLite-first/Lance-second + status marker (V003__embedding_status); drop "best-effort 2PC" misnomer
- p4-3: replace print_stream FnMut closure with mpsc::Sender<String> (RagPipeline stays Send+Sync)
- p4-3: tighten citation regex to strict [#<n>] only — reject [n]/prose/code-block false positives
- p5-2: compare_runs across chunker_version is graceful (doc + span overlap fallback) with chunker_version_match audit field; --strict-chunker-version restores refusal
- p7-1: per-page text via lopdf (pdf-extract has no per-page Rust API); use char count for spans
- p8-1: explicit rubato (FftFixedIn) for 16 kHz mono resample; symphonia decode only
- p9-5: drop cmd_read_pdf_page + pdfium native dep; cmd_read_file_bytes + frontend pdfjs; add traversal tests
2026-04-27 13:10:31 +00:00
kb
597a848af9
tasks: add P5 component specs (runner, metrics)
2026-04-27 12:04:06 +00:00