Commit Graph

6 Commits

Author SHA1 Message Date
f9714aa5cb docs(rename): kb → kebab — README, tasks/, docs/, design doc, report
마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:01:55 +00:00
ee1f2339dd fix(p5-2): apply push-time review items — citation/refusal correctness + nits
두 reviewer 의 should-fix 4 건 + nit 5 건 push 전 반영.

## should-fix

- `citation_coverage`: 빈 citations[] 가 `Iterator::all` vacuous-true 로
  1.0 새는 거 차단 — `!is_empty() && all(non-empty path)` 로 변경.
  또한 `_store: &SqliteStore` dead 인자 시그니처에서 제거 (호출 사이트
  + 테스트 helper 정리).
- `refusal_correctness`: lexical-only run 에서 `answer == None` 인 경우
  분모 증가 안 함 (NaN/null 출력) — 자동 fail 처리하던 게 metric 의미를
  왜곡함. 새 unit test `refusal_correctness_nan_for_non_rag_run` 추가.
- `groundedness`: `must_contain.is_empty() && forbidden.is_empty()`인
  golden 은 분모에서 제외. unconfigured entry 가 free 1.0 받지 않게.
  새 unit test `groundedness_skips_unconfigured_goldens` 추가.
- `kb-cli/Cargo.toml` rationale 코멘트 사실 오류 정정 — kb-eval →
  kb-app 의존이지 그 반대 아님.

## nits

- `KB_EVAL_GOLDEN` / `DEFAULT_GOLDEN_PATH` 중복 — `metrics::` 의
  `pub(crate)` 로 단일화, `runner` 가 import.
- `render_report_md` 의 `{:?}` `ComparisonKind` → 명시적 lowercase
  매핑 함수 (`win`/`loss`/`draw`/`regression`) — JSON 직렬화 컨벤션과
  통일.
- `extract_chunker_version` `None == None` 매치 silent 위험에 대한
  defensive 코멘트.
- `delta_null_when_either_nan` 테스트의 `let mut` suppress hack →
  struct update syntax 로 정리.
- `empty_store` test helper + 매번 `mem::forget(tmp)` 죽은 코드 제거.

## 추가 spec doc

`tasks/p5/p5-2-metrics-compare.md` deviations 섹션 4 항목 추가:

- `kb-eval` crate-level `kb-app` dep — P5-1 inheritance, 새 모듈 surface
  는 import 안 함.
- `citation_coverage` 약화된 resolver — `document_exists_by_path` 기다리는
  중.
- `refusal_correctness` non-RAG 런 NaN.
- `groundedness` no-check golden skip.

## 검증

- `cargo test -p kb-eval` 35/35 (18 unit + 2 loader + 8 integration + 7
  runner; 새 3 unit test).
- `cargo clippy --workspace --all-targets -- -D warnings` clean.
- `compare_report_snapshot_matches_fixture` 변경 없이 통과 — 새 동작이
  스냅샷 입력 (lexical-only, no must_contain, no should-refuse) 영향 없음.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 03:17:32 +00:00
d9a5b88d27 feat(p5-2): kb-eval metrics + compare — AggregateMetrics, CompareReport, kb eval CLI
P5-2 구현. 저장된 eval_runs / eval_query_results 위에서:

- `kb_eval::metrics`: hit@k / MRR / recall@k_doc / citation_coverage /
  groundedness / empty_result_rate / refusal_correctness 계산. NaN
  metrics (분모 0)는 JSON null. 4-decimal round + Deserialize 추가로
  aggregate_json 라운드트립.
- `kb_eval::compare`: 두 run 비교 → CompareReport (per-metric Δ + per-
  query Win/Loss/Draw/Regression). chunker_version drift 시 graceful
  doc-id fallback (chunker_version_match: "fallback_doc"), `strict`
  옵션이면 refuse.
- `render_report_md`: 인간용 Markdown (집계 + Wins/Losses/Regressions
  표).
- `SqliteStore::{load_eval_run, load_eval_query_results,
  update_eval_run_aggregate}` + owned `EvalRunRecord` /
  `EvalQueryResultRecord` 추가 — write 측 borrow-shape는 그대로.
- `kb eval` CLI: `run` (P5-1 위임), `aggregate <id>`, `compare <a> <b>
  [--strict-chunker-version] [--write-report]`. `--json` 으로 raw
  CompareReport, 기본은 Markdown 출력.

## Spec deviations (intentional, doc 명시)

- Graceful 매칭은 doc-id-only (chunker_version_match: "fallback_doc")
  — 50% span overlap은 chunker re-index 후 양쪽 chunks 동시 보존이
  현실적으로 안 돼서 P6+ 로 deferred.
- `*_with_config` 헬퍼 추가: 통합 테스트가 TempDir Config 로 드라이브.
  no-arg 형태는 Config::load(None) 로 위임.
- CLI 는 kb-cli → kb-eval 직접 wire (kb-app cycle 회피). DoD 의 "via
  kb-app" 의도는 facade 단일화였지만 cycle 발생.
- `AggregateMetrics: Deserialize` 추가 — aggregate_json 라운드트립.

## 검증

- `cargo test -p kb-eval` 30/30 (15 unit + 2 loader + 8 metrics+compare
  통합 + 7 runner). 8 통합 중 snapshot 1 건 (`compare-1.json`).
- `cargo test -p kb-store-sqlite` 33/33.
- `cargo clippy --workspace --all-targets -- -D warnings` clean.
- forbidden imports 부재 (`kb-source-fs|kb-parse|kb-normalize|kb-chunk|
  kb-store-vector|kb-embed|kb-search|kb-llm|kb-rag|kb-tui|kb-desktop|
  kb-app` — kb-app 는 metrics/compare 모듈에 부재; runner 만 사용).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 03:05:13 +00:00
58a11cc2b8 feat(p5-1): kb-eval crate — golden-fixture runner + eval persistence
- new kb-eval crate: load_golden_set (YAML) + run_eval (per-query search/ask + persistence)
- new kb-store-sqlite::eval module: record_eval_run_with_results (transactional), document_exists / chunk_exists probes
- fixtures/golden_queries.yaml: 5-entry KO+EN template
- tests: 13 pass (loader: parse, dup-id, missing chunk_id; runner: elapsed, snapshot, error capture, JSONL, determinism, persistence, config_snapshot)
- per_query.jsonl mirror written to runs_dir/<run_id>/
- temperature=0 + fixed seed → byte-identical per_query.jsonl (lexical)

deviations from spec (documented in code):
- run_id uses uuid::Uuid::now_v7().simple() (timestamp-ordered hex) instead of ULID — uuid already in workspace deps
- load_golden_set_validated kept #[cfg(test)] pub(crate) — production inlines validate_against_db
- snapshot fixture uses normalized projection (id/query/mode/first_hit) — full byte-determinism covered by separate test
- index_version in config_snapshot left null (composed per call by kb-app, not config-level)

deferred to follow-up:
- App reuse across queries (currently rebuilds App per query)
- expand_path hoist to kb-config (3 crate clones now)
- --max-queries flag (deferred to P5-2 per updated spec)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 18:01:09 +00:00
kb
b999a12ab5 tasks: address PR #1 review
- p3-3: SQLite-first/Lance-second + status marker (V003__embedding_status); drop "best-effort 2PC" misnomer
- p4-3: replace print_stream FnMut closure with mpsc::Sender<String> (RagPipeline stays Send+Sync)
- p4-3: tighten citation regex to strict [#<n>] only — reject [n]/prose/code-block false positives
- p5-2: compare_runs across chunker_version is graceful (doc + span overlap fallback) with chunker_version_match audit field; --strict-chunker-version restores refusal
- p7-1: per-page text via lopdf (pdf-extract has no per-page Rust API); use char count for spans
- p8-1: explicit rubato (FftFixedIn) for 16 kHz mono resample; symphonia decode only
- p9-5: drop cmd_read_pdf_page + pdfium native dep; cmd_read_file_bytes + frontend pdfjs; add traversal tests
2026-04-27 13:10:31 +00:00
kb
597a848af9 tasks: add P5 component specs (runner, metrics) 2026-04-27 12:04:06 +00:00