Commit Graph

7 Commits

Author SHA1 Message Date
f9714aa5cb docs(rename): kb → kebab — README, tasks/, docs/, design doc, report
마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:01:55 +00:00
9dde01eb9f fix(rag): normalize RRF fusion_score to [0,1] + log post-merge hotfixes
## Bug

config.rag.score_gate default 0.05 was incompatible with hybrid RRF
fusion_score: raw RRF tops out at num_retrievers / (k_rrf + 1) ≈
0.0328 at the default k_rrf=60, so every hybrid `kb ask` tripped
ScoreGate refusal even when the top hit was perfectly aligned across
both retrievers. Symptomatic on the post-P4-3 manual smoke at
/tmp/kb-smoke/ pointed at 192.168.0.47 Ollama:

    $ kb ask "Rust ownership 모델의 핵심 규칙은 뭐야?" --mode hybrid
    근거 부족. KB에 해당 내용 없음.        # top fusion_score = 0.0164

Per-mode score_gate (lexical_score_gate / vector_score_gate /
hybrid_score_gate) was rejected because it forces every consumer
(CLI, eval, TUI) to know which mode picks which threshold. Score
normalization solves it at the source.

## Fix

crates/kb-search/src/hybrid.rs divides every fused score by
2 / (k_rrf + 1), the theoretical RRF maximum with two retrievers
each contributing rank 1. After normalization:

- both retrievers agree on rank 1 → fusion_score = 1.0
- only one retriever finds the chunk → caps near 0.5
- typical mixed ranks → falls between 0 and 0.5

RRF's rank-ordering invariants are preserved (every score divides
by the same positive constant), so sort + tiebreak behaviour is
unchanged. Wire schema label `fusion_score` keeps its slot in
RetrievalDetail; only the magnitude shifts, and only for hybrid
mode (lexical / vector were already in [0, 1]).

Verification: re-ran the four-scenario smoke at /tmp/kb-smoke/ with
default score_gate = 0.05 — all four (Korean→Korean, English→
English, cross-language Korean↔English, out-of-corpus) succeed
with the expected grounded / refusal classification, top
fusion_score now ≈ 0.5.

## Tests

One unit test (rrf_formula_matches_known_value) updated to expect
the normalized value `(1/61 + 1/62) / (2/61) ≈ 0.9919` instead of
the raw `1/61 + 1/62 ≈ 0.0325`. The integration snapshot fixture
crates/kb-search/tests/fixtures/search/hybrid/run-1.json already
used presence checks (fusion_score_positive: true) rather than
absolute values, so it doesn't need regeneration. Workspace 319
tests pass; clippy clean across both feature configs.

## Docs

This commit also adds tasks/HOTFIXES.md as a dated post-merge log
covering this fix and the two earlier --config-flag regressions
(P3-5 hotfix #20 across ingest/search/list/inspect/doctor; P4-3
follow-up #24 for kb ask). Original task specs in tasks/p<N>/
*.md stay frozen as the historical contract; HOTFIXES.md is the
live source of truth for post-merge deltas. Each affected task
spec gets a "Risks/notes" addendum pointing back to HOTFIXES.md
so a reader landing on the spec sees the active behaviour:

- tasks/INDEX.md gains a "Post-merge 핫픽스" section.
- tasks/phase-3-vector-hybrid.md updates the RRF formula text to
  show the normalized form.
- tasks/p3/p3-4-hybrid-fusion.md "Behavior contract" RRF bullet
  notes the normalization and reason.
- tasks/p3/p3-5-app-wiring.md "Risks/notes" notes the --config
  fix.
- tasks/p4/p4-3-rag-pipeline.md "Risks/notes" notes the kb-ask
  --config fix and the score_gate-RRF incompatibility (closed by
  the normalization in p3-4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:16:01 +00:00
78bf78c8be docs(p3-5): add app-wiring task spec; INDEX + phase-3 updates
P1–P3 shipped libraries but kb-app facade is still all `bail!("not yet
wired")` stubs, so the CLI is structurally complete but unusable.
Insert p3-5 between P3-4 and P4-1 to swap the facade bodies — ingest,
search, list_docs, inspect_doc, inspect_chunk — into real
compositions of the libraries shipped through P3-4. `ask` stays
stubbed; P4-3 owns it.

After p3-5 merges:
- `kb index` walks a workspace and persists chunks (SQLite +
  optionally LanceDB).
- `kb search --mode {lexical,vector,hybrid}` returns real SearchHits
  with citations.
- `kb list` / `kb inspect doc|chunk` round-trip from the store.

Updates:
- New task spec at tasks/p3/p3-5-app-wiring.md (depends_on
  p1-6/p2-2/p3-2/p3-3/p3-4; unblocks p4-3/p9-1/p9-2/p9-4).
- tasks/INDEX.md bumps P3 component count 4 → 5 and adds the link.
- tasks/phase-3-vector-hybrid.md replaces the speculative
  `embed_index` facade signature with the actual frozen kb-app
  surface and updates the phase completion checklist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:32:42 +00:00
kb
9fa38543a8 refactor(spec): introduce kb-parse-types thin crate
PR #1 review left a design-debt note: ParsedBlock landing in kb-core would
(a) force every crate to recompile on parser-internal changes, and
(b) cause namespace pollution when P6/P7/P8 parsers add their own variants.

Resolution: a new thin crate kb-parse-types sits between kb-core and parsers.
Owns ParsedBlock + ParsedPayload + Warning + forward-refs for image/pdf/audio
parser intermediates. Depends on kb-core only (for SourceSpan / Inline).

Updates:
- design §3.7b: add new section defining kb-parse-types
- design §8: add kb-parse-types to module-boundary diagram + forbidden list
- design §3.4 Inline stays in kb-core; kb-parse-types references it (no duplication)
- p0-1 skeleton: workspace + Cargo deps + public surface block
- p1-3 parse-md-blocks: outputs Vec<kb_parse_types::ParsedBlock> directly
- p1-4 normalize: Allowed gains kb-parse-types, drops cross-coupling note
- INDEX + phase-0 epic: list kb-parse-types in P0 deliverables
2026-04-27 20:41:35 +00:00
kb
eedaeccff2 tasks: update INDEX with full component-task tree (30 specs) 2026-04-27 12:14:58 +00:00
kb
3f1ef86ee6 tasks: prepare P1 component decomposition skeleton 2026-04-27 11:42:22 +00:00
kb
b565b330d9 add frozen design doc and task index
- design: docs/superpowers/specs/2026-04-27-kb-final-form-design.md
- locks UX shape, wire schema v1, domain model, ID recipe, DDL, layout, traits, module boundaries, versioning, errors
- tasks/INDEX.md + 10 phase docs derived from kb_local_rust_report.md
2026-04-27 11:17:24 +00:00