th-kim0823
d5321701ea
plan(fb-39b): embedding upgrade implementation plan
...
5 tasks: kebab-embed-local resolve_model arm + check_dim test,
kebab-config defaults + TOML template flip, cross-crate fixture
sweep (likely no-op since most tests use provider=none), docs
(design + HOTFIXES + new task spec + INDEX), README + SMOKE
walkthrough.
Post-merge: 0.6 → 0.7 binary bump per CLAUDE.md cascade rule.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 23:02:37 +09:00
th-kim0823
2c3461c465
spec(fb-39b): embedding model upgrade design
...
- multilingual-e5-small (384 dim) → multilingual-e5-large (1024 dim)
- Cascade: embedding_version bump → fb-23 incremental ingest
re-embeds all chunks
- Migration policy: dim mismatch detection at LanceVectorStore::open
→ error.v1 (code = embedding_dim_mismatch) + hint
"kebab reset --vector-only && kebab ingest"
- Config defaults flip (model + dimensions). User TOML pinning small
preserves backwards-compat
- bge-m3 deferred (fastembed enum 미포함, UserDefinedEmbeddingModel
ONNX path 별도)
- Release trigger: 0.6 → 0.7 minor bump per CLAUDE.md cascade rule
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 22:59:03 +09:00
240120ee80
Merge pull request 'feat(fb-39): eval foundation — precision_at_k_chunk metric' ( #136 ) from feat/fb-39-eval-foundation into main
...
Reviewed-on: #136
2026-05-10 13:41:04 +00:00
th-kim0823
5870a1de15
fix(fb-39): address PR #136 round 1 review
...
kebab eval compare now surfaces precision_at_k_chunk delta in both
human-readable table + deltas JSON. Snapshot fixture regenerated
additively.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 22:39:11 +09:00
th-kim0823
f00fb376fe
docs(fb-39): golden header + design §10.3 eval + spec status + INDEX
...
Strengthen fixtures/golden_queries.yaml header with precision_at_k_chunk
explanation + measurement guidance. Add §10.3 Eval metrics section to
frozen design documenting retrieval metrics (hit@k, MRR, recall@k_doc,
P@k_chunk) + groundedness metrics. Flip p9-fb-39 spec status from open
→ completed (eval foundation only, lever deferral noted). Update
tasks/INDEX.md fb-39 row mirror to fb-42 (merged, deferred note).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 22:35:15 +09:00
th-kim0823
bb0ec0469f
feat(eval): precision_at_k_chunk metric (P@5, P@10) (fb-39)
2026-05-10 22:26:21 +09:00
th-kim0823
f303c76f52
plan(fb-39): eval foundation implementation plan
...
4 tasks: AggregateMetrics.precision_at_k_chunk field + serde
backwards-compat, compute aggregation in loop with 5 unit tests,
golden YAML header doc strengthening, design §11 + INDEX + status
flip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 22:19:44 +09:00
th-kim0823
cd5b1e3bfc
spec(fb-39): eval foundation design (P@k metric)
...
- AggregateMetrics 에 precision_at_k_chunk: BTreeMap<u32, f32>
(P@5, P@10) 추가, binary relevance via expected_chunk_ids
- Denominator = k 고정 (hits.len() < k 도 precision 손실 간주)
- Empty expected_chunk_ids query 는 skip (hit_at_k 동일 정책)
- Lever 적용 (chunk policy / RRF / cross-encoder / embedding) 은
본 spec 범위 외 — fb-39b 이후 별도 task
- Golden set schema 무변경, shipped fixtures 헤더 주석만 강화
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 22:05:09 +09:00
3a9a52326d
Merge pull request 'feat(fb-42): bulk multi-query — kebab search --bulk + mcp__kebab__bulk_search' ( #134 ) from feat/fb-42-bulk-multi-query into main
...
Reviewed-on: #134
2026-05-10 12:27:11 +00:00
th-kim0823
b53376e96e
fix(fb-42): address PR #134 round 1 review
...
- print_schema_text plain mode: include bulk_search capability row
- README: tool count 7 → 8, fetch added to MCP tool name lists
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 21:19:20 +09:00
th-kim0823
441f1192ee
docs(fb-42): wire schema + README + SMOKE + design + SKILL + INDEX
...
- Add bulk_search_item.v1 + bulk_search_response.v1 wire schemas
- Register both in WIRE_SCHEMAS const
- README: --bulk flag mention + MCP tool list 7→8 (bulk_search)
- SMOKE: bulk multi-query walkthrough (CLI + MCP equivalent)
- Design §2.2: Bulk multi-query (fb-42) subsection (additive minor)
- SKILL: mcp__kebab__bulk_search section + tool table row
- Task spec status open→completed, banner replaced
- INDEX: fb-42 row 머지 (rerank hint deferred)
- Fix: missed Capabilities {bulk_search} in cli wire.rs test (Task 7 leftover)
- Fix: missed tools.len() 7→8 in cli_mcp_smoke (Task 5 leftover)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 21:07:36 +09:00
th-kim0823
e8da415624
feat(schema): bulk_search capability flag (fb-42)
...
- Capabilities.bulk_search: true (snapshot)
- schema.v1 wire required list updated
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 20:49:09 +09:00
th-kim0823
d8e5f35601
test(mcp): integration tests for bulk_search tool (fb-42)
2026-05-10 20:33:32 +09:00
th-kim0823
6ab0d782ef
feat(mcp): kebab__bulk_search tool (fb-42)
...
Exposes bulk multi-query search via MCP `bulk_search` tool:
- Input: { queries: [SearchInput shapes...] }, capped at 100
- Output: bulk_search_response.v1 with per-query results + summary
- Sequential execution reuses App instance for cache amortization
- Per-query errors embed error.v1 JSON; never aborts bulk call
Updates tool count from 7 to 8 in lib.rs comment + tools_list test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 20:31:20 +09:00
th-kim0823
2bbe94eb05
test(cli): integration tests for kebab search --bulk (fb-42)
2026-05-10 20:26:07 +09:00
th-kim0823
9ac13fa256
fix(cli): make query optional when --bulk is set (fb-42)
2026-05-10 20:26:03 +09:00
th-kim0823
67f2c16cc2
feat(cli): kebab search --bulk flag + stdin ndjson + output stream (fb-42)
2026-05-10 20:22:45 +09:00
th-kim0823
1ebbd6b711
feat(app): bulk_search_with_config facade (fb-42)
2026-05-10 20:18:49 +09:00
th-kim0823
892175d009
feat(core): BulkSearchItem / Summary / Response types (fb-42)
2026-05-10 20:12:31 +09:00
th-kim0823
de9016fe16
plan(fb-42): bulk multi-query implementation plan
...
8 tasks: kebab-core types, kebab-app bulk_search_with_config facade
(cap 100 + per-query error policy), CLI --bulk flag + stdin ndjson +
output stream, CLI integration tests, MCP bulk_search tool +
registration + tools_list count bump, MCP integration tests,
capability flag, wire schemas + README + SMOKE + design + SKILL +
status flip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 20:10:39 +09:00
th-kim0823
35df15df99
spec(fb-42): bulk multi-query design (rerank hint deferred)
...
- CLI: kebab search --bulk + stdin ndjson → stdout per-query ndjson
- MCP: 신규 kebab__bulk_search tool + JSON envelope (results + summary)
- Sequential for-loop, App instance 재사용 (cache amortize)
- Per-query error policy: continue + per-item error.v1
- Limits: queries.len() <= 100
- Capability flag bulk_search 신규
- Rerank hint 별도 task (fb-39 cross-encoder 설계 후)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 20:05:27 +09:00
b0becf43b8
Merge pull request 'chore(handoff): sync release roadmap with shipped state' ( #133 ) from chore/sync-handoff into main
...
Reviewed-on: #133
2026-05-10 10:49:23 +00:00
21ecbb00d4
Merge pull request 'feat(fb-40): fact-grounded answer — rag-v2 prompt template' ( #132 ) from feat/fb-40-fact-grounded-answer into main
...
Reviewed-on: #132
2026-05-10 10:49:06 +00:00
th-kim0823
8cd21e8342
chore(handoff): sync release roadmap with shipped state
...
- 0.3.0 batch (fb-26/27/28 + fb-29 deferral) marked cut
- 0.4.0 batch (fb-30 MCP + fb-31 single-file) marked cut
- 0.5.0 batch (fb-32..37) marked cut on 2026-05-10
- 0.6.0 in progress: fb-38 + fb-40 merged today, fb-39 pending
- fb-41/42 reframed as 0.7.0+ candidates
Note: PR #132 (fb-40) merge updates roadmap header in spec status
table (already flipped via fb-40 PR).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 19:46:28 +09:00
th-kim0823
b35f163f56
fix(fb-40): address PR #132 round 1 review
...
Module doc still pinned "rag-v1" — update to reflect dispatched
template via system_prompt_for (rag-v1 legacy / rag-v2 default).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 19:42:57 +09:00
th-kim0823
600c6182fc
docs(fb-40): rag-v2 prompt + README + design + SKILL + INDEX
...
- README: [rag] prompt_template_version default rag-v2 + V2 강화 3 규칙
- design §7: rag-v2 본문 + V1 legacy note
- SKILL.md: mcp__kebab__ask 응답 행태 변화 안내
- task spec: status open → completed, design + plan 링크
- INDEX: fb-40 ✅ 머지 (2026-05-10)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 19:37:28 +09:00
th-kim0823
0e8b800b6b
test(rag): integration tests for rag-v1/v2/unknown dispatch (fb-40)
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 19:18:36 +09:00
th-kim0823
126559ce7a
fix(fb-40): update test fixtures for rag-v2 default
2026-05-10 19:15:15 +09:00
th-kim0823
137fc4ee31
feat(config): default prompt_template_version rag-v1 → rag-v2 (fb-40)
2026-05-10 19:04:55 +09:00
th-kim0823
59f01f8185
feat(rag): pipeline reads prompt_template_version via helper (fb-40)
2026-05-10 19:02:39 +09:00
th-kim0823
9f70681b77
feat(rag): SYSTEM_PROMPT_RAG_V2 + system_prompt_for dispatch helper (fb-40)
2026-05-10 19:01:05 +09:00
th-kim0823
6d6eb442be
plan(fb-40): fact-grounded answer implementation plan
...
6 tasks: SYSTEM_PROMPT_RAG_V2 + system_prompt_for helper, pipeline
dispatch wiring, config default flip rag-v1 → rag-v2, test fixture
cleanup, integration tests (rag-v1 / rag-v2 / unknown via
CapturingLm wrapper around MockLanguageModel), docs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 18:58:35 +09:00
th-kim0823
28d3250546
spec(fb-40): fact-grounded answer design
...
- rag-v1 → rag-v2 system prompt with 3 신규 규칙 (verbatim span 인용 자도 /
학습 지식 동원 금지 / 추측 금지)
- system_prompt_for(version) helper dispatch in pipeline
- config default prompt_template_version "rag-v1" → "rag-v2", V1 legacy
kept for backwards-compat
- Lever C (pre-LLM gate) already shipped (RefusalReason::ScoreGate),
out of scope here
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 18:55:05 +09:00
945319ae93
Merge pull request 'feat(fb-38): score semantics — score_kind on search_hit.v1 + RRF formula docs' ( #131 ) from feat/fb-38-score-semantics into main
...
Reviewed-on: #131
2026-05-10 09:38:24 +00:00
th-kim0823
c864bd007f
docs(fb-38): wire schema + README + design + SKILL + INDEX
2026-05-10 18:21:55 +09:00
th-kim0823
67aee9f480
test(cli): integration tests for score_kind on lexical mode (fb-38)
2026-05-10 18:12:14 +09:00
th-kim0823
4440fa6659
fix(fb-38): add score_kind to remaining SearchHit literals
...
Add missing score_kind field to SearchHit constructors in:
- kebab-tui/tests/search.rs::make_hit()
- kebab-eval/tests/metrics_and_compare.rs::hit()
- kebab-eval/src/metrics.rs::hit()
All test fixtures default to Rrf (hybrid mode), matching the field's
Default impl and the test semantics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 18:08:29 +09:00
th-kim0823
b51cdb9e8f
feat(search/hybrid): fuse hits override score_kind to Rrf (fb-38)
2026-05-10 17:56:56 +09:00
th-kim0823
4e739f3cd8
feat(search): add score_kind to VectorRetriever (Cosine) and hybrid test helpers (Rrf)
...
This commit unblocks Tasks 3 and 4 of fb-38:
- VectorRetriever::build_hit now labels hits with ScoreKind::Cosine
- Hybrid retriever test helpers (mk_hit functions) label synthetic hits with ScoreKind::Rrf
- Updated lexical snapshot fixture to reflect new score_kind field in output
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 17:54:16 +09:00
th-kim0823
3a621bba0d
feat(search/lexical): label hits with ScoreKind::Bm25 (fb-38 task 2)
...
- Add ScoreKind::Bm25 to LexicalRetriever::build_hit SearchHit construction
- Import ScoreKind from kebab_core in lexical.rs
- Add integration test lexical_retriever_hits_carry_bm25_score_kind to verify all
hits from LexicalRetriever carry score_kind == ScoreKind::Bm25
- Update lexical snapshot test baseline to include new score_kind field
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 17:54:11 +09:00
th-kim0823
3c605b1a5d
feat(core): ScoreKind enum + SearchHit.score_kind (fb-38)
2026-05-10 17:49:02 +09:00
th-kim0823
56f20b7235
plan(fb-38): score semantics implementation plan
...
7 tasks: kebab-core ScoreKind enum + SearchHit field, lexical Bm25
labeling, vector Cosine, hybrid Rrf + search_with_trace pass-through,
cross-crate SearchHit literal cleanup, CLI integration test, docs
(wire schema + README + design + SKILL + INDEX).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 17:45:57 +09:00
th-kim0823
0359bd9682
spec(fb-38): score semantics design
...
- search_hit.v1 에 optional score_kind 필드 (rrf | bm25 | cosine)
- LexicalRetriever → Bm25, VectorRetriever → Cosine, HybridRetriever → Rrf
- fb-37 search_with_trace 의 mode-dispatch hits 는 underlying retriever 의
score_kind 그대로 보존
- README + design §4 + SKILL 에 RRF 수식 전체 + "ranking signal, NOT confidence"
안내, agent 용 trust threshold 는 nested retrieval.{lexical,vector}_score
- additive minor wire — schema bump 없음
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 17:40:47 +09:00
cf3acfc136
Merge pull request 'chore: bump version 0.4 → 0.5' ( #130 ) from chore/bump-v0.5.0 into main
...
Reviewed-on: #130
v0.5.0
2026-05-10 08:08:06 +00:00
th-kim0823
668e1174cc
chore: bump version 0.4 → 0.5
...
v0.5.0 batches fb-32 (stale doc indicator) + fb-33 (streaming ask)
+ fb-34 (output budget controls) + fb-35 (verbatim fetch) + fb-36
(search filter args) + fb-37 (trace + stats).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 17:04:51 +09:00
745a75a82b
Merge pull request 'feat(fb-37): trace + stats — search debug + KB health surface' ( #129 ) from feat/fb-37-trace-and-stats into main
...
Reviewed-on: #129
2026-05-10 07:59:56 +00:00
th-kim0823
6a33d08aea
fix(fb-37): address PR #129 round 1 review
...
- doc TraceFusionInput.fusion_score semantics (single-mode vs hybrid)
- comment why total_ms vs stage sum can drift (millis truncation)
- TODO marker on TUI trace popup filter passthrough
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-10 16:26:34 +09:00
th-kim0823
a40593590b
docs(fb-37): wire schema + README + SMOKE + INDEX + SKILL
2026-05-10 14:13:47 +09:00
th-kim0823
5687cbc0e2
feat(tui): search pane t-key opens TracePopup (fb-37)
2026-05-10 13:39:11 +09:00
th-kim0823
653e432a30
feat(mcp): kebab__search trace input + output mirror (fb-37)
...
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-10 13:32:30 +09:00