kebab

Author	SHA1	Message	Date
altair823	e613236d60	feat(cli): kebab ingest progress display (p9-fb-02) + p9-fb-01 status flip `kebab ingest` 가 진행 상황을 사용자에게 보여주는 두 surface 추가: - 사람 모드 (TTY): indicatif `ProgressBar` on stderr — scan 중에는 spinner, ScanCompleted 후 bar 로 전환, 매 asset 마다 message 갱신. - 사람 모드 (non-TTY, CI/pipe): indicatif draw target 을 hidden 으로 두고 stderr 에 한 줄씩 (`ingest: scanning`, `ingest: 1/N path`, `ingest: complete (...)`). - `--json` 모드: stderr 비우고 stdout 에 line-delimited `ingest_progress.v1` JSON 을 emit. 마지막 줄은 기존 `ingest_report.v1` 그대로 (외부 wrapper backward-compat). 구현: - 신규 `crates/kebab-cli/src/progress.rs` — `ProgressMode::{Json, Human { tty }}`, `ProgressDisplay` (background thread 가 channel drain + 모드별 render), `now_rfc3339` helper. mode 가 무엇이든 ts 는 wire emit 시점에 stamp. - `crates/kebab-cli/src/wire.rs` 에 `wire_ingest_progress` 추가. serde tag (`kind`) 위에 `schema_version` + `ts` 두 필드 더해 spec §2.4a wire shape 완성. - `Cmd::Ingest` 핸들러: mpsc channel 만들고 background thread 가 display 돌리는 동안 main 이 `ingest_with_config_progress` 호출. ingest 반환 시 Sender drop → display thread 정상 종료. join 후 최종 ingest_report 출력. - 새 dep: `indicatif` 0.17 (TTY 전용 진행 바, non-TTY/--json 에서는 hidden draw target). Test: - 3 lib unit (mode resolution + RFC 3339 round-trip). - 3 integration (--json line-delimited / non-TTY stderr text / ts+kind 검증). 16 PASS 전체 회귀 0. Plan 갱신: - p9-fb-01: status `in_progress` → `completed` (PR #52 머지 후속). - p9-fb-02: status `planned` → `in_progress`. 머지 후 별도 한 줄 commit 으로 `completed` flip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 19:57:02 +00:00
altair823	eb331f9b29	feat(app): add IngestEvent + ingest_with_config_progress (p9-fb-01) Streaming progress channel for ingest. Facade emits one IngestEvent per step boundary into an optional `mpsc::Sender<IngestEvent>` injected by the caller. CLI (p9-fb-02), TUI (p9-fb-03), and future desktop UI all consume the same stream. 신규: - crates/kebab-app/src/ingest_progress.rs: `IngestEvent` enum (`#[serde(tag = "kind", rename_all = "snake_case")]` matching wire schema ingest_progress.v1) + `AggregateCounts` struct + `media_label` helper + best-effort `emit` helper. - ingest_with_config_progress(cfg, scope, summary_only, progress) — 존재 시 `mpsc::Sender<IngestEvent>` 로 ScanStarted → ScanCompleted → (AssetStarted < AssetFinished)* → Completed 발신. dropped receiver 는 silent absorb (hot path stall 금지). - 기존 ingest_with_config 가 `progress=None` forwarding wrapper. 미적용 (계약 상 향후 task 가 채움): - IngestEvent::Aborted: cancel token wiring 은 p9-fb-04. - embed_batch_started / embed_batch_finished: spec 의 \"asset 이벤트 사이 임의 위치\" 에 해당. v1 단순화 — asset 단위 해상도면 CLI / TUI 충분. Test: - 6 lib unit (media_label / serde discriminator / emit corner cases). - 3 integration (이벤트 sequence 가 §2.4a invariant 준수 / forwarding wrapper / dropped receiver tolerance). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 19:44:34 +00:00
altair823	8879bd4de2	chore(p9-fb-06): sync Cargo.lock for tempfile dev-dep + flip spec to completed Two follow-ups after PR #49 (kebab reset) merged: - Cargo.lock: kebab-cli's new dev-dep `tempfile` was committed in the feature PR but the lockfile entry was not regenerated, leaving main with a stale lock. `cargo metadata` regenerates the one-line addition to kebab-cli's dependency list. - tasks/p9/p9-fb-06-data-reset-command.md: status `in_progress` → `completed`, per the plan's task 6 commitment to flip in a separate one-line commit only after the implementation PR merges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 18:47:23 +00:00
altair823	9d96d603b0	chore(tasks): mark p9-fb-06 in_progress Final flip to completed lands in a separate one-line commit AFTER PR merges so spec history reflects reality. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 18:39:12 +00:00
altair823	d445df515d	review(회차2): nit 2건 — 권장 실행 순서 표기 명확화 회차 2 의 가독성 nit 반영: - '5번 debounce' → 'p9-fb-08 debounce' (task ID 명시) - '12 와 같은 batch 가능, 11 prerequisite' → 'p9-fb-12 와 같은 batch 가능, p9-fb-11 의 prerequisite' - 13번 항목 'prerequisite for 18' → 'prerequisite for p9-fb-18' (일관 패턴) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 18:02:54 +00:00
altair823	a757e2cdb3	review(회차1): 회차 1 지적 5건 반영 - p9-dogfooding-feedback.md item 14: README 오타 (READE → README) - p9-fb-11.md frontmatter: depends_on=[p9-fb-14] 추가 (14.unblocks 와 양방향 정합) - p9-fb-01.md Behavior contract: '14 번과 wiring' 모호 cross-ref 정정 — cancel wiring 은 p9-fb-04, TUI 신호는 p9-fb-03 - plan File Structure: 'tasks/HOTFIXES.md — n/a (skip)' 자기모순 제거 → 별도 HOTFIXES 절로 분리 - plan task 4 handler: let _ = data_only; 제거, pattern binding 자체를 data_only: _ 로 변경 (관용적) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 18:01:14 +00:00
altair823	5428412688	docs(p9): decompose dogfooding feedback into 20 task specs + reset plan P9-1~P9-4 머지 후 사용자가 직접 도그푸딩 하며 수집한 16 항목 UX 피드백을 20 개 single-PR 사이즈 task spec 으로 분해. 각 spec 은 frontmatter (depends_on / unblocks / source_feedback), Goal, Allowed deps, Public surface, Behavior contract, Test plan, DoD, Out of scope 절 포함. 추가: - p9-fb-01 ~ 20-*.md: 분해된 task spec 20 개 - p9-dogfooding-feedback.md: master index + 우선순위 + 권장 실행 순서 + spec PR vs impl PR 절 - INDEX.md: p9-fb-01 ~ 20 link 추가 - docs/superpowers/plans/2026-05-02-p9-fb-06-reset-command.md: 첫 후속 작업 (kebab reset 명령) 의 6-task 구현 plan - .gitignore: .worktrees/ 추가 (superpowers worktree skill 용) 피드백 항목 → task spec 매핑은 p9-dogfooding-feedback.md 의 표 참조. 실행 시작 task: p9-fb-06 (reset 명령) — 도그푸딩 막힘 강도 1위. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 17:54:15 +00:00
altair823	893287a5a3	fix(config + tilde): LLM default → gemma4:e4b + workspace.root ~ expansion 일관성 도그푸딩 시 사용자 결정 (2026-05-02): 텍스트 LLM 기본을 gemma4 계열로 통일. OCR/caption 어댑터 (P6-2/P6-3) 가 이미 gemma4:e4b 사용 중 — 사용자가 한 family 만 pull 하면 ingest + ask 모두 작동. 같이 발견된 ~ expansion 불일치: - kebab-source-fs::connector 는 expand_tilde 사용 (walk 정상) - kebab-app::ingest_one_image_asset / ingest_one_pdf_asset 은 직접 PathBuf::from → ~ 미확장 → ExtractContext 에 ~/KnowledgeBase 그대로 전달 - kebab-tui::search::handle_key_search 의 editor jump 도 동일 → 의미 없는 경로 spawn Fix: - Config::defaults().models.llm.model = \"gemma4:e4b\". OCR/caption family 통일 코멘트 추가. - kebab-app 의 image / pdf 분기 두 곳 모두 expand_tilde 호출. - kebab-tui::search jump 가 kebab_config::expand_path(.., \"\") 사용 (expand_path 는 ~ / ${XDG_DATA_HOME} / {data_dir} 모두 처리하는 정식 helper). Caveat: kebab-app::expand_tilde 와 kebab-config::expand_path 가 별도 정의. 통합은 P+ task. Docs (sync rule): - README 사전 요구 절: gemma4:e4b 기본 + 더 큰 variant override 안내. - docs/ARCHITECTURE 핵심 결정 표: LLM default qwen2.5:7b-instruct → gemma4:e4b. - docs/SMOKE: ollama pull 예시 + KEBAB_MODELS_LLM_MODEL env 예시 qwen2.5:32b → gemma4:26b. - HOTFIXES: 새 entry (\"Config defaults: LLM = gemma4:e4b + workspace.root tilde expansion\"). - Memory: project_llm_default.md 신설, MEMORY.md 인덱스 추가. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 16:34:24 +00:00
altair823	b6e0ab352f	feat(kebab-tui): P9-4 Inspect pane — doc/chunk detail with collapsible sections Library Enter / Search 'i' 가 Inspect 진입. Doc 또는 Chunk 단일 view 로 metadata / provenance / blocks (doc) 또는 spans / text / embeddings (chunk) 6 section 을 collapsible 로 표시. Esc/q 로 originating pane 으로 복귀. 핵심: - InspectTarget enum (`Doc(DocumentId) \| Chunk(ChunkId)`). - InspectState 본체 (`app.rs`) — target / doc / chunk / collapsed HashSet / scroll / return_to / needs_fetch / loading. - `src/inspect.rs`: - `render_inspect` — target 종류별 render_doc / render_chunk 분기, section header 가 collapse marker (▾/▸) 표시. metadata.user JSON pretty-printed. - `handle_key_inspect`: j/k / Down/Up scroll. PageDown/PageUp 10 row. c = toggle all sections (v1 simplification). Esc/q = SwitchPane(return_to). - `enter_inspect(state, target, return_to)` helper — Library 와 Search 공통 entry point. - run-loop hook `refresh_inspect` — needs_fetch 면 lazy inspect_doc_with_config / inspect_chunk_with_config. - run.rs: Pane::Inspect arm 이 handle_key_inspect + render_inspect. Idle tick 마다 refresh_inspect. SwitchPane(Inspect) lazy init. - Library: Enter 가 enter_inspect(Doc(selected)) 호출 후 SwitchPane. - Search: 'i' (plain modifier) 가 enter_inspect(Chunk(selected_hit)) 호출 후 SwitchPane. typing 'i' (\"instance\") 와 충돌 가드. 테스트 12개 (`tests/inspect.rs`, TestBackend) — Esc 가 return_to 사용 / q 도 동작 / j/k scroll bounds / PgUp PgDn ±10 / c 일괄 toggle / no target hint / loading / doc view header+metadata+provenance+blocks / collapse hides body / chunk view text+block_ids / no slot → SwitchPane(Library) / enter_inspect helper sets fields. Spec deviation (HOTFIXES `2026-05-02 P9-4`): - `render_inspect<B: Backend>` generic 제거 (P9-1/2/3 와 동일). - Search `i` 키 추가 (P9-2 spec 에 없었음, P9-4 retroactive 추가). - `c` 일괄 collapse — spec 의 \"focus 기반 selective collapse\" 는 P+. Docs (sync rule): - README: TUI 행 \"4 패널\" + Quick start 코멘트. - HANDOFF: 한 줄 요약 + Phase status (P9 3/5 → 4/5) + deviation 한 줄. - HOTFIXES: P9-4 entry. - tasks/p9/p9-4 status: completed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 15:41:11 +00:00
altair823	f08fefec1d	feat(kebab-tui): P9-3 Ask pane — streaming answer + citation panel + explain toggle P9-1 Library 의 ? 키 활성화. App.ask slot 채움 (parallel-safety contract 그대로). Worker thread 가 kebab-app::ask_with_config 호출하면서 AskOpts.stream_sink 로 token 을 mpsc 채널 에 보냄, 메인 스레드 (TUI) 는 매 render frame 마다 drain 으로 문자열 누적 → 답변 영역 이 token-by-token 업데이트. 핵심: - AskState 본체 (`app.rs`) — input / explain / streaming / partial / answer / thread JoinHandle / rx Receiver / scroll / last_error. - `src/ask.rs`: - `render_ask` — input bar / 답변 영역 (streaming 시 ▍ cursor) / bottom split (status: grounded/model/prompt/k/refusal · citations or explain panel). - `handle_key_ask`: typing → input. Enter → spawn_ask_worker (input 있음 + not streaming). e (input empty 시) → toggle explain. j/k (input empty 시) → scroll. Esc → SwitchPane(Library) + streaming/rx/thread 클리어 (best-effort cancel). - `spawn_ask_worker` — mpsc::channel + thread::spawn(\|\| ask_with_config). - run-loop hooks: `drain_stream` (try_iter → partial), `poll_worker` (handle.is_finished → take + join → answer 채움 또는 ErrorOverlay). - run.rs: Pane::Ask arm 이 handle_key_ask + render_ask. Idle tick 마다 drain_stream + poll_worker. SwitchPane(Ask) 시 lazy init. 테스트 13개 (`tests/ask.rs`) — Esc/typing/backspace/e toggle (input empty)/e typed (input nonempty)/Enter empty/Enter while streaming no-op/render pre-submission hint/streaming partial+cursor/grounded answer + citation [1]/refusal score_gate 패널 panic 없음/explain panel title flip/no slot. Spec deviation (HOTFIXES `2026-05-02 P9-3`): - `render_ask<B: Backend>` generic 제거 — ratatui 0.28 Frame backend-agnostic (P9-1/P9-2 와 동일). - e/j/k 가 input 빈 상태 일 때만 command 키, 입력 있으면 typing — vim "command vs insert" 변형. spec literal 의 단순 \"e=toggle\" 은 \"explain\" / \"javascript\" 같은 단어 입력 깨뜨림. Docs (sync rule): - README: TUI 행 \"Library + Search + Ask 패널\" + Quick start 코멘트. - HANDOFF: 한 줄 요약 + Phase status (P9 2/5 → 3/5) + deviation 한 줄. - HOTFIXES: P9-3 entry. - tasks/p9/p9-3 status: completed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 15:24:26 +00:00
altair823	0490b6a126	feat(kebab-tui): P9-2 Search pane — input + dense hits + preview + editor jump Library 의 / 키가 활성화. App.search slot 이 lazy 채워지고 (run loop 가 SwitchPane(Search) 받을 때), debounce 200 ms 후 kebab-app::search 호출, 선택된 hit 의 chunk 를 preview pane 에 표시. g 키로 $EDITOR (vim/nvim/code/cursor 자동 감지) 에서 citation 위치 열림. 핵심: - SearchState 본체 (`app.rs` 의 forward decl 채움) — input / mode / hits / selected_hit / input_dirty_at / last_query / searching / preview. - `src/search.rs` (신규): - `render_search(f, area, state)` — 3-pane layout (input bar / 결과 리스트 / preview). 각 hit 는 §1.5 dense 4-line format (rank.score URI / heading / snippet). - `handle_key_search`: typing → input + dirty mark. Tab → mode 순환. Enter → immediate refresh. j/k → 선택 이동 + preview invalidate. g → editor jump (RAII raw-mode suspend). Esc → Library 복귀. - `build_jump_command(citation, editor_env, workspace_root)` 가 vim 류 `+<line> path` / VS Code `code -g path:line` / cursor `cursor -g` 자동 분기. unit test 로 잠금. - `jump_to_citation` 가 raw-mode + AltScreen 을 RAII 로 suspend/restore (panic 안전). - run-loop hook 4 함수: `debounce_due` / `fire_search` / `refresh_preview` (private to crate). - run.rs: - Pane::Search arm 이 `handle_key_search` 로 dispatch + `render_search`. - SwitchPane(Search) 시 `app.search = Some(SearchState::default())` lazy init. - Idle tick 마다 debounce_due → fire_search, preview None → refresh_preview. - 테스트 13개 (`tests/search.rs`) — Esc/typing/backspace/Tab cycle/Enter refresh/j-k 이동/jump cmd vim+code+args/render w/hits/empty render/no slot. Spec deviation (HOTFIXES `2026-05-02 P9-2`): - `render_search<B: Backend>` generic 제거 (P9-1 와 동일 사유 — ratatui 0.28 Frame backend-agnostic). - `jump_to_citation` 가 `workspace_root: &Path` 인자 추가. Citation.path 가 workspace 상대 라 editor 호출 시 절대 경로 필요. spec literal 의 시그니처 는 unimplementable. Docs (sync rule): - README: TUI 행 \"Library + Search 패널, ask/inspect 진행 중\" + Quick start 의 `kebab tui` 코멘트 갱신. - HANDOFF: 한 줄 요약 + Phase status (P9 1/5 → 2/5) + deviation 한 줄 추가. - HOTFIXES: P9-2 entry 추가. - tasks/p9/p9-2 status: completed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 14:38:17 +00:00
altair823	43ff4048e8	feat(kebab-tui): P9-1 Ratatui shell + Library pane 새 crate `kebab-tui` 가 §8 facade rule 따라 `kebab-app` 만 import. Ratatui 0.28 + crossterm 0.28 기반 shell 이 다음을 제공: - `App` 구조체: config + focus + library + 3 Option sub-state slot (search/ask/inspect — p9-2/3/4 가 자기 모듈에서 채우는 parallel-safety contract). p9-1 외에 App 정의 손대지 않음. - `Pane` enum (Library/Search/Ask/Inspect/Jobs). - `KeyOutcome` (Continue/Quit/SwitchPane/Refresh). - `LibraryState` + 내부 inner: docs / list_state / filter / filter_edit / needs_refresh / loading / pending_g. - `render_library` (Frame, area, &App) — heading/body, filter overlay toggleable, Korean/wide-char 너비는 unicode-width 로 계산. - `handle_key_library`: j/k/Down/Up 이동, gg/G 끝, f 필터 overlay, /=>Search ?=>Ask Enter=>Inspect, q/Esc 종료. error overlay 가 켜 있으면 어떤 키든 dismiss. - 필터 overlay: tags_any (CSV) + lang 두 필드, Tab cycle, Enter apply→Refresh, Esc cancel. - `ErrorOverlay`: anyhow chain 캡쳐 후 popup 렌더 (Clear + 빨간 border). - 터미널 lifecycle: `TuiTerminal` 가 enter raw mode + alt screen, Drop 이 종료 시 (panic 포함) restore — 사용자 쉘 깨지지 않게. - 비동기 없음: facade 호출은 main thread 동기. v1 의 brief hang 수용. CLI: `kebab tui` 서브커맨드 추가, --config 받아 App::new + run. 테스트 10건 (`tests/library.rs`, TestBackend 사용): - 빈 library / 3-doc render / q,Esc quit / / Search 전환 / ? Ask 전환 - Enter 빈 list 무동작 / Enter Inspect 전환 / j 이동 (3-step clamp) / f 필터 overlay → 입력 → Enter Refresh. Test seam: `App::populate_library_for_testing` (#[doc(hidden)]) 가 `pub(crate)` inner 를 우회. spec parallel-safety contract 그대로 유지. Spec deviation (HOTFIXES `2026-05-02 P9-1`): - `render_library` 의 `<B: Backend>` generic 제거 — ratatui 0.28 의 Frame 이 backend-agnostic. - `populate_library_for_testing` 추가 (test seam, 공식 API 아님). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 13:26:24 +00:00
altair823	0c8821f857	fix(kebab-store-vector): close P7-3 vector orphan caveat — delete_by_chunk_ids P7-3 의 storage UNIQUE bug fix 가 SQLite 측 (documents → blocks / chunks / embedding_records) 만 sweep 했음. LanceDB 의 vector 는 별도 store 라 옛 chunk_id 를 가진 row 가 디스크에 잔존. 검색에는 영향 없지만 디스크는 무한 누적. HOTFIXES `2026-05-02 P7-3` caveat 의 "P+ task" 약속을 같은 후속 PR 안에서 닫음. 변경: - `VectorStore::delete_by_chunk_ids(&[ChunkId])` trait method 추가 (default no-op 제공 — 테스트 fake / 기존 impl 이 그대로 컴파일). - `LanceVectorStore::delete_by_chunk_ids` 가 connection 의 모든 `chunk_embeddings_` 테이블을 순회 + `Table::delete("chunk_id IN (...)")` 를 batch=200 단위로 실행. 다중 모델 워크스페이스 (마이그레이션 중간 등) 에서도 안전. - `SqliteStore::stale_chunk_ids_at(workspace_path, new_asset_id)` 가 read-only SELECT 로 옛 chunk_id 들 반환. CASCADE 가 흐르기 전* 에 caller 가 호출. - `kebab-app::purge_vector_orphans_for_workspace_path` 가 위 두 단계를 orchestrate. 세 ingest path (markdown / image / pdf) 의 `put_asset_with_bytes` 호출 직전에 한 줄로 호출. Smoke 검증 (release binary, fastembed enabled): - whitepaper.pdf 첫 ingest → chunk_ids = {f616…, 4e0f…}, vector store 에 그 두 ID 의 row 존재. - byte 변경 후 re-ingest → 새 doc_id (3741…) + 새 chunk_ids (ed0c…, e13c…). vector search "REWRITTEN chapter two" → 새 chunk_ids 만 hit. 옛 query "Edited page two body" 시도해도 옛 chunk_ids 는 vector store 에 더 이상 없음 (의미적으로 가장 가까운 새 chunks 가 hit). HOTFIXES `2026-05-02 P7-3` 의 \"vector store cleanup\" 항목이 \"deferred\" → \"closed by follow-up PR\" 로 갱신. SMOKE.md 의 알려진 동작 (\"옛 vector 잔존\") 도 \"두 store 정합\" 으로 갱신. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 12:32:29 +00:00
altair823	3a57cab1eb	fix(kebab-store-sqlite): purge stale assets row on workspace_path orphan + smoke P7-3 통합 테스트가 노출한 storage 레이어 버그 fix. `assets.workspace_path` 의 UNIQUE 제약과 `upsert_asset_row` 의 `ON CONFLICT(asset_id)` 만 처리하던 gap 사이 — byte 가 변경된 자산 re-ingest 시 새 asset_id 가 같은 workspace_path 에서 secondary UNIQUE 충돌. md / image / pdf 모두 영향. Fix: - 새 helper `purge_orphan_at_workspace_path` 가 같은 `workspace_path` 의 다른 `asset_id` 를 발견하면 documents → assets 순서로 sweep. documents 의 ON DELETE RESTRICT 회피 + CASCADE 로 blocks / chunks / embedding_records 정리. copied 모드면 storage_path 의 byte 파일도 best-effort 삭제. - `put_asset_with_bytes` 의 두 분기 (copy / reference) + `DocumentStore ::put_asset` 모두 호출. - 회귀 테스트 `put_asset_with_bytes_sweeps_workspace_path_orphan` (이전 의 "UPSERT 실패시 orphan 청소" 테스트가 더 이상 doable 하지 않으므로 대체). - `re_ingest_edited_pdf_produces_new_doc_id` integration `#[ignore]` 해제 → 9 통합 테스트 모두 default 로 통과. Vector store orphan 은 별도 P+ task — LanceDB 가 SQLite cascade 와 무관하게 운영되므로 stale chunk_id vector 가 디스크에 남음. 검색에는 영향 없음 (search 가 SQLite join 통해 surface). Smoke 검증 (release binary, markdown 2 + image 1 + PDF 2): - doctor pass - 첫 ingest: 5 new - list docs: 5 docs all media types - search lexical "pdf-page-v1 chunker" → whitepaper.pdf hit - search hybrid → cross-media 결과 - inspect doc PDF: parser_version=pdf-text-v1, blocks 가 SourceSpan::Page - 동일 byte re-ingest: 5 updated, 0 errors (P1 idempotency) - byte 수정 후 re-ingest: 1 new (해당 PDF) + 4 updated, 0 errors (storage fix) - corrupt PDF 추가: errors+=1 + IngestItem.error 메시지 정확, 다른 자산 영향 0 - 정리 후 다시 ingest: errors=0 - RAG ask: PDF 인용 + `citations[].citation` 에 `kind: "page"` + `page: <N>` + `path: <pdf_path>` 정확히 노출 운영 fixture 보조: - `crates/kebab-parse-pdf/examples/gen_smoke_pdf.rs` — `cargo run --release --example gen_smoke_pdf -p kebab-parse-pdf -- <out.pdf> <text-pages>` 로 reportlab/qpdf 없이 in-tree PDF 생성. - `crates/kebab-parse-image/examples/gen_smoke_png.rs` — 동일 방식의 PNG fixture 생성. - SMOKE.md 가 두 example 사용법 + 갱신된 HOTFIXES 동작 (byte 수정 시 errors+=1 → new+=1) 반영. HOTFIXES `2026-05-02 P7-3` entry 가 \"deferred\" → \"fixed in same PR\" 로 업데이트, vector store orphan caveat 만 남음. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 11:41:23 +00:00
altair823	5f3a37cafa	feat(kebab-app): P7-3 PDF ingest wiring — kebab ingest 가 PDF 자산도 처리 P7-1 (`PdfTextExtractor`) + P7-2 (`PdfPageV1Chunker`) 의 라이브러리를 `kebab-app::ingest_with_config` 에 와이어링. `kebab-source-fs` 가 이미 `*.pdf` 를 `MediaType::Pdf` 로 분류하던 자산이 이제 검색 가능한 doc 으로 색인됨. P6-4 image wiring 패턴과 평행 — `ingest_one_asset` 에 `MediaType::Pdf` arm 추가, 새 private fn `ingest_one_pdf_asset` 로 분기. 핵심 동작: - per-medium chunker 선택: PDF 자산은 `PdfPageV1Chunker` 하드코딩 (compile-time match 기반). `config.chunking.chunker_version` 은 markdown 만 represent — PDF 는 항상 `pdf-page-v1`. HOTFIXES entry `2026-05-02 P7-3` 에 deviation 기록. - encrypted PDF / corrupt PDF → `errors+=1` + P7-1 의 `qpdf --decrypt` hint 를 `IngestItem.error` 에 verbatim 보존. - 빈/scanned candidate 페이지 → 0 chunk, P7-1 의 `Provenance::Warning` 그대로 통과. v1 에서는 검색 불가, P+ scanned-PDF OCR fallback 대기. - determinism stress: extract → chunk 사이 `now()` 추가 호출 없음 (P6-4 invariant 계승). PDF doc/chunk_id 모두 결정적. 통합 테스트 (`tests/pdf_pipeline.rs`, 8 passed + 1 ignored): - 3-page text PDF → 1 doc + 3 chunk + Page span 검증 - identical re-ingest → Updated, doc_id 동일 - encrypted PDF → Error + `qpdf` hint 보존 - corrupt header PDF → Error + 미저장 - mixed page (page 2 빈) → 2 chunk + Warning 1개 - IngestReport 산술 invariant - 50-page 긴 PDF → ≥50 chunk - inspect doc → SourceSpan::Page round-trip - (ignored) edited bytes re-ingest → storage UNIQUE bug 노출, P+ fix 대기 추가 발견 (HOTFIXES `2026-05-02 P7-3`): `assets.workspace_path` 의 UNIQUE 제약과 `upsert_asset_row` 의 `ON CONFLICT(asset_id)` 만 처리하는 부분 사이에 gap 존재. byte 변경 시 새 asset_id → 같은 workspace_path 충돌. md / image / pdf 모두 영향. P7-3 통합 테스트가 처음 노출. 본 PR 은 fix 안 함 — P+ storage task. `docs/SMOKE.md` 에 PDF 섹션 + 검증 체크리스트 + 알려진 동작 4건 추가. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:28:06 +00:00
altair823	a6e536d3c4	review(p7-3 spec): 회차 1 지적 반영 - RAG `kebab ask` 통합 테스트 → out of scope 로 이동. RAG citation 라운드트립 검증은 P4-3 의 책임 영역이고 PDF chunk 도 Citation shape 동일. wiremock+RAG 인프라를 P7-3 통합 테스트에 신설하는 비용이 본 task 의 invariant 와 비례하지 않음. - byte-edit re-ingest 케이스 추가. 동일 byte (P1 idempotency) 외에 byte 수정 → 새 doc_id → `new+=1` 시나리오를 명시적으로 잠금. - "Embedding call fails" 행을 `Embedder::embed Err → errors+=1, doc + chunks rows 는 이미 저장됨, 재실행 시 재시도" 로 명시. md path 코드 (lib.rs:621+) 와 일치. - Dispatch 절에 "이전엔 PDF 가 Skipped 였음 → 머지 후 skipped→scanned 이동" 운영 jump 한 줄 추가. - `PdfPageV1Chunker::chunk Err` 비고를 "P7-1 contract drift OR future routing bug — defensive validation either way" 로 정확화. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:11:17 +00:00
altair823	aebf900b2f	docs(tasks): P7-3 pdf ingest wiring task spec P7-1 (`PdfTextExtractor`) + P7-2 (`PdfPageV1Chunker`) 의 라이브러리는 완성됐지만 `kebab-app::ingest` 가 `MediaType::Pdf` 를 dispatch 하지 않아 CLI 에서 PDF 가 보이지 않는 상태. P7-3 이 그 와이어링을 다룬다 — P6-4 의 image wiring 패턴과 평행. 핵심 결정 (spec 본문): - 새 private fn `ingest_one_pdf_asset` (P6-4 의 `ingest_one_image_asset` 와 평행). `ingest_one_asset` match 에 `MediaType::Pdf` arm 추가. - per-medium chunker 선택: PDF 는 `PdfPageV1Chunker` 하드코딩 (md 는 `MdHeadingV1Chunker` 그대로). `config.chunking.chunker_version` 은 PDF ingest 에서 무시 (deviation, HOTFIXES 추가 예정). - encrypted PDF / corrupt PDF → `errors+=1` + `IngestItem.error` 에 P7-1 의 `qpdf --decrypt` 안내 그대로 보존. - 빈/scanned candidate 페이지 → asset 인덱싱, 빈 페이지 0 chunk, P7-1 emit 한 `Provenance::Warning` 그대로 통과. 향후 OCR fallback 까지는 검색 불가 (out of scope). - determinism stress: extract → chunk 사이에 `now()` 추가 호출 금지 (P6-4 와 동일 invariant). - 11 통합 테스트 + smoke 업데이트 (별도 implementation PR). `tasks/INDEX.md` P7 components 2 → 3 반영. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:07:03 +00:00
altair823	7ee0ac9894	feat(kebab-chunk): P7-2 pdf-page-v1 chunker — page-aware splitting `PdfPageV1Chunker` 가 `kebab-parse-pdf` 가 emit 한 `CanonicalDocument` (블록당 한 페이지, 모두 `SourceSpan::Page`) 를 받아 페이지 경계를 절대 넘지 않는 `Chunk` 들을 생성. `chunker_version = "pdf-page-v1"`. 핵심 동작: - 페이지 텍스트가 `target_tokens × BYTES_PER_TOKEN` (= 3) 안이면 한 덩어리. 초과 시 `\n\n` (paragraph) 또는 sentence-end 구두점 + whitespace 경계를 segment 로 보고 greedy 누적, 기본 한 chunk 당 최소 한 segment. - 다음 chunk 의 prefix 에 `overlap_tokens × BYTES_PER_TOKEN` 만큼의 직전 꼬리를 prepend (char 단위, 이전 chunk 시작 너머로 backtrack 안 함). - 빈/공백-only 페이지는 0 chunk (페이지의 `Provenance::Warning` 으로 `kebab-parse-pdf` 단계에 이미 표시됨). - 비-PDF doc (Block::Paragraph 가 아니거나 SourceSpan 이 Page 아님) → 명시 에러. Spec deviation (HOTFIXES 2026-05-02 P7-2): - `chunk_id` 충돌 가드: 같은 페이지에서 여러 chunk 가 나오면 `block_ids` 가 모두 같아 §4.2 recipe 가 충돌. `id_for_chunk` 의 `policy_hash` 인풋을 per-chunk 로 `format!("{base}#c{char_start}")` 변형해 회피. recipe 자체는 불변. `Chunk.policy_hash` 필드는 base 유지. - `BYTES_PER_TOKEN = 3` (md-heading-v1 실제 코드와 일치). spec 본문은 "/ 4" 라고 했지만 그 자체가 md-heading-v1 의 실코드와 어긋나 있어 일관성 쪽을 택함. cross-chunker `policy_hash` 동일성 unit test 로 잠금. 테스트 (10개 신규): - chunker_version label, 3-page small, 1-page huge + overlap + chunk_id 유일성, empty page skip, whitespace-only skip, non-PDF error, cross-page boundary 절대 안 만들어짐, determinism (1000회), snapshot shape 안정, md-heading-v1 와 policy_hash 동일. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 08:51:44 +00:00
altair823	5a158d7343	feat(kebab-parse-pdf): P7-1 text PDF extractor — per-page CanonicalDocument `PdfTextExtractor`(MediaType::Pdf) lopdf 기반 per-page 텍스트 추출. 페이지마다 `Block::Paragraph` + `SourceSpan::Page { page, char_start, char_end }` emit. 본문이 비거나 추출 panic 인 페이지는 빈 paragraph + `Provenance::Warning` ("scanned candidate") 로 표시 — 이후 OCR fallback (별도 task) 의 입력. 핵심 동작: - `lopdf::Document::load_mem` + `is_encrypted()` → 암호화 PDF 는 명시 에러 (`qpdf --decrypt` 안내). - 페이지 단위 `extract_text(&[page])` 를 `catch_unwind` 로 감싸 malformed page panic 을 recoverable warning 으로 변환. - `/Info` dict 에서 Title/Producer/Creator best-effort 추출. UTF-16BE BOM prefixed 문자열도 디코드 (한국어 등 non-ASCII Title 정상 처리). - 9개 통합 테스트: 3-page emit, scanned-mixed warning, encrypted refuse, corrupt header error, page_count 메타, UTF-16BE Title, filename fallback, determinism, snapshot. `parser_version = "pdf-text-v1"`. Allowed deps: `lopdf 0.32` + `pdf-extract 0.7` (원본 spec 그대로). 본문 다국어 OCR fallback 은 §9.2 후속 task (out of scope). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 08:34:55 +00:00
altair823	ca0567c72b	feat(kebab-app): P6-4 image ingest wiring — kebab ingest 가 PNG/JPEG 자산도 처리 P6-1/P6-2/P6-3 의 라이브러리 (`ImageExtractor`, `OllamaVisionOcr`, `apply_caption`) 가 그동안 CLI 에서 보이지 않던 미완 구간을 완성. 이제 `kebab ingest` 가 markdown 외에 이미지 자산을 end-to-end 로 색인하고, `kebab search` / `kebab ask` 가 OCR 텍스트 + caption 으로 이미지를 매칭/인용한다. ## kebab-app - `[dependencies]` 에 `kebab-parse-image` 추가. - `ingest_with_config` 진입 시 `image.ocr.enabled` / `image.caption.enabled` 플래그에 따라 `OllamaVisionOcr` / `OllamaLanguageModel` 을 ingest 세션당 1회 빌드. 자산 루프에서 trait object 로 공유. reqwest::blocking::Client 의 내부 Arc 덕분에 알로케이션 비용은 자산 수와 무관. - 두 어댑터 + ImageExtractor 를 한 묶음으로 `ImagePipeline` 구조체에 담아 `ingest_one_asset` 매개변수 폭증 차단 (clippy::too_many_arguments 대응). - `ingest_one_asset` 의 markdown-only 가드를 `match media_type` 으로 교체 — Markdown 은 기존 경로, Image(_) 는 새 `ingest_one_image_asset` 로 분기, PDF/Audio/Other 는 종전대로 skipped. - 신규 `ingest_one_image_asset`: - bytes 읽기 → `ImageExtractor::extract` (실패 시 caller 가 errors+=1) - `apply_ocr` (Lenient — 실패 시 ProvenanceKind::Warning 이벤트 + `IngestItem.warnings` 에 \"ocr_failed: ...\", `block.ocr` 는 None 유지) - `apply_caption` (동일 Lenient 정책) - 기존 `MdHeadingV1Chunker` 호출 — 청커는 이미 `Block::ImageRef` 를 단일 청크로 emit - 기존 persist + embed 시퀀스 그대로 (markdown 과 byte-identical) - `lang_hint_from_doc` — `Lang(\"und\")` 또는 빈 문자열을 None 으로 매핑 (image-pipeline 어댑터의 build_prompt 가 \"und\" 를 silent drop 하지 않도록 caller 측에서 미리). ## kebab-chunk - `render_block_text` 의 `Block::ImageRef` 분기를 P6-4 (β) plain concat 정책으로 교체 — `[alt, ocr.joined, caption.text]` 를 `\\n\\n` 로 join, 빈 부분은 drop. alt 가 비면 `src` 의 basename 으로 fallback (P6-1 contract 의 defensive guard). - 신규 unit 테스트 `image_ref_p6_4_plain_concat_drops_empty_parts` — alt-only / alt+ocr / alt+caption / alt+ocr+caption / 빈 alt → src fallback 다섯 케이스 모두 검증. - 기존 `image_ref_emits_own_chunk_zero_tokens` 그대로 통과 — 청커의 per-block dispatch 는 변경 없음, text 렌더링만 갱신. ## 통합 테스트 (kebab-app/tests/image_pipeline.rs) wiremock 으로 Ollama 를 stub. 5건: 1. OCR-only happy path — 1 PNG + ocr.enabled → 1 doc + 1 chunk emit, `block.ocr.joined` 가 mock 의 \"Hello World 2026\". 2. OCR + caption 동시 활성 — 두 필드 모두 채워지고 chunk text 에 alt + ocr + caption 세 부분 모두 포함. 3. Lenient 실패 검증 — OCR 503 시 자산은 indexed (kind=New), `errors=0`, ProvenanceKind::Warning attributed to \"kb-app\", `IngestItem.warnings` 에 \"ocr_failed:\" 노트. 4. 양쪽 비활성 — `image.ocr.enabled=false && image.caption.enabled=false` 여도 자산은 chunk 1개로 indexed (chunk text=filename), EXIF + dimensions 그대로 채워짐. 5. 결정성 (re-ingest) — 동일 PNG 두 번 ingest 시 두 번째는 `Updated` + 동일 `doc_id`. ## SMOKE.md `kebab search --mode lexical \"Hello World\"` 단계를 명령 시퀀스에 추가. `[image.ocr]` / `[image.caption]` config 절 예시 + ingest 시간 추정 (자산당 ~5-10초) 추가. \"책은 P7 PDF 라인으로\" 가이드를 검증 체크리스트 와 \"알려진 동작\" 양쪽에 박음. ## 실 Ollama 통합 검증 192.168.0.47 + gemma4:e4b 기준: ``` $ kebab --config /tmp/kebab-smoke/config.toml ingest scanned 2 new 2 updated 0 skipped 0 errors 0 (18395 ms) $ kebab inspect doc <image_doc_id> parser_version: image-meta-v1 blocks: [{ alt: \"hello.png\", ocr: \"Hello World 2026\", caption: \"The image displays the text \\\"Hello World 2026\\\" in a large, black, sans-serif font.\" }] $ kebab --json ask \"Hello World 텍스트가 어디에 있나?\" --mode hybrid grounded: true citations: [{marker: \"[1]\", doc_path: \"hello.png\"}] ``` ## 검증 - `cargo test --workspace --no-fail-fast -j 1` — 전부 pass - `cargo clippy --workspace --all-targets -- -D warnings` — pass - `cargo test -p kebab-chunk image_ref` — 2 pass (P1-5 회귀 + P6-4 신규 unit) - `cargo test -p kebab-app --test image_pipeline` — 5 pass ## 의존성 경계 - `kebab-app` 이 `kebab-parse-image` 추가 — spec Allowed dep 그대로. - 새 forbidden 침범 없음 (기존 `kebab-tui` / `kebab-desktop` / `kebab-eval` 미참조 유지). - 본 task 가 신설하는 image-specific 비즈니스 로직 0줄 — 모두 `kebab-parse-image` 에 위임. `tasks/p6/p6-4-image-ingest-wiring.md` status: planned → completed. contract: docs/superpowers/specs/2026-04-27-kebab-final-form-design.md sections: §3.4 ImageRefBlock, §6.1 ingest pipeline, §7.2 Extractor/Chunker traits, §9.1 image extraction policy.	2026-05-02 07:37:56 +00:00
altair823	f98f9a27ab	review(p6-4-spec): 회차 1 지적 반영 - Allowed dependencies 를 kebab-app 의 현재 Cargo.toml 과 정합되도록 보강 (kebab-search / kebab-llm / kebab-rag / kebab-embed 누락 추가). 본 task 가 새로 추가하는 deps 인 `kebab-parse-image` 만 \"NEW\" 라벨로 강조. - Forbidden dependencies 를 추상적 한 줄에서 명시 리스트로 교체: `kebab-tui` / `kebab-desktop` (UI layering), `kebab-eval` (cycle), 본 crate 안 image-specific 비즈니스 로직 (kebab-parse-image 가 이미 처리). P6-1/2/3 spec 의 컨벤션과 통일. - Public surface 의 `Chunk` 사실 오류 정정: • `chunk.section_label = None` 줄 삭제 (필드 없음) • `chunk.source_span = ...` → `chunk.source_spans = vec![...]` (실제 필드명 + Vec 타입 반영) • `token_estimate` / `policy_hash` 채움 정책 추가. - LM construction 절을 \"LM / OCR engine construction\" 으로 일반화 + OCR 어댑터도 ingest session 당 1회 빌드 정책 명시. - Behavior contract 에 \"Parallelism\" 새 절 추가 — 현재 markdown branch 가 sequential 임 + 본 task 도 동일 + 5000장 OCR 시간 추정치까지 명시. 책 P7 이관 신호와 일관. - Definition of Done 을 spec PR (이 PR — 모두 완료된 항목) 과 implementation PR (후속) 으로 분할. spec PR 의 머지 가능 시점 명확. - `is_image_only_document` 의 doc-comment 추가 — P6-1 contract 가 이미 단일 ImageRef block 보장하지만 chunker 측 가드의 defensive 의도 명시. 본 PR 은 spec only — implementation 은 후속 PR.	2026-05-02 07:07:19 +00:00
altair823	d643f9fd1a	docs(tasks): P6-4 image ingest wiring task spec P6-1 / P6-2 / P6-3 의 라이브러리 파이프라인 (`ImageExtractor`, `OllamaVisionOcr`, `apply_caption`) 이 모두 머지되어 있지만 `kebab-app::ingest_with_config` 의 dispatch 가 markdown 만 처리하므로 CLI 에서 이미지 자산이 색인되지 않는 미완 구간 존재. 본 spec 은 그 wiring 을 별도 component task 로 잡아 P6-1/2/3 의 frozen contract 는 보존하고 통합만 본 task 의 contract 로 진행되게 한다. 핵심 결정 (사용자 brainstorming 반영): - 청킹 옵션 A — `kebab-chunk::md_heading_v1` 에 image-only document 분기 추가, 단일 합성 청크 emit. - 청크 텍스트 포맷 (β) — `<alt>\n\n<ocr.joined>\n\n<caption.text>` plain concatenation. 라벨 없음. 빈 부분 drop. - 실패 정책 (b) Lenient — extract 성공이면 doc 저장, OCR/caption 부분 실패는 Provenance Warning + `errors` 카운터 미증가. - LM 인스턴스 — ingest 세션당 1회 빌드, `&dyn LanguageModel` 공유. - 책 / 스캔 PDF — P6-4 scope 외, P7 PDF 라인이 책임. - P6-5 (image-scale-hardening) 미시작 — 사용자 시나리오가 \"다이어그램 / 스크린샷 / 카메라 사진\" 으로 좁아져 불필요. INDEX.md: P6 \"3 components\" → \"4 components\". contract: docs/superpowers/specs/2026-04-27-kebab-final-form-design.md sections: §3.4 ImageRefBlock, §6.1 ingest pipeline, §7.2 Extractor/Chunker traits, §9.1 image extraction policy.	2026-05-02 07:01:56 +00:00
altair823	9c644245fb	review(p6-3): 회차 1 지적 반영 - 새 모듈 `crates/kebab-parse-image/src/image_prep.rs` — OCR + caption + 향후 PDF/video 가 공유할 단일 다운스케일 헬퍼 (`downscale_to_png`) 추출. 기존 ocr.rs / caption.rs 의 거의 동일 알고리즘 두 벌을 한 곳으로 통합. 1px 후행 클램프 / PNG passthrough hot path / 에러 메시지 패턴이 한 곳에서 관리됨. - src/ocr.rs: `downscale_to_long_edge` 제거 → `image_prep::downscale_to_png` 호출. `image::ImageReader / ImageFormat / Cursor` import 도 정리. - src/caption.rs: • `caption_image` / `apply_caption` 의 disabled 처리 비대칭 해소. `caption_image` 는 raw 연산 (gate 없음), `apply_caption` 만 `cfg.image.caption.enabled` 게이트 검사. 호출자가 같은 함수에서 같은 의미를 얻음. • `apply_caption` 의 caption.model / model_version `String::clone` 2회 → 0회. caption move 전에 ProvenanceEvent.note 를 먼저 빌드. • 다운스케일 로직 통째로 image_prep 위임. • `MIN_CAPTION_LONG_EDGE` / `MAX_CAPTION_LONG_EDGE` 를 `pub const` 로 노출 (P6-2 의 `MAX_DECODE_DIM` 가시성 컨벤션과 일관). - tests/caption.rs: • `caption_image_errors_when_feature_disabled` 를 `caption_image_runs_regardless_of_enabled_flag` 로 교체 — 새 책임 분리 의미 검증. • `caption_image_clamps_oversized_max_pixels` 가 literal 1536 대신 `kebab_parse_image::caption::MAX_CAPTION_LONG_EDGE` 상수 참조. - tasks/HOTFIXES.md: `model_version` 형태 deviation 한 단락 추가 (spec literal `provider` → `<provider>/<prompt_template_version>` 확장 + 사유). cargo test -p kebab-parse-image — 42 pass + 2 ignored (13 unit + 12 P6-1 + 8 P6-2 + 9 P6-3). cargo clippy --workspace --all-targets -- -D warnings — pass.	2026-05-02 06:11:56 +00:00
altair823	cd2213e48d	feat(kebab-parse-image): P6-3 caption adapter — vision LM via trait - 신규 모듈 `crates/kebab-parse-image/src/caption.rs` 추가: • `caption_image(llm, bytes, lang_hint, cfg)` — `&dyn LanguageModel` 위에서 동작. 비전 LM (예: gemma4:e4b) 이 한 문장 객관 설명 출력. temperature=0 / seed=0 결정성. • `apply_caption(llm, bytes, block, lang_hint, cfg, events)` — `block.caption = Some(...)` 으로 채우고 ProvenanceKind::CaptionApplied 이벤트 1건 추가. `image.caption.enabled = false` 면 클린 no-op (Ok(())). LM 실패 시 block.caption None 그대로 + events 미기록. • 다운스케일 long-edge `[128, 1536]` 클램프. PNG passthrough hot path 보존, 그 외는 단일 디코드 + PNG 재인코딩. • 한국어 / 영어 프롬프트 분기 (lang_hint=\"ko\"/\"kor\" → 한국어). • `ModelCaption.model_version = \"<provider>/<prompt_template_version>\"` (예: \"ollama/caption-v1\") — prompt 또는 모델 회귀 감사 가능. ## kebab-core / kebab-llm-local 변경 - `kebab_core::GenerateRequest` 에 `images: Vec<String>` 필드 추가. `#[serde(default)]` 으로 기존 wire 페이로드 / snapshot 호환. - `kebab-llm-local::OllamaLanguageModel` 가 req.images 를 Ollama `images: [base64, ...]` 와이어 필드로 라우팅. `#[serde(skip_serializing_if = is_empty)]` 로 비어 있을 때 wire shape 가 pre-P6-3 와 byte-identical. ## kebab-config - 신규 `ImageCfg.caption: CaptionCfg`: - `enabled: bool` (default false) - `max_pixels: u32` (default 768, 클램프 [128, 1536]) - `prompt_template_version: String` (default \"caption-v1\") - `KEBAB_IMAGE_CAPTION_{ENABLED,MAX_PIXELS,PROMPT_TEMPLATE_VERSION}` 3종 환경변수 추가. ## Spec deviations `tasks/HOTFIXES.md` 2026-05-02 항목 추가: - Symptom 1: spec p6-3 시그니처가 `&dyn LanguageModel` 인데 frozen trait + GenerateRequest 가 vision 미지원. → trait 확장. - Symptom 2: spec 의 cargo feature `caption` (default OFF at compile time) → runtime gate 1개로 통합. base64/image/kebab-llm 외 추가 deps 없어 cargo feature 의 binary 절감 가치 미미. p4-1 / p4-2 / p6-3 spec 의 amends 명시. ## 테스트 `cargo test -p kebab-parse-image --test caption` — 9건 + 1 ignored: - feature gate (disabled → no-op / Err on direct call) - happy path (block.caption Some + Provenance CaptionApplied) - 빈 토큰 stream → empty text + caption.is_some() - CapturingMock 으로 req.images 라우팅 검증 (base64 1개, decode 가능) - 한국어 / 영어 프롬프트 분기 (CapturingMock 의 system 캡처) - LM Err → block.caption None 유지 + events 미기록 - 결정성 (동일 mock 입력 → 동일 caption) - max_pixels 클램프 (99999 → 1536, 4000×3000 PNG 다운스케일 검증) - opt-in 통합 (실 192.168.0.47 Ollama / gemma4:e4b → \"The image is a solid red color.\" 검증 완료, 4.3초) `cargo test --workspace --no-fail-fast -j 1` 전체 pass. `cargo clippy --workspace --all-targets -- -D warnings` pass. ## 의존성 경계 - 추가 deps: `kebab-llm` (trait 만), `base64` (이미 P6-2 에서 추가). - dev-deps: `kebab-llm/mock` 으로 `MockLanguageModel`, `kebab-llm-local` (통합 테스트 전용 — 런타임 deps 에는 없음). - forbidden 침범 없음: `kebab-source-fs / parse-md / normalize / chunk / store-* / embed* / search / rag / UI` 미참조. contract: docs/superpowers/specs/2026-04-27-kebab-final-form-design.md sections: §3.4 ImageRefBlock.caption, §3.7a ModelCaption, §9.1 caption (model-generated, low trust).	2026-05-02 06:05:39 +00:00
altair823	4ed5536c92	feat(kebab-parse-image): P6-2 OCR adapter — Ollama-vision default - 새 모듈 `crates/kebab-parse-image/src/ocr.rs` 추가. spec 의 `OcrEngine` trait 그대로 + `OllamaVisionOcr` default 구현 + `apply_ocr` 헬퍼. - `OllamaVisionOcr`: `<endpoint>/api/generate` 비스트리밍 호출, `images: [base64]` 필드로 이미지 전달, 프롬프트는 언어 힌트 + 화이트리스트 언어 목록 포함. 응답 prose 를 `OcrText.joined` 로, prepared image 전체 영역 단일 region (confidence 1.0) 으로 wrap. 기본 모델 `gemma4:e4b`. endpoint 비어 있으면 `models.llm.endpoint` 로 fallback. - 이미지 전처리: long-edge `config.image.ocr.max_pixels` (기본 1600, 256~4096 클램프) 초과 시 PNG 로 재인코딩 (image::imageops::resize, Triangle filter). PNG 입력이 max 이내면 zero-copy passthrough. - `apply_ocr` 는 OCR 성공 시 block.ocr 를 Some 으로 채우고 ProvenanceKind::OcrApplied 이벤트 추가. 실패 시 block.ocr 는 None 그대로 + provenance 미기록 (부분 상태 누출 금지). - `kebab-config`: 새 `ImageCfg.ocr: OcrCfg` 블록 (enabled/engine/model /endpoint/languages/max_pixels). `#[serde(default)]` 로 pre-P6 TOML 호환. `KEBAB_IMAGE_OCR_*` 환경변수 5종 추가. ## Spec deviation 원래 P6-2 spec 은 Tesseract 를 default OCR 엔진으로 지정했으나, dev / CI 호스트에서 `libtesseract-dev` 시스템 패키지 설치를 피하려고 Ollama-vision 으로 default 를 교체. `OcrEngine` trait 추상화는 spec 그대로 보존 — Tesseract / Apple Vision / PaddleOCR 어댑터는 같은 trait 으로 추후 feature-gate 추가 가능. 자세한 내역은 `tasks/HOTFIXES.md` 2026-05-02 항목 참조. Trust 측면: vision LM 은 hallucinate 가능. `OcrText.engine = "ollama-vision"` 필드로 consumer 가 엔진 별 신뢰 분기 가능. ## 테스트 - 신규 (`tests/ocr.rs`, 8 + 1 ignored): - 200 happy → OcrText 디코딩 (joined / engine / engine_version / region count / bbox / confidence) - 빈 응답 → 빈 regions - 5xx → Err with status + body 포함 - 200 error envelope → Err - apply_ocr → block.ocr Some + Provenance OcrApplied 1건 - apply_ocr error → block.ocr None 유지 + events 미기록 - 4000×3000 PNG → max_pixels=1024 까지 다운스케일, aspect ratio 보존 - from_parts max_pixels 클램프 - opt-in `KEBAB_OCR_INTEGRATION=1` 통합 (실제 192.168.0.47 Ollama `gemma4:e4b` 로 \"Hello World 2026\" 전사 검증 완료) - 신규 (`src/ocr.rs` unit): truncate, build_prompt 언어/힌트 처리 - `kebab-config` 테스트 +3: defaults, env override, pre-P6 TOML 호환 전체: `cargo test -p kebab-parse-image` 28 pass + 1 ignored, `cargo test -p kebab-config` 20 pass, `cargo clippy --workspace --all-targets -- -D warnings` pass. contract: docs/superpowers/specs/2026-04-27-kebab-final-form-design.md sections: §3.4 ImageRefBlock.ocr, §3.7a OcrText / OcrRegion, §9.1 OCR vs caption provenance.	2026-05-02 05:38:24 +00:00
altair823	d11a810119	feat(kebab-parse-image): P6-1 image extractor + EXIF whitelist - 새 crate kebab-parse-image 추가 (workspace 19개째). MediaType::Image(_) 자산을 단일-블록 CanonicalDocument 로 변환하는 ImageExtractor 구현. - parser_version "image-meta-v1" (§9 versioning). - 본문은 Block::ImageRef 1건만 포함 — OCR / caption 필드는 None 으로 남겨 두고 P6-2 / P6-3 에서 채운다. - EXIF 화이트리스트 (§9.1, PII 표면 최소화): Make / Model / Software / DateTimeOriginal / Orientation / GPSLatitude(+Ref) / GPSLongitude(+Ref). MakerNote / Thumbnail / 기타 태그는 폐기. DateTime 은 EXIF "YYYY:MM:DD HH:MM:SS" → ISO-8601 변환. GPS DMS triple + N/S/E/W ref → signed decimal degree. - 차원: image::ImageReader 헤더만 읽어 (w, h, format) 획득. 16k×16k cap 초과 또는 디코드 실패 → metadata.user.dimensions = null + Provenance Warning 이벤트 (Err 아님). 포맷 자체 인식 실패 → anyhow::Error (caller skip). - SourceSpan::Region { 0, 0, w, h } 으로 전체 이미지 영역 표기. 결정성: 동일 bytes + 동일 parser_version → 동일 doc_id + block_id (§4.2 ID recipe 그대로 사용). - metadata.source_type = Reference, trust_level = Primary, lang = "und". title = 확장자 제외 파일명, alt = 파일명. - 의존성 경계 (§8): kebab-core 만 + image 0.25 (default features off, png/jpeg/webp/gif/tiff 만), kamadak-exif 0.6, anyhow / serde / serde_json / time / tracing / thiserror. kebab-source-fs · parse-md · store-* · embed* · llm* · rag · UI crate 미참조. - 테스트 14개 (4 unit + 10 integration): • PNG 차원 추출, JPEG EXIF GPS 추출 (DMS → decimal 변환 정확도 1e-6), EXIF 없는 PNG → 빈 map, 손상 PNG → warning + null dims (panic 없음), 인식 불가 bytes → Err, 결정성, 스냅샷, supports() 매칭, media_type 불일치 거부. • 픽스처는 in-memory 생성 (PNG 는 image crate, EXIF JPEG 는 kamadak Writer 로 EXIF blob 만든 뒤 SOI 직후 APP1 splice) — 바이너리 fixture 커밋 없음. - HEIC / RAW 는 spec 상 v1 out of scope (image crate 미지원, Apple Vision sidecar 가 추후 P+ 에서 채움). - tasks/p6/p6-1-image-extractor-exif.md status: planned → completed. contract: docs/superpowers/specs/2026-04-27-kebab-final-form-design.md sections: §3.4 Block::ImageRef + ImageRefBlock, §3.7a OcrText / ModelCaption stubs, §9.1 image extraction policy, §9 versioning.	2026-05-02 05:05:47 +00:00
altair823	f9714aa5cb	docs(rename): kb → kebab — README, tasks/, docs/, design doc, report 마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신. - 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-` (Rust 모듈 path 표기 `kb_` → `kebab_` 포함). - 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`, `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`, `kb-smoke`, `kb-architecture`) → `kebab-` (P6+ 가 시작될 때 같은 prefix 사용). - CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` / `kb doctor` / `kb inspect` / `kb list` / `kb eval` → `kebab <verb>`. fenced code block + 인라인 backtick 모두. - XDG paths + env vars + binary 경로 (`target/release/kb` → `target/release/kebab`) 동기화. - design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task spec 모든 reference 통일. - task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history 기록용 author 정보라 그대로 유지 (실제 git history 의 author 는 변경 불가). - `tasks/phase-5-evaluation.md` 의 `status: planned` → `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분). ## 검증 - `grep -rEn "\bkb-[a-z]\|\bkb_[a-z]\|\.config/kb\b\|kb\.sqlite\|\bKB_[A-Z]" --include="*.md"` 0 hits (task-decomposition.md 의 git author 제외). - 모든 file path reference 살아있음 (renamed file 들 모두 새 path 로 update). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 04:01:55 +00:00
altair823	911fb49550	refactor(rename): kb crates → kebab — Cargo packages, folders, Rust modules 프로젝트 이름 `kb` → `kebab` rename 의 첫 단계. - workspace `Cargo.toml`: members `crates/kb-` → `crates/kebab-`, repository URL `altair823/kb` → `altair823/kebab`. - 18 crate 폴더 rename via `git mv` (history 보존). - 각 crate `Cargo.toml`: `name = "kb-"` → `"kebab-"`, path deps `../kb-` → `../kebab-`. - 모든 `.rs`: `kb_<id>` snake-case 모듈 path 18 개 (`kb_core`, `kb_config`, `kb_app`, `kb_cli`, `kb_eval`, `kb_search`, `kb_chunk`, `kb_normalize`, `kb_source_fs`, `kb_parse_md`, `kb_parse_types`, `kb_store_sqlite`, `kb_store_vector`, `kb_embed`, `kb_embed_local`, `kb_llm`, `kb_llm_local`, `kb_rag`) → `kebab_<id>` 일괄 sed (단어 경계 \\b 사용해 영어 문장 안의 "kb" 약어 미오염). CLI binary 이름 (`[[bin]] name = "kb"`), 환경변수 `KB_*`, XDG paths, tracing target, 그리고 docs sweep 은 다음 commit 에서. ## 검증 - `cargo check --workspace` clean — 모든 crate 빌드 통과 후 commit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 03:28:08 +00:00
altair823	ee1f2339dd	fix(p5-2): apply push-time review items — citation/refusal correctness + nits 두 reviewer 의 should-fix 4 건 + nit 5 건 push 전 반영. ## should-fix - `citation_coverage`: 빈 citations[] 가 `Iterator::all` vacuous-true 로 1.0 새는 거 차단 — `!is_empty() && all(non-empty path)` 로 변경. 또한 `_store: &SqliteStore` dead 인자 시그니처에서 제거 (호출 사이트 + 테스트 helper 정리). - `refusal_correctness`: lexical-only run 에서 `answer == None` 인 경우 분모 증가 안 함 (NaN/null 출력) — 자동 fail 처리하던 게 metric 의미를 왜곡함. 새 unit test `refusal_correctness_nan_for_non_rag_run` 추가. - `groundedness`: `must_contain.is_empty() && forbidden.is_empty()`인 golden 은 분모에서 제외. unconfigured entry 가 free 1.0 받지 않게. 새 unit test `groundedness_skips_unconfigured_goldens` 추가. - `kb-cli/Cargo.toml` rationale 코멘트 사실 오류 정정 — kb-eval → kb-app 의존이지 그 반대 아님. ## nits - `KB_EVAL_GOLDEN` / `DEFAULT_GOLDEN_PATH` 중복 — `metrics::` 의 `pub(crate)` 로 단일화, `runner` 가 import. - `render_report_md` 의 `{:?}` `ComparisonKind` → 명시적 lowercase 매핑 함수 (`win`/`loss`/`draw`/`regression`) — JSON 직렬화 컨벤션과 통일. - `extract_chunker_version` `None == None` 매치 silent 위험에 대한 defensive 코멘트. - `delta_null_when_either_nan` 테스트의 `let mut` suppress hack → struct update syntax 로 정리. - `empty_store` test helper + 매번 `mem::forget(tmp)` 죽은 코드 제거. ## 추가 spec doc `tasks/p5/p5-2-metrics-compare.md` deviations 섹션 4 항목 추가: - `kb-eval` crate-level `kb-app` dep — P5-1 inheritance, 새 모듈 surface 는 import 안 함. - `citation_coverage` 약화된 resolver — `document_exists_by_path` 기다리는 중. - `refusal_correctness` non-RAG 런 NaN. - `groundedness` no-check golden skip. ## 검증 - `cargo test -p kb-eval` 35/35 (18 unit + 2 loader + 8 integration + 7 runner; 새 3 unit test). - `cargo clippy --workspace --all-targets -- -D warnings` clean. - `compare_report_snapshot_matches_fixture` 변경 없이 통과 — 새 동작이 스냅샷 입력 (lexical-only, no must_contain, no should-refuse) 영향 없음. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 03:17:32 +00:00
altair823	d9a5b88d27	feat(p5-2): kb-eval metrics + compare — AggregateMetrics, CompareReport, kb eval CLI P5-2 구현. 저장된 eval_runs / eval_query_results 위에서: - `kb_eval::metrics`: hit@k / MRR / recall@k_doc / citation_coverage / groundedness / empty_result_rate / refusal_correctness 계산. NaN metrics (분모 0)는 JSON null. 4-decimal round + Deserialize 추가로 aggregate_json 라운드트립. - `kb_eval::compare`: 두 run 비교 → CompareReport (per-metric Δ + per- query Win/Loss/Draw/Regression). chunker_version drift 시 graceful doc-id fallback (chunker_version_match: "fallback_doc"), `strict` 옵션이면 refuse. - `render_report_md`: 인간용 Markdown (집계 + Wins/Losses/Regressions 표). - `SqliteStore::{load_eval_run, load_eval_query_results, update_eval_run_aggregate}` + owned `EvalRunRecord` / `EvalQueryResultRecord` 추가 — write 측 borrow-shape는 그대로. - `kb eval` CLI: `run` (P5-1 위임), `aggregate <id>`, `compare <a> <b> [--strict-chunker-version] [--write-report]`. `--json` 으로 raw CompareReport, 기본은 Markdown 출력. ## Spec deviations (intentional, doc 명시) - Graceful 매칭은 doc-id-only (chunker_version_match: "fallback_doc") — 50% span overlap은 chunker re-index 후 양쪽 chunks 동시 보존이 현실적으로 안 돼서 P6+ 로 deferred. - `*_with_config` 헬퍼 추가: 통합 테스트가 TempDir Config 로 드라이브. no-arg 형태는 Config::load(None) 로 위임. - CLI 는 kb-cli → kb-eval 직접 wire (kb-app cycle 회피). DoD 의 "via kb-app" 의도는 facade 단일화였지만 cycle 발생. - `AggregateMetrics: Deserialize` 추가 — aggregate_json 라운드트립. ## 검증 - `cargo test -p kb-eval` 30/30 (15 unit + 2 loader + 8 metrics+compare 통합 + 7 runner). 8 통합 중 snapshot 1 건 (`compare-1.json`). - `cargo test -p kb-store-sqlite` 33/33. - `cargo clippy --workspace --all-targets -- -D warnings` clean. - forbidden imports 부재 (`kb-source-fs\|kb-parse\|kb-normalize\|kb-chunk\| kb-store-vector\|kb-embed\|kb-search\|kb-llm\|kb-rag\|kb-tui\|kb-desktop\| kb-app` — kb-app 는 metrics/compare 모듈에 부재; runner 만 사용). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 03:05:13 +00:00
altair823	58a11cc2b8	feat(p5-1): kb-eval crate — golden-fixture runner + eval persistence - new kb-eval crate: load_golden_set (YAML) + run_eval (per-query search/ask + persistence) - new kb-store-sqlite::eval module: record_eval_run_with_results (transactional), document_exists / chunk_exists probes - fixtures/golden_queries.yaml: 5-entry KO+EN template - tests: 13 pass (loader: parse, dup-id, missing chunk_id; runner: elapsed, snapshot, error capture, JSONL, determinism, persistence, config_snapshot) - per_query.jsonl mirror written to runs_dir/<run_id>/ - temperature=0 + fixed seed → byte-identical per_query.jsonl (lexical) deviations from spec (documented in code): - run_id uses uuid::Uuid::now_v7().simple() (timestamp-ordered hex) instead of ULID — uuid already in workspace deps - load_golden_set_validated kept #[cfg(test)] pub(crate) — production inlines validate_against_db - snapshot fixture uses normalized projection (id/query/mode/first_hit) — full byte-determinism covered by separate test - index_version in config_snapshot left null (composed per call by kb-app, not config-level) deferred to follow-up: - App reuse across queries (currently rebuilds App per query) - expand_path hoist to kb-config (3 crate clones now) - --max-queries flag (deferred to P5-2 per updated spec) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 18:01:09 +00:00
altair823	d1b99b2994	docs: mark P0–P4 done, add SMOKE recipe, refresh README State drift after P0–P4 completion + 3 post-merge hotfixes (PR #20 --config across subcommands, PR #24 --config in kb ask, PR #25 RRF fusion_score normalization). README still framed the project as "spec frozen, code 0 lines"; phase docs and task specs all carried status: planned. Sweep: - README.md: top banner now "P0–P4 done (17/31 tasks) + 3 hotfixes applied"; command table marks each subcommand's owning phase and current status (kb ask = ✅ via P4-3, kb eval = ⏳ P5); phase roadmap table grew a Status column (P0–P4 completed, P5 next, P6–P9 pending); component count bumped 30 → 31 to reflect P3-5 (app-wiring, post-spec); core decisions table notes the RRF [0,1] normalization invariant; build+실행 section drops the "P0 미시작" caveat; new pointers to HOTFIXES.md and SMOKE.md. - docs/SMOKE.md (new): ~80-line recipe for running the full pipeline against an isolated /tmp/kb-smoke/ workspace via --config, without polluting ~/.config/kb/ or ~/.local/share/kb/. Covers fixture seeding, sample config.toml with the post-merge defaults, doctor → ingest → list → search × 3 modes → inspect → ask sequence, verification checklist, and known-behaviour notes (fastembed model download, RAG response time, --config hard-fail on missing path). - tasks/phase-{0..4}-*.md: status frontmatter flipped planned → completed. - tasks/p0/, tasks/p1/, tasks/p2/, tasks/p3/, tasks/p4/: same status flip across all 17 component task specs (1+6+2+5+3). P5–P9 stay planned. cargo test --workspace: 319 passed; clippy clean (no source changes in this commit, just docs + frontmatter). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:32:28 +00:00
altair823	9dde01eb9f	fix(rag): normalize RRF fusion_score to [0,1] + log post-merge hotfixes ## Bug config.rag.score_gate default 0.05 was incompatible with hybrid RRF fusion_score: raw RRF tops out at num_retrievers / (k_rrf + 1) ≈ 0.0328 at the default k_rrf=60, so every hybrid `kb ask` tripped ScoreGate refusal even when the top hit was perfectly aligned across both retrievers. Symptomatic on the post-P4-3 manual smoke at /tmp/kb-smoke/ pointed at 192.168.0.47 Ollama: $ kb ask "Rust ownership 모델의 핵심 규칙은 뭐야?" --mode hybrid 근거 부족. KB에 해당 내용 없음. # top fusion_score = 0.0164 Per-mode score_gate (lexical_score_gate / vector_score_gate / hybrid_score_gate) was rejected because it forces every consumer (CLI, eval, TUI) to know which mode picks which threshold. Score normalization solves it at the source. ## Fix crates/kb-search/src/hybrid.rs divides every fused score by 2 / (k_rrf + 1), the theoretical RRF maximum with two retrievers each contributing rank 1. After normalization: - both retrievers agree on rank 1 → fusion_score = 1.0 - only one retriever finds the chunk → caps near 0.5 - typical mixed ranks → falls between 0 and 0.5 RRF's rank-ordering invariants are preserved (every score divides by the same positive constant), so sort + tiebreak behaviour is unchanged. Wire schema label `fusion_score` keeps its slot in RetrievalDetail; only the magnitude shifts, and only for hybrid mode (lexical / vector were already in [0, 1]). Verification: re-ran the four-scenario smoke at /tmp/kb-smoke/ with default score_gate = 0.05 — all four (Korean→Korean, English→ English, cross-language Korean↔English, out-of-corpus) succeed with the expected grounded / refusal classification, top fusion_score now ≈ 0.5. ## Tests One unit test (rrf_formula_matches_known_value) updated to expect the normalized value `(1/61 + 1/62) / (2/61) ≈ 0.9919` instead of the raw `1/61 + 1/62 ≈ 0.0325`. The integration snapshot fixture crates/kb-search/tests/fixtures/search/hybrid/run-1.json already used presence checks (fusion_score_positive: true) rather than absolute values, so it doesn't need regeneration. Workspace 319 tests pass; clippy clean across both feature configs. ## Docs This commit also adds tasks/HOTFIXES.md as a dated post-merge log covering this fix and the two earlier --config-flag regressions (P3-5 hotfix #20 across ingest/search/list/inspect/doctor; P4-3 follow-up #24 for kb ask). Original task specs in tasks/p<N>/ *.md stay frozen as the historical contract; HOTFIXES.md is the live source of truth for post-merge deltas. Each affected task spec gets a "Risks/notes" addendum pointing back to HOTFIXES.md so a reader landing on the spec sees the active behaviour: - tasks/INDEX.md gains a "Post-merge 핫픽스" section. - tasks/phase-3-vector-hybrid.md updates the RRF formula text to show the normalized form. - tasks/p3/p3-4-hybrid-fusion.md "Behavior contract" RRF bullet notes the normalization and reason. - tasks/p3/p3-5-app-wiring.md "Risks/notes" notes the --config fix. - tasks/p4/p4-3-rag-pipeline.md "Risks/notes" notes the kb-ask --config fix and the score_gate-RRF incompatibility (closed by the normalization in p3-4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:16:01 +00:00
altair823	78bf78c8be	docs(p3-5): add app-wiring task spec; INDEX + phase-3 updates P1–P3 shipped libraries but kb-app facade is still all `bail!("not yet wired")` stubs, so the CLI is structurally complete but unusable. Insert p3-5 between P3-4 and P4-1 to swap the facade bodies — ingest, search, list_docs, inspect_doc, inspect_chunk — into real compositions of the libraries shipped through P3-4. `ask` stays stubbed; P4-3 owns it. After p3-5 merges: - `kb index` walks a workspace and persists chunks (SQLite + optionally LanceDB). - `kb search --mode {lexical,vector,hybrid}` returns real SearchHits with citations. - `kb list` / `kb inspect doc\|chunk` round-trip from the store. Updates: - New task spec at tasks/p3/p3-5-app-wiring.md (depends_on p1-6/p2-2/p3-2/p3-3/p3-4; unblocks p4-3/p9-1/p9-2/p9-4). - tasks/INDEX.md bumps P3 component count 4 → 5 and adds the link. - tasks/phase-3-vector-hybrid.md replaces the speculative `embed_index` facade signature with the actual frozen kb-app surface and updates the phase completion checklist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 11:32:42 +00:00
kb	bc1b3147cd	refactor(spec): cleanup pass over component specs Address 8 issues found in spec audit (post PR #2): 1. §refs label: distinguish design vs report sections in p3-1 / p3-2 / p4-2 / p9-1 / p9-5 contract_sections (e.g., "report §11.2 Ollama" not "§11.2"). 2. mock feature gate: gate MockEmbedder (p3-1) and MockLanguageModel (p4-1) behind `mock` cargo feature, default OFF; add CI symbol-scan as DoD item. 3. Warning type unification: p1-2 frontmatter now emits `kb_parse_types::Warning` (matches p1-3 / p1-4); drops crate-internal type. 4. p4-3 streaming thread: explicitly single-threaded inside RagPipeline::ask; collection + sink.send share the calling thread, no race. UI concurrency is callers responsibility (TUI worker thread pattern in p9-3). 5. p6-2 tesseract version: noted that `tesseract` 0.13 has no stable Rust `version()` accessor; use TessVersion FFI or shell-out + cache approach. 6. p9-* App struct extensions: introduce `kb_tui::{Library,Search,Ask,Inspect}State` slots in p9-1 forward-decl form; p9-2/3/4 fill bodies in their own crate without editing `App`. Parallel-safety contract added. 7. p3-3 cosine score: shift `(sim+1)/2` instead of clamp; preserve ranking signal between unrelated and opposite vectors. Clamp reserved for NaN. 8. fixtures/ root: p0-1 DoD now creates all fixture subdirs with .gitkeep so downstream tasks have a stable target path.	2026-04-27 23:38:13 +00:00
kb	9fa38543a8	refactor(spec): introduce kb-parse-types thin crate PR #1 review left a design-debt note: ParsedBlock landing in kb-core would (a) force every crate to recompile on parser-internal changes, and (b) cause namespace pollution when P6/P7/P8 parsers add their own variants. Resolution: a new thin crate kb-parse-types sits between kb-core and parsers. Owns ParsedBlock + ParsedPayload + Warning + forward-refs for image/pdf/audio parser intermediates. Depends on kb-core only (for SourceSpan / Inline). Updates: - design §3.7b: add new section defining kb-parse-types - design §8: add kb-parse-types to module-boundary diagram + forbidden list - design §3.4 Inline stays in kb-core; kb-parse-types references it (no duplication) - p0-1 skeleton: workspace + Cargo deps + public surface block - p1-3 parse-md-blocks: outputs Vec<kb_parse_types::ParsedBlock> directly - p1-4 normalize: Allowed gains kb-parse-types, drops cross-coupling note - INDEX + phase-0 epic: list kb-parse-types in P0 deliverables	2026-04-27 20:41:35 +00:00
kb	b999a12ab5	tasks: address PR #1 review - p3-3: SQLite-first/Lance-second + status marker (V003__embedding_status); drop "best-effort 2PC" misnomer - p4-3: replace print_stream FnMut closure with mpsc::Sender<String> (RagPipeline stays Send+Sync) - p4-3: tighten citation regex to strict [#<n>] only — reject [n]/prose/code-block false positives - p5-2: compare_runs across chunker_version is graceful (doc + span overlap fallback) with chunker_version_match audit field; --strict-chunker-version restores refusal - p7-1: per-page text via lopdf (pdf-extract has no per-page Rust API); use char count for spans - p8-1: explicit rubato (FftFixedIn) for 16 kHz mono resample; symphonia decode only - p9-5: drop cmd_read_pdf_page + pdfium native dep; cmd_read_file_bytes + frontend pdfjs; add traversal tests	2026-04-27 13:10:31 +00:00
kb	eedaeccff2	tasks: update INDEX with full component-task tree (30 specs)	2026-04-27 12:14:58 +00:00
kb	f8b9f51d94	tasks: add P9 component specs (tui x4, desktop)	2026-04-27 12:14:16 +00:00
kb	7c10b15ad7	tasks: add P8 component specs (whisper, audio-chunker)	2026-04-27 12:09:51 +00:00
kb	d96d9cc56c	tasks: add P7 component specs (pdf-extractor, pdf-chunker)	2026-04-27 12:07:56 +00:00
kb	c84ab03404	tasks: add P6 component specs (image-exif, ocr, caption)	2026-04-27 12:06:20 +00:00
kb	597a848af9	tasks: add P5 component specs (runner, metrics)	2026-04-27 12:04:06 +00:00
kb	ab7f6f110e	tasks: add P4 component specs (llm-trait, ollama, rag-pipeline)	2026-04-27 12:02:18 +00:00
kb	5b813ce39e	tasks: add P3 component specs (embedder, fastembed, lancedb, hybrid)	2026-04-27 11:59:46 +00:00
kb	c044b97a34	tasks: add P2 component specs (fts-schema, lexical-retriever)	2026-04-27 11:57:00 +00:00
kb	1cffed25ff	tasks: add p0-1 skeleton component spec	2026-04-27 11:55:26 +00:00
kb	46f146584f	tasks: add p1-6 store-sqlite component spec	2026-04-27 11:47:49 +00:00
kb	b711cfe5fd	tasks: add p1-5 chunk component spec	2026-04-27 11:46:43 +00:00
kb	d4315dc602	tasks: add p1-4 normalize component spec	2026-04-27 11:45:53 +00:00

1 2 3 4 5

206 Commits