kebab

Author	SHA1	Message	Date
altair823	28f513795e	fix(config): emit error.v1 code=config_not_found for missing --config path (Bug #10 ) 이전: `kebab search "rust" --config /tmp/nonexistent.toml --json` 가 exit=0 + `{"hits":[]}` silent fallback to XDG default. typo / wrong path 가 0-hit 으로만 surface — debugging nightmare. 이후: kebab_config::ConfigNotFound thiserror::Error 추가, Config::load 의 `Some(p) if !p.exists()` arm 이 anyhow::Error::new(ConfigNotFound { path }) return. kebab_app::error_wire::classify 가 downcast → ErrorV1 code=config_not_found, hint, details.path 채워서 stderr 에 ndjson 으로 emit. R-1 (relative path): std::path::Path::exists() 는 cwd-relative — 별도 작업 없이 absolute + relative 모두 cover. integration test 두 개로 검증. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 23:14:54 +00:00
altair823	760eee89c8	fix(app): flip streaming_ask + single_file_ingest capabilities to actual surface (Bug #9 ) capabilities_snapshot() 가 streaming_ask + single_file_ingest 를 hardcoded false 로 보고했으나 실제 구현은 v0.20 final-dogfood 에서 production-grade: - kebab ask --stream → answer_event.v1 ndjson 191 event 정상 emit - kebab ingest-file <path> / kebab ingest-stdin --title <T> → ingest_report.v1 정상 MCP host + Claude Code skill 등 agent 가 schema.capabilities 로 routing 결정 시 false negative → 사용자가 실제 동작 feature 를 사용 불가능하다고 오인. http_daemon 은 false 유지 (별도 sub-item 의 non-impl). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 23:13:57 +00:00
altair823	241ded59df	test(app): multi-scanned PDF chunk_id collision-free integration test (Bug #3 regression) v0.20.0 sub-item 1 bugfix Step 3 (Group C) — integration-level regression for Bug #3 (intra-doc chunk_id collision under aggressive overlap). - `crates/kebab-app/tests/common/mod.rs`: `pub mod mock_ocr;` 1 line append. - `crates/kebab-app/tests/common/mock_ocr.rs` (new): MockOcrEngine lift + `single` / `per_page` ctor (backward-compat single + per-page cursor). - `crates/kebab-app/tests/pdf_ocr_apply.rs`: inline MockOcrEngine 제거 + `mod common; use common::mock_ocr::MockOcrEngine;` import. 10 ctor call site migration (`MockOcrEngine { .. }` → `MockOcrEngine::single(...)`). - `crates/kebab-app/tests/multi_scanned_pdf_ingest_no_chunk_id_collision.rs` (new): F1 + F2 scanned PDF + Bug #3 trigger shape (10 char "가" + ". " + 500 char "나") via mock OCR. assertion: chunk_id global uniqueness (HashSet dedup) across F1 + F2; F2 trigger text produces ≥2 chunks (collision shape). - C1 decision: Option A (share via tests/common/mock_ocr.rs). Facade mock injection unavailable (OllamaVisionOcr hardcoded) — helper-level chain test (apply_ocr_to_pdf_pages → PdfPageV1Chunker) adds value beyond unit B5. spec: docs/superpowers/specs/2026-05-27-v0.20-sub1-bugfix-spec.md (§4.5) plan: docs/superpowers/plans/2026-05-27-v0.20-sub1-bugfix-plan.md (Step 3) prior: `436fd01` (Step 2 Bug #3 chunk_id fix) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 13:45:38 +00:00
altair823	436fd015a2	fix(chunk): chunk_id collision under aggressive overlap; bump pdf-page-v1 → pdf-page-v1.1 (Bug #3 ) v0.20.0 sub-item 1 dogfood report 의 Bug #3 (Critical). scanned_page2.pdf (1580 char OCR text) ingest 시 `chunks.chunk_id` PRIMARY KEY violation — `per_chunk_hash = #c{char_start}` 가 post-overlap `actual_start` 사용 + overlap walk floor 가 `prev_min` 으로 collapse → segment 1/2 동일 `#c0`. - `crates/kebab-chunk/src/pdf_page_v1.rs`: `chunk_page` returns 4-tuple (segment_start, actual_start, chunk_end, slice); caller `per_chunk_hash` suffix uses `segment_start` (pre-overlap boundary, strictly increasing) instead of `char_start` (post-overlap, may collapse to prev_min). - VERSION_LABEL `"pdf-page-v1"` → `"pdf-page-v1.1"` (design §9 cascade, explicit user-facing audit trail). `crates/kebab-app/tests/pdf_pipeline.rs: 168, 368` 의 hardcoded literal 도 v1.1 로 갱신. - module docs (`pdf_page_v1.rs:47-60`): workaround description 의 `#c{char_start}` reference 를 `#c{segment_start}` 로 갱신 + segment_start invariant 명문 + HOTFIXES.md cross-ref. - `pdf_page_v1.rs::tests`: `multi_chunk_page_with_aggressive_overlap_produces_unique_chunk_ids` regression pin (10 char "가" + ". " + 500 char "나" — multi-chunk + overlap walk collapse trigger). - `tasks/HOTFIXES.md`: 2026-05-27 entry (symptom F2 1580 char OCR, intra-doc collision root cause, second-iteration patch rationale). spec: docs/superpowers/specs/2026-05-27-v0.20-sub1-bugfix-spec.md (§4) plan: docs/superpowers/plans/2026-05-27-v0.20-sub1-bugfix-plan.md (Step 2) prior: `d9acda5` (Step 1 Bug #2 walker fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 13:32:09 +00:00
altair823	48197687b7	test(pdf): integration smoke (w/ search + cancel) + vector regression + alnum e2e (#[ignore]) for v0.20 sub-item 1 Step 9 (Group I) of v0.20.0 sub-item 1 (scanned PDF OCR) plan. I3 — crates/kebab-app/tests/ingest_pdf_ocr_smoke.rs (신규): - ingest_with_mock_ocr_yields_pdf_ocr_summary — `#[ignore]` real Ollama, ingest_with_config production path + IngestItem.pdf_ocr_pages verify. - ocr_text_indexed_and_searchable — `#[ignore]` real Ollama, app.search 의 OCR text indexed verify (§ Acceptance #2). - ingest_with_cancel_aborts_mid_pdf — production cancel chain (pre-set cancel=true + dummy endpoint, no panic/deadlock verify). I4 — crates/kebab-parse-pdf/tests/text_extractor_regression.rs (신규): - vector_pdf_extract_byte_identical_to_baseline — F4 mojibake.pdf 의 vector PDF path canonical 의 byte-identical 보존 (Step 1-8 모든 변경 전후 invariant). - baseline 신규 = tests/snapshots/vector_pdf_canonical.json (first run create). - normalize_provenance_timestamps inline helper (R-3 mitigation, workspace 전체 부재 — 신규 12-line). I5 — crates/kebab-parse-pdf/tests/ocr_e2e.rs (신규): - f1_alnum_accuracy_ge_85 / f2_alnum_accuracy_ge_70 — `#[ignore]` real Ollama qwen2.5vl:3b, § Acceptance §9 #3 의 implementation. - alnum metric = strsim::levenshtein (dev-dep 추가). - truth file copy from PoC scratch (page1.txt + page2-batchim.txt) → scanned_page1_truth.txt + scanned_page2_truth.txt. - kebab-parse-image dev-dep 추가 (OllamaVisionOcr::from_parts 호출용). parser isolation invariant 의 dev-dep exception (spec §3.1, dep graph baseline -e normal 보존). spec: docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md plan: docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 9 I3+I4+I5) prior: `c9e0594` (Step 8 CLI printer) contract: §9 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 10:10:58 +00:00
altair823	c9e05941c5	feat(cli): activate per-page PDF OCR progress printer + test(app): ingest_progress emit verify + spec(pdf-ocr): align §4.6.1 literal with option_A (ms/chars) Step 8 (Group H) of v0.20.0 sub-item 1 (scanned PDF OCR) plan + Step 7 reviewer concern fix (spec literal deviation). H1 — kebab-cli/src/progress.rs printer activation: - 구 no-op stub `IngestEvent::PdfOcr* { .. } => {}` (Step 6 placeholder) 를 사람-친화 stderr line printer 로 활성화. - spec §4.6.1 line 1085-1086 wording 그대로: - PdfOcrStarted → ` 📷 OCR page {page}...` - PdfOcrFinished (skipped=false) → ` ✓ OCR page {page} ({chars} chars, {ms}ms via {ocr_engine})` - PdfOcrFinished (skipped=true) → ` ⊘ OCR page {page} skipped (no DCTDecode or engine fail, {ms}ms)` (M-4 의 skipped field carry 활용) - `!quiet` gate 정합 (AssetStarted/Finished pattern mirror). H2 — crates/kebab-app/tests/ingest_progress.rs 의 새 test: - pdf_ocr_progress_emits_started_finished_events (real Ollama 의존, `#[ignore]`). - F1 fixture (scanned_page1.pdf) ingest 시 pdf_ocr_started + pdf_ocr_finished event 가 emit 됨을 verify. Started count == Finished count invariant. - Manual invoke: `KEBAB_PDF_OCR_ENABLED=true cargo test -p kebab-app --test ingest_progress --ignored`. - mock OcrEngine inject path 부재 (Step 6 의 eager build), Step 9 I5 의 ocr_e2e pattern (real Ollama + `#[ignore]`) 와 동일. Step 7 reviewer concern fix — spec §4.6.1 literal: - line 1076-1077 의 `ocr_ms` / `ocr_chars` literal 을 wire schema 의 실제 field name `ms` / `chars` (option_A, Rust serde 와 정합) 로 갱신. - line 1087 의 printer wording 도 `{ocr_chars}` / `{ocr_ms}` → `{chars}` / `{ms}`. - line 1556 의 rationale 참조 `pdf_ocr_finished.ocr_ms` → `.ms`. - `skipped` field 도 명시 (Step 6 reviewer M-4 결과). spec: docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (§4.6.1) plan: docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 8 H1+H2) prior: `4c5ccd5` (Step 7 wire schema) — Step 7 reviewer concern 1 의 fix contract: §9 (additive minor wire bump — Step 7 commit 에서 완료) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 09:18:49 +00:00
altair823	4c5ccd5447	feat(wire): additive minor — IngestEvent kind 의 pdf_ocr_* + ingest_report.items[] 의 pdf_ocr_pages/ms_total + skipped field carry (Step 6 M-4/M-2) Step 7 (Group G) of v0.20.0 sub-item 1 (scanned PDF OCR) plan + Step 6 code reviewer Important M-4 (skipped field carry) + Minor M-2 (ordering invariant doc) fix. G3 — JSON Schema sync (additive minor — schema_version 보존): ingest_progress.schema.json: - kind enum 2 추가: pdf_ocr_started + pdf_ocr_finished. - 새 field: page (1-based PDF page), ocr_engine (engine_name), skipped (bool). - 기존 ms / chars field 의 description 갱신 (pdf_ocr_finished carry 추가). ingest_report.schema.json: - items.items.properties 신규 정의 (이전 stub ["array", "null"] 만). - pdf_ocr_pages + pdf_ocr_ms_total (nullable integer). - 모든 기존 IngestItem field 도 명시화 (kind, doc_path, byte_len, ...). Step 6 reviewer M-4 (Important) — skipped field carry: - IngestEvent::PdfOcrFinished 에 skipped: bool 추가. - ingest_one_pdf_asset 의 emit closure (lib.rs:~1864) 가 source PdfOcrProgress::Finished { skipped } 를 discard 않고 propagate. Step 6 reviewer M-2 (Minor) — ordering invariant doc: - crates/kebab-app/src/ingest_progress.rs 의 ordering text 갱신: ScanStarted < ScanCompleted < (AssetStarted [< (PdfOcrStarted < PdfOcrFinished)] < AssetFinished) < (Completed \| Aborted). .md doc (docs/wire-schema/v1/*.md) 부재 — plan §3 Step 7 G3 의 .md deliverable retro N/A (해당 file 0). spec: docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md plan: docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 7 G3) prior: `b9ee09f` (Step 6 wiring) + Step 6 reviewer M-4/M-2 권고 contract: §9 (additive minor wire bump — schema_version 보존) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 08:51:51 +00:00
altair823	b9ee09f176	feat(app): wire PDF OCR enrichment + cancel propagation into ingest_one_pdf_asset (H-5 eager init + post-extract hook + per-page cancel) + workspace lopdf dep (Step 4 M-4) Step 6 (Group E) of v0.20.0 sub-item 1 (scanned PDF OCR) plan + Step 7 spillover (IngestEvent variant + IngestItem field for compile boundary) + Step 4 reviewer Minor M-4 fix. E1 — eager PDF OCR engine build at `ingest_with_config_opts` entry, mirror of image OCR pattern (lib.rs:338-347). `pdf.ocr.enabled \|\| always_on` 시 `OllamaVisionOcr::from_parts(endpoint, model, ...)` 호출 + fail-fast `?`. App field 추가 0 (local var only, spec L-1 / Step 1 A1 cosmetic fix 정합). E2 — `ingest_one_pdf_asset` signature extension: +3 param (`pdf_ocr_engine: Option<&OllamaVisionOcr>`, `progress: Option<& mpsc::Sender<IngestEvent>>`, `cancel: Option<&Arc<AtomicBool>>`). `ingest_one_asset` dispatch wrapper + caller (dispatch loop) update. E3 — post-extract enrichment block at `extract_for` 직후 (line 1779). `pdf.ocr.enabled \|\| always_on` 시 `apply_ocr_to_pdf_pages` 호출, PdfOcrProgress → IngestEvent emit (PdfOcrStarted / PdfOcrFinished with ocr_engine), summary 의 pages_ocrd/ms_total 을 IngestItem field 로 carry. PR #187 registry dispatch invariant 보존 (`extract_for(&asset.media_type, ...)` 그대로). E4 — cancel handle propagation: ingest_with_config_cancellable → IngestOpts.cancel → ingest_with_config_opts → ingest_one_asset → ingest_one_pdf_asset (new `cancel` param) → PdfOcrOpts.cancel chain. spec §4.8 line 1159 production wiring. Step 7 spillover (compile boundary): - `kebab_app::ingest_progress::IngestEvent`: PdfOcrStarted { page } + PdfOcrFinished { page, ms, chars, ocr_engine }. serde discriminant `pdf_ocr_started` / `pdf_ocr_finished` (Step 7 G3 wire schema 와 일치). - `kebab_core::IngestItem`: pdf_ocr_pages: Option<u32> + pdf_ocr_ms_total: Option<u64> (warnings/error 사이). 11 non-PDF IngestItem construct site 가 `None` 채움. - `kebab-cli/src/progress.rs` + `kebab-tui/src/ingest_progress.rs`: 새 variant no-op handler (v1에서 per-page progress 미노출, future refinement 시 활성화 가능). - `kebab-store-sqlite/tests/ingest_report_snapshot.rs` + snapshot `ingest_report.snapshot.json`: 2 IngestItem fixture 의 새 field 추가. - Step 7 의 JSON Schema 갱신 + CLI printer activation + snapshot regenerate 는 별 commit (G3/H1/H2 deliverable). M-4 (Step 4 reviewer Minor) — lopdf workspace dep 통합: - workspace `Cargo.toml [workspace.dependencies] lopdf = "0.32"`. - kebab-app + kebab-parse-pdf 의 direct dep → `{ workspace = true }`. Verifier evidence: - workspace test (`cargo test --workspace --no-fail-fast -j 1`): 175 test result summary lines, 0 failures, 0 FAILED. - workspace clippy (`-D warnings`): exit 0, 0 warning. - dep graph baseline (`.omc/state/pdf-ocr-{parse-pdf,app-parse}-deps.baseline.txt`): empty diff for both. spec: docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (§4.4 + §4.6 + §4.8) plan: docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 6 E1-E4 + Step 7 partial G1+G2) prior: `4672cba` (Step 5 fix) + `fd918a6` (Step 5) + `9f003ef` (Step 4 helper) contract: §9 (additive minor wire bump — Step 7 JSON Schema 완료 시) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 08:18:34 +00:00
altair823	fd918a60ce	feat(config): add [pdf.ocr] section — qwen2.5vl:3b default, opt-in + env overrides + doc(app): PdfOcrOpts field doc (Step 4 I-1) Step 5 (Group F) of v0.20.0 sub-item 1 (scanned PDF OCR) plan + Step 4 reviewer Important I-1 fix (PdfOcrOpts field doc) 동봉. F1 — `kebab-config::PdfCfg` + `PdfOcrCfg` + 4 default fn: - PdfCfg { ocr: PdfOcrCfg }. - PdfOcrCfg with 11 field (enabled/always_on/engine/model/endpoint/ languages/max_pixels/request_timeout_secs/valid_ratio_threshold/ min_char_count/lang_hint). - defaults: opt-in (enabled=false), qwen2.5vl:3b, 0.5 threshold, 20 char. - mirror of image OCR cfg pattern (spec §4.5). Config struct extension: - `pdf: PdfCfg` field with `#[serde(default = "PdfCfg::defaults")]`. 11 env var override (parallel to KEBAB_IMAGE_OCR_*): KEBAB_PDF_OCR_{ENABLED,ALWAYS_ON,ENGINE,MODEL,ENDPOINT,LANGUAGES, MAX_PIXELS,REQUEST_TIMEOUT_SECS,VALID_RATIO_THRESHOLD,MIN_CHAR_COUNT, LANG_HINT}. F2 — `crates/kebab-config/tests/pdf_ocr.rs` (신규): - toml roundtrip (11 field). - defaults (opt-in + qwen2.5vl:3b). - env override (4 key sample + default preservation). F3 (Step 4 I-1) — `pdf_ocr_apply.rs` 4 public item 의 doc comment: - PdfOcrOpts struct + 6 field. - PdfOcrSummary struct + 2 field. - apply_ocr_to_pdf_pages fn (Errors block 포함). - PdfOcrProgress enum + 2 variant + 5 field. body 변경 0, doc-only. spec: docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (§4.5) plan: docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 5 F1+F2) prior: `9f003ef` (Step 4) — code reviewer Important I-1 resolution contract: §9 (additive minor wire bump — Step 7) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 07:07:18 +00:00
altair823	9f003ef1cd	feat(app): add pdf_ocr_apply helper (10 test, F7 split + cancel) — post-extract OCR enrichment for PDF (H-1 resolution) Step 4 (Group D) of v0.20.0 sub-item 1 (scanned PDF OCR) plan. D1 — `apply_ocr_to_pdf_pages(&mut canonical, &dyn OcrEngine, &bytes, &opts, emit_progress)` in `kebab-app::pdf_ocr_apply`. spec §4.1 line 381-599 body 그대로 + PdfOcrOpts.cancel field + per-page cancel check (verifier LOW L-1). post-extract enrichment pattern (H-1 resolution): kebab-parse-pdf 가 kebab-parse-image::OcrEngine 을 import 하지 않음 (parser isolation 보존). helper 가 kebab-app 의 facade 안 — both parser crate 의 cross-import 회피. Per-page decision matrix (spec §4.1 line 459-464): - always_on=true → 모든 page OCR (dual-block, ordinal = page-1 + page_count). - always_on=false + needs_ocr → in-place OCR (text-detect block mutate). - needs_ocr=false → skip. DCTDecode-only v1 (H-3): FlateDecode / CCITTFaxDecode page 는 extract_dctdecode_page_image=None → Warning event + skip + emit_progress(skipped=true). OcrEngine.recognize 실패 → Warning event + skip + emit_progress(skipped=true). D3 — per-page cancel handle (verifier LOW L-1 + spec §4.8 line 1159): PdfOcrOpts.cancel: Option<Arc<AtomicBool>>. set→true 시 `anyhow::bail!("PDF OCR cancelled mid-PDF at page N")`. lopdf = "0.32" added to [dependencies] (already transitive via kebab-parse-pdf; no new crate introduced — dep graph kebab-parse-* baseline unchanged). Integration test (`tests/pdf_ocr_apply.rs`, 10 test): - f1_input_with_ocr_enabled_replaces_empty_block — in-place mutate. - f3_input_with_ocr_enabled_keeps_text_detect_blocks — vector PDF skip. - f1_input_with_ocr_disabled_keeps_empty_block — disabled no-op. - f4_input_with_ocr_enabled_replaces_mojibake_block — mojibake → in-place mutate. - f3_input_with_always_on_pushes_dual_blocks — always_on dual-block. - f6_flatedecode_skipped_with_warning — FlateDecode skip + Warning event. - f7_ccittfax_skipped_with_warning — CCITTFax skip + Warning event (verifier M-4 split). - ocr_engine_failure_surfaces_as_warning — OCR failure → Warning event. - dual_block_ordinals_are_deterministic_and_unique — ordinal invariant. - cancel_handle_aborts_mid_pdf — cancel handle 의 production source (D3). MockOcrEngine fixture: spec §5.5 line 1284-1299. F3 fixture 부재 → mock CanonicalDocument construction + F1 bytes reuse pattern (Option B: PdfTextExtractor::extract 를 통한 실제 production path canonical 생성). spec: docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (§4.1 + §5.5) plan: docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 4 D1+D2+D3) prior: `c2cd3a7` (Step 3) + `8d81bc1` (Step 3 clippy fix) contract: §9 (additive minor wire bump — 후속 step) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 06:42:01 +00:00
altair823	2c05dbd0dd	refactor(app): extract dispatch polymorphism — App.extract_for(...) + 11 Extractor registry kebab-app 의 hardcoded extract dispatch (`ImageExtractor` + `PdfTextExtractor` + 9 AST `*Extractor` 의 `::new().extract(…)` callsite 11곳 + 9 AST arm match) 를 `App::extract_for(&MediaType, &ExtractContext, &[u8])` 단일 polymorphic call 로 통합. trait 변경 0, parser source 변경 0, wire schema 변경 0 (success path). 핵심 변경: - App struct 에 `pub(crate) extractors: Vec<Box<dyn Extractor + Send + Sync>>` field + `pub(crate) fn extract_for(...)` helper method. - App::open_with_config 의 registry init = 11 Extractor (image + pdf + 9 AST). - ImagePipeline struct 의 `extractor: &'a ImageExtractor` field 제거 + lib.rs:356 local + lib.rs:1235 alias 삭제 (atomic block). - 9 AST arm (lib.rs:2012-2047 의 12 arm = 11 explicit + 1 wildcard) → 4 arm (9 AST grouped + 7 manifest + 1 shell + 1 other-bail). - in-crate unit test (app.rs 의 `mod tests_extractor_dispatch`) 3 class: registry length 11 / mutually-exclusive supports() grid (16 sample MediaType) / extract_for error path (Audio). scope = AST 9-arm + image + pdf extract callsite only. MarkdownExtractor / Tier 2/3 / outer 4-arm / inner 4 match / Chunker dispatch 모두 future-defer (별 PR — spec §11). Wire schema (success path) 변경 0 — ingest_report.v1 / search_response.v1 / answer.v1 byte-identical (4-medium SMOKE 비교 검증). error.v1.message 의 internal context string wording 변경 (예: `kb-parse-image::ImageExtractor::extract` → `kb-app::extract_for (image)`) 은 spec §5.5 risk acceptance — `error.v1.code` + `error.v1.schema_version` 보존, user-visible surface 외. Cargo workspace.version bump 0. Refs: - docs/superpowers/specs/2026-05-26-extractor-dispatch-unification-spec.md (2 round APPROVE) - docs/superpowers/plans/2026-05-26-extractor-dispatch-unification-plan.md (3 round APPROVE) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-26 17:43:44 +00:00
altair823	710945c4b0	refactor(parse-md): absorb kebab-normalize + kebab-parse-types — 24 → 22 crates + §3.7b 재작성 design §3.7b 의 thin layer (ParsedBlock 류) 가 4 parser 중 1개 (markdown) 만 lift 를 경유하는 현실 — fan-in/fan-out 모두 1 → layer 의미 잃음. kebab-normalize (1097 LOC) + kebab-parse-types (98 LOC) 둘을 kebab-parse-md 로 흡수. 설계: docs/superpowers/specs/2026-05-26-normalize-absorption-spec.md 플랜: docs/superpowers/plans/2026-05-26-normalize-absorption-plan.md HOTFIXES: tasks/HOTFIXES.md 의 2026-05-26 entry (design deviation) - 5 사용 type + 3 forward-declared struct → kebab-parse-md::types module 의 pub explicit re-export. - build_canonical_document + derive_title + warning_agent → kebab-parse-md::normalize module. - 4 hard-coded agent literal (lib.rs:122/128/134/153) + warning_agent body return + tracing target literal 모두 보존 — stage label 일관성. - kebab-app callsite (lib.rs:51 use + :1119 context string) + Cargo.toml 의 2 dep (regular + dead) 제거. - kebab-chunk + kebab-store-sqlite 의 [dev-dependencies] kebab-normalize → 제거 (kebab-parse-md 로 갈음). 통합 test source 의 use shift. - test file 이동 (kebab-normalize/tests/normalize_snapshot.rs → kebab-parse-md/tests/). - workspace Cargo.toml: Hunk (a) members 2 entry 삭제 + Hunk (b) version 0.18.0 → 0.19.0 (frozen contract 변경). - design §3.7b 4-단락 재작성 (원래 intent 보존 + 현재 상태 + 보존된 surface + future re-extraction trigger). - design §8 graph 갱신 (3 edge 제거 + 2 forbidden bullet 의미 갱신 + commentary). - ARCHITECTURE.md crate graph + directory tree mechanical 갱신. - tasks/INDEX.md L169 closure mention + "Future work / deferred" 섹션 신설 (image/pdf normalize integration entry). - tasks/HOTFIXES.md 신규 entry (4-block — design deviation Symptom). - HANDOFF.md cross-link 한 줄. - 3 dead struct (ParsedImageRegion / ParsedPdfPage / ParsedAudioSegment) 는 보존 — v0.20+ image/pdf normalize integration 의 future surface (spec §11). Wire / surface impact: 0건. CLI / TUI / MCP / --json 출력 / config / XDG path / parser_version 모두 unchanged. wire-invisible provenance.events[].agent + tracing target literal "kb-normalize" 도 보존 — old DB row 와 new DB row 의 audit log 일관성. Verification: cargo test --workspace --no-fail-fast -j 1 → 1313 passed / 0 failed (172 result blocks). cargo clippy --workspace --all-targets -j 1 -- -D warnings → 0 warning (5m 46s). cargo metadata --no-deps --format-version 1 \| jq '.workspace_members \| length' = 22. cargo tree -p kebab-app --depth 2 \| grep -E "kebab_(parse_types\|normalize)" = 0 줄.	2026-05-26 15:00:59 +00:00
altair823	7c27633df2	chore(rag): post-PR9 refactor — H1/H2/H3/D/E + test coverage + post-refactor dogfood retest OMC team `post-pr9-refactor` 의 architectural cleanup. architect priorities 분석 후 executor + test-engineer 가 file edits, system-architect 가 component-level review 로 pre-cut nothing — all v0.18.1+ defer 결론. ## Executor 작업 (H1/H2/H3/D/E) - H1 (kebab-nli/src/onnx.rs): `[models.nli]` config wire 활성화. `DEFAULT_MODEL_ID` const 제거 (kebab-config 의 NliCfg::defaults 가 single source). OnnxNliVerifier::new 가 config.models.nli.model 읽고 config.models.nli.provider 가 "onnx" 아니면 anyhow::bail. 3 stale "PR-9c-1 will wire this" 코멘트 제거. 2 unit test 추가 (`new_uses_config_model_id`, `new_rejects_unsupported_provider`). - H2 (kebab-rag/src/pipeline.rs): `truncate_for_nli(premise: &str, _hypothesis: &str)` → `truncate_for_nli(premise: &str)`. v0.18.1 placeholder doc 제거. 4 callsite (tests/multi_hop.rs) 갱신 + test rename `multi_hop_truncate_for_nli_preserves_hypothesis` → `multi_hop_truncate_for_nli_char_budget` (contract 정합). - H3 (kebab-rag/src/pipeline.rs:1041): `was_truncated` 가 tracing::debug! 으로 surface (observability 추가, signature 보존 — caller logging contract). - D (kebab-mcp/tests/tools_call_ask_multi_hop.rs): request_timeout_secs 2 → 5 (slow CI 안정성), `mh_code` discriminator 제거. dispatch contract = `mh.is_error.unwrap_or(false)` (기존 assertion 으로 충분). - E (tasks/HOTFIXES.md + pipeline.rs:1633-1638): fb-41 PR-9 closure entry 의 sibling 으로 "### PR-9 NLI refusal: terminal Synthesize hop omitted from hops trace" subsection 추가. pipeline 의 "cleanup deferred to a follow-up" → "// See tasks/HOTFIXES.md ... for follow-up" cross-link. ## Test-engineer 작업 (T1/T2/T3/T4, 9 new tests) - T1 (kebab-nli/src/onnx.rs::tests): sanitize_model_id 3 unit (replaces_slash / idempotent / leaves_other_chars). - T2 (kebab-rag/tests/multi_hop_nli_panic.rs 신규): 2 panic-path tests — facade invariant (`expect("verifier must be Some when nli_threshold > 0.0")`) 의 #[should_panic] + threshold=0 의 companion. - T3 (kebab-rag/tests/multi_hop_nli_stream.rs 신규): 2 StreamEvent::Final tests — refuse_nli_verification + refuse_nli_model_unavailable 의 stream_sink Final 분기 wire shape pinning. - T4 (kebab-app/tests/open_with_config_nli.rs 신규): 2 NLI failure path — model_dir 가 unwritable 일 때 App::open_with_config 의 Result<App> Err (with "OnnxNliVerifier" in chain) + threshold=0 일 때 graceful skip. ## System-architect 결론 3 lenses (absorption / duplication / under-engineered interface) 분석 결과 — pre-cut nothing. Top-3 items 모두 v0.18.1+ defer: - Lens 1: kebab-normalize + kebab-parse-types 흡수 가능 (parse-md 만 사용, 5 parsers 우회) → v0.18.1+. - Lens 3: Extractor + Chunker trait 의 dead polymorphism (모든 callsite 가 hardcoded) → v0.18.1+. - Lens 1 bundled: kebab-source-fs 가 kebab-parse-code 의 9 tree-sitter grammars drag → low-risk dep-graph win, v0.18.1+ bundled. - Defer-with-intent: LanguageModel async refactor (cloud-LLM 시), NliVerifier::score_batch + typed NliError (2nd impl 시), compute_stale → kebab-core::stale. 보고서: /build/cache/tmp/post-pr9-refactor-priorities.md, /build/cache/tmp/system-architecture-priorities.md (둘 다 repo 외 — analysis 보존). ## 검증 - cargo test -p kebab-nli -j 1 → 11/11 pass. - cargo test -p kebab-rag -j 1 → 75/75 pass (5 NLI multi-hop + 4 신규 T2/T3 포함). - cargo test -p kebab-app -j 1 → 23 pass + 2 ignored (T4 의 2 포함). - cargo test -p kebab-mcp --test tools_call_ask_multi_hop -j 1 → 1 pass + 1 pre-existing flaky (HOTFIX #15, no_chunks short-circuit, executor D fix 와 무관 — line 86 의 base assertion 이 fixture 없어서 fail). - cargo clippy --workspace --all-targets -j 1 -- -D warnings clean. - cargo test --workspace --no-fail-fast -j 1 → 1304 passed (+11 new) + 1 pre-existing flaky 동일. - Post-refactor dogfood retest byte-identical (PR-9d / post-cleanup / post-refactor 3번 모두): S7 0.0035389824770390987, S1 0.058334656059741974, S10 0.0027875436935573816, S3 nli_model_unavailable. docs/dogfood/v0.18.0/SUMMARY.md 에 "Post-architectural-refactor retest" section 추가. Wire 영향: 없음. Behavior 영향: 없음 (H1 의 config wiring 가 default 와 같은 model → byte-identical). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 04:42:37 +00:00
altair823	7c85de065a	chore: workspace-wide cleanup — clippy::pedantic baseline + auto-fix cut PR v0.18.0 전 마지막 정리. 사용자 요청: "전체 코드베이스를 깔끔하고 알아보기 쉽게". ## Workspace lints - `Cargo.toml` 의 `[workspace.lints.clippy]` 에 `pedantic = "warn"` (priority -1) + 의도적 allow-list 추가: - cast_possible_truncation / cast_possible_wrap / cast_sign_loss / cast_precision_loss — ONNX i64 / hash modular reduction 등 의도적 truncation. - doc_markdown / missing_errors_doc / missing_panics_doc — cosmetic doc style. - too_many_lines / module_name_repetitions / must_use_candidate / needless_pass_by_value / manual_let_else / items_after_statements / similar_names — informational only. - format_collect / match_wildcard_for_single_variants / trivially_copy_pass_by_ref / unnecessary_wraps — intentional patterns (exhaustive match, future Result variants 등). - default_trait_access — `Foo::default()` 가 idiomatic. - float_cmp — NLI / RRF score 의 explicit threshold 비교 의도. - struct_excessive_bools / case_sensitive_file_extension_comparisons / naive_bytecount / ignore_without_reason — domain-specific 의도. - format_push_string / return_self_not_must_use / match_same_arms — builder / wire-label / hot-path 패턴 보존. - needless_continue / used_underscore_binding / nonminimal_bool / unreadable_literal / many_single_char_names / doc_link_with_quotes / assigning_clones / collapsible_str_replace / trivial_regex / elidable_lifetime_names / range_plus_one / explicit_iter_loop / implicit_hasher / ref_option — remaining low-value style. - 각 24 crate `Cargo.toml` 에 `[lints] workspace = true` 추가. ## Auto-fix `cargo clippy --workspace --all-targets --fix` 적용 — 128 files changed, 552 insertions / 472 deletions. 주로: - uninlined_format_args (~18): `format!("{}", x)` → `format!("{x}")`. - redundant_closure_for_method_calls (~33): `.map(\|x\| x.foo())` → `.map(T::foo)`. - 그 외 mechanical refactor. ## 검증 - `cargo clippy --workspace --all-targets -j 1 -- -D warnings` clean (pedantic + 모든 lint group). - `cargo test --workspace --no-fail-fast -j 1` — 1293 tests pass + 1 pre-existing flaky fail (`kebab-mcp::tools_call_ask_multi_hop::ask_tool_routes_multi_hop_true_to_decompose_first`, HOTFIX candidate, cleanup 무관). 회귀 0. Wire 영향: 없음. Behavior 영향: 없음 (mechanical refactor only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 03:01:58 +00:00
altair823	00ffe9c792	feat(rag): fb-41 PR-9c-2 — pipeline integration + mock test + SKILL.md (★ NLI 실 활성화) PR-9c-1 의 wire surface 위에 behavior 활성화 — `ask_multi_hop` 의 step 8.5 hook 가 `[rag] nli_threshold > 0` 일 때 NLI 검증 실 수행. 첫 user-visible behavior change in PR-9. - crates/kebab-rag/src/pipeline.rs: - ask_multi_hop step 8.5 NLI hook (empty answer 가드 + truncate_for_nli + verifier.score + verification field + refusal 분기). - refuse_nli_verification helper (verification: Some(...) + RefusalReason::NliVerificationFailed). - refuse_nli_model_unavailable helper (verification: None + RefusalReason::NliModelUnavailable). - truncate_for_nli helper (module-level pub fn, MAX_NLI_PREMISE_CHARS = 4 * 400 = 1600 chars 의 chars-based budget, _hypothesis 미사용 placeholder — v0.18.1 token-budget 갱신 candidate). - PR-9c-1 의 #[allow(dead_code)] 두 곳 제거 (verifier field + with_verifier builder; doc 의 transitional sentence 도 정리). round-1 PR-9c-1 review N1 carry-forward closure. - crates/kebab-app/src/app.rs: - App::open_with_config 의 NliVerifier construction — config.rag.nli_threshold > 0 → OnnxNliVerifier::new + Arc::new wrap + 후속 RagPipeline 초기화 시 with_verifier 호출. 실패 시 ? 전파 (시그니처 Result<Self> 그대로 — caller cascading 0). - kebab-app/Cargo.toml 에 kebab-nli path 의존 추가. - crates/kebab-rag/tests/multi_hop.rs + tests/common/mod.rs: - MockNliVerifier (pass / fail / err 생성자 + score call_count instrumented). - multi_hop_nli_pass_keeps_grounded — entailment 0.9 → grounded=true, verification.nli_passed=true. - multi_hop_nli_fail_refuses — entailment 0.1 → refusal=NliVerificationFailed. - multi_hop_nli_disabled_skip_verify — threshold 0.0 → verify skip, verification=None. - multi_hop_nli_model_unavailable_refuses — verifier Err → refusal=NliModelUnavailable. - multi_hop_truncate_for_nli_preserves_hypothesis — long premise truncation + hypothesis 보전. - integrations/claude-code/kebab/SKILL.md: mcp__kebab__ask 절에 NLI 안내 한 단락 (verification.nli_passed 의미 + threshold tuning + nli_verification_failed/nli_model_unavailable refusal handling). 검증: cargo test --workspace -j 1 — 5 신규 multi-hop pass + 회귀 0 (pre-existing kebab-mcp::tools_call_ask_multi_hop 동일 flaky). cargo clippy --workspace --all-targets -j 1 -- -D warnings clean. Wire 영향: PR-9c-1 의 schema 변경에 behavior wiring — answer.v1.verification field 가 multi-hop happy path + refuse path 양쪽에서 채움. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 00:55:02 +00:00
altair823	cf35f36f88	feat(rag): fb-41 PR-2 — RagPipeline::ask_multi_hop skeleton (fixed depth=2) PR-2 of fb-41 multi-hop RAG. Decompose + retrieve + synthesize 3-stage pipeline가 `opts.multi_hop=true` 일 때 dispatch. Dynamic decide loop 는 PR-3. - `AskOpts.multi_hop: bool` 필드 추가 + `impl Default for AskOpts` 도입 (HOTFIXES 2026-05-07 의 known limitation 해소). 9 explicit init site 모두 `multi_hop: false` 추가 — Default 도입으로 향후 `..Default::default()` 점진 migrate 가능. - `RagPipeline::ask` 의 entry 에 dispatcher 한 줄 (`if opts.multi_hop { return self.ask_multi_hop(...) }`). - `RagPipeline::ask_multi_hop` 신규 method. 1) decompose LLM call → JSON array of strings parse, 2) 각 sub-query 로 retrieve + chunk_id dedup pool, 3) score gate / no-chunks 가드, 4) pack_context (single-pass 와 helper 공유), 5) synthesize LLM call w/ MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT, 6) citation extract + Answer build. `prompt_template_version` = "rag-multi-hop-v1" 로 stamp — eval `compare` 가 single-pass vs multi-hop 분리. - Prompt const 신규: MULTI_HOP_DECOMPOSE_SYSTEM_PROMPT + MULTI_HOP_DECOMPOSE_USER_TEMPLATE + MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT + PROMPT_TEMPLATE_VERSION_MULTI_HOP + MULTI_HOP_MAX_SUB_QUERIES_DEFAULT. - `kebab_core::RefusalReason::MultiHopDecomposeFailed` variant 신규. Cascade: kebab-store-sqlite `refusal_reason_label` + kebab-tui `ask refusal render` exhaustive match 갱신. - `parse_decompose_response` + `strip_markdown_json_fence` helper — markdown code fence (```json / ```) strip + JSON array of strings parse + trim + drop empty + cap at MULTI_HOP_MAX_SUB_QUERIES_DEFAULT. None 반환 시 caller 가 `MultiHopDecomposeFailed` refusal. Tests (55 passing total, 8 신규): - 6 unit (parse_decompose_response 의 bare array / fence variants / garbage / cap / trim 회귀 핀). - 2 integration: `ask_multi_hop_dispatches_and_decompose_garbage_refuses` (decompose garbage → MultiHopDecomposeFailed + 정확히 1 LLM call) + `ask_with_multi_hop_false_keeps_single_pass_path` (회귀 핀, 기존 caller 자동 backwards-compat). Happy-path multi-hop (decompose 성공 → synthesize) 의 integration test 는 ScriptedLm helper 가 PR-3 의 decide loop 와 함께 도입될 때 같이 추가. 현 `MockLanguageModel` 는 canned single response 라 2-LLM-call sequence 핀 불가. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 06:45:32 +00:00
altair823	13a3361ba2	docs(v0.17.0/PR-C): rustdoc — code_lang_breakdown / repo_breakdown 가 실제로 doc count 임을 명시 (PR #161 워커 리뷰 MEDIUM 반영) JSON schema description 은 PR-C 본체에서 'code chunk count' → 'doc count' 로 정정했으나 Rust struct field 의 rustdoc 은 같은 오기재를 그대로 carry — Gemini round 2 가 JSON schema 만 봤고 rustdoc 은 miss. 워커 둘 다 동일 finding (MEDIUM). implementation 변경 없음 — 의미가 doc count 였던 사실이 처음부터 일관. wording 만 맞춤. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 20:35:01 +00:00
altair823	0def913abd	feat(v0.17.0/PR-C): code_lang_chunk_breakdown additive wire field closure of HOTFIXES 2026-05-22 "code_lang_breakdown chunk granularity" LOW. Chunk-level companion of the existing doc-count metric. - crates/kebab-store-sqlite/src/store.rs: code_lang_chunk_breakdown() method. chunks INNER JOIN documents → COUNT(c.chunk_id) GROUP BY metadata_json.code_lang, NULL skipped. BTreeMap<String, u32>. + lib unit test code_lang_chunk_breakdown_counts_chunks_not_docs (1 rust doc + 3 chunks → rust=3 chunks vs rust=1 doc). - crates/kebab-app/src/schema.rs: Stats.code_lang_chunk_breakdown additive field + collect_stats builder. tests_stats_ext 의 stats_includes_code_lang_and_repo_breakdown_fields 가 신규 필드도 검증. - docs/wire-schema/v1/schema.schema.json: 신규 additive 필드 명세 + 기존 code_lang_breakdown / repo_breakdown description 정정 ("code chunk count" → "doc count", Gemini round 2 권고). - tasks/HOTFIXES.md: 2026-05-24 PR-C closure entry. wire additive, schema_version bump 불필요. v0.16.x 호출 호환. cargo test --workspace --no-fail-fast -j 1 + clippy 통과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 20:35:01 +00:00
altair823	93ddece111	feat(v0.17.0/PR-B/B1): C typedef extractor + parser_version bump + orphan purge cascade closure of HOTFIXES 2026-05-21. C typedef-wrapped anonymous struct/enum/union 이 typedef alias 이름으로 symbol unit 방출. - crates/kebab-parse-code/src/c.rs: type_definition 분기 추가. inner anonymous struct_specifier / enum_specifier / union_specifier 탐지 → declarator field 의 type_identifier 재귀 추출 → synthetic unit (typedef alias). named inner aggregate / plain alias 는 기존대로 glue. PARSER_VERSION code-c-v1 → code-c-v2. recover_typedef_alias + extract_typedef_alias_name helper 추가. - crates/kebab-store-sqlite/src/store.rs: 두 helper 신규 (parser_version bump cascade 용 doc-id 기반 orphan purge). - stale_chunk_ids_for_workspace_path_except_doc_id(workspace_path, keep_doc_id) — sister of stale_chunk_ids_at, doc_id 기반. - purge_document_at_workspace_path_except_doc_id(workspace_path, keep_doc_id) — CASCADE document/chunks 제거, assets 보존. keep_doc_id="" 가 "모든 doc 제거" 사용. - crates/kebab-app/src/lib.rs: try_skip_unchanged 의 parser_mismatch 분기에서 purge_workspace_path_for_parser_bump 호출. helper 가 app.vector() 로 lazy 접근 + delete_by_chunk_ids + SQLite document row 제거. Ok(None) 반환 전 cleanup 끝나서 caller 의 새 INSERT 시 idx_docs_workspace_path UNIQUE 충돌 회피. - tests: - c.rs unit tests 4 신규 — typedef_struct_emits_unit / typedef_enum_emits_unit / typedef_union_emits_unit / typedef_to_existing_type_stays_glue (negative). - tier1_c_ingest_searchable: parser_version assertion code-c-v1 → code-c-v2. - 회귀: bytes-edit 경로 (asset_id 변경) 의 기존 purge_orphan_at_workspace_path + purge_vector_orphans_for_workspace_path 는 그대로 — 신규 분기와 공존, 기존 test 모두 PASS. 미해결 (Risks): nested typedef (typedef struct { struct {...} inner; } Outer;) 의 inner 익명 struct 는 여전히 glue — v2 의 1차 범위는 top-level typedef alias 만. cargo test --workspace --no-fail-fast -j 1 + clippy 통과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 20:30:57 +00:00
altair823	0ee18149e7	test(v0.17.0/A5 follow-up): trigram tokenizer downstream test fixes trigram tokenizer 가 snippet 단위 + 단어 경계 + BM25 raw score 분포를 모두 바꿔서 unicode61 assumption 기반의 3 test 가 regression. - wire_search_response::search_json_truncates_with_max_tokens + search_plain_emits_truncated_hint_to_stderr: 단일 doc + 작은 max_tokens 로는 snippet 이 짧아서 budget loop 가 trip 안 함. 다중 doc fixture (5 doc) + budget 30 token 으로 hit-pop 경로 통해 truncated=true 보장. - fetch_integration::fetch_chunk_with_context_returns_neighbors: fixture body 의 2-char tokens (A1/A3 등) 가 trigram 비호환으로 0-hit. apples/banana/cherry/durian/elder 5-char unique words 로 갱신, query 도 cherry 로 deterministic pin. - eval/runner::runner_per_query_snapshot_matches_fixture: trigram token stream 으로 BM25 raw score 변동. UPDATE_SNAPSHOTS=1 로 regenerate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 12:21:34 +00:00
altair823	6ac7fea7b9	feat(v0.17.0/A5): trigram-aware build_match_string + SearchResponse.hint PR-A 본체. plan Task A4 Step 1c + A5. - lexical.rs::build_match_string 재설계: whole-phrase + token-AND OR-combined, 3자 미만 토큰 drop, 후보 없음 시 None (빈 MATCH 회피). raw single-quote mode 유지. - SearchResponse.hint additive — empty result + trimmed < 3 chars + non-raw 케이스에 short_query_hint helper 가 set. - CLI 'kebab search' 가 [hint] stderr 한 줄 (text mode). - TUI SearchState.short_query_hint + poll_worker stale-aware set + fire_search/mark_input_changed reset + dynamic_status 표시. - docs/wire-schema/v1/search_response.schema.json hint additive. - 신규 unit tests (lexical 9 PASS, 기존 2 expectation 갱신) + 통합 회귀 (search_korean: multi_token + mixed, 3 PASS) + BM25 snapshot regen (trigram token stream). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 11:54:25 +00:00
altair823	1969c8e3b5	fix(dogfood): k8s multi-resource YAML chunk_id collision P10 dogfooding found that a k8s manifest with 2+ documents (e.g. Deployment + Service in one file) fails to ingest: UNIQUE constraint failed: chunks.chunk_id Root cause: tier2_shared::push_chunks_with_oversize's non-oversize branch hardcoded split_key = None. K8sManifestResourceV1Chunker calls it once per resource; with split_key None every resource from the same document gets the same id_hash (= base_policy_hash) → identical chunk_id. p10-3's code_text_paragraph_v1 had the same bug (fixed in `df3c5b8`) but it calls build_chunk_no_symbol directly — the push_chunks_with_oversize path was never fixed. Fix: push_chunks_with_oversize gains a base_split_key parameter for the non-oversize single-chunk case. k8s chunker passes Some(resource.line_start) so each resource gets a distinct chunk_id; dockerfile / manifest pass None (1 chunk per file — no sibling collision, chunk_id stays stable). Regression coverage: k8s_multi_doc_emits_one_chunk_per_resource now asserts chunk_id distinctness; new integration test tier2_k8s_multi_resource_yaml_ingests_without_collision ingests a real 2-document YAML end-to-end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 23:49:37 +00:00
altair823	192835e5bf	test(p10-1d): integration smoke tests for C + C++ Verifies end-to-end ingest + search + Citation::Code shape: - tier1_c_ingest_searchable: .c file → --code-lang c search → symbol = function name (no nesting), lang = "c", chunker_version = "code-c-ast-v1". - tier1_cpp_ingest_searchable: .cpp file → --code-lang cpp search → symbol starts with namespace::Class prefix, lang = "cpp", chunker_version = "code-cpp-ast-v1". Brings code_ingest_smoke to 18 tests (Tier 1: 9 → 11, Tier 2: 3, Tier 3: 4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 14:31:35 +00:00
altair823	1034de25a2	fix(p10-3+p10-1d): land the missing try_skip_unchanged fallback-aware fix PR #155 (p10-3) merged WITHOUT the reviewer's required Option B1 fix — the implementer reported a commit SHA (2a39513) that never made it to main. Result: every reingest of a Tier 3-fallback file (non-k8s YAML, invalid YAML, AST extractor failure) re-runs full extract + chunk + embed because the parser/chunker version comparison can never match (stored is code-text-paragraph-v1 / none-v1, but caller uses Tier 1/2 dispatch values). This commit: 1. Adds the 7th param `fallback_chunker_version: Option<&ChunkerVersion>` to try_skip_unchanged + the stored_is_tier3_fallback detection branch (skip parser/chunker equality, keep embedder check). 2. Threads `None` through non-code call sites (md / image / pdf). 3. Code call site computes tier3_fallback_cv covering all Tier 1/2 langs that can fall back: rust / python / ts / js / go / java / kotlin / yaml / dockerfile / toml / json / xml / groovy / go-mod / c / cpp (p10-1D additions). 4. Adds tier3_yaml_fallback_reingest_is_unchanged + tier3_shell_reingest_is_unchanged regression tests (the originally-promised PR #155 regression coverage that also never made it to main). Smoke tests: 14 + 2 = 16 PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 14:19:17 +00:00
altair823	d1560be80d	feat(p10-1d): activate C + C++ in ingest_one_code_asset dispatch Extends 4-arm match (parser_version / chunker_version / extract / chunks) + allowlist + tier3_fallback_cv with "c" + "cpp" arms. C uses CAstExtractor + CodeCAstV1Chunker; C++ uses CppAstExtractor + CodeCppAstV1Chunker. Both langs are Tier 3-fallback-eligible (e.g. .h file with C++ syntax may fail tree-sitter-c parse → Tier 3 paragraph fallback per p10-3 wrapper). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 13:56:45 +00:00
altair823	df3c5b8caf	test(p10-3): integration smoke tests for Tier 3 (shell + yaml fallback) Two new tests verify end-to-end Tier 3 wiring: - tier3_shell_ingest_searchable: .sh file → --code-lang shell search → Citation::Code { symbol: None, lang: "shell" }, chunker_version "code-text-paragraph-v1". - tier3_yaml_fallback_picks_up_non_k8s_yaml: docker-compose-shaped yaml (no apiVersion/kind) triggers k8s chunker's Ok(vec![]) result, fallback retries with Tier 3 → Citation::Code { symbol: None, lang: "yaml" } and chunker_version "code-text-paragraph-v1". Also fixes a bug in CodeTextParagraphV1Chunker (Task B): short paragraphs (≤80 lines) were emitted with split_key=None, causing all paragraphs from the same document to share the same chunk_id (UNIQUE constraint violation at put_chunks). Fix: always use para.line_start as split_key so every paragraph gets a distinct id regardless of size. Brings code_ingest_smoke to 14 tests (Tier 1: 9, Tier 2: 3, Tier 3: 2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 11:37:44 +00:00
altair823	5051ea7534	feat(p10-3): Tier 1/2 → Tier 3 fallback wrapper in ingest_one_code_asset After the chunks match resolves, an Ok(empty) result (Tier 2 invalid YAML / non-k8s YAML / similar) or Err (Tier 1 extractor / chunker failure) is retried against CodeTextParagraphV1Chunker. On retry, chunker_version is swapped to "code-text-paragraph-v1" and canonical.parser_version to "none-v1" so downstream stamping + try_skip_unchanged remain consistent. Extract failure is handled similarly — when a Tier 1 extractor errors (e.g. tree-sitter parse failure), a synthesize_tier2_document-shaped fallback doc is built from raw bytes and routed through Tier 3 chunker directly (extract_fell_back guard). shell direct path + Tier 2 extract synthesize_tier2_document failures are exempted from the fallback chain (they ARE Tier 3 already, or the error is real). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 11:32:49 +00:00
altair823	88d7fbc182	feat(p10-3): activate shell direct routing through Tier 3 chunker Extends ingest_one_code_asset's allowlist + 4-arm match (parser_version / chunker_version / extract / chunks) to admit code_lang "shell" and route it to CodeTextParagraphV1Chunker. parser_version "none-v1" + synthesize_tier2_document reused. Tier 1/2 fallback wrapper lands in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 11:28:41 +00:00
altair823	b5c12ecb6f	docs(p10-2-followup): clarify synthesize_tier2_document path resolution comment Earlier comment claimed the function "mirrors RustAstExtractor pattern" but the two differ: RustAstExtractor joins ctx.workspace_root to handle relative paths, while Tier 2 trusts FsSourceConnector's absolute-path invariant. Rephrase to document the actual rationale + the Kb URI fallback. Reviewer nit #3 (PR #153 code-reviewer). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 14:39:02 +00:00
altair823	166e1ddfaf	test(p10-2): integration smoke tests for Tier 2 (k8s yaml + Dockerfile + Cargo.toml) Three new tests in code_ingest_smoke.rs verifying isolated-TempDir ingest + --code-lang filter + Citation::Code.lang / .symbol shape for each Tier 2 chunker. Brings the suite to 12 tests (Rust 3 + Python 1 + TS 1 + JS 1 + Go 1 + Java 1 + Kotlin 1 + yaml 1 + dockerfile 1 + manifest 1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 13:23:01 +00:00
altair823	226ce8b744	feat(p10-2): activate Tier 2 chunkers in ingest_one_code_asset dispatch Adds yaml / dockerfile / toml / json / xml / groovy / go-mod arms to the existing 7-arm AST match. parser_version unified to "none-v1" for Tier 2. synthesize_tier2_document builds a minimal Document (single Block::Code with raw file text) since Tier 2 has no parse step. allowlist in ingest_one_asset extended to admit Tier 2 langs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 13:19:54 +00:00
altair823	ff1bedbef5	feat(p10-1c-jk): activate Kotlin in ingest_one_code_asset dispatch Replaces Kotlin bail! arms with KotlinAstExtractor + CodeKotlinAstV1Chunker. Adds kotlin_file_ingests_and_searches_as_code_citation integration test — asserts citation.lang=kotlin, symbol=com.foo.Foo.bar, code_lang=kotlin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 10:54:55 +00:00
altair823	ebc4ef2eea	feat(p10-1c-jk): activate Java in ingest_one_code_asset dispatch Replaces Java bail! arms with JavaAstExtractor + CodeJavaAstV1Chunker. Adds java_file_ingests_and_searches_as_code_citation integration test — asserts citation.lang=java, symbol=com.foo.Foo.bar, code_lang=java. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 10:44:05 +00:00
altair823	f4c840b994	refactor(p10-1c-jk): add java + kotlin to dispatch allowlist (bail until Tasks F/I) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 10:33:27 +00:00
altair823	c19aa006d0	feat(p10-1c-go): activate Go in ingest_one_code_asset dispatch Replaces Go bail! arms with GoAstExtractor + CodeGoAstV1Chunker. Adds go_file_ingests_and_searches_as_code_citation integration test — asserts citation.lang=go, symbol=chunk.ParseDoc, code_lang=go. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 09:13:47 +00:00
altair823	2559d0d95a	refactor(p10-1c-go): add go to ingest dispatch allowlist (bail until Task F) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 09:03:28 +00:00
altair823	453ec15df4	fix(dogfood): document-centric fetch_span + remove get_asset_by_workspace_path assets.workspace_path is INTENTIONALLY 'last-registered path' for twin files (identical content at different paths share one asset row PK'd by blake3 content hash). PR #146 made try_skip_unchanged document-centric; PR #149 made reset --orphans-only document-centric; this PR removes the last caller of get_asset_by_workspace_path (fetch.rs:193 in fetch_span, which used it to reject PDF/audio media — for twins this could read the wrong asset's media_type and pick the wrong branch). Replaced with the natural 2-step lookup: get_document_by_workspace_path (PR #146) → doc.source_asset_id → get_asset (NEW trait method, asset_id is PRIMARY KEY so flip-flop-immune by construction). Then removed get_asset_by_workspace_path trait method + SqliteStore impl — 0 callers after the refactor. UPSERT doc-comment refreshed in store.rs to make the 'last-registered' semantics explicit so future readers don't try to 'fix' the flip-flop. Dogfood follow-up (PR #142 1B + multi-root corpus). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 08:03:38 +00:00
altair823	5f2bd9e97e	feat(dogfood): kebab reset --orphans-only — purge stored docs outside walker scope PR #148 auto-purges only filesystem-missing files (conservative — leaves on-disk-but-out-of-scope docs alone for data safety). This is the explicit complement: when the user has narrowed include / widened exclude / removed a sub-directory from the workspace and WANTS the stored docs reconciled, they invoke 'kebab reset --orphans-only'. Confirm prompt with orphan count + sample paths; --yes required in non-TTY. SQLite purge via existing purge_deleted_workspace_path (PR #148) + vector store delete_by_chunk_ids when configured. No fs existence check — orphans-only is the explicit 'I know what I'm doing' variant. dogfood follow-up to PR #148 (file deletion auto-purge). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 07:38:10 +00:00
altair823	d6d165df01	docs(dogfood): sync sweep_deleted_files algorithm doc with try_exists (PR #148 nit) Round 2 review found the function-level doc-comment still referenced the old fs::exists() (now replaced by try_exists().unwrap_or(true) in commit `2baa846`). One-line clarification — describes the conservative-on-Err semantics so future readers don't reintroduce the data-safety bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 07:10:27 +00:00
altair823	2baa846c6b	fix(dogfood): conservative try_exists() in sweep_deleted_files (PR #148 review) Round 1 review found a data-safety bug: fs::exists() returns false on errors like EACCES / EPERM / NFS-hiccup / ownership-change, which would trigger purge on a file that is in fact still on disk (just unreadable this moment). Switched to try_exists().unwrap_or(true) so transient FS errors are CONSERVATIVELY treated as 'file present' — never purge on uncertain signal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 07:04:03 +00:00
altair823	27baec82ea	fix(dogfood): auto-purge stored docs for filesystem-deleted files Files deleted from disk (rm a.md) were leaving stale documents + chunks + embeddings in the store, surfacing as ghost citations in search/ask. Existing purge_orphan_at_workspace_path only handled content-changed stale (WHERE workspace_path=? AND asset_id != ?) — file deletion has no new asset_id. Fix: post-walker-scan sweep. Compute (stored_paths - scanned_paths), for each candidate check filesystem existence — only purge when the file is TRULY missing. Scope-narrowing case (file on disk but outside include glob) is explicitly NOT purged to protect users from accidental data loss via config edits. Adds: - DocumentStore::all_workspace_paths trait method + SqliteStore impl - purge_deleted_workspace_path in store-sqlite (returns chunk_ids for vector delete; deletes doc CASCADE + asset row + copied storage file) - sweep_deleted_files in kebab-app::ingest path; called once per ingest before the per-asset loop - IngestReport.purged_deleted_files counter (additive, serde default) - CLI ingest summary mentions purge count when > 0 - 2 integration tests: file_deletion_auto_purge + include_scope_narrowing_does_NOT_purge dogfood discovery (PR #142 1B + multi-root: kebab-docs + httpx + zod + lodash). Per user decision: only filesystem deletion auto-purges; scope narrowing requires explicit kebab reset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 06:51:07 +00:00
altair823	360f825f3a	docs(dogfood): refresh try_skip_unchanged doc-comment to match new flow (PR #146 review) Round 1 review found the function-level doc-comment still described the old asset-side algorithm (item 2 asset-row checksum, item 3 id_for_doc miss). Updated to the document-centric flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 05:35:17 +00:00
altair823	641b92af7d	fix(dogfood): document-centric try_skip_unchanged for twin-file idempotency Identical-content files at different workspace paths share one assets row (assets.asset_id = blake3 content hash, PRIMARY KEY). The UPSERT `ON CONFLICT(asset_id) DO UPDATE SET workspace_path = excluded` made twin files overwrite each other's workspace_path on every ingest, so `get_asset_by_workspace_path(path1)` returned the OTHER twin's row (or None) — break idempotent unchanged-detection for both files. Fix: switch try_skip_unchanged to document-centric lookup. `documents. workspace_path` is already UNIQUE (V001) and `id_for_doc(path, ...)` includes path, so each twin has its own stable document row. Compare `doc.source_asset_id` with the new asset's checksum instead of going through the assets table. Dogfood (multi-root: kebab-docs + httpx + zod + lodash) showed 27 of 726 docs marked Updated on every idempotent re-ingest — all 27 are twin-file victims (empty `__init__.py` ×3, AGENTS.md ↔ CLAUDE.md same content, duplicate logo PDFs/JPGs). After: re-ingest reports 0 new / 0 updated / 726 unchanged. No schema migration needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 05:27:21 +00:00
altair823	4e8b84c4e0	fix(dogfood): populate schema.v1.repo_breakdown (Task 9 follow-up) Dogfooding (PR #142 1B + multi-root corpus: kebab-docs + httpx + zod + lodash) revealed schema.v1.repo_breakdown is always {} despite the 1A-2 Task 9 having added the code_lang_breakdown sibling. The schema.rs:171 placeholder `BTreeMap::new()` was left in place. Mirror Task 9's code_lang_breakdown query for the repo field — same metadata_json JSON-path pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 05:09:19 +00:00
altair823	d53995a6d4	feat(p10-1b): code-js-ast-v1 chunker + activate JavaScript in app dispatch Chunker: duplicate-with-substitution from code-ts-ast-v1 / code-rust-ast-v1. Dispatch: replaces JS bail! arms with JavascriptAstExtractor + CodeJsAstV1Chunker. Integration test javascript_file_ingests_and_searches_as_code_citation asserts citation.lang=javascript, symbol=src/Bar.Bar.baz, code_lang=javascript. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 01:16:07 +00:00
altair823	31245a4328	fix(p10-1b): TS parser_version code-typescript-v1 → code-ts-v1 (naming consistency) Task H implementer chose code-typescript-v1 but plan + design §3.3 use the short form (chunker is code-ts-ast-v1 / code-js-ast-v1). Aligning parser versions to match: rust=code-rust-v1 / python=code-python-v1 / ts=code-ts-v1 / js=code-js-v1 (Task K). Fixes 2 sites: const PARSER_VERSION + integration test assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 01:05:17 +00:00
altair823	acb61b6830	feat(p10-1b): activate TypeScript in ingest_one_code_asset dispatch Replaces TS bail! arms with TypescriptAstExtractor + CodeTsAstV1Chunker. Adds typescript_file_ingests_and_searches_as_code_citation integration test — asserts citation.lang=typescript, symbol=src/Foo.Foo.bar, code_lang=typescript. JS arms remain bail!() (Task L). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 00:59:41 +00:00
altair823	1815091247	feat(p10-1b): activate Python in ingest_one_code_asset dispatch Replaces Python bail! arms with PythonAstExtractor + CodePythonAstV1Chunker. Adds python_file_ingests_and_searches_as_code_citation integration test — asserts citation.lang=python, symbol=kebab_eval.metrics.compute_mrr, code_lang=python. TS/JS arms remain bail!() (Tasks J/L). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 00:49:01 +00:00
altair823	8bdb3e8090	refactor(p10-1b): generalize ingest_one_code_asset for multi-language dispatch Rust path observably unchanged (verified by existing code_ingest_smoke tests). Python/TS/JS arms bail with TODO; per-lang extractor + chunker land in subsequent tasks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 00:35:53 +00:00
altair823	b1d5047399	fix(p10-1a-2): PR review round 1 — doc inconsistencies + observable backfill error path PR #140 회차 1 actionable 7건 반영: - docs/SMOKE.md: parser_version "code-rust-ast-v1" → "code-rust-v1" (chunker_version 과 혼동); jq path .citation.code_lang → .citation.lang (wire 의 code_lang 은 SearchHit top-level) - docs/ARCHITECTURE.md: Mermaid pcode→ptypes 잘못된 edge → pcode→core 로 정정 (kebab-parse-code Cargo.toml 실제 dep 와 일치); 디렉토리 트리에서 code-rust-ast-v1 chunker 표기 위치 kebab-parse-code → kebab-chunk 로 정정 - crates/kebab-app/src/app.rs: backfill_repo 의 .ok().flatten() 실패 silent swallow → tracing::warn 로 관측 가능, 비-abort 의도 보존 - crates/kebab-parse-code/src/rust.rs: impl_item arm 의 "function_item 만 unit 생성" 1A scope 한정 주석을 외부에서도 보이도록 arm 상단에 한 줄 추가 (내부 주석은 유지) verify: kebab-parse-code 7/7 / kebab-app --lib 51/51 / code_ingest_smoke 3/3 green; touched-crate clippy clean (재부팅 전 검증). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 23:24:20 +00:00

1 2 3

134 Commits