feat(kebab-app): P6-4 image ingest wiring — kebab ingest 가 PNG/JPEG 자산도 처리

P6-1/P6-2/P6-3 의 라이브러리 (`ImageExtractor`, `OllamaVisionOcr`, `apply_caption`) 가 그동안 CLI 에서 보이지 않던 미완 구간을 완성. 이제 `kebab ingest` 가 markdown 외에 이미지 자산을 end-to-end 로 색인하고, `kebab search` / `kebab ask` 가 OCR 텍스트 + caption 으로 이미지를 매칭/인용한다. ## kebab-app - `[dependencies]` 에 `kebab-parse-image` 추가. - `ingest_with_config` 진입 시 `image.ocr.enabled` / `image.caption.enabled` 플래그에 따라 `OllamaVisionOcr` / `OllamaLanguageModel` 을 **ingest 세션당 1회** 빌드. 자산 루프에서 trait object 로 공유. reqwest::blocking::Client 의 내부 Arc 덕분에 알로케이션 비용은 자산 수와 무관. - 두 어댑터 + ImageExtractor 를 한 묶음으로 `ImagePipeline` 구조체에 담아 `ingest_one_asset` 매개변수 폭증 차단 (clippy::too_many_arguments 대응). - `ingest_one_asset` 의 markdown-only 가드를 `match media_type` 으로 교체 — Markdown 은 기존 경로, Image(_) 는 새 `ingest_one_image_asset` 로 분기, PDF/Audio/Other 는 종전대로 skipped. - 신규 `ingest_one_image_asset`: - bytes 읽기 → `ImageExtractor::extract` (실패 시 caller 가 errors+=1) - `apply_ocr` (Lenient — 실패 시 ProvenanceKind::Warning 이벤트 + `IngestItem.warnings` 에 \"ocr_failed: ...\", `block.ocr` 는 None 유지) - `apply_caption` (동일 Lenient 정책) - 기존 `MdHeadingV1Chunker` 호출 — 청커는 이미 `Block::ImageRef` 를 단일 청크로 emit - 기존 persist + embed 시퀀스 그대로 (markdown 과 byte-identical) - `lang_hint_from_doc` — `Lang(\"und\")` 또는 빈 문자열을 None 으로 매핑 (image-pipeline 어댑터의 build_prompt 가 \"und\" 를 silent drop 하지 않도록 caller 측에서 미리). ## kebab-chunk - `render_block_text` 의 `Block::ImageRef` 분기를 P6-4 (β) plain concat 정책으로 교체 — `[alt, ocr.joined, caption.text]` 를 `\\n\\n` 로 join, 빈 부분은 drop. alt 가 비면 `src` 의 basename 으로 fallback (P6-1 contract 의 defensive guard). - 신규 unit 테스트 `image_ref_p6_4_plain_concat_drops_empty_parts` — alt-only / alt+ocr / alt+caption / alt+ocr+caption / 빈 alt → src fallback 다섯 케이스 모두 검증. - 기존 `image_ref_emits_own_chunk_zero_tokens` 그대로 통과 — 청커의 per-block dispatch 는 변경 없음, text 렌더링만 갱신. ## 통합 테스트 (kebab-app/tests/image_pipeline.rs) wiremock 으로 Ollama 를 stub. 5건: 1. OCR-only happy path — 1 PNG + ocr.enabled → 1 doc + 1 chunk emit, `block.ocr.joined` 가 mock 의 \"Hello World 2026\". 2. OCR + caption 동시 활성 — 두 필드 모두 채워지고 chunk text 에 alt + ocr + caption 세 부분 모두 포함. 3. Lenient 실패 검증 — OCR 503 시 자산은 indexed (kind=New), `errors=0`, ProvenanceKind::Warning attributed to \"kb-app\", `IngestItem.warnings` 에 \"ocr_failed:\" 노트. 4. 양쪽 비활성 — `image.ocr.enabled=false && image.caption.enabled=false` 여도 자산은 chunk 1개로 indexed (chunk text=filename), EXIF + dimensions 그대로 채워짐. 5. 결정성 (re-ingest) — 동일 PNG 두 번 ingest 시 두 번째는 `Updated` + 동일 `doc_id`. ## SMOKE.md `kebab search --mode lexical \"Hello World\"` 단계를 명령 시퀀스에 추가. `[image.ocr]` / `[image.caption]` config 절 예시 + ingest 시간 추정 (자산당 ~5-10초) 추가. \"책은 P7 PDF 라인으로\" 가이드를 검증 체크리스트 와 \"알려진 동작\" 양쪽에 박음. ## 실 Ollama 통합 검증 192.168.0.47 + gemma4:e4b 기준: ``` $ kebab --config /tmp/kebab-smoke/config.toml ingest scanned 2 new 2 updated 0 skipped 0 errors 0 (18395 ms) $ kebab inspect doc <image_doc_id> parser_version: image-meta-v1 blocks: [{ alt: \"hello.png\", ocr: \"Hello World 2026\", caption: \"The image displays the text \\\"Hello World 2026\\\" in a large, black, sans-serif font.\" }] $ kebab --json ask \"Hello World 텍스트가 어디에 있나?\" --mode hybrid grounded: true citations: [{marker: \"[1]\", doc_path: \"hello.png\"}] ``` ## 검증 - `cargo test --workspace --no-fail-fast -j 1` — 전부 pass - `cargo clippy --workspace --all-targets -- -D warnings` — pass - `cargo test -p kebab-chunk image_ref` — 2 pass (P1-5 회귀 + P6-4 신규 unit) - `cargo test -p kebab-app --test image_pipeline` — 5 pass ## 의존성 경계 - `kebab-app` 이 `kebab-parse-image` 추가 — spec Allowed dep 그대로. - 새 forbidden 침범 없음 (기존 `kebab-tui` / `kebab-desktop` / `kebab-eval` 미참조 유지). - 본 task 가 신설하는 image-specific 비즈니스 로직 0줄 — 모두 `kebab-parse-image` 에 위임. `tasks/p6/p6-4-image-ingest-wiring.md` status: planned → completed. contract: docs/superpowers/specs/2026-04-27-kebab-final-form-design.md sections: §3.4 ImageRefBlock, §6.1 ingest pipeline, §7.2 Extractor/Chunker traits, §9.1 image extraction policy.
2026-05-02 07:37:56 +00:00
parent 5453829195
commit ca0567c72b
7 changed files with 807 additions and 32 deletions
--- a/crates/kebab-chunk/src/md_heading_v1.rs
+++ b/crates/kebab-chunk/src/md_heading_v1.rs
@@ -381,17 +381,41 @@ fn render_block_text(b: &Block) -> String {
            }
            s
        }
-        // ImageRef text portion = alt (per task spec). Fall back to
-        // model caption text if alt is empty.
+        // ImageRef text portion follows the P6-4 (β) plain-concat
+        // contract — `[alt, ocr.joined, caption.text]` joined by
+        // `\n\n`, dropping empty parts. Filename fallback for empty
+        // alt keeps lexical search hits on filenames working even when
+        // P6-1's filename auto-fill is bypassed.
        Block::ImageRef(i) => {
-            if !i.alt.is_empty() {
+            let alt = if !i.alt.is_empty() {
                i.alt.clone()
            } else {
-                i.caption
-                    .as_ref()
-                    .map(|c| c.text.clone())
-                    .unwrap_or_default()
-            }
+                // P6-1 falls back to filename so this branch is
+                // defensive — keep it lest a future test fixture or
+                // synthetic block path skip the auto-fill.
+                i.src
+                    .rsplit('/')
+                    .next()
+                    .filter(|s| !s.is_empty())
+                    .unwrap_or("[image]")
+                    .to_string()
+            };
+            let ocr = i
+                .ocr
+                .as_ref()
+                .map(|o| o.joined.as_str())
+                .unwrap_or("");
+            let cap = i
+                .caption
+                .as_ref()
+                .map(|c| c.text.as_str())
+                .unwrap_or("");
+            [alt.as_str(), ocr, cap]
+                .iter()
+                .filter(|s| !s.is_empty())
+                .copied()
+                .collect::<Vec<_>>()
+                .join("\n\n")
        }
        // AudioRef has no caption preview yet (transcript joins land
        // in P8). Empty string per task spec.
@@ -700,6 +724,63 @@ mod tests {
        }
    }

+    /// P6-4 (β) plain concatenation — alt + ocr.joined + caption.text
+    /// joined by `\n\n`, dropping empty parts. Verifies all four
+    /// (alt-only, alt+ocr, alt+caption, alt+ocr+caption) shapes.
+    #[test]
+    fn image_ref_p6_4_plain_concat_drops_empty_parts() {
+        use kebab_core::{ModelCaption, OcrText};
+
+        let mk = |alt: &str, ocr: Option<&str>, cap: Option<&str>| {
+            Block::ImageRef(ImageRefBlock {
+                common: common_for("imageref", &[], 0, span(1, 1)),
+                asset_id: None,
+                src: "img.png".into(),
+                alt: alt.into(),
+                ocr: ocr.map(|t| OcrText {
+                    joined: t.into(),
+                    regions: vec![],
+                    engine: "test".into(),
+                    engine_version: "v1".into(),
+                }),
+                caption: cap.map(|t| ModelCaption {
+                    text: t.into(),
+                    model: "m".into(),
+                    model_version: "v".into(),
+                }),
+            })
+        };
+
+        // alt-only — no separators between empty parts.
+        assert_eq!(render_block_text(&mk("photo.png", None, None)), "photo.png");
+
+        // alt + ocr — joined by exactly one `\n\n`.
+        assert_eq!(
+            render_block_text(&mk("photo.png", Some("Hello"), None)),
+            "photo.png\n\nHello"
+        );
+
+        // alt + caption.
+        assert_eq!(
+            render_block_text(&mk("photo.png", None, Some("a red square"))),
+            "photo.png\n\na red square"
+        );
+
+        // alt + ocr + caption — three parts joined by `\n\n` each.
+        assert_eq!(
+            render_block_text(&mk("photo.png", Some("Hello"), Some("a red square"))),
+            "photo.png\n\nHello\n\na red square"
+        );
+
+        // empty alt — falls back to filename derived from `src`.
+        let blk = mk("", Some("text from image"), None);
+        assert_eq!(
+            render_block_text(&blk),
+            "img.png\n\ntext from image",
+            "empty alt must fall back to the basename of `src`"
+        );
+    }
+
    /// ImageRef → own chunk, token_estimate=0.
    #[test]
    fn image_ref_emits_own_chunk_zero_tokens() {