feat(kebab-app): P6-4 image ingest wiring — kebab ingest 가 PNG/JPEG 자산도 처리

P6-1/P6-2/P6-3 의 라이브러리 (`ImageExtractor`, `OllamaVisionOcr`, `apply_caption`) 가 그동안 CLI 에서 보이지 않던 미완 구간을 완성. 이제 `kebab ingest` 가 markdown 외에 이미지 자산을 end-to-end 로 색인하고, `kebab search` / `kebab ask` 가 OCR 텍스트 + caption 으로 이미지를 매칭/인용한다. ## kebab-app - `[dependencies]` 에 `kebab-parse-image` 추가. - `ingest_with_config` 진입 시 `image.ocr.enabled` / `image.caption.enabled` 플래그에 따라 `OllamaVisionOcr` / `OllamaLanguageModel` 을 **ingest 세션당 1회** 빌드. 자산 루프에서 trait object 로 공유. reqwest::blocking::Client 의 내부 Arc 덕분에 알로케이션 비용은 자산 수와 무관. - 두 어댑터 + ImageExtractor 를 한 묶음으로 `ImagePipeline` 구조체에 담아 `ingest_one_asset` 매개변수 폭증 차단 (clippy::too_many_arguments 대응). - `ingest_one_asset` 의 markdown-only 가드를 `match media_type` 으로 교체 — Markdown 은 기존 경로, Image(_) 는 새 `ingest_one_image_asset` 로 분기, PDF/Audio/Other 는 종전대로 skipped. - 신규 `ingest_one_image_asset`: - bytes 읽기 → `ImageExtractor::extract` (실패 시 caller 가 errors+=1) - `apply_ocr` (Lenient — 실패 시 ProvenanceKind::Warning 이벤트 + `IngestItem.warnings` 에 \"ocr_failed: ...\", `block.ocr` 는 None 유지) - `apply_caption` (동일 Lenient 정책) - 기존 `MdHeadingV1Chunker` 호출 — 청커는 이미 `Block::ImageRef` 를 단일 청크로 emit - 기존 persist + embed 시퀀스 그대로 (markdown 과 byte-identical) - `lang_hint_from_doc` — `Lang(\"und\")` 또는 빈 문자열을 None 으로 매핑 (image-pipeline 어댑터의 build_prompt 가 \"und\" 를 silent drop 하지 않도록 caller 측에서 미리). ## kebab-chunk - `render_block_text` 의 `Block::ImageRef` 분기를 P6-4 (β) plain concat 정책으로 교체 — `[alt, ocr.joined, caption.text]` 를 `\\n\\n` 로 join, 빈 부분은 drop. alt 가 비면 `src` 의 basename 으로 fallback (P6-1 contract 의 defensive guard). - 신규 unit 테스트 `image_ref_p6_4_plain_concat_drops_empty_parts` — alt-only / alt+ocr / alt+caption / alt+ocr+caption / 빈 alt → src fallback 다섯 케이스 모두 검증. - 기존 `image_ref_emits_own_chunk_zero_tokens` 그대로 통과 — 청커의 per-block dispatch 는 변경 없음, text 렌더링만 갱신. ## 통합 테스트 (kebab-app/tests/image_pipeline.rs) wiremock 으로 Ollama 를 stub. 5건: 1. OCR-only happy path — 1 PNG + ocr.enabled → 1 doc + 1 chunk emit, `block.ocr.joined` 가 mock 의 \"Hello World 2026\". 2. OCR + caption 동시 활성 — 두 필드 모두 채워지고 chunk text 에 alt + ocr + caption 세 부분 모두 포함. 3. Lenient 실패 검증 — OCR 503 시 자산은 indexed (kind=New), `errors=0`, ProvenanceKind::Warning attributed to \"kb-app\", `IngestItem.warnings` 에 \"ocr_failed:\" 노트. 4. 양쪽 비활성 — `image.ocr.enabled=false && image.caption.enabled=false` 여도 자산은 chunk 1개로 indexed (chunk text=filename), EXIF + dimensions 그대로 채워짐. 5. 결정성 (re-ingest) — 동일 PNG 두 번 ingest 시 두 번째는 `Updated` + 동일 `doc_id`. ## SMOKE.md `kebab search --mode lexical \"Hello World\"` 단계를 명령 시퀀스에 추가. `[image.ocr]` / `[image.caption]` config 절 예시 + ingest 시간 추정 (자산당 ~5-10초) 추가. \"책은 P7 PDF 라인으로\" 가이드를 검증 체크리스트 와 \"알려진 동작\" 양쪽에 박음. ## 실 Ollama 통합 검증 192.168.0.47 + gemma4:e4b 기준: ``` $ kebab --config /tmp/kebab-smoke/config.toml ingest scanned 2 new 2 updated 0 skipped 0 errors 0 (18395 ms) $ kebab inspect doc <image_doc_id> parser_version: image-meta-v1 blocks: [{ alt: \"hello.png\", ocr: \"Hello World 2026\", caption: \"The image displays the text \\\"Hello World 2026\\\" in a large, black, sans-serif font.\" }] $ kebab --json ask \"Hello World 텍스트가 어디에 있나?\" --mode hybrid grounded: true citations: [{marker: \"[1]\", doc_path: \"hello.png\"}] ``` ## 검증 - `cargo test --workspace --no-fail-fast -j 1` — 전부 pass - `cargo clippy --workspace --all-targets -- -D warnings` — pass - `cargo test -p kebab-chunk image_ref` — 2 pass (P1-5 회귀 + P6-4 신규 unit) - `cargo test -p kebab-app --test image_pipeline` — 5 pass ## 의존성 경계 - `kebab-app` 이 `kebab-parse-image` 추가 — spec Allowed dep 그대로. - 새 forbidden 침범 없음 (기존 `kebab-tui` / `kebab-desktop` / `kebab-eval` 미참조 유지). - 본 task 가 신설하는 image-specific 비즈니스 로직 0줄 — 모두 `kebab-parse-image` 에 위임. `tasks/p6/p6-4-image-ingest-wiring.md` status: planned → completed. contract: docs/superpowers/specs/2026-04-27-kebab-final-form-design.md sections: §3.4 ImageRefBlock, §6.1 ingest pipeline, §7.2 Extractor/Chunker traits, §9.1 image extraction policy.
2026-05-02 07:37:56 +00:00
parent 5453829195
commit ca0567c72b
7 changed files with 807 additions and 32 deletions
--- a/crates/kebab-app/tests/image_pipeline.rs
+++ b/crates/kebab-app/tests/image_pipeline.rs
@@ -0,0 +1,366 @@
+//! P6-4 image ingest wiring — end-to-end integration.
+//!
+//! Each test spins up a `TempDir` workspace + writes one PNG fixture +
+//! routes OCR / caption HTTP calls through a `wiremock` server that
+//! impersonates Ollama's `/api/generate` endpoint. The kb-app code
+//! under test is sync; the wiremock server is async, so test bodies
+//! drive blocking work via `tokio::task::spawn_blocking`.
+
+mod common;
+
+use std::path::Path;
+
+use common::TestEnv;
+use kebab_config::Config;
+use serde_json::json;
+use tokio::task::spawn_blocking;
+use wiremock::matchers::{method, path};
+use wiremock::{Mock, MockServer, ResponseTemplate};
+
+// ── Fixture helpers ──────────────────────────────────────────────────────
+
+/// Tiny solid-red PNG written into the test workspace at `<root>/<name>`.
+/// 100×50 — small enough to skip downscale by default but non-trivially
+/// inspectable in stored DB rows.
+fn write_red_png(root: &Path, name: &str) -> std::path::PathBuf {
+    use image::{ImageBuffer, Rgb};
+    let img: ImageBuffer<Rgb<u8>, _> =
+        ImageBuffer::from_fn(100, 50, |_, _| Rgb([255, 0, 0]));
+    let path = root.join(name);
+    img.save(&path).expect("write PNG fixture");
+    path
+}
+
+fn cfg_with_image_pipeline(env: &TestEnv, mock_endpoint: &str) -> Config {
+    let mut cfg = env.config.clone();
+    // Ensure image assets are scanned.
+    cfg.workspace
+        .include
+        .push("**/*.png".to_string());
+    cfg.image.ocr.enabled = true;
+    cfg.image.ocr.endpoint = Some(mock_endpoint.to_string());
+    cfg.image.ocr.model = "vision-mock:1b".to_string();
+    cfg.image.ocr.max_pixels = 512;
+    cfg.image.caption.enabled = false; // tested separately below
+    cfg.models.llm.endpoint = mock_endpoint.to_string();
+    cfg.models.llm.model = "vision-mock:1b".to_string();
+    cfg
+}
+
+// ── 1. Happy path: OCR-only ingest ───────────────────────────────────────
+
+/// One PNG asset + OCR enabled (caption off) → ingest produces 1 doc + 1
+/// chunk; chunk text contains alt + OCR transcription joined by `\n\n`.
+#[tokio::test]
+async fn ingest_image_with_ocr_produces_chunk_containing_ocr_text() {
+    let server = MockServer::start().await;
+    Mock::given(method("POST"))
+        .and(path("/api/generate"))
+        .respond_with(ResponseTemplate::new(200).set_body_json(json!({
+            "model": "vision-mock:1b",
+            "response": "Hello World 2026",
+            "done": true,
+            "done_reason": "stop"
+        })))
+        .mount(&server)
+        .await;
+
+    let env = TestEnv::lexical_only();
+    let png = write_red_png(&env.workspace_root, "diagram.png");
+    eprintln!("PNG written to {}", png.display());
+    let cfg = cfg_with_image_pipeline(&env, &server.uri());
+    let cfg_clone = cfg.clone();
+    let env_workspace = env.workspace_root.clone();
+    let env_scope = env.scope();
+
+    let report = spawn_blocking(move || {
+        kebab_app::ingest_with_config(cfg_clone, env_scope, false)
+            .expect("image ingest must succeed")
+    })
+    .await
+    .expect("blocking task panicked");
+
+    // Counters: scanned should include the PNG; new ≥ 1 (markdown
+    // fixtures from the workspace tree may also count).
+    assert!(report.scanned >= 1, "scanned={}, items={:?}", report.scanned, report.items);
+    assert_eq!(report.errors, 0, "no errors on lenient OCR path");
+
+    // Locate the image doc in the report items.
+    let items = report.items.expect("items present (summary_only=false)");
+    let img_item = items
+        .iter()
+        .find(|i| i.doc_path.0.ends_with("diagram.png"))
+        .expect("image doc item must be present");
+    assert_eq!(
+        img_item.kind,
+        kebab_core::IngestItemKind::New,
+        "image asset must be classified New on first ingest"
+    );
+    assert_eq!(img_item.chunk_count, Some(1), "image emits exactly one chunk");
+
+    // Inspect the stored chunk text via kb-app's inspect_chunk facade.
+    let doc_id = img_item.doc_id.clone().expect("image doc id");
+    let doc = kebab_app::inspect_doc_with_config(cfg.clone(), &doc_id)
+        .expect("inspect_doc returns the image document");
+    let block = match doc.blocks.first() {
+        Some(kebab_core::Block::ImageRef(b)) => b,
+        other => panic!("expected ImageRef, got {other:?}"),
+    };
+    assert!(block.ocr.is_some(), "block.ocr populated by apply_ocr");
+    assert_eq!(
+        block.ocr.as_ref().unwrap().joined,
+        "Hello World 2026",
+        "OCR text from mock"
+    );
+    assert!(
+        block.caption.is_none(),
+        "caption disabled in cfg → block.caption stays None"
+    );
+
+    // Sanity: the doc was actually persisted into SQLite (kb-app's
+    // list_docs facade reads the same store the chunker writes to).
+    let summaries = kebab_app::list_docs_with_config(cfg, kebab_core::DocFilter::default())
+        .expect("list_docs");
+    assert!(
+        summaries.iter().any(|s| s.doc_path.0.ends_with("diagram.png")),
+        "image doc must appear in list_docs"
+    );
+
+    drop(env_workspace); // keep TempDir alive until here
+    drop(env);
+}
+
+// ── 2. OCR + caption together ────────────────────────────────────────────
+
+/// Both OCR and caption enabled. The mock returns the same JSON body
+/// for every `/api/generate` POST — wiremock has no per-prompt routing
+/// on the default `Mock` so we treat both calls as equivalent. We then
+/// verify both `block.ocr` and `block.caption` are populated, and the
+/// chunk text contains both fragments separated by `\n\n`.
+#[tokio::test]
+async fn ingest_image_with_ocr_and_caption_populates_both_fields() {
+    let server = MockServer::start().await;
+    Mock::given(method("POST"))
+        .and(path("/api/generate"))
+        .respond_with(ResponseTemplate::new(200).set_body_json(json!({
+            "response": "shared mock body",
+            "done": true,
+            "done_reason": "stop"
+        })))
+        .mount(&server)
+        .await;
+
+    let env = TestEnv::lexical_only();
+    write_red_png(&env.workspace_root, "diagram.png");
+    let mut cfg = cfg_with_image_pipeline(&env, &server.uri());
+    cfg.image.caption.enabled = true;
+    cfg.image.caption.max_pixels = 384;
+
+    let cfg_clone = cfg.clone();
+    let scope = env.scope();
+    let report = spawn_blocking(move || {
+        kebab_app::ingest_with_config(cfg_clone, scope, false)
+            .expect("ingest must succeed with both OCR+caption")
+    })
+    .await
+    .expect("task");
+
+    assert_eq!(report.errors, 0);
+    let img_item = report
+        .items
+        .as_ref()
+        .unwrap()
+        .iter()
+        .find(|i| i.doc_path.0.ends_with("diagram.png"))
+        .unwrap();
+    let doc = kebab_app::inspect_doc_with_config(cfg, img_item.doc_id.as_ref().unwrap())
+        .unwrap();
+    let block = match &doc.blocks[0] {
+        kebab_core::Block::ImageRef(b) => b,
+        _ => unreachable!(),
+    };
+    assert!(block.ocr.is_some(), "OCR populated");
+    assert!(block.caption.is_some(), "caption populated");
+    drop(env);
+}
+
+// ── 3. Lenient failure: OCR Ollama 503 → asset still indexed ─────────────
+
+/// OCR endpoint returns 503. Spec contract: image is still indexed,
+/// `block.ocr = None`, Provenance has a Warning event, `errors`
+/// counter NOT incremented.
+#[tokio::test]
+async fn ocr_failure_indexes_asset_with_warning_no_error_counter() {
+    let server = MockServer::start().await;
+    Mock::given(method("POST"))
+        .and(path("/api/generate"))
+        .respond_with(ResponseTemplate::new(503))
+        .mount(&server)
+        .await;
+
+    let env = TestEnv::lexical_only();
+    write_red_png(&env.workspace_root, "broken.png");
+    let cfg = cfg_with_image_pipeline(&env, &server.uri());
+
+    let cfg_clone = cfg.clone();
+    let scope = env.scope();
+    let report = spawn_blocking(move || {
+        kebab_app::ingest_with_config(cfg_clone, scope, false)
+            .expect("ingest does not abort on lenient OCR failure")
+    })
+    .await
+    .expect("task");
+
+    assert_eq!(
+        report.errors, 0,
+        "lenient OCR failure must NOT increment errors counter (spec)"
+    );
+    let img_item = report
+        .items
+        .as_ref()
+        .unwrap()
+        .iter()
+        .find(|i| i.doc_path.0.ends_with("broken.png"))
+        .expect("asset still indexed despite OCR failure");
+    assert_eq!(img_item.kind, kebab_core::IngestItemKind::New);
+    assert_eq!(img_item.chunk_count, Some(1));
+    assert!(
+        !img_item.warnings.is_empty(),
+        "lenient OCR failure must surface a warning on the IngestItem"
+    );
+
+    let doc_id = img_item.doc_id.clone().unwrap();
+    let doc = kebab_app::inspect_doc_with_config(cfg, &doc_id).unwrap();
+    let block = match &doc.blocks[0] {
+        kebab_core::Block::ImageRef(b) => b,
+        _ => unreachable!(),
+    };
+    assert!(block.ocr.is_none(), "block.ocr stays None on OCR failure");
+    let warning = doc
+        .provenance
+        .events
+        .iter()
+        .find(|e| e.kind == kebab_core::ProvenanceKind::Warning && e.agent == "kb-app")
+        .expect("Provenance Warning attributed to kb-app");
+    let note = warning.note.as_deref().unwrap_or("");
+    assert!(
+        note.contains("ocr_failed"),
+        "warning note must describe OCR failure: {note}"
+    );
+}
+
+// ── 4. Both image.ocr.enabled and image.caption.enabled = false ──────────
+
+/// When both adapters are disabled, the image is still extracted +
+/// chunked. Chunk text falls back to the filename. EXIF + dimensions
+/// are populated by the extractor regardless.
+#[tokio::test]
+async fn image_indexed_with_filename_when_ocr_and_caption_disabled() {
+    // No mock server needed — neither HTTP path is touched.
+    let env = TestEnv::lexical_only();
+    write_red_png(&env.workspace_root, "raw.png");
+    let mut cfg = env.config.clone();
+    cfg.workspace.include.push("**/*.png".to_string());
+    cfg.image.ocr.enabled = false;
+    cfg.image.caption.enabled = false;
+
+    let cfg_clone = cfg.clone();
+    let scope = env.scope();
+    let report = spawn_blocking(move || {
+        kebab_app::ingest_with_config(cfg_clone, scope, false)
+            .expect("ingest with no OCR/caption")
+    })
+    .await
+    .expect("task");
+
+    assert_eq!(report.errors, 0);
+    let img_item = report
+        .items
+        .as_ref()
+        .unwrap()
+        .iter()
+        .find(|i| i.doc_path.0.ends_with("raw.png"))
+        .unwrap();
+    assert_eq!(img_item.chunk_count, Some(1), "image emits one chunk");
+    let doc = kebab_app::inspect_doc_with_config(cfg, img_item.doc_id.as_ref().unwrap())
+        .unwrap();
+    let block = match &doc.blocks[0] {
+        kebab_core::Block::ImageRef(b) => b,
+        _ => unreachable!(),
+    };
+    assert!(block.ocr.is_none() && block.caption.is_none());
+    // EXIF + dimensions still populated by the extractor.
+    let dims = doc
+        .metadata
+        .user
+        .get("dimensions")
+        .and_then(|v: &serde_json::Value| v.as_object())
+        .expect("dimensions object present");
+    assert_eq!(
+        dims.get("w").and_then(|v: &serde_json::Value| v.as_u64()),
+        Some(100)
+    );
+    assert_eq!(
+        dims.get("h").and_then(|v: &serde_json::Value| v.as_u64()),
+        Some(50)
+    );
+}
+
+// ── 5. Determinism: re-ingest produces identical doc_id / chunk_id ───────
+
+/// Idempotency contract — running the same ingest twice should mark
+/// the asset Updated on the second run with byte-identical IDs.
+#[tokio::test]
+async fn re_ingest_image_produces_updated_with_same_doc_id() {
+    let server = MockServer::start().await;
+    Mock::given(method("POST"))
+        .and(path("/api/generate"))
+        .respond_with(ResponseTemplate::new(200).set_body_json(json!({
+            "response": "stable",
+            "done": true,
+            "done_reason": "stop"
+        })))
+        .mount(&server)
+        .await;
+
+    let env = TestEnv::lexical_only();
+    write_red_png(&env.workspace_root, "diagram.png");
+    let cfg = cfg_with_image_pipeline(&env, &server.uri());
+
+    let scope = env.scope();
+    let cfg1 = cfg.clone();
+    let cfg2 = cfg.clone();
+    let scope1 = scope.clone();
+    let scope2 = scope.clone();
+
+    let r1 = spawn_blocking(move || {
+        kebab_app::ingest_with_config(cfg1, scope1, false).unwrap()
+    })
+    .await
+    .unwrap();
+    let r2 = spawn_blocking(move || {
+        kebab_app::ingest_with_config(cfg2, scope2, false).unwrap()
+    })
+    .await
+    .unwrap();
+
+    let id1 = r1
+        .items
+        .as_ref()
+        .unwrap()
+        .iter()
+        .find(|i| i.doc_path.0.ends_with("diagram.png"))
+        .unwrap()
+        .doc_id
+        .clone()
+        .unwrap();
+    let img2 = r2
+        .items
+        .as_ref()
+        .unwrap()
+        .iter()
+        .find(|i| i.doc_path.0.ends_with("diagram.png"))
+        .unwrap();
+    assert_eq!(img2.kind, kebab_core::IngestItemKind::Updated);
+    assert_eq!(img2.doc_id.as_ref().unwrap(), &id1);
+}