feat(kebab-app): P6-4 image ingest wiring — kebab ingest 가 PNG/JPEG 자산도 처리
P6-1/P6-2/P6-3 의 라이브러리 (`ImageExtractor`, `OllamaVisionOcr`,
`apply_caption`) 가 그동안 CLI 에서 보이지 않던 미완 구간을 완성.
이제 `kebab ingest` 가 markdown 외에 이미지 자산을 end-to-end 로
색인하고, `kebab search` / `kebab ask` 가 OCR 텍스트 + caption 으로
이미지를 매칭/인용한다.
## kebab-app
- `[dependencies]` 에 `kebab-parse-image` 추가.
- `ingest_with_config` 진입 시 `image.ocr.enabled` / `image.caption.enabled`
플래그에 따라 `OllamaVisionOcr` / `OllamaLanguageModel` 을 **ingest
세션당 1회** 빌드. 자산 루프에서 trait object 로 공유.
reqwest::blocking::Client 의 내부 Arc 덕분에 알로케이션 비용은
자산 수와 무관.
- 두 어댑터 + ImageExtractor 를 한 묶음으로 `ImagePipeline` 구조체에
담아 `ingest_one_asset` 매개변수 폭증 차단 (clippy::too_many_arguments
대응).
- `ingest_one_asset` 의 markdown-only 가드를 `match media_type` 으로
교체 — Markdown 은 기존 경로, Image(_) 는 새 `ingest_one_image_asset`
로 분기, PDF/Audio/Other 는 종전대로 skipped.
- 신규 `ingest_one_image_asset`:
- bytes 읽기 → `ImageExtractor::extract` (실패 시 caller 가 errors+=1)
- `apply_ocr` (Lenient — 실패 시 ProvenanceKind::Warning 이벤트 +
`IngestItem.warnings` 에 \"ocr_failed: ...\", `block.ocr` 는 None
유지)
- `apply_caption` (동일 Lenient 정책)
- 기존 `MdHeadingV1Chunker` 호출 — 청커는 이미 `Block::ImageRef` 를
단일 청크로 emit
- 기존 persist + embed 시퀀스 그대로 (markdown 과 byte-identical)
- `lang_hint_from_doc` — `Lang(\"und\")` 또는 빈 문자열을 None 으로
매핑 (image-pipeline 어댑터의 build_prompt 가 \"und\" 를 silent drop
하지 않도록 caller 측에서 미리).
## kebab-chunk
- `render_block_text` 의 `Block::ImageRef` 분기를 P6-4 (β) plain
concat 정책으로 교체 — `[alt, ocr.joined, caption.text]` 를 `\\n\\n`
로 join, 빈 부분은 drop. alt 가 비면 `src` 의 basename 으로 fallback
(P6-1 contract 의 defensive guard).
- 신규 unit 테스트 `image_ref_p6_4_plain_concat_drops_empty_parts` —
alt-only / alt+ocr / alt+caption / alt+ocr+caption / 빈 alt → src
fallback 다섯 케이스 모두 검증.
- 기존 `image_ref_emits_own_chunk_zero_tokens` 그대로 통과 — 청커의
per-block dispatch 는 변경 없음, text 렌더링만 갱신.
## 통합 테스트 (kebab-app/tests/image_pipeline.rs)
wiremock 으로 Ollama 를 stub. 5건:
1. OCR-only happy path — 1 PNG + ocr.enabled → 1 doc + 1 chunk emit,
`block.ocr.joined` 가 mock 의 \"Hello World 2026\".
2. OCR + caption 동시 활성 — 두 필드 모두 채워지고 chunk text 에
alt + ocr + caption 세 부분 모두 포함.
3. Lenient 실패 검증 — OCR 503 시 자산은 indexed (kind=New),
`errors=0`, ProvenanceKind::Warning attributed to \"kb-app\",
`IngestItem.warnings` 에 \"ocr_failed:\" 노트.
4. 양쪽 비활성 — `image.ocr.enabled=false && image.caption.enabled=false`
여도 자산은 chunk 1개로 indexed (chunk text=filename), EXIF +
dimensions 그대로 채워짐.
5. 결정성 (re-ingest) — 동일 PNG 두 번 ingest 시 두 번째는
`Updated` + 동일 `doc_id`.
## SMOKE.md
`kebab search --mode lexical \"Hello World\"` 단계를 명령 시퀀스에
추가. `[image.ocr]` / `[image.caption]` config 절 예시 + ingest 시간
추정 (자산당 ~5-10초) 추가. \"책은 P7 PDF 라인으로\" 가이드를 검증
체크리스트 와 \"알려진 동작\" 양쪽에 박음.
## 실 Ollama 통합 검증
192.168.0.47 + gemma4:e4b 기준:
```
$ kebab --config /tmp/kebab-smoke/config.toml ingest
scanned 2 new 2 updated 0 skipped 0 errors 0 (18395 ms)
$ kebab inspect doc <image_doc_id>
parser_version: image-meta-v1
blocks: [{
alt: \"hello.png\",
ocr: \"Hello World 2026\",
caption: \"The image displays the text \\\"Hello World 2026\\\" in a large, black, sans-serif font.\"
}]
$ kebab --json ask \"Hello World 텍스트가 어디에 있나?\" --mode hybrid
grounded: true
citations: [{marker: \"[1]\", doc_path: \"hello.png\"}]
```
## 검증
- `cargo test --workspace --no-fail-fast -j 1` — 전부 pass
- `cargo clippy --workspace --all-targets -- -D warnings` — pass
- `cargo test -p kebab-chunk image_ref` — 2 pass (P1-5 회귀 + P6-4
신규 unit)
- `cargo test -p kebab-app --test image_pipeline` — 5 pass
## 의존성 경계
- `kebab-app` 이 `kebab-parse-image` 추가 — spec Allowed dep 그대로.
- 새 forbidden 침범 없음 (기존 `kebab-tui` / `kebab-desktop` /
`kebab-eval` 미참조 유지).
- 본 task 가 신설하는 image-specific 비즈니스 로직 0줄 — 모두
`kebab-parse-image` 에 위임.
`tasks/p6/p6-4-image-ingest-wiring.md` status: planned → completed.
contract: docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
sections: §3.4 ImageRefBlock, §6.1 ingest pipeline, §7.2
Extractor/Chunker traits, §9.1 image extraction policy.
This commit is contained in:
4
Cargo.lock
generated
4
Cargo.lock
generated
@@ -3397,6 +3397,7 @@ dependencies = [
|
|||||||
"anyhow",
|
"anyhow",
|
||||||
"blake3",
|
"blake3",
|
||||||
"dirs 5.0.1",
|
"dirs 5.0.1",
|
||||||
|
"image",
|
||||||
"kebab-chunk",
|
"kebab-chunk",
|
||||||
"kebab-config",
|
"kebab-config",
|
||||||
"kebab-core",
|
"kebab-core",
|
||||||
@@ -3405,6 +3406,7 @@ dependencies = [
|
|||||||
"kebab-llm",
|
"kebab-llm",
|
||||||
"kebab-llm-local",
|
"kebab-llm-local",
|
||||||
"kebab-normalize",
|
"kebab-normalize",
|
||||||
|
"kebab-parse-image",
|
||||||
"kebab-parse-md",
|
"kebab-parse-md",
|
||||||
"kebab-parse-types",
|
"kebab-parse-types",
|
||||||
"kebab-rag",
|
"kebab-rag",
|
||||||
@@ -3417,10 +3419,12 @@ dependencies = [
|
|||||||
"serde_json",
|
"serde_json",
|
||||||
"tempfile",
|
"tempfile",
|
||||||
"time",
|
"time",
|
||||||
|
"tokio",
|
||||||
"toml",
|
"toml",
|
||||||
"tracing",
|
"tracing",
|
||||||
"tracing-appender",
|
"tracing-appender",
|
||||||
"tracing-subscriber",
|
"tracing-subscriber",
|
||||||
|
"wiremock",
|
||||||
]
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
|
|||||||
@@ -23,6 +23,11 @@ kebab-embed-local = { path = "../kebab-embed-local" }
|
|||||||
kebab-llm = { path = "../kebab-llm" }
|
kebab-llm = { path = "../kebab-llm" }
|
||||||
kebab-llm-local = { path = "../kebab-llm-local" }
|
kebab-llm-local = { path = "../kebab-llm-local" }
|
||||||
kebab-rag = { path = "../kebab-rag" }
|
kebab-rag = { path = "../kebab-rag" }
|
||||||
|
# P6-4: image extractor + OCR + caption adapters live here. App
|
||||||
|
# threads them into the per-asset dispatch (see `ingest_one_asset`
|
||||||
|
# image branch). Trait-only consumption — no `kebab-parse-image`
|
||||||
|
# internals leak into kb-app code.
|
||||||
|
kebab-parse-image = { path = "../kebab-parse-image" }
|
||||||
anyhow = { workspace = true }
|
anyhow = { workspace = true }
|
||||||
blake3 = { workspace = true }
|
blake3 = { workspace = true }
|
||||||
serde = { workspace = true }
|
serde = { workspace = true }
|
||||||
@@ -37,3 +42,9 @@ dirs = "5"
|
|||||||
[dev-dependencies]
|
[dev-dependencies]
|
||||||
rusqlite = { workspace = true }
|
rusqlite = { workspace = true }
|
||||||
tempfile = { workspace = true }
|
tempfile = { workspace = true }
|
||||||
|
# Image-pipeline integration tests use wiremock to stub Ollama for OCR
|
||||||
|
# / caption HTTP calls. Async runtime to host the mock server only;
|
||||||
|
# the kb-app code under test stays sync.
|
||||||
|
wiremock = { workspace = true }
|
||||||
|
tokio = { workspace = true, features = ["rt-multi-thread"] }
|
||||||
|
image = { version = "0.25", default-features = false, features = ["png"] }
|
||||||
|
|||||||
@@ -41,12 +41,15 @@ use serde::{Deserialize, Serialize};
|
|||||||
|
|
||||||
use kebab_chunk::MdHeadingV1Chunker;
|
use kebab_chunk::MdHeadingV1Chunker;
|
||||||
use kebab_core::{
|
use kebab_core::{
|
||||||
Answer, CanonicalDocument, Chunk, ChunkId, ChunkPolicy, ChunkerVersion, Chunker,
|
Answer, Block, CanonicalDocument, Chunk, ChunkId, ChunkPolicy, ChunkerVersion, Chunker,
|
||||||
DocFilter, DocSummary, DocumentId, DocumentStore, Embedder, EmbeddingInput,
|
DocFilter, DocSummary, DocumentId, DocumentStore, Embedder, EmbeddingInput,
|
||||||
EmbeddingKind, IngestReport, ParserVersion, RawAsset, SearchHit, SearchQuery,
|
EmbeddingKind, ExtractContext, Extractor, IngestReport, Lang, LanguageModel, MediaType,
|
||||||
SourceConnector, SourceScope, SourceUri, VectorRecord, VectorStore,
|
ParserVersion, RawAsset, SearchHit, SearchQuery, SourceConnector, SourceScope,
|
||||||
|
SourceUri, VectorRecord, VectorStore,
|
||||||
};
|
};
|
||||||
|
use kebab_llm_local::OllamaLanguageModel;
|
||||||
use kebab_normalize::build_canonical_document;
|
use kebab_normalize::build_canonical_document;
|
||||||
|
use kebab_parse_image::{ImageExtractor, OllamaVisionOcr, apply_caption, apply_ocr};
|
||||||
use kebab_parse_md::{BodyHints, parse_blocks, parse_frontmatter};
|
use kebab_parse_md::{BodyHints, parse_blocks, parse_frontmatter};
|
||||||
use kebab_source_fs::FsSourceConnector;
|
use kebab_source_fs::FsSourceConnector;
|
||||||
|
|
||||||
@@ -190,6 +193,35 @@ pub fn ingest_with_config(
|
|||||||
let parser_version = ParserVersion(KEBAB_PARSE_MD_VERSION.to_string());
|
let parser_version = ParserVersion(KEBAB_PARSE_MD_VERSION.to_string());
|
||||||
let chunk_policy = chunk_policy_from_config(&app.config);
|
let chunk_policy = chunk_policy_from_config(&app.config);
|
||||||
|
|
||||||
|
// P6-4: build OCR / caption adapters once per ingest invocation,
|
||||||
|
// gated on their respective `enabled` flags. `reqwest::blocking::Client`
|
||||||
|
// is internally Arc-shared so reusing one instance across the asset
|
||||||
|
// loop is correct and cheap. Construction failure (e.g. invalid
|
||||||
|
// endpoint) aborts ingest fail-fast — better than silently disabling
|
||||||
|
// OCR/caption mid-run.
|
||||||
|
let ocr_engine: Option<OllamaVisionOcr> = if app.config.image.ocr.enabled {
|
||||||
|
Some(
|
||||||
|
OllamaVisionOcr::new(&app.config)
|
||||||
|
.context("kb-app::ingest: build OllamaVisionOcr")?,
|
||||||
|
)
|
||||||
|
} else {
|
||||||
|
None
|
||||||
|
};
|
||||||
|
let caption_llm: Option<Box<dyn LanguageModel>> = if app.config.image.caption.enabled {
|
||||||
|
Some(Box::new(
|
||||||
|
OllamaLanguageModel::new(&app.config)
|
||||||
|
.context("kb-app::ingest: build OllamaLanguageModel for caption")?,
|
||||||
|
))
|
||||||
|
} else {
|
||||||
|
None
|
||||||
|
};
|
||||||
|
let image_extractor = ImageExtractor::new();
|
||||||
|
let image_pipeline = ImagePipeline {
|
||||||
|
extractor: &image_extractor,
|
||||||
|
ocr_engine: ocr_engine.as_ref(),
|
||||||
|
caption_llm: caption_llm.as_deref(),
|
||||||
|
};
|
||||||
|
|
||||||
// Pre-load every existing doc_id so we can label `IngestItem.kind`
|
// Pre-load every existing doc_id so we can label `IngestItem.kind`
|
||||||
// as `New` vs `Updated` correctly. `list_documents` returns one
|
// as `New` vs `Updated` correctly. `list_documents` returns one
|
||||||
// row per `(workspace_path, asset_id)` — index by the deterministic
|
// row per `(workspace_path, asset_id)` — index by the deterministic
|
||||||
@@ -230,6 +262,7 @@ pub fn ingest_with_config(
|
|||||||
embedder.as_ref(),
|
embedder.as_ref(),
|
||||||
vector_store.as_ref(),
|
vector_store.as_ref(),
|
||||||
&existing_doc_ids,
|
&existing_doc_ids,
|
||||||
|
&image_pipeline,
|
||||||
);
|
);
|
||||||
|
|
||||||
let item = match item {
|
let item = match item {
|
||||||
@@ -438,6 +471,16 @@ type SqliteStoreAlias = kebab_store_sqlite::SqliteStore;
|
|||||||
/// persist, embed. Per-asset failures bubble up to the caller for
|
/// persist, embed. Per-asset failures bubble up to the caller for
|
||||||
/// labelling as `IngestItemKind::Error` — they do NOT abort the
|
/// labelling as `IngestItemKind::Error` — they do NOT abort the
|
||||||
/// whole run.
|
/// whole run.
|
||||||
|
/// P6-4: borrowed bundle of the three image-pipeline components built
|
||||||
|
/// once per ingest invocation. Threaded through `ingest_one_asset` so
|
||||||
|
/// the dispatch does not need ten separate parameters.
|
||||||
|
struct ImagePipeline<'a> {
|
||||||
|
extractor: &'a ImageExtractor,
|
||||||
|
ocr_engine: Option<&'a OllamaVisionOcr>,
|
||||||
|
caption_llm: Option<&'a dyn LanguageModel>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[allow(clippy::too_many_arguments)]
|
||||||
fn ingest_one_asset(
|
fn ingest_one_asset(
|
||||||
app: &App,
|
app: &App,
|
||||||
asset: &RawAsset,
|
asset: &RawAsset,
|
||||||
@@ -446,27 +489,47 @@ fn ingest_one_asset(
|
|||||||
embedder: Option<&Arc<dyn Embedder + Send + Sync>>,
|
embedder: Option<&Arc<dyn Embedder + Send + Sync>>,
|
||||||
vector_store: Option<&Arc<kebab_store_vector::LanceVectorStore>>,
|
vector_store: Option<&Arc<kebab_store_vector::LanceVectorStore>>,
|
||||||
existing_doc_ids: &std::collections::HashSet<String>,
|
existing_doc_ids: &std::collections::HashSet<String>,
|
||||||
|
image_pipeline: &ImagePipeline<'_>,
|
||||||
) -> anyhow::Result<kebab_core::IngestItem> {
|
) -> anyhow::Result<kebab_core::IngestItem> {
|
||||||
tracing::debug!(
|
tracing::debug!(
|
||||||
target: "kebab-app::ingest",
|
target: "kebab-app::ingest",
|
||||||
path = %asset.workspace_path.0,
|
path = %asset.workspace_path.0,
|
||||||
|
media_type = ?asset.media_type,
|
||||||
"processing asset"
|
"processing asset"
|
||||||
);
|
);
|
||||||
// Only handle Markdown for now; other media types are P6+ work.
|
// P6-4: dispatch on media_type. Markdown takes the existing
|
||||||
if asset.media_type != kebab_core::MediaType::Markdown {
|
// parse-md / normalize path; image takes the new
|
||||||
return Ok(kebab_core::IngestItem {
|
// ImageExtractor + (optional) OCR + (optional) caption path.
|
||||||
kind: kebab_core::IngestItemKind::Skipped,
|
// Anything else (PDF, audio, unknown) is skipped — the
|
||||||
doc_id: None,
|
// respective phases (P7 / P8) wire them in later.
|
||||||
doc_path: asset.workspace_path.clone(),
|
match &asset.media_type {
|
||||||
asset_id: Some(asset.asset_id.clone()),
|
MediaType::Markdown => { /* fall through to markdown path */ }
|
||||||
byte_len: Some(asset.byte_len),
|
MediaType::Image(_) => {
|
||||||
block_count: None,
|
return ingest_one_image_asset(
|
||||||
chunk_count: None,
|
app,
|
||||||
parser_version: None,
|
asset,
|
||||||
chunker_version: None,
|
chunk_policy,
|
||||||
warnings: Vec::new(),
|
embedder,
|
||||||
error: None,
|
vector_store,
|
||||||
});
|
existing_doc_ids,
|
||||||
|
image_pipeline,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
_ => {
|
||||||
|
return Ok(kebab_core::IngestItem {
|
||||||
|
kind: kebab_core::IngestItemKind::Skipped,
|
||||||
|
doc_id: None,
|
||||||
|
doc_path: asset.workspace_path.clone(),
|
||||||
|
asset_id: Some(asset.asset_id.clone()),
|
||||||
|
byte_len: Some(asset.byte_len),
|
||||||
|
block_count: None,
|
||||||
|
chunk_count: None,
|
||||||
|
parser_version: None,
|
||||||
|
chunker_version: None,
|
||||||
|
warnings: Vec::new(),
|
||||||
|
error: None,
|
||||||
|
});
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
let path = match &asset.source_uri {
|
let path = match &asset.source_uri {
|
||||||
@@ -612,6 +675,228 @@ fn ingest_one_asset(
|
|||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// P6-4: process one `MediaType::Image(_)` asset end-to-end.
|
||||||
|
///
|
||||||
|
/// Pipeline: read bytes → `ImageExtractor::extract` → optional
|
||||||
|
/// `apply_ocr` → optional `apply_caption` → existing chunker / embedder
|
||||||
|
/// / store path (the same one markdown uses, which already handles
|
||||||
|
/// `Block::ImageRef` per P1-5).
|
||||||
|
///
|
||||||
|
/// Failure semantics (per P6-4 spec):
|
||||||
|
/// - `ImageExtractor::extract` Err → propagate (caller increments
|
||||||
|
/// `errors`).
|
||||||
|
/// - OCR / caption Err → log + `Provenance::Warning` event, continue.
|
||||||
|
/// `block.ocr` / `block.caption` stay `None`. `errors` NOT incremented.
|
||||||
|
#[allow(clippy::too_many_arguments)]
|
||||||
|
fn ingest_one_image_asset(
|
||||||
|
app: &App,
|
||||||
|
asset: &RawAsset,
|
||||||
|
chunk_policy: &ChunkPolicy,
|
||||||
|
embedder: Option<&Arc<dyn Embedder + Send + Sync>>,
|
||||||
|
vector_store: Option<&Arc<kebab_store_vector::LanceVectorStore>>,
|
||||||
|
existing_doc_ids: &std::collections::HashSet<String>,
|
||||||
|
image_pipeline: &ImagePipeline<'_>,
|
||||||
|
) -> anyhow::Result<kebab_core::IngestItem> {
|
||||||
|
let image_extractor = image_pipeline.extractor;
|
||||||
|
let ocr_engine = image_pipeline.ocr_engine;
|
||||||
|
let caption_llm = image_pipeline.caption_llm;
|
||||||
|
let path = match &asset.source_uri {
|
||||||
|
SourceUri::File(p) => p.clone(),
|
||||||
|
SourceUri::Kb(_) => {
|
||||||
|
return Ok(kebab_core::IngestItem {
|
||||||
|
kind: kebab_core::IngestItemKind::Skipped,
|
||||||
|
doc_id: None,
|
||||||
|
doc_path: asset.workspace_path.clone(),
|
||||||
|
asset_id: Some(asset.asset_id.clone()),
|
||||||
|
byte_len: Some(asset.byte_len),
|
||||||
|
block_count: None,
|
||||||
|
chunk_count: None,
|
||||||
|
parser_version: None,
|
||||||
|
chunker_version: None,
|
||||||
|
warnings: vec![
|
||||||
|
"kb:// source URIs are not supported by the fs ingester".into(),
|
||||||
|
],
|
||||||
|
error: None,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
};
|
||||||
|
let bytes = std::fs::read(&path)
|
||||||
|
.with_context(|| format!("read image asset bytes from {}", path.display()))?;
|
||||||
|
|
||||||
|
// 1. Decode + EXIF + dimensions. ExtractContext.config carries
|
||||||
|
// nothing the image extractor reads today; we pass a default
|
||||||
|
// instance per the trait shape.
|
||||||
|
let extract_config = kebab_core::ExtractConfig::default();
|
||||||
|
let workspace_root = std::path::PathBuf::from(&app.config.workspace.root);
|
||||||
|
let ctx = ExtractContext {
|
||||||
|
asset,
|
||||||
|
workspace_root: &workspace_root,
|
||||||
|
config: &extract_config,
|
||||||
|
};
|
||||||
|
let mut canonical = image_extractor
|
||||||
|
.extract(&ctx, &bytes)
|
||||||
|
.context("kb-parse-image::ImageExtractor::extract")?;
|
||||||
|
|
||||||
|
// 2 + 3. Apply OCR / caption when their adapters exist. Both are
|
||||||
|
// Lenient — failure is captured into Provenance Warning,
|
||||||
|
// `block.ocr` / `block.caption` stay `None`. P6-4 spec
|
||||||
|
// explicitly: such partial failures do NOT increment the
|
||||||
|
// `errors` counter.
|
||||||
|
let lang_hint = lang_hint_from_doc(&canonical);
|
||||||
|
let mut warning_notes: Vec<String> = Vec::new();
|
||||||
|
if !canonical.blocks.is_empty() {
|
||||||
|
// P6-1 contract: image documents always have exactly one
|
||||||
|
// `Block::ImageRef`. Defensive match keeps us forward-compatible.
|
||||||
|
if let Some(Block::ImageRef(block)) = canonical.blocks.first_mut() {
|
||||||
|
if let Some(engine) = ocr_engine {
|
||||||
|
if let Err(e) = apply_ocr(
|
||||||
|
engine,
|
||||||
|
&bytes,
|
||||||
|
block,
|
||||||
|
lang_hint.as_ref(),
|
||||||
|
&mut canonical.provenance.events,
|
||||||
|
) {
|
||||||
|
let note = format!("ocr_failed: {e:#}");
|
||||||
|
tracing::warn!(
|
||||||
|
target: "kebab-app",
|
||||||
|
path = %asset.workspace_path.0,
|
||||||
|
"{}",
|
||||||
|
note
|
||||||
|
);
|
||||||
|
canonical.provenance.events.push(kebab_core::ProvenanceEvent {
|
||||||
|
at: time::OffsetDateTime::now_utc(),
|
||||||
|
agent: "kb-app".to_string(),
|
||||||
|
kind: kebab_core::ProvenanceKind::Warning,
|
||||||
|
note: Some(note.clone()),
|
||||||
|
});
|
||||||
|
warning_notes.push(note);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if let Some(llm) = caption_llm {
|
||||||
|
if let Err(e) = apply_caption(
|
||||||
|
llm,
|
||||||
|
&bytes,
|
||||||
|
block,
|
||||||
|
lang_hint.as_ref(),
|
||||||
|
&app.config,
|
||||||
|
&mut canonical.provenance.events,
|
||||||
|
) {
|
||||||
|
let note = format!("caption_failed: {e:#}");
|
||||||
|
tracing::warn!(
|
||||||
|
target: "kebab-app",
|
||||||
|
path = %asset.workspace_path.0,
|
||||||
|
"{}",
|
||||||
|
note
|
||||||
|
);
|
||||||
|
canonical.provenance.events.push(kebab_core::ProvenanceEvent {
|
||||||
|
at: time::OffsetDateTime::now_utc(),
|
||||||
|
agent: "kb-app".to_string(),
|
||||||
|
kind: kebab_core::ProvenanceKind::Warning,
|
||||||
|
note: Some(note.clone()),
|
||||||
|
});
|
||||||
|
warning_notes.push(note);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 4. Chunk via the same `MdHeadingV1Chunker` markdown uses — its
|
||||||
|
// `Block::ImageRef` arm already produces a single chunk per
|
||||||
|
// image (P1-5). The chunk text now follows the (β) plain-concat
|
||||||
|
// contract per the kebab-chunk render_block_text update.
|
||||||
|
let chunks = MdHeadingV1Chunker
|
||||||
|
.chunk(&canonical, chunk_policy)
|
||||||
|
.context("kb-chunk::MdHeadingV1Chunker::chunk (image)")?;
|
||||||
|
|
||||||
|
// 5. Persist + embed — identical sequence to markdown.
|
||||||
|
app.sqlite
|
||||||
|
.put_asset_with_bytes(asset, &bytes)
|
||||||
|
.context("DocumentStore::put_asset_with_bytes (image)")?;
|
||||||
|
app.sqlite
|
||||||
|
.put_document(&canonical)
|
||||||
|
.context("DocumentStore::put_document (image)")?;
|
||||||
|
app.sqlite
|
||||||
|
.put_blocks(&canonical.doc_id, &canonical.blocks)
|
||||||
|
.context("DocumentStore::put_blocks (image)")?;
|
||||||
|
app.sqlite
|
||||||
|
.put_chunks(&canonical.doc_id, &chunks)
|
||||||
|
.context("DocumentStore::put_chunks (image)")?;
|
||||||
|
|
||||||
|
if let (Some(emb), Some(vec_store)) = (embedder, vector_store)
|
||||||
|
&& !chunks.is_empty()
|
||||||
|
{
|
||||||
|
let inputs: Vec<EmbeddingInput<'_>> = chunks
|
||||||
|
.iter()
|
||||||
|
.map(|c| EmbeddingInput {
|
||||||
|
text: c.text.as_str(),
|
||||||
|
kind: EmbeddingKind::Document,
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
let vectors = emb
|
||||||
|
.embed(&inputs)
|
||||||
|
.context("Embedder::embed (image chunks)")?;
|
||||||
|
let model_id = emb.model_id();
|
||||||
|
let model_version = emb.model_version();
|
||||||
|
let dimensions = emb.dimensions();
|
||||||
|
let records: Vec<VectorRecord> = chunks
|
||||||
|
.iter()
|
||||||
|
.zip(vectors)
|
||||||
|
.map(|(c, v)| VectorRecord {
|
||||||
|
embedding_id: kebab_core::id_for_embedding(
|
||||||
|
&c.chunk_id,
|
||||||
|
&model_id,
|
||||||
|
&model_version,
|
||||||
|
dimensions,
|
||||||
|
),
|
||||||
|
chunk_id: c.chunk_id.clone(),
|
||||||
|
vector: v,
|
||||||
|
doc_id: canonical.doc_id.clone(),
|
||||||
|
text: c.text.clone(),
|
||||||
|
heading_path: c.heading_path.clone(),
|
||||||
|
model_id: model_id.clone(),
|
||||||
|
model_version: model_version.clone(),
|
||||||
|
dimensions,
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
vec_store
|
||||||
|
.upsert(&records)
|
||||||
|
.context("VectorStore::upsert (image)")?;
|
||||||
|
}
|
||||||
|
|
||||||
|
let kind = if existing_doc_ids.contains(&canonical.doc_id.0) {
|
||||||
|
kebab_core::IngestItemKind::Updated
|
||||||
|
} else {
|
||||||
|
kebab_core::IngestItemKind::New
|
||||||
|
};
|
||||||
|
|
||||||
|
Ok(kebab_core::IngestItem {
|
||||||
|
kind,
|
||||||
|
doc_id: Some(canonical.doc_id.clone()),
|
||||||
|
doc_path: asset.workspace_path.clone(),
|
||||||
|
asset_id: Some(asset.asset_id.clone()),
|
||||||
|
byte_len: Some(asset.byte_len),
|
||||||
|
block_count: u32::try_from(canonical.blocks.len()).ok(),
|
||||||
|
chunk_count: u32::try_from(chunks.len()).ok(),
|
||||||
|
parser_version: Some(canonical.parser_version.clone()),
|
||||||
|
chunker_version: Some(MdHeadingV1Chunker.chunker_version()),
|
||||||
|
warnings: warning_notes,
|
||||||
|
error: None,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Pull the BCP-47 language hint from the canonical document. P6-1
|
||||||
|
/// stamps `Lang("und")` by default; image-pipeline OCR / caption
|
||||||
|
/// adapters special-case "und" so the hint is intentionally dropped
|
||||||
|
/// from prompts.
|
||||||
|
fn lang_hint_from_doc(doc: &CanonicalDocument) -> Option<Lang> {
|
||||||
|
let s = doc.lang.0.as_str();
|
||||||
|
if s.is_empty() || s == "und" {
|
||||||
|
None
|
||||||
|
} else {
|
||||||
|
Some(doc.lang.clone())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/// Convenience: end byte of the frontmatter region (or 0 when absent).
|
/// Convenience: end byte of the frontmatter region (or 0 when absent).
|
||||||
fn fm_span_end(span: Option<kebab_parse_md::FrontmatterSpan>) -> usize {
|
fn fm_span_end(span: Option<kebab_parse_md::FrontmatterSpan>) -> usize {
|
||||||
span.map(|s| s.end).unwrap_or(0)
|
span.map(|s| s.end).unwrap_or(0)
|
||||||
|
|||||||
366
crates/kebab-app/tests/image_pipeline.rs
Normal file
366
crates/kebab-app/tests/image_pipeline.rs
Normal file
@@ -0,0 +1,366 @@
|
|||||||
|
//! P6-4 image ingest wiring — end-to-end integration.
|
||||||
|
//!
|
||||||
|
//! Each test spins up a `TempDir` workspace + writes one PNG fixture +
|
||||||
|
//! routes OCR / caption HTTP calls through a `wiremock` server that
|
||||||
|
//! impersonates Ollama's `/api/generate` endpoint. The kb-app code
|
||||||
|
//! under test is sync; the wiremock server is async, so test bodies
|
||||||
|
//! drive blocking work via `tokio::task::spawn_blocking`.
|
||||||
|
|
||||||
|
mod common;
|
||||||
|
|
||||||
|
use std::path::Path;
|
||||||
|
|
||||||
|
use common::TestEnv;
|
||||||
|
use kebab_config::Config;
|
||||||
|
use serde_json::json;
|
||||||
|
use tokio::task::spawn_blocking;
|
||||||
|
use wiremock::matchers::{method, path};
|
||||||
|
use wiremock::{Mock, MockServer, ResponseTemplate};
|
||||||
|
|
||||||
|
// ── Fixture helpers ──────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/// Tiny solid-red PNG written into the test workspace at `<root>/<name>`.
|
||||||
|
/// 100×50 — small enough to skip downscale by default but non-trivially
|
||||||
|
/// inspectable in stored DB rows.
|
||||||
|
fn write_red_png(root: &Path, name: &str) -> std::path::PathBuf {
|
||||||
|
use image::{ImageBuffer, Rgb};
|
||||||
|
let img: ImageBuffer<Rgb<u8>, _> =
|
||||||
|
ImageBuffer::from_fn(100, 50, |_, _| Rgb([255, 0, 0]));
|
||||||
|
let path = root.join(name);
|
||||||
|
img.save(&path).expect("write PNG fixture");
|
||||||
|
path
|
||||||
|
}
|
||||||
|
|
||||||
|
fn cfg_with_image_pipeline(env: &TestEnv, mock_endpoint: &str) -> Config {
|
||||||
|
let mut cfg = env.config.clone();
|
||||||
|
// Ensure image assets are scanned.
|
||||||
|
cfg.workspace
|
||||||
|
.include
|
||||||
|
.push("**/*.png".to_string());
|
||||||
|
cfg.image.ocr.enabled = true;
|
||||||
|
cfg.image.ocr.endpoint = Some(mock_endpoint.to_string());
|
||||||
|
cfg.image.ocr.model = "vision-mock:1b".to_string();
|
||||||
|
cfg.image.ocr.max_pixels = 512;
|
||||||
|
cfg.image.caption.enabled = false; // tested separately below
|
||||||
|
cfg.models.llm.endpoint = mock_endpoint.to_string();
|
||||||
|
cfg.models.llm.model = "vision-mock:1b".to_string();
|
||||||
|
cfg
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── 1. Happy path: OCR-only ingest ───────────────────────────────────────
|
||||||
|
|
||||||
|
/// One PNG asset + OCR enabled (caption off) → ingest produces 1 doc + 1
|
||||||
|
/// chunk; chunk text contains alt + OCR transcription joined by `\n\n`.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn ingest_image_with_ocr_produces_chunk_containing_ocr_text() {
|
||||||
|
let server = MockServer::start().await;
|
||||||
|
Mock::given(method("POST"))
|
||||||
|
.and(path("/api/generate"))
|
||||||
|
.respond_with(ResponseTemplate::new(200).set_body_json(json!({
|
||||||
|
"model": "vision-mock:1b",
|
||||||
|
"response": "Hello World 2026",
|
||||||
|
"done": true,
|
||||||
|
"done_reason": "stop"
|
||||||
|
})))
|
||||||
|
.mount(&server)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
let env = TestEnv::lexical_only();
|
||||||
|
let png = write_red_png(&env.workspace_root, "diagram.png");
|
||||||
|
eprintln!("PNG written to {}", png.display());
|
||||||
|
let cfg = cfg_with_image_pipeline(&env, &server.uri());
|
||||||
|
let cfg_clone = cfg.clone();
|
||||||
|
let env_workspace = env.workspace_root.clone();
|
||||||
|
let env_scope = env.scope();
|
||||||
|
|
||||||
|
let report = spawn_blocking(move || {
|
||||||
|
kebab_app::ingest_with_config(cfg_clone, env_scope, false)
|
||||||
|
.expect("image ingest must succeed")
|
||||||
|
})
|
||||||
|
.await
|
||||||
|
.expect("blocking task panicked");
|
||||||
|
|
||||||
|
// Counters: scanned should include the PNG; new ≥ 1 (markdown
|
||||||
|
// fixtures from the workspace tree may also count).
|
||||||
|
assert!(report.scanned >= 1, "scanned={}, items={:?}", report.scanned, report.items);
|
||||||
|
assert_eq!(report.errors, 0, "no errors on lenient OCR path");
|
||||||
|
|
||||||
|
// Locate the image doc in the report items.
|
||||||
|
let items = report.items.expect("items present (summary_only=false)");
|
||||||
|
let img_item = items
|
||||||
|
.iter()
|
||||||
|
.find(|i| i.doc_path.0.ends_with("diagram.png"))
|
||||||
|
.expect("image doc item must be present");
|
||||||
|
assert_eq!(
|
||||||
|
img_item.kind,
|
||||||
|
kebab_core::IngestItemKind::New,
|
||||||
|
"image asset must be classified New on first ingest"
|
||||||
|
);
|
||||||
|
assert_eq!(img_item.chunk_count, Some(1), "image emits exactly one chunk");
|
||||||
|
|
||||||
|
// Inspect the stored chunk text via kb-app's inspect_chunk facade.
|
||||||
|
let doc_id = img_item.doc_id.clone().expect("image doc id");
|
||||||
|
let doc = kebab_app::inspect_doc_with_config(cfg.clone(), &doc_id)
|
||||||
|
.expect("inspect_doc returns the image document");
|
||||||
|
let block = match doc.blocks.first() {
|
||||||
|
Some(kebab_core::Block::ImageRef(b)) => b,
|
||||||
|
other => panic!("expected ImageRef, got {other:?}"),
|
||||||
|
};
|
||||||
|
assert!(block.ocr.is_some(), "block.ocr populated by apply_ocr");
|
||||||
|
assert_eq!(
|
||||||
|
block.ocr.as_ref().unwrap().joined,
|
||||||
|
"Hello World 2026",
|
||||||
|
"OCR text from mock"
|
||||||
|
);
|
||||||
|
assert!(
|
||||||
|
block.caption.is_none(),
|
||||||
|
"caption disabled in cfg → block.caption stays None"
|
||||||
|
);
|
||||||
|
|
||||||
|
// Sanity: the doc was actually persisted into SQLite (kb-app's
|
||||||
|
// list_docs facade reads the same store the chunker writes to).
|
||||||
|
let summaries = kebab_app::list_docs_with_config(cfg, kebab_core::DocFilter::default())
|
||||||
|
.expect("list_docs");
|
||||||
|
assert!(
|
||||||
|
summaries.iter().any(|s| s.doc_path.0.ends_with("diagram.png")),
|
||||||
|
"image doc must appear in list_docs"
|
||||||
|
);
|
||||||
|
|
||||||
|
drop(env_workspace); // keep TempDir alive until here
|
||||||
|
drop(env);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── 2. OCR + caption together ────────────────────────────────────────────
|
||||||
|
|
||||||
|
/// Both OCR and caption enabled. The mock returns the same JSON body
|
||||||
|
/// for every `/api/generate` POST — wiremock has no per-prompt routing
|
||||||
|
/// on the default `Mock` so we treat both calls as equivalent. We then
|
||||||
|
/// verify both `block.ocr` and `block.caption` are populated, and the
|
||||||
|
/// chunk text contains both fragments separated by `\n\n`.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn ingest_image_with_ocr_and_caption_populates_both_fields() {
|
||||||
|
let server = MockServer::start().await;
|
||||||
|
Mock::given(method("POST"))
|
||||||
|
.and(path("/api/generate"))
|
||||||
|
.respond_with(ResponseTemplate::new(200).set_body_json(json!({
|
||||||
|
"response": "shared mock body",
|
||||||
|
"done": true,
|
||||||
|
"done_reason": "stop"
|
||||||
|
})))
|
||||||
|
.mount(&server)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
let env = TestEnv::lexical_only();
|
||||||
|
write_red_png(&env.workspace_root, "diagram.png");
|
||||||
|
let mut cfg = cfg_with_image_pipeline(&env, &server.uri());
|
||||||
|
cfg.image.caption.enabled = true;
|
||||||
|
cfg.image.caption.max_pixels = 384;
|
||||||
|
|
||||||
|
let cfg_clone = cfg.clone();
|
||||||
|
let scope = env.scope();
|
||||||
|
let report = spawn_blocking(move || {
|
||||||
|
kebab_app::ingest_with_config(cfg_clone, scope, false)
|
||||||
|
.expect("ingest must succeed with both OCR+caption")
|
||||||
|
})
|
||||||
|
.await
|
||||||
|
.expect("task");
|
||||||
|
|
||||||
|
assert_eq!(report.errors, 0);
|
||||||
|
let img_item = report
|
||||||
|
.items
|
||||||
|
.as_ref()
|
||||||
|
.unwrap()
|
||||||
|
.iter()
|
||||||
|
.find(|i| i.doc_path.0.ends_with("diagram.png"))
|
||||||
|
.unwrap();
|
||||||
|
let doc = kebab_app::inspect_doc_with_config(cfg, img_item.doc_id.as_ref().unwrap())
|
||||||
|
.unwrap();
|
||||||
|
let block = match &doc.blocks[0] {
|
||||||
|
kebab_core::Block::ImageRef(b) => b,
|
||||||
|
_ => unreachable!(),
|
||||||
|
};
|
||||||
|
assert!(block.ocr.is_some(), "OCR populated");
|
||||||
|
assert!(block.caption.is_some(), "caption populated");
|
||||||
|
drop(env);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── 3. Lenient failure: OCR Ollama 503 → asset still indexed ─────────────
|
||||||
|
|
||||||
|
/// OCR endpoint returns 503. Spec contract: image is still indexed,
|
||||||
|
/// `block.ocr = None`, Provenance has a Warning event, `errors`
|
||||||
|
/// counter NOT incremented.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn ocr_failure_indexes_asset_with_warning_no_error_counter() {
|
||||||
|
let server = MockServer::start().await;
|
||||||
|
Mock::given(method("POST"))
|
||||||
|
.and(path("/api/generate"))
|
||||||
|
.respond_with(ResponseTemplate::new(503))
|
||||||
|
.mount(&server)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
let env = TestEnv::lexical_only();
|
||||||
|
write_red_png(&env.workspace_root, "broken.png");
|
||||||
|
let cfg = cfg_with_image_pipeline(&env, &server.uri());
|
||||||
|
|
||||||
|
let cfg_clone = cfg.clone();
|
||||||
|
let scope = env.scope();
|
||||||
|
let report = spawn_blocking(move || {
|
||||||
|
kebab_app::ingest_with_config(cfg_clone, scope, false)
|
||||||
|
.expect("ingest does not abort on lenient OCR failure")
|
||||||
|
})
|
||||||
|
.await
|
||||||
|
.expect("task");
|
||||||
|
|
||||||
|
assert_eq!(
|
||||||
|
report.errors, 0,
|
||||||
|
"lenient OCR failure must NOT increment errors counter (spec)"
|
||||||
|
);
|
||||||
|
let img_item = report
|
||||||
|
.items
|
||||||
|
.as_ref()
|
||||||
|
.unwrap()
|
||||||
|
.iter()
|
||||||
|
.find(|i| i.doc_path.0.ends_with("broken.png"))
|
||||||
|
.expect("asset still indexed despite OCR failure");
|
||||||
|
assert_eq!(img_item.kind, kebab_core::IngestItemKind::New);
|
||||||
|
assert_eq!(img_item.chunk_count, Some(1));
|
||||||
|
assert!(
|
||||||
|
!img_item.warnings.is_empty(),
|
||||||
|
"lenient OCR failure must surface a warning on the IngestItem"
|
||||||
|
);
|
||||||
|
|
||||||
|
let doc_id = img_item.doc_id.clone().unwrap();
|
||||||
|
let doc = kebab_app::inspect_doc_with_config(cfg, &doc_id).unwrap();
|
||||||
|
let block = match &doc.blocks[0] {
|
||||||
|
kebab_core::Block::ImageRef(b) => b,
|
||||||
|
_ => unreachable!(),
|
||||||
|
};
|
||||||
|
assert!(block.ocr.is_none(), "block.ocr stays None on OCR failure");
|
||||||
|
let warning = doc
|
||||||
|
.provenance
|
||||||
|
.events
|
||||||
|
.iter()
|
||||||
|
.find(|e| e.kind == kebab_core::ProvenanceKind::Warning && e.agent == "kb-app")
|
||||||
|
.expect("Provenance Warning attributed to kb-app");
|
||||||
|
let note = warning.note.as_deref().unwrap_or("");
|
||||||
|
assert!(
|
||||||
|
note.contains("ocr_failed"),
|
||||||
|
"warning note must describe OCR failure: {note}"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── 4. Both image.ocr.enabled and image.caption.enabled = false ──────────
|
||||||
|
|
||||||
|
/// When both adapters are disabled, the image is still extracted +
|
||||||
|
/// chunked. Chunk text falls back to the filename. EXIF + dimensions
|
||||||
|
/// are populated by the extractor regardless.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn image_indexed_with_filename_when_ocr_and_caption_disabled() {
|
||||||
|
// No mock server needed — neither HTTP path is touched.
|
||||||
|
let env = TestEnv::lexical_only();
|
||||||
|
write_red_png(&env.workspace_root, "raw.png");
|
||||||
|
let mut cfg = env.config.clone();
|
||||||
|
cfg.workspace.include.push("**/*.png".to_string());
|
||||||
|
cfg.image.ocr.enabled = false;
|
||||||
|
cfg.image.caption.enabled = false;
|
||||||
|
|
||||||
|
let cfg_clone = cfg.clone();
|
||||||
|
let scope = env.scope();
|
||||||
|
let report = spawn_blocking(move || {
|
||||||
|
kebab_app::ingest_with_config(cfg_clone, scope, false)
|
||||||
|
.expect("ingest with no OCR/caption")
|
||||||
|
})
|
||||||
|
.await
|
||||||
|
.expect("task");
|
||||||
|
|
||||||
|
assert_eq!(report.errors, 0);
|
||||||
|
let img_item = report
|
||||||
|
.items
|
||||||
|
.as_ref()
|
||||||
|
.unwrap()
|
||||||
|
.iter()
|
||||||
|
.find(|i| i.doc_path.0.ends_with("raw.png"))
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(img_item.chunk_count, Some(1), "image emits one chunk");
|
||||||
|
let doc = kebab_app::inspect_doc_with_config(cfg, img_item.doc_id.as_ref().unwrap())
|
||||||
|
.unwrap();
|
||||||
|
let block = match &doc.blocks[0] {
|
||||||
|
kebab_core::Block::ImageRef(b) => b,
|
||||||
|
_ => unreachable!(),
|
||||||
|
};
|
||||||
|
assert!(block.ocr.is_none() && block.caption.is_none());
|
||||||
|
// EXIF + dimensions still populated by the extractor.
|
||||||
|
let dims = doc
|
||||||
|
.metadata
|
||||||
|
.user
|
||||||
|
.get("dimensions")
|
||||||
|
.and_then(|v: &serde_json::Value| v.as_object())
|
||||||
|
.expect("dimensions object present");
|
||||||
|
assert_eq!(
|
||||||
|
dims.get("w").and_then(|v: &serde_json::Value| v.as_u64()),
|
||||||
|
Some(100)
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
dims.get("h").and_then(|v: &serde_json::Value| v.as_u64()),
|
||||||
|
Some(50)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── 5. Determinism: re-ingest produces identical doc_id / chunk_id ───────
|
||||||
|
|
||||||
|
/// Idempotency contract — running the same ingest twice should mark
|
||||||
|
/// the asset Updated on the second run with byte-identical IDs.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn re_ingest_image_produces_updated_with_same_doc_id() {
|
||||||
|
let server = MockServer::start().await;
|
||||||
|
Mock::given(method("POST"))
|
||||||
|
.and(path("/api/generate"))
|
||||||
|
.respond_with(ResponseTemplate::new(200).set_body_json(json!({
|
||||||
|
"response": "stable",
|
||||||
|
"done": true,
|
||||||
|
"done_reason": "stop"
|
||||||
|
})))
|
||||||
|
.mount(&server)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
let env = TestEnv::lexical_only();
|
||||||
|
write_red_png(&env.workspace_root, "diagram.png");
|
||||||
|
let cfg = cfg_with_image_pipeline(&env, &server.uri());
|
||||||
|
|
||||||
|
let scope = env.scope();
|
||||||
|
let cfg1 = cfg.clone();
|
||||||
|
let cfg2 = cfg.clone();
|
||||||
|
let scope1 = scope.clone();
|
||||||
|
let scope2 = scope.clone();
|
||||||
|
|
||||||
|
let r1 = spawn_blocking(move || {
|
||||||
|
kebab_app::ingest_with_config(cfg1, scope1, false).unwrap()
|
||||||
|
})
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let r2 = spawn_blocking(move || {
|
||||||
|
kebab_app::ingest_with_config(cfg2, scope2, false).unwrap()
|
||||||
|
})
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let id1 = r1
|
||||||
|
.items
|
||||||
|
.as_ref()
|
||||||
|
.unwrap()
|
||||||
|
.iter()
|
||||||
|
.find(|i| i.doc_path.0.ends_with("diagram.png"))
|
||||||
|
.unwrap()
|
||||||
|
.doc_id
|
||||||
|
.clone()
|
||||||
|
.unwrap();
|
||||||
|
let img2 = r2
|
||||||
|
.items
|
||||||
|
.as_ref()
|
||||||
|
.unwrap()
|
||||||
|
.iter()
|
||||||
|
.find(|i| i.doc_path.0.ends_with("diagram.png"))
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(img2.kind, kebab_core::IngestItemKind::Updated);
|
||||||
|
assert_eq!(img2.doc_id.as_ref().unwrap(), &id1);
|
||||||
|
}
|
||||||
@@ -381,17 +381,41 @@ fn render_block_text(b: &Block) -> String {
|
|||||||
}
|
}
|
||||||
s
|
s
|
||||||
}
|
}
|
||||||
// ImageRef text portion = alt (per task spec). Fall back to
|
// ImageRef text portion follows the P6-4 (β) plain-concat
|
||||||
// model caption text if alt is empty.
|
// contract — `[alt, ocr.joined, caption.text]` joined by
|
||||||
|
// `\n\n`, dropping empty parts. Filename fallback for empty
|
||||||
|
// alt keeps lexical search hits on filenames working even when
|
||||||
|
// P6-1's filename auto-fill is bypassed.
|
||||||
Block::ImageRef(i) => {
|
Block::ImageRef(i) => {
|
||||||
if !i.alt.is_empty() {
|
let alt = if !i.alt.is_empty() {
|
||||||
i.alt.clone()
|
i.alt.clone()
|
||||||
} else {
|
} else {
|
||||||
i.caption
|
// P6-1 falls back to filename so this branch is
|
||||||
.as_ref()
|
// defensive — keep it lest a future test fixture or
|
||||||
.map(|c| c.text.clone())
|
// synthetic block path skip the auto-fill.
|
||||||
.unwrap_or_default()
|
i.src
|
||||||
}
|
.rsplit('/')
|
||||||
|
.next()
|
||||||
|
.filter(|s| !s.is_empty())
|
||||||
|
.unwrap_or("[image]")
|
||||||
|
.to_string()
|
||||||
|
};
|
||||||
|
let ocr = i
|
||||||
|
.ocr
|
||||||
|
.as_ref()
|
||||||
|
.map(|o| o.joined.as_str())
|
||||||
|
.unwrap_or("");
|
||||||
|
let cap = i
|
||||||
|
.caption
|
||||||
|
.as_ref()
|
||||||
|
.map(|c| c.text.as_str())
|
||||||
|
.unwrap_or("");
|
||||||
|
[alt.as_str(), ocr, cap]
|
||||||
|
.iter()
|
||||||
|
.filter(|s| !s.is_empty())
|
||||||
|
.copied()
|
||||||
|
.collect::<Vec<_>>()
|
||||||
|
.join("\n\n")
|
||||||
}
|
}
|
||||||
// AudioRef has no caption preview yet (transcript joins land
|
// AudioRef has no caption preview yet (transcript joins land
|
||||||
// in P8). Empty string per task spec.
|
// in P8). Empty string per task spec.
|
||||||
@@ -700,6 +724,63 @@ mod tests {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// P6-4 (β) plain concatenation — alt + ocr.joined + caption.text
|
||||||
|
/// joined by `\n\n`, dropping empty parts. Verifies all four
|
||||||
|
/// (alt-only, alt+ocr, alt+caption, alt+ocr+caption) shapes.
|
||||||
|
#[test]
|
||||||
|
fn image_ref_p6_4_plain_concat_drops_empty_parts() {
|
||||||
|
use kebab_core::{ModelCaption, OcrText};
|
||||||
|
|
||||||
|
let mk = |alt: &str, ocr: Option<&str>, cap: Option<&str>| {
|
||||||
|
Block::ImageRef(ImageRefBlock {
|
||||||
|
common: common_for("imageref", &[], 0, span(1, 1)),
|
||||||
|
asset_id: None,
|
||||||
|
src: "img.png".into(),
|
||||||
|
alt: alt.into(),
|
||||||
|
ocr: ocr.map(|t| OcrText {
|
||||||
|
joined: t.into(),
|
||||||
|
regions: vec![],
|
||||||
|
engine: "test".into(),
|
||||||
|
engine_version: "v1".into(),
|
||||||
|
}),
|
||||||
|
caption: cap.map(|t| ModelCaption {
|
||||||
|
text: t.into(),
|
||||||
|
model: "m".into(),
|
||||||
|
model_version: "v".into(),
|
||||||
|
}),
|
||||||
|
})
|
||||||
|
};
|
||||||
|
|
||||||
|
// alt-only — no separators between empty parts.
|
||||||
|
assert_eq!(render_block_text(&mk("photo.png", None, None)), "photo.png");
|
||||||
|
|
||||||
|
// alt + ocr — joined by exactly one `\n\n`.
|
||||||
|
assert_eq!(
|
||||||
|
render_block_text(&mk("photo.png", Some("Hello"), None)),
|
||||||
|
"photo.png\n\nHello"
|
||||||
|
);
|
||||||
|
|
||||||
|
// alt + caption.
|
||||||
|
assert_eq!(
|
||||||
|
render_block_text(&mk("photo.png", None, Some("a red square"))),
|
||||||
|
"photo.png\n\na red square"
|
||||||
|
);
|
||||||
|
|
||||||
|
// alt + ocr + caption — three parts joined by `\n\n` each.
|
||||||
|
assert_eq!(
|
||||||
|
render_block_text(&mk("photo.png", Some("Hello"), Some("a red square"))),
|
||||||
|
"photo.png\n\nHello\n\na red square"
|
||||||
|
);
|
||||||
|
|
||||||
|
// empty alt — falls back to filename derived from `src`.
|
||||||
|
let blk = mk("", Some("text from image"), None);
|
||||||
|
assert_eq!(
|
||||||
|
render_block_text(&blk),
|
||||||
|
"img.png\n\ntext from image",
|
||||||
|
"empty alt must fall back to the basename of `src`"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
/// ImageRef → own chunk, token_estimate=0.
|
/// ImageRef → own chunk, token_estimate=0.
|
||||||
#[test]
|
#[test]
|
||||||
fn image_ref_emits_own_chunk_zero_tokens() {
|
fn image_ref_emits_own_chunk_zero_tokens() {
|
||||||
|
|||||||
@@ -118,16 +118,41 @@ max_context_tokens = 6000
|
|||||||
KEBAB() { ./target/debug/kebab --config /tmp/kebab-smoke/config.toml "$@"; }
|
KEBAB() { ./target/debug/kebab --config /tmp/kebab-smoke/config.toml "$@"; }
|
||||||
|
|
||||||
KB doctor # 1. health check
|
KB doctor # 1. health check
|
||||||
KB ingest # 2. 워크스페이스 색인
|
KB ingest # 2. 워크스페이스 색인 (markdown + image)
|
||||||
KB list docs # 3. 색인 결과 목록
|
KB list docs # 3. 색인 결과 목록 (markdown + image 모두 표시)
|
||||||
KB search --mode lexical "코루틴" --k 3 # 4. lexical 검색
|
KB search --mode lexical "코루틴" --k 3 # 4. lexical 검색
|
||||||
KB search --mode vector "memory safety" --k 3 # 5. vector 검색
|
KB search --mode vector "memory safety" --k 3 # 5. vector 검색
|
||||||
KB search --mode hybrid "Cargo workspace" --k 3 # 6. hybrid 검색
|
KB search --mode hybrid "Cargo workspace" --k 3 # 6. hybrid 검색
|
||||||
KB inspect chunk <chunk_id> # 7. raw chunk 보기
|
KB search --mode lexical "Hello World" --k 3 # 7. image OCR 텍스트 검색 (P6-4)
|
||||||
KB ask "이 KB 안에서 ..." --mode hybrid --k 5 # 8. RAG 답변 (Ollama 필요)
|
KB inspect chunk <chunk_id> # 8. raw chunk 보기
|
||||||
KB --json ask "..." --mode hybrid # 9. 기계 친화 출력 검증
|
KB ask "이 KB 안에서 ..." --mode hybrid --k 5 # 9. RAG 답변 (Ollama 필요)
|
||||||
|
KB --json ask "..." --mode hybrid # 10. 기계 친화 출력 검증
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## P6-4 이미지 ingestion 옵션
|
||||||
|
|
||||||
|
`config.toml` 에 다음 절을 추가하면 `kebab ingest` 가 `**/*.png` / `**/*.jpg` 등 이미지 자산도 함께 색인합니다 (텍스트만 색인하려면 생략):
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[workspace]
|
||||||
|
include = ["**/*.md", "**/*.png", "**/*.jpg"]
|
||||||
|
|
||||||
|
[image.ocr]
|
||||||
|
enabled = true # vision LM 으로 이미지 안 텍스트 전사
|
||||||
|
engine = "ollama-vision"
|
||||||
|
model = "gemma4:e4b" # 사용자 환경의 비전 모델
|
||||||
|
endpoint = "http://192.168.0.47:11434" # 비우면 models.llm.endpoint fallback
|
||||||
|
languages = ["eng", "kor"]
|
||||||
|
max_pixels = 1600 # long-edge cap
|
||||||
|
|
||||||
|
[image.caption]
|
||||||
|
enabled = true # vision LM 으로 한 문장 객관 설명 생성
|
||||||
|
max_pixels = 768
|
||||||
|
prompt_template_version = "caption-v1"
|
||||||
|
```
|
||||||
|
|
||||||
|
이미지 자산 한 장당 OCR 1 호출 + Caption 1 호출 → ~3-6초 (`gemma4:e4b` 기준). 다이어그램 / 카메라 사진 / 스크린샷 위주 워크스페이스에 권장. 책 / 스캔본은 P7 PDF 라인으로 (P7 머지 후).
|
||||||
|
|
||||||
각 명령은 0 종료 코드면 정상. `kebab ask` 는 거절 시 종료 코드 1 (`RefusalSignal`) — 의도된 동작.
|
각 명령은 0 종료 코드면 정상. `kebab ask` 는 거절 시 종료 코드 1 (`RefusalSignal`) — 의도된 동작.
|
||||||
|
|
||||||
## 검증 체크리스트
|
## 검증 체크리스트
|
||||||
@@ -138,6 +163,8 @@ KB --json ask "..." --mode hybrid # 9. 기계 친화 출력 검
|
|||||||
- `kebab search --mode hybrid` 의 `fusion_score` 가 `[0, 1]` 범위 (top-1 종종 1.0 — 두 retriever 모두 rank 1 일 때).
|
- `kebab search --mode hybrid` 의 `fusion_score` 가 `[0, 1]` 범위 (top-1 종종 1.0 — 두 retriever 모두 rank 1 일 때).
|
||||||
- `kebab ask` JSON 응답에 `model.id` 가 config 의 모델 (`gemma4:26b` 등) 과 일치, `embedding.id = multilingual-e5-small`, `citations[].marker` 가 `[1]` / `[2]` 형식 (square-bracketed bare index).
|
- `kebab ask` JSON 응답에 `model.id` 가 config 의 모델 (`gemma4:26b` 등) 과 일치, `embedding.id = multilingual-e5-small`, `citations[].marker` 가 `[1]` / `[2]` 형식 (square-bracketed bare index).
|
||||||
- 코퍼스에 없는 주제로 `kebab ask` → `refusal_reason: "llm_self_judge"` (또는 `no_chunks` / `score_gate`) + `grounded: false`.
|
- 코퍼스에 없는 주제로 `kebab ask` → `refusal_reason: "llm_self_judge"` (또는 `no_chunks` / `score_gate`) + `grounded: false`.
|
||||||
|
- (P6-4) `image.ocr.enabled = true` 로 PNG 자산을 ingest 하면 `kebab list docs` 가 markdown 옆에 image doc 도 출력 (`workspace_path` 가 `*.png`). `kebab inspect doc <image_doc_id>` 의 `block.ocr.joined` 가 vision LM 의 OCR 결과 (예: 스크린샷 안의 텍스트). `kebab search --mode lexical "<OCR text>"` 가 그 image chunk 를 반환하면 wiring 정상.
|
||||||
|
- OCR / caption 부분 실패는 `errors` 카운터 미증가 — `kebab inspect doc <id>` 의 Provenance Warning 이벤트 또는 `--debug` 로그에서만 확인.
|
||||||
|
|
||||||
## 정리
|
## 정리
|
||||||
|
|
||||||
@@ -154,5 +181,6 @@ rm -rf /tmp/kebab-smoke # 통째로 정리
|
|||||||
- `kebab ask` 응답 시간 = LLM 토큰 throughput 에 종속. M4 Pro 48GB + gemma4:26b 기준 답변 50–100 토큰에 20–55초.
|
- `kebab ask` 응답 시간 = LLM 토큰 throughput 에 종속. M4 Pro 48GB + gemma4:26b 기준 답변 50–100 토큰에 20–55초.
|
||||||
- `--config` path 가 존재하지 않거나 malformed 면 `kebab doctor` 가 hard fail (defaults 가 silently mask 하지 않게 하는 hotfix 동작).
|
- `--config` path 가 존재하지 않거나 malformed 면 `kebab doctor` 가 hard fail (defaults 가 silently mask 하지 않게 하는 hotfix 동작).
|
||||||
- 매 CLI invocation 마다 fastembed 모델 init 비용 (~4초) — process-level 캐시 부재 때문. P9 TUI 진입 시 `App` 의 `OnceLock` 으로 세션 동안 한 번만 init.
|
- 매 CLI invocation 마다 fastembed 모델 init 비용 (~4초) — process-level 캐시 부재 때문. P9 TUI 진입 시 `App` 의 `OnceLock` 으로 세션 동안 한 번만 init.
|
||||||
|
- (P6-4) `image.ocr.enabled = true` + `image.caption.enabled = true` 인 워크스페이스에 PNG 가 N장 있으면 ingest 시간 ≈ markdown_time + N × (OCR + Caption latency). `gemma4:e4b` + 192.168.0.47 로 자산당 ~5-10초. 다수의 책 페이지를 이미지로 넣지 말 것 — 책은 P7 PDF 라인 사용 권장 (P7 머지 후).
|
||||||
|
|
||||||
자세한 history 와 발견된 버그는 [tasks/HOTFIXES.md](../tasks/HOTFIXES.md) 참조.
|
자세한 history 와 발견된 버그는 [tasks/HOTFIXES.md](../tasks/HOTFIXES.md) 참조.
|
||||||
|
|||||||
@@ -3,7 +3,7 @@ phase: P6
|
|||||||
component: kebab-app (image ingest dispatch + chunking)
|
component: kebab-app (image ingest dispatch + chunking)
|
||||||
task_id: p6-4
|
task_id: p6-4
|
||||||
title: "Wire ImageExtractor + OCR + caption into kebab-app::ingest end-to-end"
|
title: "Wire ImageExtractor + OCR + caption into kebab-app::ingest end-to-end"
|
||||||
status: planned
|
status: completed
|
||||||
depends_on: [p6-1, p6-2, p6-3, p1-6, p3-5]
|
depends_on: [p6-1, p6-2, p6-3, p1-6, p3-5]
|
||||||
unblocks: []
|
unblocks: []
|
||||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||||
|
|||||||
Reference in New Issue
Block a user