Files
kebab/docs/superpowers/handoffs/2026-05-26-v0.20-image-pdf-normalize-handoff.md
altair823 574e1b1ca1 docs: v0.20 image+pdf handoff + sub-item 3 spec/plan backfill
v0.19.0 release 후 다음 session 인계용 handoff 문서 + 사후 backfill.

- docs/superpowers/handoffs/2026-05-26-v0.20-image-pdf-normalize-handoff.md (540 lines, 9 section)
  - sub-item 1/2/3 머지 결과 + 도그푸딩 baseline (1781 doc / 9050 chunks) + user memory + OMC workflow + 빌드 환경
  - 현재 구현 상태 (v0.19.0, image+pdf) — 정확한 file:line + struct/fn signature + flow
  - 8 TODO 상세 (problem + scope + affected files + risk + trigger 조건)
  - 우선순위 + sequencing 권장 + 새 session 첫 단계 제안

- docs/superpowers/specs/2026-05-26-extractor-dispatch-unification-spec.md (sub-item 3 spec)
- docs/superpowers/plans/2026-05-26-extractor-dispatch-unification-plan.md (sub-item 3 plan)

PR #187 머지 시 source code 만 들어가고 spec/plan 누락 — 동일 PR 의 reference link 가 main 에서 404. 본 commit 으로 backfill.

Assisted-by: Claude Code
2026-05-26 23:34:17 +00:00

30 KiB

title, created, status, target_version, related_specs
title created status target_version related_specs
v0.20.0 — Image + PDF normalize integration handoff 2026-05-26 ready-for-spec 0.20.0
docs/superpowers/specs/2026-05-26-normalize-absorption-spec.md (sub-item 2, §11 future-work)
docs/superpowers/specs/2026-05-26-extractor-dispatch-unification-spec.md (sub-item 3, §11 future-work)
docs/superpowers/specs/2026-04-27-kebab-final-form-design.md (frozen design §3.7b 재작성 후)

v0.20.0 핸드오프 — Image + PDF normalize integration

본 문서는 새 Claude session 에 v0.20.0 작업 인계하는 self-contained context. PR #185 / #186 / #187 머지 + 도그푸딩 완료 시점의 상태를 그대로 전달.


1. 컨텍스트 요약 (새 session 이 알아야 할 것)

1.1 머지된 v0.19.0 sub-item 3 PR

PR sub-item scope 영향
#185 1 kebab-source-fskebab-parse-code dep 제거 — 9 tree-sitter grammars drag 정리. 4 leaf helper → kebab-source-fs::code_meta 로 이전 dep graph 정리. 사용자 visible surface 변경 0.
#186 2 kebab-normalize + kebab-parse-types 흡수 → kebab-parse-md. 24 → 22 crates. design §3.7b 4-단락 재작성. workspace.version 0.18.0 → 0.19.0 bump frozen design contract 변경 (tasks/INDEX.md "Future work / deferred" 섹션 신설).
#187 3 Extractor dispatch polymorphism — App.extractors: Vec<Box<dyn Extractor + Send + Sync>> registry + App::extract_for(...) helper. 11 hardcoded callsite + 9 AST arm → 1 callsite + 4 arm sub-item 4 의 base. registry 가 v0.20.0 의 모든 dispatch 통합 work 의 시작점.

main HEAD = c1e82cc (PR #187 머지 후). branch base 는 main 에서 분기.

1.2 도그푸딩 결과 (v0.19.0)

  • 통합 KB: 1781 doc / 9050 chunks (기존 1770 + 새 fixture 11). 본 KB 가 v0.20.0 도그푸딩 baseline.
  • 위치: /home/altair823/KnowledgeBase/ (user workspace root) + 새 fixture _dogfood-v0.19.0/
  • markdown 1004 + code 7 lang (rust 13 + python 13 + go 10 + java 10 + kotlin 8 + ts 6 + js 5)
  • image 770 file (기존 KB) — OCR + caption disabled (config default)
  • pdf 0 file
  • workspace.version 0.19.0 cascade 적용

1.3 사용자 user memory (필수 follow)

~/.claude/projects/-home-altair823-kebab/memory/MEMORY.md 확인:

  • PR workflow: gitea-pr + 리뷰 루프 모드 default (단발 모드 묻지 마)
  • Phase priorities: P8 audio 보류, P9 UI 우선 (책 + PDF 위주)
  • LLM default: gemma4 계열 (gemma4:e4b local + gemma4:26b remote)
  • Docs split: README / HANDOFF / ARCHITECTURE 세 사이블링 동시 갱신 (implementation PR)
  • Ranking bias: 자동 heuristic deferred (1주+ 실사용 후 재 brainstorm)
  • Skip user review gates: brainstorming / writing-plans confirm 단계 skip. 핵심 trade-off 만 AskUserQuestion.
  • Serial cargo, -j 4 default: cargo 동시 background 금지. -j 4 default, -j 8 fast mode, -j 1 OOM fallback only
  • No caveman style: 자연스러운 한국어 산문체
  • Teammate model routing: executor + initial draft + round 1 review = opus, closure verify / micro-patch round = sonnet

1.4 OMC team workflow 패턴

sub-item 1/2/3 모두 다음 패턴 사용:

  • Phase A (spec): planner (opus) 가 spec drafting → critic (opus) round 1 thorough → critic (sonnet) round 2+ closure verify → APPROVE
  • Phase B (plan): planner (opus) 가 plan decompose → critic-plan + verifier-plan (opus) round 1 parallel → round 2+ sonnet closure verify → ACCEPT
  • Phase C (executor): executor (opus) 가 plan step 정확 따라 구현 + 1 clean commit
  • Phase D (PR): team-lead 가 gitea-pr 생성 + 회차 1 self-review APPROVE

수렴 실패 감지: round N finding 이 round N+1 closure 도 동일 위치 재등장 시 NEEDS_DISCUSSION.

1.5 빌드 / 도그푸딩 환경

  • working directory: /home/altair823/kebab
  • branch convention: refactor/<sub-item-name> 또는 feat/<feature-name>
  • CARGO_TARGET_DIR=/build/out/cargo-target/target
  • release binary: /build/out/cargo-target/target/release/kebab (v0.19.0)
  • dogfood KB: /home/altair823/KnowledgeBase/ (user XDG default workspace)
  • merged config (include 확장): /tmp/kebab-dogfood-merged.toml
  • Ollama: http://localhost:11434 (gemma3:4b + gemma4:e4b local), http://192.168.0.47:11434 (remote, user config default)
  • 도그푸딩 fixture: /home/altair823/KnowledgeBase/_dogfood-v0.19.0/ (4 markdown + 7 code lang)

1.6 현재 구현 상태 (v0.19.0 기준, image + pdf)

새 session 이 변경 작업 시작 전 reference 로 사용. 모든 file:line 은 main HEAD = c1e82cc 기준.

1.6.1 Image — 구현된 기능

crates/kebab-parse-image/src/lib.rs:

  • pub const PARSER_VERSION: &str = "image-meta-v1"; (line 47)
  • pub const MAX_DECODE_DIM: u32 = 16_384; (line 51)
  • pub struct ImageExtractor; (line 55, unit struct)
  • impl ImageExtractor (line 57-61): pub fn new() -> Self { Self }
  • impl Default for ImageExtractor (line 64)
  • impl Extractor for ImageExtractor (line 69-120):
    • fn supports(&self, m: &MediaType) -> bool { matches!(m, MediaType::Image(_)) }
    • fn parser_version(&self) -> ParserVersion { ParserVersion("image-meta-v1".to_string()) }
    • fn extract(&self, ctx: &ExtractContext<'_>, bytes: &[u8]) -> Result<CanonicalDocument>:
      • dims::probe(bytes) — width / height / format 추출
      • exif_extract::extract_whitelisted(bytes) — DateTimeOriginal / Make / Model / GPS 등 화이트리스트 only (privacy)
      • SourceSpan::Region { x: 0, y: 0, w: width, h: height } span
      • id_for_doc(&asset.workspace_path, &asset.asset_id, &parser_version) 으로 doc_id 생성
      • return 1 image-meta block + (warnings = corrupt EXIF 등)

crates/kebab-parse-image/src/ocr.rs:

  • pub fn apply_ocr(...) (line 82): vision LLM 호출로 image → OCR text. doc 에 OCR text block 추가 (mutating helper).
  • pub struct OllamaVisionOcr — Ollama vision endpoint binding. OllamaVisionOcr::new(&app.config) 으로 build (lib.rs:340).
  • pub trait OcrEngine — 미래 다른 OCR backend (Tesseract / PaddleOCR) 위한 trait. 현재 OllamaVisionOcr 만 impl.

crates/kebab-parse-image/src/caption.rs:

  • pub fn apply_caption(...) (line 162): vision LLM 호출로 image → descriptive caption. doc 에 caption block 추가.
  • pub fn caption_image(...) — internal helper.

crates/kebab-parse-image/src/lib.rs re-exports:

  • pub mod caption; (line 31)
  • pub mod ocr; (line 32)
  • pub use caption::{apply_caption, caption_image}; (line 34)
  • pub use ocr::{OcrEngine, OllamaVisionOcr, apply_ocr}; (line 35)

crates/kebab-app/src/lib.rs (image ingest wiring):

  • use kebab_parse_image::{OllamaVisionOcr, apply_caption, apply_ocr}; (line 51)
  • let ocr_engine: Option<OllamaVisionOcr> = if app.config.image.ocr.enabled { Some(OllamaVisionOcr::new(&app.config)?) } else { None }; (line 338-347)
  • let image_pipeline = ImagePipeline { ocr_engine: ocr_engine.as_ref(), caption_llm: caption_llm.as_ref() }; (line 354-361)
  • struct ImagePipeline<'a> { ocr_engine: Option<&'a OllamaVisionOcr>, caption_llm: Option<...> } (line 756-758)
  • PR #187 후 ImagePipeline.extractor field 제거됨 (sub-item 3 의 Option c)
  • fn ingest_one_image_asset(..., image_pipeline: &ImagePipeline<'_>) -> ... (line 1227)
  • ingest flow (line 1283-1336):
    1. let ctx = ExtractContext { asset: &asset, ... }; (1283)
    2. let mut canonical = app.extract_for(&asset.media_type, &ctx, &bytes)?; (1296, PR #187 적용)
    3. if image_pipeline.ocr_engine.is_some() { apply_ocr(&mut canonical, ocr_engine, &bytes)? } (1308)
    4. if image_pipeline.caption_llm.is_some() { apply_caption(&mut canonical, caption_llm, &bytes)? } (1326)
    5. chunker (kebab-chunk::image_meta_v1) → embedder

crates/kebab-app/src/app.rs:227:

  • Box::new(ImageExtractor::new()),App.extractors registry entry 1 of 11 (PR #187 의 polymorphic dispatch).

crates/kebab-config/src/lib.rs (image config):

  • image.ocr.enabled (default false, opt-in) — line 1194 assertion
  • image.ocr.engine = "ollama-vision" (default) — line 1195
  • image.ocr.modelgemma4:e4b (user config default 추정, env override KEBAB_IMAGE_OCR_MODEL)
  • image.ocr.languages = ["eng", "kor"] (default)
  • image.ocr.max_pixels = 1600 (default)
  • image.caption.enabled (default false, opt-in) — line 1406
  • image.caption.max_pixels = 768 (default) — line 1407
  • image.caption.prompt_template_version = "caption-v1"

crates/kebab-chunk/src/image_meta_v1.rs (chunker):

  • chunker_version = "image-meta-v1"
  • 1 image-meta block + (optional) 1 OCR text block + (optional) 1 caption block 을 chunk 로 변환

1.6.2 PDF — 구현된 기능

crates/kebab-parse-pdf/src/lib.rs:

  • pub const PARSER_VERSION: &str = "pdf-text-v1"; (line 32)
  • pub struct PdfTextExtractor; (line 37, unit struct)
  • impl PdfTextExtractor (line 39-43): pub fn new() -> Self { Self }
  • impl Default for PdfTextExtractor (line 45)
  • impl Extractor for PdfTextExtractor (line 51-200+):
    • fn supports(&self, m: &MediaType) -> bool { matches!(m, MediaType::Pdf) }
    • fn parser_version(&self) -> ParserVersion { ParserVersion("pdf-text-v1".to_string()) }
    • fn extract(&self, ctx: &ExtractContext<'_>, bytes: &[u8]) -> Result<CanonicalDocument>:
      • lopdf::Document::load_mem(bytes)? — catastrophic decode guard (corrupt PDF 거부)
      • encrypted PDF 거부: anyhow::bail!("encrypted PDF; remove encryption (e.g. qpdf --decrypt) before ingest")
      • info::extract_info(&pdf_doc) — Title / Author / Subject / Keywords 등 PDF metadata
      • pdf_doc.get_pages() BTreeMap (1-based, deterministic ordering)
      • per-page text 추출 + per-page ProvenanceEvent
      • return Vec<Block> (1 page = 1 block, text-only)

crates/kebab-parse-pdf/src/info.rs:

  • pub fn extract_info(pdf_doc: &lopdf::Document) -> Map<String, Value> — PDF metadata 추출 helper.

crates/kebab-app/src/lib.rs (pdf ingest wiring):

  • use kebab_parse_pdf::PdfTextExtractor; (line 54)
  • fn ingest_one_pdf_asset(...) (line ~1699)
  • ingest flow (line 1777-1850):
    1. let mut canonical = app.extract_for(&asset.media_type, &ctx, &bytes)?; (1783, PR #187 적용)
    2. chunker (kebab-chunk::pdf_page_v1) → embedder
  • PDF 의 OCR / caption / image extract / table extract 모두 미구현 (TODO #1, #2, #4 의 대상)

crates/kebab-app/src/app.rs:228:

  • Box::new(PdfTextExtractor::new()),App.extractors registry entry 2 of 11 (PR #187 의 polymorphic dispatch).

crates/kebab-chunk/src/pdf_page_v1.rs (chunker):

  • chunker_version = "pdf-page-v1" (verified line 303 의 ParserVersion("pdf-text-v1".into()) 매칭)
  • per-page chunk (1 page = 1 chunk, large page 는 token budget 으로 분할 가능)

1.6.3 Image + PDF 의 polymorphic dispatch (PR #187 적용)

crates/kebab-app/src/app.rs (registry init):

// crates/kebab-app/src/app.rs:225-240 (Step 3 of PR #187 plan)
let extractors: Vec<Box<dyn Extractor + Send + Sync>> = vec![
    Box::new(ImageExtractor::new()),         // entry 1
    Box::new(PdfTextExtractor::new()),       // entry 2
    Box::new(RustAstExtractor::new()),       // entry 3
    Box::new(PythonAstExtractor::new()),     // entry 4
    Box::new(TypescriptAstExtractor::new()), // entry 5
    Box::new(JavascriptAstExtractor::new()), // entry 6
    Box::new(GoAstExtractor::new()),         // entry 7
    Box::new(JavaAstExtractor::new()),       // entry 8
    Box::new(KotlinAstExtractor::new()),     // entry 9
    Box::new(CAstExtractor::new()),          // entry 10
    Box::new(CppAstExtractor::new()),        // entry 11
];

crates/kebab-app/src/app.rs (App.extract_for helper):

// pub(crate) fn extract_for(...)
pub(crate) fn extract_for(
    &self,
    media: &MediaType,
    ctx: &ExtractContext<'_>,
    bytes: &[u8],
) -> anyhow::Result<CanonicalDocument> {
    self.extractors
        .iter()
        .find(|e| e.supports(media))
        .ok_or_else(|| anyhow::anyhow!("no matching extractor for media {:?}", media))?
        .extract(ctx, bytes)
}

crates/kebab-app/src/app.rs (in-crate unit test, mod tests_extractor_dispatch):

  • registry length = 11 test
  • mutually-exclusive supports() grid 16 sample test
  • extract_for "no matching extractor" error path (Audio(Wav) MediaType)

1.6.4 Image + PDF 의 wire schema (현재 v0.19.0)

  • ingest_progress.v1: media field 가 "image" / "pdf" 명시. asset_started / asset_finished event 가 chunks count 명시.
  • ingest_report.v1.IngestItem: parser_version ("image-meta-v1" / "pdf-text-v1"), chunker_version ("image-meta-v1" / "pdf-page-v1"), block_count, chunk_count, warnings 모두 carry.
  • search_response.v1.SearchHit.chunker_version: 정확 dispatch ("image-meta-v1" / "pdf-page-v1").
  • citation.v1.citation: kind: "image" / "pdf", path, line_start/line_end (PDF 의 경우 page 번호).
  • chunk_inspection.v1.canonical_document: parser_version, last_chunker_version, last_embedding_version, blocks, provenance.events 모두 dump.

1.6.5 MediaType / dispatch boundary

crates/kebab-core/src/media.rs:

  • pub enum MediaType { Markdown, Pdf, Image(ImageType), Audio(AudioType), Code(String), Other }
  • ImageType variant: Jpeg / Png / Webp / Heic / Gif / Bmp / Tiff / Svg 등 (실 enum 확인 필요)
  • AudioType — P8 deferred, 현재 production caller 0

crates/kebab-source-fs/src/media.rs (file extension → MediaType detect):

  • image_extensions (jpg / jpeg / png / webp / heic / heif / gif / bmp / tiff / svg)
  • pdf_extension (pdf only)
  • 다른 extension 은 markdown / code / other 로 dispatch

1.6.6 Dead struct 3 — future surface (PR #186 머지 후 보존)

crates/kebab-parse-md/src/types.rs (sub-item 2 의 absorption 후):

  • pub struct ParsedImageRegion; (line ~85, production caller 0 — future surface for TODO #2 multi-region)
  • pub struct ParsedPdfPage { ... } (line ~88, production caller 0 — future surface for TODO #3 PDF normalize integration)
  • pub struct ParsedAudioSegment { ... } (line ~94, production caller 0 — future surface for P8 audio parser)

세 struct 의 정의는 모두 kebab-parse-md 안에 정확히 보존됨. v0.20.0 의 TODO #2 / #3 에서 첫 production caller 등장 시 spec § 11 의 raison d'être 재활성화.


2. v0.20.0 의 TODO (우선순위 순)

2.1 TODO #1 — PDF scanned OCR path (HIGHEST PRIORITY — 사용자 P9 의 책+PDF 위주 정합)

문제: 현재 PdfTextExtractor (crates/kebab-parse-pdf/src/lib.rs) 는 lopdf 기반 text-only — embedded text 없는 scanned PDF (book scan / 논문 scan / 영수증 scan) 는 빈 chunk → search 결과 0.

Scope:

  • Scanned PDF detect: per-page text 추출량이 threshold (e.g. 50 char / page) 미만이면 scanned 로 판정
  • Render scanned page → image (pdfium-render 또는 mupdf-rs 등 의존성 검토)
  • Render 된 image 를 OllamaVisionOcr 으로 OCR → text block
  • pdf-page-v1 chunker 가 OCR text 도 cover

Affected files:

  • crates/kebab-parse-pdf/src/lib.rs — PdfTextExtractor 의 scanned detect logic
  • crates/kebab-parse-pdf/Cargo.toml — pdfium-render 또는 mupdf 의존성 추가
  • crates/kebab-parse-pdf/src/scanned_ocr.rs (신규) — render + OCR pipeline
  • crates/kebab-app/src/lib.rs:1696-1850 — ingest_one_pdf_asset 의 scanned-path 분기 또는 PdfTextExtractor 내부 통합
  • config: pdf.ocr.enabled + pdf.ocr.engine (image.ocr 와 sibling)

Risk: PDF rendering 의존성 (pdfium 등) 의 binary size + cross-platform.

Trigger 조건: 책 / 논문 PDF 가 indexed 됐는데 search 결과 0 인 경우 (사용자 dogfood 시 실측).

2.2 TODO #2 — Multi-region image dispatch (medium priority)

문제: 현재 ImageExtractor 는 한 이미지 = 1 image-meta block + (optional) 1 OCR text block + (optional) 1 caption block. 이미지 안의 multi-region (다른 text bbox / face bbox / figure bbox 등) 분리 0.

Scope:

  • ParsedImageRegion struct (이미 crates/kebab-parse-md/src/types.rs:85 정의됨, production caller 0)
  • ImageExtractor::extractVec<ParsedImageRegion> emit 으로 변경 (multi-region detect)
  • region 분리 source: OCR bounding box (Tesseract 또는 PaddleOCR layout detection)
  • kebab-parse-md::build_canonical_document_from_image_regions(...) lift 신설
  • region-별 chunk → search granularity 향상

Affected files:

  • crates/kebab-parse-image/src/lib.rs — ImageExtractor 의 emit 변경
  • crates/kebab-parse-image/src/regions.rs (신규) — multi-region detection logic
  • crates/kebab-parse-md/src/normalize.rsbuild_canonical_document_from_image_regions 신설
  • crates/kebab-chunk/src/image_region_v1.rs (신규) — region-별 chunker
  • crates/kebab-app/src/lib.rs:1209-1336 — ingest_one_image_asset 변경

Risk: region detection 의 false positive (overlap region 다중 detection, chunker dedup 필요).

Trigger 조건: image multi-region 사용 사례 (예: 명함 scan 의 name region + email region 분리 search).

2.3 TODO #3 — PDF normalize integration (medium priority)

문제: 현재 PdfTextExtractor 가 CanonicalDocument 직접 emit (normalize 우회). cross-page reference + page-level metadata 의 doc-level aggregation 어려움.

Scope:

  • ParsedPdfPage struct (이미 crates/kebab-parse-md/src/types.rs:88 정의됨, production caller 0)
  • PdfTextExtractor::extractVec<ParsedPdfPage> emit 으로 변경
  • kebab-parse-md::build_canonical_document_from_pdf_pages(...) lift 신설 — page-별 provenance + cross-page reference graph + page-level doc-summary
  • pdf-page-v1 chunker 가 page-level metadata 도 cover

Affected files:

  • crates/kebab-parse-pdf/src/lib.rs — PdfTextExtractor emit 변경
  • crates/kebab-parse-md/src/normalize.rsbuild_canonical_document_from_pdf_pages 신설
  • crates/kebab-chunk/src/pdf_page_v1.rs — chunker 의 multi-page handling
  • crates/kebab-app/src/lib.rs:1696-1850 — ingest_one_pdf_asset 의 lift path

Risk: cross-page reference graph 의 complexity (page N → page M 의 figure 참조). false positive 시 doc-summary 가 spurious entry 가짐.

Trigger 조건: PDF cross-page navigation (e.g. "page 7 의 Figure 3 이 page 12 에서 설명됨") 의 search/ask 필요 시.

2.4 TODO #4 — Per-page image / table extraction (low-medium priority)

문제: PDF 안의 embedded image / table 추출 0. text-only flow.

Scope:

  • PDF page 안의 figure / table extract (lopdf 의 image stream 추출 + table detect)
  • Figure: image block 으로 변환 → ParsedImageRegion (TODO #2 의존)
  • Table: structured Block (예: markdown table 또는 CSV-like) 으로 변환
  • pdf-table-v1 chunker (신규)

Affected files:

  • crates/kebab-parse-pdf/src/figure.rs + table.rs (신규)
  • crates/kebab-chunk/src/pdf_table_v1.rs (신규)

Risk: PDF figure / table 의 다양한 encoding (vector vs raster, table 의 cell merging) — robust detection 어려움.

Trigger 조건: 논문 / 보고서 PDF 의 figure / table search 필요 (현재 사용자 use case 우선순위 P9 의 책 PDF 와 정합).

2.5 TODO #5 — OCR / caption 의 Extractor trait 통합 (low priority)

문제: 현재 apply_ocr / apply_caption (crates/kebab-parse-image/src/{ocr,caption}.rs) 가 별 free function. Extractor trait 미사용. app.extract_for(image, ...) 가 base ImageExtractor 만 호출 + OCR/caption 은 callsite (ingest_one_image_asset) 후처리.

Scope (옵션 A): OCR / caption 을 별 Enricher trait 으로 모델링. App.enrichers: Vec<Box<dyn Enricher + Send + Sync>> registry 신설. app.enrich_for(doc, &media_type, &bytes) polymorphic call.

Scope (옵션 B): OCR / caption 도 Extractor 로 통합 — OcrEnricherExtractor + CaptionEnricherExtractor 가 Extractor::supports() 별 dispatch + Extractor::extract() 가 doc 의 일부만 변경. 단 spec contract violation 가능.

Affected files:

  • crates/kebab-parse-image/src/ocr.rs + caption.rs — Enricher trait impl 추가
  • crates/kebab-core/src/traits.rs — Enricher trait 정의 (옵션 A)
  • crates/kebab-app/src/app.rsenrichers field + enrich_for helper
  • crates/kebab-app/src/lib.rs:1209-1336 — ingest_one_image_asset 의 enrichment dispatch

Risk: trait 신설 = design §7.2 갱신 동반 → frozen contract 변경 → workspace.version bump trigger.

Trigger 조건: 새 enrichment 추가 시 (e.g. embedding-based image classification, depth estimation, OCR + translate)에 polymorphic registry 가 절실해질 때.

2.6 TODO #6 — MarkdownExtractor 신설 (low priority, sub-item 3 §11 future-work)

문제: 현재 markdown 만 free function path (parse_frontmatter + parse_blocks + build_canonical_document). 다른 4 parser (pdf/image/code) 는 Extractor impl 보유. markdown 만 outer 4-arm match 의 markdown arm 에서 free function 호출.

Scope:

  • kebab-parse-md::MarkdownExtractor struct + Extractor trait impl
  • supports(MediaType::Markdown)
  • extract(ctx, bytes) -> CanonicalDocument 내부에서 parse_frontmatter + parse_blocks + build_canonical_document 통합 호출
  • App.extractors registry 의 12 entry (markdown 추가) — extract_for 가 markdown 도 dispatch

Challenge (sub-item 3 의 round 1 critic 발견): markdown path 의 warning channel handover. 현재 parse_frontmatter + parse_blocksVec<Warning>ingest_one_markdown_asset 안 별도 snap → IngestItem.warnings (wire ingest_report.v1) 로 흐름. MarkdownExtractor::extract(&ctx, &bytes) -> Result<CanonicalDocument> signature 가 별 channel 부재 → wire schema 영향 가능.

Affected files:

  • crates/kebab-parse-md/src/extractor.rs (신규)
  • crates/kebab-parse-md/src/lib.rspub mod extractor; + pub use crate::extractor::MarkdownExtractor;
  • crates/kebab-app/src/app.rs:225-240 — registry entry 추가
  • crates/kebab-app/src/lib.rs:1083-1170 — markdown ingest path 의 helper signature 변경 폭 정확 measurement

Risk: wire schema (ingest_report.v1.IngestItem.warnings) 변경 위험. wire schema additive minor bump v1.x 또는 trait signature 변경 v2.

Trigger 조건: markdown 의 polymorphic dispatch 가 절실해지는 use case (예: 새 markdown variant — kramdown / CommonMark + extensions).

2.7 TODO #7 — Chunker dispatch unification (low priority, sub-item 3 §11 future-work)

문제: 현재 Chunker trait 에 supports() method 없음. kebab-app/src/lib.rs 의 chunker dispatch (parser_version 11-arm + chunker_version 11-arm + tier3_fallback_cv 2-arm + chunk dispatch 13-arm) 가 inner match.

Scope:

  • Chunker trait 에 supports(media: &MediaType) -> bool 또는 supports(chunker_version: &str) -> bool method 추가
  • App.chunkers: Vec<Box<dyn Chunker + Send + Sync>> registry
  • app.chunk_for(...) polymorphic call
  • inner 4-arm match 제거

Affected files:

  • crates/kebab-core/src/traits.rs — Chunker trait 의 method 추가
  • 모든 Chunker impl (15곳, kebab-chunk 안 md-heading / pdf-page / 9 code-ast / code-text-paragraph / manifest-file / dockerfile-file / k8s-manifest-resource) 의 supports() 추가
  • crates/kebab-app/src/app.rs — chunkers field + chunk_for helper
  • crates/kebab-app/src/lib.rs:1935-2128 — 4 inner match 제거

Risk: design §7.2 갱신 동반 (frozen contract 변경, workspace.version bump).

Trigger 조건: 새 chunker variant 추가 시 (e.g. semantic chunker, sentence-level chunker) 의 polymorphic registry 가 절실해질 때.

2.8 TODO #8 — outer 4-arm match 통합 (low priority, sub-item 3 §11)

문제: ingest_one_asset (lib.rs:961-1040) 의 match &asset.media_type 4-arm (markdown / pdf / image / code) 이 hardcoded helper 호출 (ingest_one_markdown_asset / ingest_one_pdf_asset / ingest_one_image_asset / ingest_one_code_asset).

Scope:

  • 각 medium 별 helper 의 unified signature 도입 — IngestEnv 같은 context object 로 helper 의 signature 통일
  • app.ingest_one(...) polymorphic call

Risk: helper 별 다른 enrichment (image 의 OCR/caption + pdf 의 lopdf + code 의 lang detect) 의 통합 어려움 → 큰 refactor.

Trigger 조건: 새 medium 추가 시 (e.g. audio P8 또는 video P+) 의 통합 dispatch 가 절실해질 때.


3. 우선순위 + Sequencing 권장

순서 TODO 효과 의존성
1 TODO #1 (PDF scanned OCR) 사용자 P9 책+PDF use case 직접 unblock OllamaVisionOcr (이미 구현됨)
2 TODO #3 (PDF normalize integration) cross-page reference + doc-summary 가능 TODO #1 와 분리 가능
3 TODO #2 (multi-region image) image search granularity 향상 OCR bbox detection 필요
4 TODO #4 (PDF figure/table) 논문/보고서 PDF 의 figure/table search TODO #2 + #3
5 TODO #5 (OCR/caption Enricher trait) architecture cleanup TODO #1, #2 의 enrichment 가 stable 후
6 TODO #6 (MarkdownExtractor 신설) dispatch unification wire schema 변경 필요
7 TODO #7 (Chunker dispatch) dispatch unification design §7.2 갱신
8 TODO #8 (outer 4-arm 통합) full polymorphic dispatch TODO #6 + #7

권장: TODO #1 + #3 을 v0.20.0 의 첫 두 sub-item 으로 시작 (사용자 use case 직접 unblock). TODO #2 + #4 는 v0.20.0 의 sub-item 3, 4. TODO #5 ~ #8 은 v0.21+ defer (sub-item 3 의 §11 future-work 에서 이미 별 PR defer 명시).


4. 새 session 의 첫 단계 (제안)

  1. 상태 확인:

    • git status + git log --oneline -5 확인 (HEAD = c1e82cc 또는 그 이후)
    • cat ~/.claude/projects/-home-altair823-kebab/memory/MEMORY.md
    • cat docs/superpowers/handoffs/2026-05-26-v0.20-image-pdf-normalize-handoff.md (본 문서)
  2. 사용자에게 첫 sub-item 선택 확인:

    • "TODO #1 (PDF scanned OCR) 부터 시작?" — 가장 사용자 use case 직접
    • 또는 "다른 우선순위?"
  3. 선택된 TODO 의 spec drafting:

    • 새 team 생성 (gitea-pr workflow 의 branch convention 따라)
    • planner (opus) spawn
    • sub-item 1/2/3 의 spec/plan 패턴 그대로
    • critic round 1 = opus, round 2+ = sonnet (model routing 정책)
  4. spec → plan → executor → PR + 리뷰 루프 진행 (이전 sub-item 패턴).


5. 핵심 파일 / 자료 / 의존성 위치

Codebase 위치 (변경 대상)

  • crates/kebab-parse-image/src/lib.rs — ImageExtractor (line 55)
  • crates/kebab-parse-image/src/ocr.rs — OllamaVisionOcr + apply_ocr
  • crates/kebab-parse-image/src/caption.rs — apply_caption
  • crates/kebab-parse-pdf/src/lib.rs — PdfTextExtractor (line 37)
  • crates/kebab-parse-md/src/types.rs — ParsedImageRegion + ParsedPdfPage + ParsedAudioSegment (dead struct 3, future surface)
  • crates/kebab-parse-md/src/normalize.rs — build_canonical_document (line 60)
  • crates/kebab-core/src/traits.rs — Extractor + Chunker trait (line 115-132)
  • crates/kebab-app/src/app.rs:225-240 — App.extractors registry (sub-item 3 의 11 entry)
  • crates/kebab-app/src/lib.rs:961-1040 — ingest_one_asset outer 4-arm match
  • crates/kebab-app/src/lib.rs:1209-1336 — ingest_one_image_asset (OCR / caption flow)
  • crates/kebab-app/src/lib.rs:1696-1850 — ingest_one_pdf_asset
  • crates/kebab-chunk/src/pdf_page_v1.rs — PDF chunker
  • crates/kebab-config/src/lib.rs:861-1430 — image.ocr / image.caption config

Spec / Plan / Doc 위치 (참조 대상)

  • docs/superpowers/specs/2026-04-27-kebab-final-form-design.md — frozen design contract (§3.7b 4-단락 재작성, §7.2 trait, §8 dep graph)
  • docs/superpowers/specs/2026-05-26-normalize-absorption-spec.md — sub-item 2 의 §11 future-work
  • docs/superpowers/specs/2026-05-26-extractor-dispatch-unification-spec.md — sub-item 3 의 §11 future-work
  • docs/superpowers/plans/2026-05-26-*-plan.md — 3 sub-item 의 plan
  • tasks/INDEX.md — "Future work / deferred" 섹션 (sub-item 2 머지 시 신설)
  • tasks/HOTFIXES.md — design deviation entry (sub-item 2)
  • docs/SMOKE.md — 도그푸딩 절차
  • docs/ARCHITECTURE.md — crate dependency graph + locked-in decisions
  • HANDOFF.md — phase-level progress dashboard
  • README.md — end-user facing surface

외부 의존성 후보 (TODO #1 의 PDF rendering)

  • pdfium-render crate — Chrome PDFium binding, robust
  • mupdf crate — MuPDF binding, lighter
  • lopdf (이미 사용) 의 image stream 추출 (TODO #4)

6. 모델 routing 정책 (사용자 결정 적용)

새 session 의 모든 teammate spawn 시:

Phase Role Model
Phase A planner (spec drafter) opus
Phase A critic round 1 (thorough) opus
Phase A critic round 2+ (closure verify) sonnet
Phase B planner (plan decompose) opus
Phase B critic-plan + verifier-plan round 1 opus
Phase B critic-plan + verifier-plan round 2+ sonnet
Phase C executor opus
Phase D team-lead (self)

memory feedback_teammate_model_routing 참조.


7. release 계획

  • 현 v0.19.0 binary 가 main HEAD c1e82cc 기반 (PR #185 + #186 + #187 머지 후). 도그푸딩 완료.
  • gitea-release v0.19.0 진행 권장 (sub-item 1/2/3 통합) — 사용자가 release 명시 안 했으면 본 session 의 v0.20.0 sub-item 시작 전에 release tag 컷.
  • v0.20.0 의 TODO #1, #3 머지 후 다음 release 검토.

8. 검증 invariant (모든 v0.20.0 sub-item 의 acceptance)

  • workspace test 회귀 0 (baseline = 1316 + 새 sub-item 의 신규 test 만큼 + delta)
  • wire schema 변경 0 또는 minor additive bump (*.v1)
  • design contract 변경 시 frozen task spec 모두 갱신 또는 HOTFIXES live source 추가
  • v0.20.0 minor bump (frozen design contract 변경 시) 또는 patch bump (refactor only)
  • 도그푸딩 KB 의 byte-identical 결과 보존 (success path)
  • cargo clippy --workspace --all-targets -- -D warnings clean
  • cargo build --release -p kebab-cli -j 4 clean

9. 본 handoff 문서의 위치 + 의도

docs/superpowers/handoffs/2026-05-26-v0.20-image-pdf-normalize-handoff.md

새 Claude session 이 본 file 만 read 해도 sub-item 시작 가능. 이 session 의 모든 context (sub-item 1/2/3 결과 + 도그푸딩 + user memory + OMC workflow) 가 self-contained.

새 session 시작 시:

cat /home/altair823/kebab/docs/superpowers/handoffs/2026-05-26-v0.20-image-pdf-normalize-handoff.md

후 사용자에게 첫 sub-item 우선순위 확인.