Files

altair823 f9714aa5cb docs(rename): kb → kebab — README, tasks/, docs/, design doc, report

마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-02 04:01:55 +00:00

6.5 KiB

Raw Blame History

phase, component, task_id, title, status, depends_on, unblocks, contract_source, contract_sections

phase

component

task_id

title

status

depends_on

unblocks

contract_source

contract_sections

kebab-parse-image (OCR adapter)

p6-2

OcrEngine trait + Tesseract adapter (Apple Vision feature-gated)

planned

p6-1

p6-3

../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md

§3.4 ImageRefBlock.ocr

§3.7a OcrText/OcrRegion

§9.1 OCR vs caption provenance

p6-2 — OCR adapter

Goal

Define OcrEngine trait + a Tesseract-backed default implementation. Populate ImageRefBlock.ocr with OcrText { joined, regions, engine, engine_version }. Provide an apple-vision feature gate that switches to a sidecar binary on macOS.

Why now / why this size

Strict separation of OCR (observed text) from caption (model-generated). Confining engine choice to a single trait + adapter lets us swap to Apple Vision or PaddleOCR without touching the extractor or chunker.

Allowed dependencies

kebab-core
kebab-config
kebab-parse-image (consumes its types)
tesseract = "0.13" (feature tesseract, default ON)
For feature apple-vision: std::process::Command only (sidecar binary, not a Rust dep)
serde, serde_json
image
tracing
thiserror

Forbidden dependencies

kebab-source-fs, kebab-parse-md, kebab-normalize, kebab-chunk, kebab-store-*, kebab-embed*, kebab-search, kebab-llm*, kebab-rag, kebab-tui, kebab-desktop

Inputs

input	type	source
image bytes	`&[u8]`	from extractor
optional language hint	`kebab_core::Lang`	metadata
`kebab-config` OCR settings	engine name, languages	runtime

Outputs

output	type	downstream
`OcrText`	`kebab_core::OcrText`	merged into `ImageRefBlock.ocr`

Public surface (signatures only — no new types)

pub trait OcrEngine: Send + Sync {
    fn engine_name(&self) -> &'static str;
    fn engine_version(&self) -> String;
    fn recognize(&self, image_bytes: &[u8], lang_hint: Option<&kebab_core::Lang>) -> anyhow::Result<kebab_core::OcrText>;
}

pub struct TesseractOcr { /* internal: lazy api handle */ }
impl TesseractOcr { pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>; }
impl OcrEngine for TesseractOcr { /* per trait */ }

#[cfg(feature = "apple-vision")]
pub struct AppleVisionOcr { /* sidecar path */ }
#[cfg(feature = "apple-vision")]
impl OcrEngine for AppleVisionOcr { /* per trait */ }

pub fn apply_ocr(
    engine: &dyn OcrEngine,
    image_bytes: &[u8],
    block: &mut kebab_core::ImageRefBlock,
    lang_hint: Option<&kebab_core::Lang>,
) -> anyhow::Result<()>;

Behavior contract

Tesseract:
- Languages from config.ocr.languages (default ["eng", "kor"]).
- Recognition produces OcrRegion { bbox: (x, y, w, h), text, confidence } for each "word" or "line" (configurable; default "line").
- Drop regions with confidence < config.ocr.min_confidence (default 60.0). If all dropped, return OcrText { joined: "", regions: vec![], engine, engine_version }.
- joined = regions.iter().map(|r| r.text).join(" ") (no smart layout reconstruction in v1).
- engine = "tesseract", engine_version = <tesseract version string>. The tesseract crate (0.13+) does NOT expose a stable Rust version() accessor. Use one of: (a) call libtesseract's TessVersion() via the bundled FFI surface, OR (b) at adapter construction, shell-out tesseract --version once and cache the parsed "5.3.4"-style string. Both are deterministic for a fixed install. Pin the chosen approach in the implementation PR.
Apple Vision sidecar (feature apple-vision):
- Spawn a small Swift binary kebab-vision-ocr (path from config.ocr.apple_vision_binary) feeding the image via stdin and reading JSON { regions: [{x,y,w,h,text,confidence}, ...] } from stdout.
- Same threshold and joined rules as Tesseract. engine = "apple-vision", engine_version = sidecar's --version.
- This subagent task does NOT write the Swift sidecar; it only wires the Rust side. Document the expected sidecar interface in docs/spec/sidecar-vision.md (separate doc spec stub, optional).
apply_ocr calls engine.recognize, sets block.ocr = Some(text), and appends a Provenance::OcrApplied event in the caller's CanonicalDocument (caller responsibility — this task exposes a helper).
Streaming / large images: cap decoded image size at 8192×8192 before passing to OCR; downscale with image::imageops::resize if larger.
Trust: OcrText is observed text (high trust). Captions (ModelCaption) are NOT generated here.
Determinism: Tesseract is deterministic for a fixed input + fixed page-segmentation mode; apply_ocr asserts this by calling twice in dev tests. Apple Vision is also deterministic in practice but may vary across macOS versions; document this and accept.

Storage / wire effects

None.

Test plan

kind	description	fixture / data
unit	Tesseract recognizes English on `fixtures/image/hello-world.png` (joined contains "hello world")	fixture
unit	confidence threshold drops noise regions	fixture with low-quality text
unit	Korean text recognized when `kor` language enabled	`fixtures/image/안녕.png`
unit	empty result returns `OcrText { joined: "", regions: [], .. }` not error	`fixtures/image/no-text.png`
unit	`apply_ocr` mutates block.ocr from None → Some	inline
determinism	two runs of recognize on same input → identical OcrText	fixture
`#[cfg(feature = "apple-vision")]` smoke	sidecar invocation captured (mock binary echoes fixed JSON)	inline mock

All tests under cargo test -p kebab-parse-image ocr. Tesseract install required on CI host.

Definition of Done

cargo check -p kebab-parse-image --features tesseract passes
cargo test -p kebab-parse-image ocr passes
apple-vision feature compiles on macOS and gracefully no-ops on Linux
No imports outside Allowed dependencies
PR links design §3.4, §3.7a, §9.1

Out of scope

Caption (p6-3).
Visual embedding (P+).
Layout-aware reading order (P+).
PaddleOCR / EasyOCR adapters.

Risks / notes

Tesseract performance varies wildly with image quality; document min_confidence and default page-segmentation mode.
Apple Vision sidecar requires code signing for distribution; for v1 dev builds, accept unsigned binary from ~/.local/bin/kebab-vision-ocr.
Large image downscale loses small-text recognition; expose config.ocr.max_pixels so power users can tune.

6.5 KiB Raw Blame History Unescape Escape