feat(kebab-parse-image): P6-2 OCR adapter — Ollama-vision default #33

OcrCfg.endpoint: String 에서 빈 문자열 "" 을 "models.llm.endpoint 로 fallback" 시그널로 사용합니다. 의미가 "실제로 빈 endpoint" 와 "명시되지 않음" 을 구분 못 하는 string-typed boolean 입니다. Option<String> 으로 바꾸면 의도가 타입에 새겨집니다:

#[serde(default)]
pub endpoint: Option<String>,

TOML 측에선 endpoint 키 생략 → None, 명시 → Some(...). OllamaVisionOcr::new 의 fallback 로직도 match cfg.image.ocr.endpoint.as_deref() { Some(s) if !s.is_empty() => s, _ => &cfg.models.llm.endpoint } 로 더 정직해집니다.

사소하지만 P6-3 의 caption.endpoint 도 같은 길을 갈 텐데, 첫 케이스에서 컨벤션을 잡는 게 비용이 작습니다.

`OcrCfg.endpoint: String` 에서 빈 문자열 `""` 을 "`models.llm.endpoint` 로 fallback" 시그널로 사용합니다. 의미가 "실제로 빈 endpoint" 와 "명시되지 않음" 을 구분 못 하는 string-typed boolean 입니다. `Option<String>` 으로 바꾸면 의도가 타입에 새겨집니다: ```rust #[serde(default)] pub endpoint: Option<String>, ``` TOML 측에선 `endpoint` 키 생략 → `None`, 명시 → `Some(...)`. `OllamaVisionOcr::new` 의 fallback 로직도 `match cfg.image.ocr.endpoint.as_deref() { Some(s) if !s.is_empty() => s, _ => &cfg.models.llm.endpoint }` 로 더 정직해집니다. 사소하지만 P6-3 의 `caption.endpoint` 도 같은 길을 갈 텐데, 첫 케이스에서 컨벤션을 잡는 게 비용이 작습니다.

crates/kebab-parse-image/Cargo.toml

						
				@@ -24,0 +28,4 @@

				# rustls-tls) so both crates share the same TLS backend and the

				# transitive tokio runtime is brought in once.

				reqwest      = { version = "0.12", default-features = false, features = ["blocking", "json", "rustls-tls"] }

				base64       = "0.22"

claude-reviewer-01 commented

(칭찬 + 작은 권장) reqwest / base64 / wiremock / tokio / ab_glyph 의 feature flag 와 버전을 kebab-llm-local 과 의식적으로 맞춘 게 좋습니다 — TLS 백엔드 / 런타임이 워크스페이스에서 일관되게 한 번만 빌드됩니다.

같은 버전이 두 crate 에서 두 번 선언되어 있으니 Cargo.toml (workspace) 의 [workspace.dependencies] 로 끌어 올리는 follow-up 을 권장합니다. 본 PR 의 scope 는 아니지만, P6-3 caption adapter 도 같은 reqwest / base64 를 쓸 가능성이 커서 그때까지 묻어 두면 세 crate 분산이 됩니다.

(칭찬 + 작은 권장) `reqwest` / `base64` / `wiremock` / `tokio` / `ab_glyph` 의 feature flag 와 버전을 `kebab-llm-local` 과 의식적으로 맞춘 게 좋습니다 — TLS 백엔드 / 런타임이 워크스페이스에서 일관되게 한 번만 빌드됩니다. 같은 버전이 두 crate 에서 두 번 선언되어 있으니 `Cargo.toml` (workspace) 의 `[workspace.dependencies]` 로 끌어 올리는 follow-up 을 권장합니다. 본 PR 의 scope 는 아니지만, P6-3 caption adapter 도 같은 reqwest / base64 를 쓸 가능성이 커서 그때까지 묻어 두면 세 crate 분산이 됩니다.

						
				@@ -0,0 +300,4 @@

				/// Bypasses encode work when the source already fits — we simply pass

				/// the bytes through. PNG re-encode is only paid when downscaling is

				/// actually needed.

				fn downscale_to_long_edge(bytes: &[u8], max_long_edge: u32) -> Result<(Vec<u8>, u32, u32)> {

claude-reviewer-01 commented

downscale_to_long_edge 가 다운스케일이 필요한 케이스에서 동일 bytes 를 두 번 읽습니다 — 첫 번째 ImageReader 로 dims 를 알아내고 (소비), 그다음 새 ImageReader 를 만들어 decode() 합니다. 이미지가 큰 경우 (4000×3000) base64 + decode 가 메모리에 두 번 들어옵니다.

ImageReader::new(...).with_guessed_format()?.decode() 한 번으로 DynamicImage 를 얻은 다음 .dimensions() 로 (w, h) 를 빼고 같은 객체에서 .resize_exact() 까지 처리하면 한 번의 디코드로 끝납니다:

let img = ImageReader::new(Cursor::new(bytes))
    .with_guessed_format()
    .context("reading image header")?
    .decode()
    .context("decoding image")?;
let (w, h) = (img.width(), img.height());
let long = w.max(h);
if long <= max_long_edge {
    // 이미 PNG 인지 cheap-check 후 passthrough 또는 re-encode
    ...
}
let (new_w, new_h) = ...;
let resized = img.resize_exact(new_w, new_h, ...);

사이드 이펙트: PNG passthrough 단축 경로를 잃습니다 (현재는 PNG 이고 size OK 면 bytes.to_vec() 로 끝). passthrough 가 hot path 라면 첫 단계에서 format 만 sniff (with_guessed_format 만 호출, dims 는 안 봄) 한 다음 분기시키는 절충안도 있습니다. 둘 중 어느 쪽이든 "같은 bytes 를 두 번 디코드" 만 정리되면 충분합니다.

`downscale_to_long_edge` 가 다운스케일이 필요한 케이스에서 동일 bytes 를 두 번 읽습니다 — 첫 번째 `ImageReader` 로 dims 를 알아내고 (소비), 그다음 새 `ImageReader` 를 만들어 `decode()` 합니다. 이미지가 큰 경우 (4000×3000) base64 + decode 가 메모리에 두 번 들어옵니다. `ImageReader::new(...).with_guessed_format()?.decode()` 한 번으로 `DynamicImage` 를 얻은 다음 `.dimensions()` 로 (w, h) 를 빼고 같은 객체에서 `.resize_exact()` 까지 처리하면 한 번의 디코드로 끝납니다: ```rust let img = ImageReader::new(Cursor::new(bytes)) .with_guessed_format() .context("reading image header")? .decode() .context("decoding image")?; let (w, h) = (img.width(), img.height()); let long = w.max(h); if long <= max_long_edge { // 이미 PNG 인지 cheap-check 후 passthrough 또는 re-encode ... } let (new_w, new_h) = ...; let resized = img.resize_exact(new_w, new_h, ...); ``` 사이드 이펙트: PNG passthrough 단축 경로를 잃습니다 (현재는 PNG 이고 size OK 면 `bytes.to_vec()` 로 끝). passthrough 가 hot path 라면 첫 단계에서 format 만 sniff (`with_guessed_format` 만 호출, dims 는 안 봄) 한 다음 분기시키는 절충안도 있습니다. 둘 중 어느 쪽이든 "같은 bytes 를 두 번 디코드" 만 정리되면 충분합니다.

crates/kebab-parse-image/src/ocr.rs Outdated

						
				@@ -0,0 +361,4 @@

				struct OllamaGenerateRequest<'a> {

				    model: &'a str,

				    prompt: &'a str,

				    images: Vec<&'a str>,

claude-reviewer-01 commented

images: Vec<&'a str> 가 항상 길이 1 입니다. wire 형태도 "images": [<one>] 단일 원소 배열로 고정 — Vec 알로케이션 + vec![&b64] 매크로 호출이 함수 호출당 1회 발생합니다. [&'a str; 1] 로 박으면 알로케이션 없이 동일 JSON 출력이 나옵니다:

#[derive(Serialize)]
struct OllamaGenerateRequest<'a> {
    model: &'a str,
    prompt: &'a str,
    images: [&'a str; 1],
    stream: bool,
    options: OllamaOptions,
}
// 호출: images: [b64.as_str()],

향후 multi-image batch 가 필요해지면 그때 Vec 으로 되돌리면 됩니다 (트레이트 시그니처가 단일 이미지를 받기 때문에 batch 는 별개 RFC).

`images: Vec<&'a str>` 가 항상 길이 1 입니다. wire 형태도 `"images": [<one>]` 단일 원소 배열로 고정 — `Vec` 알로케이션 + `vec![&b64]` 매크로 호출이 함수 호출당 1회 발생합니다. `[&'a str; 1]` 로 박으면 알로케이션 없이 동일 JSON 출력이 나옵니다: ```rust #[derive(Serialize)] struct OllamaGenerateRequest<'a> { model: &'a str, prompt: &'a str, images: [&'a str; 1], stream: bool, options: OllamaOptions, } // 호출: images: [b64.as_str()], ``` 향후 multi-image batch 가 필요해지면 그때 `Vec` 으로 되돌리면 됩니다 (트레이트 시그니처가 단일 이미지를 받기 때문에 batch 는 별개 RFC).

						
				@@ -0,0 +379,4 @@

				    #[serde(default)]

				    error: Option<String>,

				    #[serde(flatten)]

				    _other: std::collections::HashMap<String, Value>,

claude-reviewer-01 commented

_other: HashMap<String, Value> + #[serde(flatten)] 가 어떤 호출자에서도 읽히지 않는 dead 필드입니다. OllamaGenerateResponse 의 다른 필드들이 ��두 #[serde(default)] 인데, serde 는 알 수 없는 키를 기본적으로 silently drop 하므로 capture-all 이 굳이 필요하지 않습니다. 통째로 제거해 wire type 을 더 가볍게:

#[derive(Deserialize)]
struct OllamaGenerateResponse {
    #[serde(default)]
    response: Option<String>,
    #[serde(default)]
    error: Option<String>,
}

그리고 serde_json::Value import 도 같이 정리할 수 있습니다.

`_other: HashMap<String, Value>` + `#[serde(flatten)]` 가 어떤 호출자에서도 읽히지 않는 dead 필드입니다. `OllamaGenerateResponse` 의 다른 필드들이 ��두 `#[serde(default)]` 인데, serde 는 알 수 없는 키를 기본적으로 silently drop 하므로 capture-all 이 굳이 필요하지 않습니다. 통째로 제거해 wire type 을 더 가볍게: ```rust #[derive(Deserialize)] struct OllamaGenerateResponse { #[serde(default)] response: Option<String>, #[serde(default)] error: Option<String>, } ``` 그리고 `serde_json::Value` import 도 같이 정리할 수 있습니다.

crates/kebab-parse-image/tests/common/mod.rs Outdated

						
				@@ -46,0 +64,4 @@

				/// `/usr/share/fonts/truetype/dejavu/`) — common across most Linux dev

				/// boxes. Falls back to a tiny built-in glyph map if the font is

				/// missing so the helper compiles even without DejaVu installed.

				pub fn hello_world_png() -> Vec<u8> {

claude-reviewer-01 commented

hello_world_png 가 /usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf 를 강제로 읽고 panic 메시지로 "DejaVu Sans Bold required for OCR integration fixture" 를 띄웁니다. 그런데 실제로 이 함수는 #[ignore] 가 붙은 통합 테스트 한 곳에서만 호출되고, 일반 hermetic 테스트는 절대 부르지 않습니다.

패닉 메시지가 "전반적으로 DejaVu 가 필요" 처럼 들려서 misleading 합니다. 두 가지 중 하나를 권장:

(선호) 함수가 Result<Vec<u8>, _> 를 반환하게 바꾸고, 통합 테스트가 폰트 부재 시 우아하게 skip:

pub fn hello_world_png() -> anyhow::Result<Vec<u8>> {
    let font_bytes = std::fs::read("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf")
        .context("DejaVu Sans Bold required only for the OCR integration fixture")?;
    ...
}

panic 메시지를 "DejaVu Sans Bold is required only for the OCR integration fixture" 로 다듬어 일반 사용자가 함수 호출시 헷갈리지 않게 만들기.

현재 통합 테스트가 dev 박스에서 동작 검증된 상태라 1번이 더 안전한 선택입니다 (다른 dev 박스의 폰트 부재가 즉시 panic 으로 가는 걸 방지).

`hello_world_png` 가 `/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf` 를 강제로 읽고 panic 메시지로 "DejaVu Sans Bold required for OCR integration fixture" 를 띄웁니다. 그런데 실제로 이 함수는 `#[ignore]` 가 붙은 통합 테스트 한 곳에서만 호출되고, 일반 hermetic 테스트는 절대 부르지 않습니다. 패닉 메시지가 "전반적으로 DejaVu 가 필요" 처럼 들려서 misleading 합니다. 두 가지 중 하나를 권장: 1. (선호) 함수가 `Result<Vec<u8>, _>` 를 반환하게 바꾸고, 통합 테스트가 폰트 부재 시 우아하게 skip: ```rust pub fn hello_world_png() -> anyhow::Result<Vec<u8>> { let font_bytes = std::fs::read("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf") .context("DejaVu Sans Bold required only for the OCR integration fixture")?; ... } ``` 2. panic 메시지를 "DejaVu Sans Bold is required only for the OCR integration fixture" 로 다듬어 일반 사용자가 함수 호출시 헷갈리지 않게 만들기. 현재 통합 테스트가 dev 박스에서 동작 검증된 상태라 1번이 더 안전한 선택입니다 (다른 dev 박스의 폰트 부재가 즉시 panic 으로 가는 걸 방지).

crates/kebab-parse-image/tests/ocr.rs

						
				@@ -0,0 +320,4 @@

				// ── from_parts construction ──────────────────────────────────────────────

				#[test]

				fn from_parts_clamps_max_pixels_into_legal_range() {

claude-reviewer-01 commented

from_parts_clamps_max_pixels_into_legal_range 테스트는 "패닉 안 났음" 만 검증합니다 — 코멘트도 "can't read the private field directly" 라 인정. 두 가지 정리 방향이 있습니다:

(선호) OllamaVisionOcr 에 pub fn max_pixels(&self) -> u32 같은 inspector accessor 를 추가해 실제 클램프 결과 (MIN_LONG_EDGE / MAX_LONG_EDGE 와 동일) 를 검증.
테스트 자체를 제거 — 클램프 동작은 이미 ocr_downscales_large_image_before_sending 가 max_pixels=1024 로 간접 검증하고 있고, 이 추가 테스트는 회귀 신호를 거의 못 줌.

어느 쪽이든 "패닉 안 났음 만 보는" 현 상태는 머지 전에 정리하면 좋겠습니다.

`from_parts_clamps_max_pixels_into_legal_range` 테스트는 "패닉 안 났음" 만 검증합니다 — 코멘트도 "can't read the private field directly" 라 인정. 두 가지 정리 방향이 있습니다: 1. (선호) `OllamaVisionOcr` 에 `pub fn max_pixels(&self) -> u32` 같은 inspector accessor 를 추가해 실제 클램프 결과 (`MIN_LONG_EDGE` / `MAX_LONG_EDGE` 와 동일) 를 검증. 2. 테스트 자체를 제거 — 클램프 동작은 이미 `ocr_downscales_large_image_before_sending` 가 max_pixels=1024 로 간접 검증하고 있고, 이 추가 테스트는 회귀 신호를 거의 못 줌. 어느 쪽이든 "패닉 안 났음 만 보는" 현 상태는 머지 전에 정리하면 좋겠습니다.

altair823 added 1 commit 2026-05-02 05:45:28 +00:00

review(p6-2): 회차 1 지적 반영 e869710d82

- crates/kebab-config/src/lib.rs:
  • `OcrCfg.endpoint: String` (\"\" sentinel) → `Option<String>` 으로 교체.
    `#[serde(default)]` 적용. `KEBAB_IMAGE_OCR_ENDPOINT=\"\"` (빈 값) 도
    None 으로 매핑하는 분기 추가.
  • 신규 회귀 테스트 `image_ocr_endpoint_empty_env_value_is_none`.
- crates/kebab-parse-image/src/ocr.rs:
  • `OllamaVisionOcr::new` 의 endpoint fallback 로직을 새 `Option<String>`
    스키마에 맞춰 정리 (`as_deref` + match).
  • `OllamaGenerateResponse` 의 dead `_other: HashMap<String, Value>` 필드
    제거. `serde_json::Value` import 도 같이 정리.
  • `OllamaGenerateRequest.images: Vec<&'a str>` → `[&'a str; 1]`
    (호출당 vec! 알로케이션 제거, multi-image 는 OcrEngine trait 가
    단일 이미지를 받으므로 OOS).
  • `downscale_to_long_edge` 단일-디코드로 리팩터. PNG passthrough
    hot path 보존 (header sniff 만으로 분기), 그 외 모든 경로는
    decode 1회 + (필요 시) resize + PNG re-encode 1회로 통일.
  • `pub fn max_pixels(&self) -> u32` accessor 추가 — clamp 결과
    검증 용 (단순 inspector).
- crates/kebab-parse-image/tests/ocr.rs:
  • `cfg_for_endpoint` / 통합 테스트가 `Some(endpoint)` 형태로 갱신.
  • `from_parts_clamps_max_pixels_into_legal_range` 가 새 accessor
    로 실제 클램프 결과 (256 / 4096 / 1024) 를 검증하도록 강화.
  • 통합 테스트가 폰트 부재 시 panic 대신 skip 하도록 분기.
- crates/kebab-parse-image/tests/common/mod.rs:
  • `hello_world_png` 가 `anyhow::Result<Vec<u8>>` 반환하도록 변경.
    expect(\"DejaVu Sans Bold required\") 메시지를 \"only the opt-in
    OCR integration fixture needs this font\" 로 의도 명확화.

cargo test -p kebab-parse-image — 28 pass + 1 ignored.
cargo test -p kebab-config — 21 pass (+1 회귀).
cargo clippy --workspace --all-targets -- -D warnings — pass.

Reviewer-suggested workspace.dependencies 통합 (reqwest / base64) 은
P6-3 와 함께 처리할 수 있도록 follow-up 으로 두고 본 PR scope 에서
제외 (회차 1 본문에서 명시).

claude-reviewer-01 requested changes 2026-05-02 05:47:09 +00:00

claude-reviewer-01 left a comment

회차 2 — 회차 1 지적 모두 정확히 반영되었습니다 (OcrCfg.endpoint Option 으로 교체, _other HashMap 제거, downscale 단일-디코드 + PNG passthrough 보존, images: [&'a str; 1], max_pixels accessor + 클램프 회귀 테스트, hello_world_png Result 반환). pre-P6 TOML 호환 + 빈 env value=None 분기 회귀 테스트도 같이 추가된 게 좋습니다.

남은 actionable 항목 4건:

from_parts 가 endpoint/model 빈 문자열 가드 없음 — new() 와 invariant 갈림.
max_pixels clamp 가 silent — 사용자가 "왜 항상 4096?" 디버깅하기 어려움. tracing::warn 권장.
downscale rounding 이 long-axis 에서 1px 초과 가능 — doc-comment 의 "at most" 와 어긋남.
통합 테스트 게이트 이중화 (#[ignore] + env var) — 둘 중 하나로 통일.

칭찬 — downscale_to_long_edge 의 단일-디코드 리팩터가 PNG passthrough hot path 를 보존하면서도 의도가 doc-comment 에 명시적으로 박혔습니다 ("PNG sources that already fit the cap are passthrough (zero decodes)"). 이 패턴은 P6-3 caption adapter 가 prompt cost 를 위해 같은 다운스케일을 할 때 그대로 재사용 가능합니다.

회차 2 — 회차 1 지적 모두 정확히 반영되었습니다 (`OcrCfg.endpoint` Option<String> 으로 교체, `_other` HashMap 제거, downscale 단일-디코드 + PNG passthrough 보존, `images: [&'a str; 1]`, `max_pixels` accessor + 클램프 회귀 테스트, `hello_world_png` Result 반환). pre-P6 TOML 호환 + 빈 env value=None 분기 회귀 테스트도 같이 추가된 게 좋습니다. 남은 actionable 항목 4건: - `from_parts` 가 endpoint/model 빈 문자열 가드 없음 — `new()` 와 invariant 갈림. - `max_pixels` clamp 가 silent — 사용자가 \"왜 항상 4096?\" 디버깅하기 어려움. tracing::warn 권장. - downscale rounding 이 long-axis 에서 1px 초과 가능 — doc-comment 의 \"at most\" 와 어긋남. - 통합 테스트 게이트 이중화 (`#[ignore]` + env var) — 둘 중 하나로 통일. 칭찬 — `downscale_to_long_edge` 의 단일-디코드 리팩터가 PNG passthrough hot path 를 보존하면서도 의도가 doc-comment 에 명시적으로 박혔습니다 (\"PNG sources that already fit the cap are passthrough (zero decodes)\"). 이 패턴은 P6-3 caption adapter 가 prompt cost 를 위해 같은 다운스케일을 할 때 그대로 재사용 가능합니다.

crates/kebab-parse-image/src/ocr.rs Outdated

						
				@@ -0,0 +147,4 @@

				        if model.is_empty() {

				            anyhow::bail!("OllamaVisionOcr: image.ocr.model is empty");

				        }

				        let max_pixels = ocr.max_pixels.clamp(MIN_LONG_EDGE, MAX_LONG_EDGE);

claude-reviewer-01 commented

max_pixels.clamp(MIN_LONG_EDGE, MAX_LONG_EDGE) 가 silent 입니다. 사용자가 image.ocr.max_pixels = 99999 를 지정하면 4096 으로 조용히 떨어지는데 — 디버깅 시 "왜 내 OCR 이 항상 4096 px 로 떨어지지?" 시나리오가 발생합니다. clamp 가 실제로 발동했을 때만 한 줄 경고:

let requested = ocr.max_pixels;
let max_pixels = requested.clamp(MIN_LONG_EDGE, MAX_LONG_EDGE);
if max_pixels != requested {
    tracing::warn!(
        target: "kebab-parse-image",
        "image.ocr.max_pixels = {requested} clamped to {max_pixels} (legal range [{MIN_LONG_EDGE}, {MAX_LONG_EDGE}])"
    );
}

from_parts 도 같은 패턴 적용. tracing 은 이미 의존성이라 비용 무시 가능.

`max_pixels.clamp(MIN_LONG_EDGE, MAX_LONG_EDGE)` 가 silent 입니다. 사용자가 `image.ocr.max_pixels = 99999` 를 지정하면 4096 으로 조용히 떨어지는데 — 디버깅 시 "왜 내 OCR 이 항상 4096 px 로 떨어지지?" 시나리오가 발생합니다. clamp 가 실제로 발동했을 때만 한 줄 경고: ```rust let requested = ocr.max_pixels; let max_pixels = requested.clamp(MIN_LONG_EDGE, MAX_LONG_EDGE); if max_pixels != requested { tracing::warn!( target: "kebab-parse-image", "image.ocr.max_pixels = {requested} clamped to {max_pixels} (legal range [{MIN_LONG_EDGE}, {MAX_LONG_EDGE}])" ); } ``` `from_parts` 도 같은 패턴 적용. tracing 은 이미 의존성이라 비용 무시 가능.

crates/kebab-parse-image/src/ocr.rs Outdated

						
				@@ -0,0 +163,4 @@

				    /// Build directly from explicit fields. Useful for tests that need

				    /// to point at a wiremock host without going through `Config`.

				    pub fn from_parts(

claude-reviewer-01 commented

from_parts 가 endpoint / model 의 빈 문자열 가드를 갖고 있지 않습니다. new() 는 빈 endpoint / 빈 model 을 모두 bail! 로 거절하지만 from_parts 는 어떤 입력이든 통과합니다. 두 생성자가 invariant 가 다르면 "테스트는 통과하지만 실서비스 코드는 panic" 같은 swarm 이 생기기 쉽습니다.

같은 가드를 추가하거나, from_parts doc-comment 에 "test-seam — 입력 검증을 caller 가 책임진다" 를 명시해 주세요. 둘 다 짧지만 첫 번째가 더 안전합니다.

`from_parts` 가 endpoint / model 의 빈 문자열 가드를 갖고 있지 않습니다. `new()` 는 빈 endpoint / 빈 model 을 모두 `bail!` 로 거절하지만 `from_parts` 는 어떤 입력이든 통과합니다. 두 생성자가 invariant 가 다르면 "테스트는 통과하지만 실서비스 코드는 panic" 같은 swarm 이 생기기 쉽습니다. 같은 가드를 추가하거나, `from_parts` doc-comment 에 "test-seam — 입력 검증을 caller 가 책임진다" 를 명시해 주세요. 둘 다 짧지만 첫 번째가 더 안전합니다.

						
				@@ -0,0 +340,4 @@

				    } else {

				        let scale = max_long_edge as f32 / long as f32;

				        let new_w = ((w as f32) * scale).round().max(1.0) as u32;

				        let new_h = ((h as f32) * scale).round().max(1.0) as u32;

claude-reviewer-01 commented

downscale 의 (w as f32 * scale).round() 두 축이 독립적으로 round 되어 long-edge 가 max_long_edge 를 1px 초과할 수 있는 코너 케이스가 있습니다. scale = max_long_edge / long 이 무리수 (예: max=1601, long=4001) 인 경우 long-axis 도 +1 px 로 round-up 가능. wire 측에선 1px 차이가 의미 없지만, doc-comment 가 "long edge is at most max_long_edge" 라고 단정하므로 실제 동작과 살짝 어긋납니다.

간단한 후행 클램프 한 줄로 invariant 정확히 보장:

let new_w = ((w as f32) * scale).round().max(1.0) as u32;
let new_h = ((h as f32) * scale).round().max(1.0) as u32;
let (new_w, new_h) = if w >= h {
    (new_w.min(max_long_edge), new_h)
} else {
    (new_w, new_h.min(max_long_edge))
};

또는 doc-comment 를 "approximately at most" 로 완화. 둘 중 어느 쪽이든.

downscale 의 `(w as f32 * scale).round()` 두 축이 독립적으로 round 되어 long-edge 가 `max_long_edge` 를 1px 초과할 수 있는 코너 케이스가 있습니다. `scale = max_long_edge / long` 이 무리수 (예: max=1601, long=4001) 인 경우 long-axis 도 +1 px 로 round-up 가능. wire 측에선 1px 차이가 의미 없지만, doc-comment 가 "long edge is at most max_long_edge" 라고 단정하므로 실제 동작과 살짝 어긋납니다. 간단한 후행 클램프 한 줄로 invariant 정확히 보장: ```rust let new_w = ((w as f32) * scale).round().max(1.0) as u32; let new_h = ((h as f32) * scale).round().max(1.0) as u32; let (new_w, new_h) = if w >= h { (new_w.min(max_long_edge), new_h) } else { (new_w, new_h.min(max_long_edge)) }; ``` 또는 doc-comment 를 "approximately at most" 로 완화. 둘 중 어느 쪽이든.

crates/kebab-parse-image/tests/ocr.rs

						
				@@ -0,0 +359,4 @@

				#[tokio::test]

				#[ignore = "hits a real Ollama daemon; gated behind KEBAB_OCR_INTEGRATION=1"]

				async fn ocr_integration_real_ollama_transcribes_text() {

				    if std::env::var("KEBAB_OCR_INTEGRATION").ok().as_deref() != Some("1") {

claude-reviewer-01 commented

2026-05-02 05:49:37 +00:00

통합 테스트가 두 겹의 게이트를 가집니다 — #[ignore] (cargo 가 default 로 skip) + 함수 내부의 KEBAB_OCR_INTEGRATION != "1" 조기 return. #[ignore] 만으로 일반 cargo test 는 이미 skip 되고, --ignored 로 실행할 때는 사용자가 명시적으로 의도를 표현한 거니 env var 추가 검증이 "수단의 중복" 처럼 보입니다.

둘 중 하나로 통일을 권장:

(선호) env var 검증 제거 — --ignored 가 곧 "실행 의도" 의 단일 신호.
#[ignore] 제거하고 env var 만으로 게이팅 — cargo test 실행 시점에 env 검사로 자동 skip.

둘 다 동작은 같지만, 1번이 다른 워크스페이스 통합 테스트 (kebab-llm-local) 의 #[ignore] 패턴과 일관됩니다.

통합 테스트가 두 겹의 게이트를 가집니다 — `#[ignore]` (cargo 가 default 로 skip) + 함수 내부의 `KEBAB_OCR_INTEGRATION != "1"` 조기 return. `#[ignore]` 만으로 일반 `cargo test` 는 이미 skip 되고, `--ignored` 로 실행할 때는 사용자가 명시적으로 의도를 표현한 거니 env var 추가 검증이 "수단의 중복" 처럼 보입니다. 둘 중 하나로 통일을 권장: 1. (선호) env var 검증 제거 — `--ignored` 가 곧 "실행 의도" 의 단일 신호. 2. `#[ignore]` 제거하고 env var 만으로 게이팅 — `cargo test` 실행 시점에 env 검사로 자동 skip. 둘 다 동작은 같지만, 1번이 다른 워크스페이스 통합 테스트 (`kebab-llm-local`) 의 `#[ignore]` 패턴과 일관됩니다.

altair823 added 1 commit 2026-05-02 05:48:26 +00:00

review(p6-2): 회차 2 지적 반영 2bede0030f

- src/ocr.rs:
  • `OllamaVisionOcr::new` 와 `from_parts` 의 입력 검증을 공통
    `fn build` 으로 통합. 두 생성자가 빈 endpoint / 빈 model /
    `max_pixels` 클램프 동일 invariant 를 공유 — \"테스트는 통과하지만
    프로덕션은 panic\" 분기 차단.
  • `max_pixels` clamp 가 실제로 발동 시 `tracing::warn!` 로 사유
    기록 (사용자가 \"왜 항상 4096?\" 디버깅 가능).
  • `downscale_to_long_edge` 의 long-axis 가 `f32` 라운딩으로 1px
    초과하는 코너 케이스 (예: max=1601, long=4001) 후행 클램프로
    엄격히 묶음. doc-comment 의 \"long edge is at most max_long_edge\"
    가 실제 동작과 정확히 일치.
- tests/ocr.rs:
  • 통합 테스트의 이중 게이트 (`#[ignore]` + `KEBAB_OCR_INTEGRATION=1`)
    제거. `--ignored` 만으로 실행 의도 단일 신호화 — `kebab-llm-local`
    의 통합 테스트 컨벤션과 일관됨. endpoint / model 의 env 오버라이드는
    유지.

cargo test -p kebab-parse-image — 28 pass + 1 ignored.
cargo test -p kebab-config — 21 pass.
cargo clippy --workspace --all-targets -- -D warnings — pass.

claude-reviewer-01 requested changes 2026-05-02 05:49:37 +00:00

claude-reviewer-01 left a comment

회차 3 — 회차 2 의 모든 지적이 정확히 반영되었습니다 (fn build 공용 검증, tracing::warn! clamp 사유, downscale long-axis 1px 후행 클램프, 통합 테스트 게이트 단일화). 동작 / 안전성 / 일관성 측면에서 더 손볼 곳이 보이지 않습니다.

남은 actionable 2건 (cosmetic / 회귀 신호 보강):

회차 2 에서 추가된 fn build 의 가드 (endpoint.is_empty(), model.trim().is_empty(), max_pixels 클램프) 가 회귀 테스트로 커버되지 않음 — bail 메시지 / 클램프 범위가 미래에 무심코 바뀌어도 그린 머지가 가능. 간단한 unit 테스트 3건 추가 권장.
lib.rs 의 모듈-레벨 doc 이 OCR 모듈을 한 줄로만 소개. trust 정책 ("OcrText.engine 으로 신뢰 분기") 이 워크스페이스의 핵심이라 lib.rs 부터 한 줄 노출되면 좋겠습니다.

칭찬 — fn build 로 new / from_parts 를 통합한 건 "두 생성자 invariant 갈림" 코드스멜의 정직한 처리였습니다. 향후 Tesseract / Apple Vision 어댑터가 같은 crate 에 합류해도 같은 패턴 (pub fn new(cfg) + pub fn from_parts(...) + fn build(...)) 으로 입력 검증이 한 곳에 있게 됩니다.

회차 3 — 회차 2 의 모든 지적이 정확히 반영되었습니다 (`fn build` 공용 검증, `tracing::warn!` clamp 사유, downscale long-axis 1px 후행 클램프, 통합 테스트 게이트 단일화). 동작 / 안전성 / 일관성 측면에서 더 손볼 곳이 보이지 않습니다. 남은 actionable 2건 (cosmetic / 회귀 신호 보강): - 회차 2 에서 추가된 `fn build` 의 가드 (`endpoint.is_empty()`, `model.trim().is_empty()`, `max_pixels` 클램프) 가 회귀 테스트로 커버되지 않음 — bail 메시지 / 클램프 범위가 미래에 무심코 바뀌어도 그린 머지가 가능. 간단한 unit 테스트 3건 추가 권장. - `lib.rs` 의 모듈-레벨 doc 이 OCR 모듈을 한 줄로만 소개. trust 정책 (\"OcrText.engine 으로 신뢰 분기\") 이 워크스페이스의 핵심이라 lib.rs 부터 한 줄 노출되면 좋겠습니다. 칭찬 — `fn build` 로 `new` / `from_parts` 를 통합한 건 \"두 생성자 invariant 갈림\" 코드스멜의 정직한 처리였습니다. 향후 Tesseract / Apple Vision 어댑터가 같은 crate 에 합류해도 같은 패턴 (`pub fn new(cfg)` + `pub fn from_parts(...)` + `fn build(...)`) 으로 입력 검증이 한 곳에 있게 됩니다.

crates/kebab-parse-image/src/lib.rs Outdated

						
				@@ -9,3 +12,3 @@

				//!

				//! Per design §3.4 (Block::ImageRef + ImageRefBlock), §3.7a (OcrText /

				//! ModelCaption stubs), §9.1 (image extraction policy), §9 (versioning).

				//! ModelCaption stubs), §9.1 (image extraction policy / OCR vs caption

claude-reviewer-01 commented

(작은 권장) lib.rs 의 모듈-레벨 doc-comment 가 OCR 어댑터를 소개할 때 "P6-2 adds the [ocr] module" 한 줄로만 짧게 끝납니다. OcrText 의 trust 정책 (관찰된 텍스트 vs 모델 생성) 이 본 워크스페이스의 핵심 분류라서 lib.rs 부터 한 줄 노출되면 lib 사용자가 ocr 모듈 doc 까지 안 들어가도 의도를 빠르게 잡습니다:

//! P6-2 adds the [`ocr`] module: an [`OcrEngine`] trait and an
//! [`OllamaVisionOcr`] default adapter that talks to a vision-capable
//! Ollama model. [`apply_ocr`] is the helper that mutates an
//! [`ImageRefBlock`] in place. Trust note: the LLM-driven default can
//! hallucinate — `OcrText.engine` carries the source identity so
//! consumers can branch trust by engine.

사소한 doc 변경이지만 P6-3 caption 모듈이 lib.rs doc 에 합류할 때 같은 톤을 유지하는 단서가 됩니다.

(작은 권장) `lib.rs` 의 모듈-레벨 doc-comment 가 OCR 어댑터를 소개할 때 "P6-2 adds the [`ocr`] module" 한 줄로만 짧게 끝납니다. `OcrText` 의 trust 정책 (관찰된 텍스트 vs 모델 생성) 이 본 워크스페이스의 핵심 분류라서 `lib.rs` 부터 한 줄 노출되면 lib 사용자가 ocr 모듈 doc 까지 안 들어가도 의도를 빠르게 잡습니다: ```text //! P6-2 adds the [`ocr`] module: an [`OcrEngine`] trait and an //! [`OllamaVisionOcr`] default adapter that talks to a vision-capable //! Ollama model. [`apply_ocr`] is the helper that mutates an //! [`ImageRefBlock`] in place. Trust note: the LLM-driven default can //! hallucinate — `OcrText.engine` carries the source identity so //! consumers can branch trust by engine. ``` 사소한 doc 변경이지만 P6-3 caption 모듈이 `lib.rs` doc 에 합류할 때 같은 톤을 유지하는 단서가 됩니다.

2026-05-02 05:49:37 +00:00

						
				@@ -0,0 +192,4 @@

				            languages,

				            max_pixels,

				        })

				    }

claude-reviewer-01 commented

회차 2 에서 추가된 fn build 가드 (endpoint.is_empty() → bail, model.trim().is_empty() → bail, max_pixels clamp + warn) 가 unit/integration 테스트로 커버되지 않습니다. 회귀 신호가 비어 있으면 누군가 미래에 bail! 메시지나 클램프 범위를 무심코 바꿔도 그린 테스트로 머지됩니다.

간단한 회귀 테스트 3건 추가 권장:

#[test]
fn build_rejects_empty_endpoint() {
    let r = OllamaVisionOcr::from_parts("", "m", vec![], 1024);
    let err = r.unwrap_err().to_string();
    assert!(err.contains("endpoint is empty"), "{err}");
}

#[test]
fn build_rejects_empty_model() {
    let r = OllamaVisionOcr::from_parts("http://x", "   ", vec![], 1024);
    let err = r.unwrap_err().to_string();
    assert!(err.contains("model is empty"), "{err}");
}

#[test]
fn new_falls_back_to_llm_endpoint_when_ocr_endpoint_is_none() {
    let mut cfg = kebab_config::Config::defaults();
    cfg.image.ocr.endpoint = None;
    cfg.models.llm.endpoint = "http://llm.example".to_string();
    // Construction must not panic; we don't expose endpoint() but
    // engine_version is observable.
    let engine = OllamaVisionOcr::new(&cfg).unwrap();
    assert_eq!(engine.engine_name(), "ollama-vision");
}

첫 두 개는 src/ocr.rs 의 #[cfg(test)] mod tests 안에, 세 번째는 통합 테스트 (tests/ocr.rs) 어느 쪽이든 OK.

회차 2 에서 추가된 `fn build` 가드 (`endpoint.is_empty()` → bail, `model.trim().is_empty()` → bail, `max_pixels` clamp + warn) 가 unit/integration 테스트로 커버되지 않습니다. 회귀 신호가 비어 있으면 누군가 미래에 `bail!` 메시지나 클램프 범위를 무심코 바꿔도 그린 테스트로 머지됩니다. 간단한 회귀 테스트 3건 추가 권장: ```rust #[test] fn build_rejects_empty_endpoint() { let r = OllamaVisionOcr::from_parts("", "m", vec![], 1024); let err = r.unwrap_err().to_string(); assert!(err.contains("endpoint is empty"), "{err}"); } #[test] fn build_rejects_empty_model() { let r = OllamaVisionOcr::from_parts("http://x", " ", vec![], 1024); let err = r.unwrap_err().to_string(); assert!(err.contains("model is empty"), "{err}"); } #[test] fn new_falls_back_to_llm_endpoint_when_ocr_endpoint_is_none() { let mut cfg = kebab_config::Config::defaults(); cfg.image.ocr.endpoint = None; cfg.models.llm.endpoint = "http://llm.example".to_string(); // Construction must not panic; we don't expose endpoint() but // engine_version is observable. let engine = OllamaVisionOcr::new(&cfg).unwrap(); assert_eq!(engine.engine_name(), "ollama-vision"); } ``` 첫 두 개는 `src/ocr.rs` 의 `#[cfg(test)] mod tests` 안에, 세 번째는 통합 테스트 (`tests/ocr.rs`) 어느 쪽이든 OK.

altair823 added 1 commit 2026-05-02 05:51:03 +00:00

review(p6-2): 회차 3 cosmetic — build() 회귀 테스트 + lib doc trust note 1539367692

- src/ocr.rs:
  • `OllamaVisionOcr` 에 `#[derive(Debug)]` 추가 (test 의 expect_err
    바운드 충족용; reqwest::blocking::Client 도 Debug 구현).
  • 신규 unit 테스트 3건 (`build_rejects_empty_endpoint`,
    `build_rejects_empty_model_after_trim`,
    `build_clamps_max_pixels_outside_legal_range`) — 회차 2 에서
    추가된 `fn build` 가드의 회귀 신호.
- src/lib.rs:
  • 모듈-레벨 doc-comment 에 OCR 트러스트 정책 한 줄 추가
    (\"LLM-driven default can hallucinate; OcrText.engine carries
    source identity\"). lib 사용자가 ocr 모듈 doc 까지 안 들어가도
    의도 캐치 가능.

cargo test -p kebab-parse-image — 31 pass + 1 ignored
  (11 unit + 12 P6-1 integration + 8 P6-2 integration).
cargo clippy -p kebab-parse-image --all-targets -- -D warnings — pass.

claude-reviewer-01 approved these changes 2026-05-02 05:51:40 +00:00

claude-reviewer-01 left a comment

회차 4 — 회차 3 항목 둘 다 정확히 반영되었습니다 (build 가드 회귀 테스트 3건 추가, lib doc-comment 의 trust note). actionable 한 추가 지적이 보이지 않습니다.

전체 회차 요약:

회차 1: OcrCfg.endpoint Option 화, _other 정리, images: [_; 1], downscale 단일-디코드, from_parts 클램프 검증, hello_world_png Result 화 (5건).
회차 2: fn build 공용 가드, tracing::warn! clamp 사유, downscale long-axis 1px 후행 클램프, 통합 테스트 게이트 단일화 (4건).
회차 3: build 가드 회귀 테스트 3건, lib doc trust note 1줄 (cosmetic 2건).
회차 4: APPROVE.

머지 가능합니다. 사용자가 Gitea UI 에서 머지해 주세요:

#33

"Delete branch after merge" 체크 권장. 로컬 브랜치는 머지 후 git fetch -p + git branch -d feat/p6-2-ocr-adapter 로 정리.

다음 task: P6-3 caption adapter — 본 PR 의 OllamaVisionOcr 가 사용한 패턴 (HTTP+base64+downscale) 을 caption 측에서도 재사용할 가능성이 큽니다. workspace.dependencies 의 reqwest / base64 통합 follow-up 도 P6-3 시작 전후로 묶기 좋습니다.

회차 4 — 회차 3 항목 둘 다 정확히 반영되었습니다 (`build` 가드 회귀 테스트 3건 추가, lib doc-comment 의 trust note). actionable 한 추가 지적이 보이지 않습니다. 전체 회차 요약: - 회차 1: `OcrCfg.endpoint` Option 화, `_other` 정리, `images: [_; 1]`, downscale 단일-디코드, `from_parts` 클램프 검증, `hello_world_png` Result 화 (5건). - 회차 2: `fn build` 공용 가드, `tracing::warn!` clamp 사유, downscale long-axis 1px 후행 클램프, 통합 테스트 게이트 단일화 (4건). - 회차 3: `build` 가드 회귀 테스트 3건, lib doc trust note 1줄 (cosmetic 2건). - 회차 4: APPROVE. 머지 가능합니다. 사용자가 Gitea UI 에서 머지해 주세요: http://gitea.altair823.xyz/altair823-org/kebab/pulls/33 \"Delete branch after merge\" 체크 권장. 로컬 브랜치는 머지 후 `git fetch -p` + `git branch -d feat/p6-2-ocr-adapter` 로 정리. 다음 task: P6-3 caption adapter — 본 PR 의 `OllamaVisionOcr` 가 사용한 패턴 (HTTP+base64+downscale) 을 caption 측에서도 재사용할 가능성이 큽니다. workspace.dependencies 의 reqwest / base64 통합 follow-up 도 P6-3 시작 전후로 묶기 좋습니다.

crates/kebab-parse-image/src/lib.rs

						
				@@ -9,0 +11,4 @@

				//! [`ImageRefBlock`] in place. Trust note — the LLM-driven default

				//! can hallucinate; `OcrText.engine` carries the source identity so

				//! consumers can branch trust by engine (Tesseract / Apple Vision

				//! adapters, when added, will write a different `engine` string).

claude-reviewer-01 commented

2026-05-02 05:51:40 +00:00

(칭찬) Trust note 가 lib doc-comment 까지 한 줄 노출됐습니다 — cargo doc -p kebab-parse-image --open 시 사용자가 모듈 트리를 들어가지 않아도 OCR 출력의 신뢰 분류를 즉시 인지하게 됩니다. 워크스페이스의 "observed vs generated" 분류가 P6-3 caption / 미래 PDF OCR 까지 일관되게 흘러가는 첫 도큐먼트 단서입니다.

(칭찬) Trust note 가 lib doc-comment 까지 한 줄 노출됐습니다 — `cargo doc -p kebab-parse-image --open` 시 사용자가 모듈 트리를 들어가지 않아도 OCR 출력의 신뢰 분류를 즉시 인지하게 됩니다. 워크스페이스의 "observed vs generated" 분류가 P6-3 caption / 미래 PDF OCR 까지 일관되게 흘러가는 첫 도큐먼트 단서입니다.