cut PR v0.18.0 전 마지막 정리. 사용자 요청: "전체 코드베이스를 깔끔하고 알아보기 쉽게".
## Workspace lints
- `Cargo.toml` 의 `[workspace.lints.clippy]` 에 `pedantic = "warn"` (priority -1) + 의도적 allow-list 추가:
- cast_possible_truncation / cast_possible_wrap / cast_sign_loss / cast_precision_loss — ONNX i64 / hash modular reduction 등 의도적 truncation.
- doc_markdown / missing_errors_doc / missing_panics_doc — cosmetic doc style.
- too_many_lines / module_name_repetitions / must_use_candidate / needless_pass_by_value / manual_let_else / items_after_statements / similar_names — informational only.
- format_collect / match_wildcard_for_single_variants / trivially_copy_pass_by_ref / unnecessary_wraps — intentional patterns (exhaustive match, future Result variants 등).
- default_trait_access — `Foo::default()` 가 idiomatic.
- float_cmp — NLI / RRF score 의 explicit threshold 비교 의도.
- struct_excessive_bools / case_sensitive_file_extension_comparisons / naive_bytecount / ignore_without_reason — domain-specific 의도.
- format_push_string / return_self_not_must_use / match_same_arms — builder / wire-label / hot-path 패턴 보존.
- needless_continue / used_underscore_binding / nonminimal_bool / unreadable_literal / many_single_char_names / doc_link_with_quotes / assigning_clones / collapsible_str_replace / trivial_regex / elidable_lifetime_names / range_plus_one / explicit_iter_loop / implicit_hasher / ref_option — remaining low-value style.
- 각 24 crate `Cargo.toml` 에 `[lints] workspace = true` 추가.
## Auto-fix
`cargo clippy --workspace --all-targets --fix` 적용 — 128 files changed, 552 insertions / 472 deletions. 주로:
- uninlined_format_args (~18): `format!("{}", x)` → `format!("{x}")`.
- redundant_closure_for_method_calls (~33): `.map(|x| x.foo())` → `.map(T::foo)`.
- 그 외 mechanical refactor.
## 검증
- `cargo clippy --workspace --all-targets -j 1 -- -D warnings` clean (pedantic + 모든 lint group).
- `cargo test --workspace --no-fail-fast -j 1` — **1293 tests pass + 1 pre-existing flaky fail** (`kebab-mcp::tools_call_ask_multi_hop::ask_tool_routes_multi_hop_true_to_decompose_first`, HOTFIX candidate, cleanup 무관). 회귀 0.
Wire 영향: 없음.
Behavior 영향: 없음 (mechanical refactor only).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
188 lines
7.2 KiB
Rust
188 lines
7.2 KiB
Rust
//! Shared image preparation for any image-to-LM pipeline.
|
||
//!
|
||
//! P6-2 OCR and P6-3 caption both need the same pre-LM step: clamp
|
||
//! the long edge to a configured max, re-encode as PNG (the wire
|
||
//! format vision channels expect — Ollama's `images: [base64, ...]`
|
||
//! takes PNG/JPEG, but PNG keeps the alpha + lossless invariant we
|
||
//! prefer for hand-drawn / screenshot inputs), pass through the
|
||
//! source bytes when they already satisfy both constraints.
|
||
//! Centralising this here keeps the 1px-rounding fix, the PNG
|
||
//! passthrough hot path, and the error messages in one place —
|
||
//! future image-to-LM channels (PDF page thumbnails, video
|
||
//! keyframes, …) plug in without re-deriving the algorithm.
|
||
|
||
use std::io::Cursor;
|
||
|
||
use anyhow::{Context, Result};
|
||
use image::{ImageFormat, ImageReader};
|
||
|
||
/// Decode `bytes`, downscale so the long edge is at most `max_long_edge`,
|
||
/// and re-encode as PNG. Returns `(png_bytes, final_w, final_h)` so
|
||
/// callers that care about the final dimensions (e.g. OCR's
|
||
/// `SourceSpan::Region`) get them without re-decoding.
|
||
///
|
||
/// PNG sources that already fit the cap pass through (zero decodes,
|
||
/// just a `Vec` clone). Every other path decodes the image exactly
|
||
/// once: a cheap header sniff peeks at the format / dimensions before
|
||
/// committing to a decode, so non-PNG passthrough and downscale share
|
||
/// the same `decode → optionally resize → re-encode` tail.
|
||
pub(crate) fn downscale_to_png(
|
||
bytes: &[u8],
|
||
max_long_edge: u32,
|
||
) -> Result<(Vec<u8>, u32, u32)> {
|
||
let reader = ImageReader::new(Cursor::new(bytes))
|
||
.with_guessed_format()
|
||
.context("reading image header")?;
|
||
let format = reader.format();
|
||
let (w, h) = reader
|
||
.into_dimensions()
|
||
.context("reading image dimensions")?;
|
||
|
||
let long = w.max(h);
|
||
|
||
// Hot path — PNG within budget already matches the wire format we
|
||
// send to vision models, so we ship the bytes verbatim without
|
||
// paying for a decode + re-encode round-trip.
|
||
if long <= max_long_edge && format == Some(ImageFormat::Png) {
|
||
return Ok((bytes.to_vec(), w, h));
|
||
}
|
||
|
||
// Every remaining branch needs the pixels — either to re-encode as
|
||
// PNG (non-PNG within budget) or to resize first (over budget).
|
||
// One decode covers both.
|
||
let img = ImageReader::new(Cursor::new(bytes))
|
||
.with_guessed_format()
|
||
.context("re-reading image for decode")?
|
||
.decode()
|
||
.context("decoding image")?;
|
||
|
||
let (final_w, final_h, final_img) = if long <= max_long_edge {
|
||
(w, h, img)
|
||
} else {
|
||
let scale = max_long_edge as f32 / long as f32;
|
||
let mut new_w = ((w as f32) * scale).round().max(1.0) as u32;
|
||
let mut new_h = ((h as f32) * scale).round().max(1.0) as u32;
|
||
// Independent rounding of the two axes can let `f32`'s
|
||
// round-to-nearest push the long axis one pixel past
|
||
// `max_long_edge` for irrational scales (e.g. `max=1601,
|
||
// long=4001`). Pin the long axis to exactly `max_long_edge`
|
||
// so the doc-comment's "long edge is at most max_long_edge"
|
||
// stays a strict bound.
|
||
if w >= h {
|
||
new_w = new_w.min(max_long_edge);
|
||
} else {
|
||
new_h = new_h.min(max_long_edge);
|
||
}
|
||
let resized =
|
||
img.resize_exact(new_w, new_h, image::imageops::FilterType::Triangle);
|
||
(new_w, new_h, resized)
|
||
};
|
||
|
||
let mut out = Cursor::new(Vec::new());
|
||
final_img
|
||
.write_to(&mut out, ImageFormat::Png)
|
||
.context("encoding image as PNG")?;
|
||
Ok((out.into_inner(), final_w, final_h))
|
||
}
|
||
|
||
#[cfg(test)]
|
||
mod tests {
|
||
use super::*;
|
||
|
||
use std::io::Cursor;
|
||
|
||
use image::{ImageBuffer, Rgb};
|
||
|
||
/// Solid-colour PNG of the given dimensions. Solid colour
|
||
/// compresses aggressively so even 4001×3001 stays under a few
|
||
/// kilobytes.
|
||
fn solid_png(w: u32, h: u32) -> Vec<u8> {
|
||
let img: ImageBuffer<Rgb<u8>, _> =
|
||
ImageBuffer::from_pixel(w, h, Rgb([0, 0, 255]));
|
||
let mut buf = Cursor::new(Vec::new());
|
||
img.write_to(&mut buf, ImageFormat::Png)
|
||
.expect("encoding solid PNG must not fail");
|
||
buf.into_inner()
|
||
}
|
||
|
||
fn solid_jpeg(w: u32, h: u32) -> Vec<u8> {
|
||
let img: ImageBuffer<Rgb<u8>, _> =
|
||
ImageBuffer::from_pixel(w, h, Rgb([255, 255, 255]));
|
||
let mut buf = Cursor::new(Vec::new());
|
||
img.write_to(&mut buf, ImageFormat::Jpeg)
|
||
.expect("encoding solid JPEG must not fail");
|
||
buf.into_inner()
|
||
}
|
||
|
||
/// PNG within budget skips the decode + re-encode round-trip
|
||
/// entirely. Source bytes survive byte-for-byte.
|
||
#[test]
|
||
fn png_within_cap_passes_through_zero_decode() {
|
||
let bytes = solid_png(100, 50);
|
||
let (out, w, h) =
|
||
downscale_to_png(&bytes, 1024).expect("PNG passthrough must succeed");
|
||
assert_eq!((w, h), (100, 50));
|
||
assert_eq!(out, bytes, "PNG passthrough must return source bytes verbatim");
|
||
}
|
||
|
||
/// JPEG within budget gets re-encoded as PNG (the wire format)
|
||
/// while preserving dimensions.
|
||
#[test]
|
||
fn jpeg_within_cap_reencodes_as_png() {
|
||
let bytes = solid_jpeg(100, 50);
|
||
let (out, w, h) =
|
||
downscale_to_png(&bytes, 1024).expect("JPEG re-encode must succeed");
|
||
assert_eq!((w, h), (100, 50));
|
||
// Byte stream must now start with the PNG magic.
|
||
assert_eq!(
|
||
&out[..8],
|
||
&[0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A],
|
||
"output must be PNG-encoded after JPEG input"
|
||
);
|
||
}
|
||
|
||
/// Pathological irrational scale — `max=1601, long=4001` would let
|
||
/// independent f32 round-to-nearest push the long axis to 1602.
|
||
/// The post-resize clamp pins it back to `max_long_edge`.
|
||
#[test]
|
||
fn long_edge_clamped_strictly_to_max_for_irrational_scale() {
|
||
let bytes = solid_png(4001, 3001);
|
||
let (_out, w, h) =
|
||
downscale_to_png(&bytes, 1601).expect("downscale must succeed");
|
||
let long = w.max(h);
|
||
assert!(long <= 1601, "long edge must be ≤ max, got {long}");
|
||
}
|
||
|
||
/// Aspect ratio survives the downscale within 2%.
|
||
#[test]
|
||
fn aspect_ratio_preserved_within_rounding() {
|
||
let bytes = solid_png(4000, 3000);
|
||
let (_out, w, h) =
|
||
downscale_to_png(&bytes, 1024).expect("downscale must succeed");
|
||
let ratio = w as f32 / h as f32;
|
||
assert!(
|
||
(ratio - 4.0 / 3.0).abs() < 0.02,
|
||
"aspect drift: in=4/3 out={w}/{h}={ratio}"
|
||
);
|
||
}
|
||
|
||
/// Truncated PNG header — format guess succeeds (8-byte signature
|
||
/// intact) but `into_dimensions` fails. Surfaced as Err so
|
||
/// callers can route to "skip + warning" without confusing the
|
||
/// downstream pipeline with a zero-size image.
|
||
#[test]
|
||
fn corrupt_bytes_return_err() {
|
||
let truncated = vec![0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A];
|
||
let r = downscale_to_png(&truncated, 1024);
|
||
assert!(r.is_err(), "corrupt PNG must surface as Err");
|
||
}
|
||
|
||
/// Unrecognised bytes (not any image format) — header sniff fails
|
||
/// before dimension read.
|
||
#[test]
|
||
fn unrecognised_bytes_return_err() {
|
||
let r = downscale_to_png(b"definitely not an image", 1024);
|
||
assert!(r.is_err(), "non-image bytes must surface as Err");
|
||
}
|
||
}
|