style: cargo fmt --all (round 4 ingest log feature follow-up)

Phase C4 executor 의 마지막 `fix(test): clippy + fmt fixes` commit 이
test file 부분만 fmt 적용. workspace 전체 fmt 누락 발견 → cargo fmt --all
적용. 모든 import alphabetical reorder + line wrapping 정합.

추가 untracked artifact 동시 commit:
- docs/superpowers/specs/2026-05-28-v0.20-ingest-log-spec.md (491 line, ACCEPT)
- docs/superpowers/plans/2026-05-28-v0.20-ingest-log-plan.md (616 line, ACCEPT)

workspace test: 1370 passed / 0 failed / 50 ignored, ingest_log_smoke green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-28 04:18:40 +00:00
parent 445b096215
commit 685007789a
235 changed files with 6520 additions and 3955 deletions

View File

@@ -0,0 +1,616 @@
---
title: "v0.20.x ingest log feature — plan"
date: 2026-05-28
status: "DRAFT (round 0)"
phase: B4 (plan drafter)
target_spec: ../specs/2026-05-28-v0.20-ingest-log-spec.md
parent_task: ../../../tasks/p10/p10-1A-5-ingest-failure-log.md
plan_for_version: 0.20.x
target_branch: feat/pdf-scanned-ocr
step_count: 6
commit_count: 5
estimated_loc_delta: "+650 / -25"
---
# v0.20.x ingest log feature — plan
## §0 Overview
본 plan 은 spec ACCEPT (`docs/superpowers/specs/2026-05-28-v0.20-ingest-log-spec.md`, 491 line) 의 6 step / 5 commit decomposition. spec §5 AC-1 ~ AC-10 의 acceptance criteria 를 step boundary 마다 verifier로 매핑.
**핵심 deliverable**:
1. `kebab-config``[logging]` section (2 field: `ingest_log_enabled` / `ingest_log_dir`).
2. `kebab-app/src/ingest_log.rs` 신규 module (`IngestLogWriter` + `LogEvent` enum, 5 kind).
3. `PdfOcrProgress::Finished` + `IngestEvent::PdfOcrFinished`**4 additive field** (`image_byte_size` / `image_width` / `image_height` / `failure_reason`) → wire schema `ingest_progress.v1` additive minor cascade.
4. `kebab-app` 의 5 emit hook integration (init / flush / OCR / parse_error+skip / fatal error).
5. integration test `ingest_log_smoke.rs` (AC-9).
6. workspace test + clippy + dogfood smoke (AC-8).
**작업 분량 estimate**: +650 LOC (300 신규 module, 250 hook+config, 100 test) / -25 LOC (PdfOcrProgress callsite refactor). branch 변경 없음, doc-only commit 0.
**Spec-driven invariant**:
- **wire schema** = additive minor (4 optional field 추가, `required` 변경 없음, 기존 consumer regression 0).
- **backward compat** = `#[serde(default)]` 로 pre-v0.20 config 자동 init (AC-10).
- **subagent skip** = direct in-session execution (worker protocol).
---
## §1 Step table
| # | Step | Files (primary) | Commit (after step) | AC covered |
|---|------|-----------------|---------------------|------------|
| 1 | LoggingCfg + Config integration | `crates/kebab-config/src/lib.rs`, `crates/kebab-config/tests/*.rs` | `feat(config): add [logging] section (ingest_log_enabled + ingest_log_dir)` | AC-1, AC-10 |
| 2 | IngestLogWriter module + LogEvent enum | `crates/kebab-app/src/ingest_log.rs` (new), `crates/kebab-app/src/lib.rs` (mod 선언) | `feat(app): IngestLogWriter + LogEvent enum (per-ingest-run ndjson log)` | AC-3 (struct) |
| 3 | PdfOcrProgress::Finished extend + wire cascade | `crates/kebab-app/src/pdf_ocr_apply.rs`, `crates/kebab-app/src/ingest_progress.rs`, `docs/wire-schema/v1/ingest_progress.schema.json`, `integrations/claude-code/kebab/SKILL.md` | `feat(wire): PdfOcrProgress.Finished + ingest_progress.v1 additive 4 fields (image_byte_size/width/height + failure_reason)` | AC-3 (ocr fields), AC-5 (failure_reason carry) |
| 4 | 5 emit hook integration | `crates/kebab-app/src/lib.rs` (Hook 1/3/5), `crates/kebab-app/src/pdf_ocr_apply.rs` (Hook 2 metric capture), `crates/kebab-source-fs/src/connector.rs` (Hook 4 skip emit) | `feat(app): wire IngestLogWriter into 5 ingest emit hooks (Arc<Mutex> sync)` | AC-2, AC-4, AC-5, AC-6, AC-7 |
| 5 | Integration test (ingest_log_smoke) | `crates/kebab-app/tests/ingest_log_smoke.rs` (new) | `test(app): ingest_log_smoke integration test (AC-9)` | AC-9 |
| 6 | Final sanity (workspace test + clippy + optional dogfood) | n/a (verifier only) | no commit | AC-8 |
5 commit 단위, 6 step 단위. Step 6 는 verifier-only (no commit), 누적 regression 확인용.
---
## §2 Per-step detail
### §2.1 Step 1 — LoggingCfg + Config integration
**Goal**: spec §3.1 + §4.4 — `LoggingCfg` struct + Config field + backward compat.
#### §2.1.1 Files affected
| Path | Action | Approx LOC | Notes |
|------|--------|------------|-------|
| `crates/kebab-config/src/lib.rs` | edit | +55 / -0 | Config struct (line 37+), 새 LoggingCfg struct + Default + default fns |
| `crates/kebab-config/tests/integration.rs` (또는 신규 `tests/logging_roundtrip.rs`) | edit / new | +35 | TOML roundtrip 1 test (default load + override load + pre-v0.20 backward compat) |
기존 file `crates/kebab-config/src/lib.rs` 의 line range:
- line 3781: `Config` struct (현재 `pdf` field 가 line 6266 위치). 신규 `logging` field 는 `pdf` 다음 line 67 부근 삽입.
- 신규 `LoggingCfg` struct + `default_ingest_log_*` fn 는 line 416 부근 (`PdfCfg::defaults` 다음) 또는 file 끝부근의 cfg-grouping spot 에 추가. 위치 선택은 executor 재량 — 기존 cfg struct (NliCfg/PdfOcrCfg/PdfCfg) 와 같은 visual layout 유지.
#### §2.1.2 Action diff outline
```rust
// crates/kebab-config/src/lib.rs (line 37+, Config struct)
pub struct Config {
// ... existing fields ...
#[serde(default = "PdfCfg::defaults")]
pub pdf: PdfCfg,
/// v0.20.x sub-item: ingest log surface. `#[serde(default)]` 라
/// pre-v0.20 config (`[logging]` section 부재) 가 default 로 init.
#[serde(default)]
pub logging: LoggingCfg,
#[serde(skip)]
pub(crate) source_dir: Option<PathBuf>,
}
// 신규 struct (cfg grouping spot)
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
pub struct LoggingCfg {
/// ingest 시 structured ndjson log auto-write. default = true.
/// false 시 log file 생성 0 (AC-6).
#[serde(default = "default_ingest_log_enabled")]
pub ingest_log_enabled: bool,
/// per-ingest-run log file directory. default = `{state_dir}/logs`.
/// `{state_dir}` placeholder = XDG state dir (e.g. `~/.local/state/kebab`).
/// Log file 누적 disk usage 는 user-managed (rotation policy 미제공 — spec §6 R-1).
#[serde(default = "default_ingest_log_dir")]
pub ingest_log_dir: PathBuf,
}
fn default_ingest_log_enabled() -> bool { true }
fn default_ingest_log_dir() -> PathBuf {
PathBuf::from("{state_dir}/logs")
}
impl Default for LoggingCfg {
fn default() -> Self {
Self {
ingest_log_enabled: default_ingest_log_enabled(),
ingest_log_dir: default_ingest_log_dir(),
}
}
}
```
테스트 추가 (`crates/kebab-config/tests/logging_roundtrip.rs`):
```rust
// 1. default Config 의 logging section round-trip (TOML → Config → TOML).
// 2. `[logging]\nenabled = false\ningest_log_dir = "/tmp/x"` override 가
// deserialize 시 정확히 반영되는지 verify.
// 3. pre-v0.20 fixture (entire config 에 [logging] 없는) 가 default LoggingCfg 로 init (AC-10).
```
#### §2.1.3 Acceptance
- `cargo test -p kebab-config -j 4` 전수 pass + 새 test 1 pass.
- `cargo build -p kebab-config -j 4` clean.
- `cargo clippy -p kebab-config -- -D warnings` 0 warning.
#### §2.1.4 Commit
```
feat(config): add [logging] section (ingest_log_enabled + ingest_log_dir)
v0.20.x ingest log surface 의 config side. `LoggingCfg` struct 신설:
* ingest_log_enabled (bool, default true)
* ingest_log_dir (PathBuf, default "{state_dir}/logs")
#[serde(default)] tag 로 pre-v0.20 config 가 [logging] section 부재
시 LoggingCfg::default() 자동 init (AC-10 backward compat).
`{state_dir}` placeholder 의 실제 expand 는 step 2 (IngestLogWriter)
의 expand_log_dir helper 가 담당 (kebab-config 의 expand_path_with_base
는 `{state_dir}` 미지원, spec §6 R-3).
```
---
### §2.2 Step 2 — IngestLogWriter module + LogEvent enum
**Goal**: spec §4.1 — 새 module `crates/kebab-app/src/ingest_log.rs``IngestLogWriter` + `LogEvent` enum + `IngestSummary` struct + run_id generation + path expansion. *callsite wiring 은 step 4*. 본 step 은 self-contained 한 writer + unit test 만.
#### §2.2.1 Files affected
| Path | Action | Approx LOC | Notes |
|------|--------|------------|-------|
| `crates/kebab-app/src/ingest_log.rs` | new | +280 | writer struct + LogEvent enum + IngestSummary + Drop impl + 4 unit test |
| `crates/kebab-app/src/lib.rs` | edit | +3 | `mod ingest_log;` 선언 (line 63 부근, `pub mod ingest_progress;` 다음). `pub use ingest_log::{IngestLogWriter, LogEvent, IngestSummary};` |
#### §2.2.2 Action diff outline
`crates/kebab-app/src/ingest_log.rs` (신규 module body):
- module doc (5 line): writer 역할 + run_id 형식 + emit ordering 명문.
- imports: `std::fs::File`, `std::io::{BufWriter,Write}`, `std::path::{Path,PathBuf}`, `std::time::SystemTime`, `serde::{Serialize,Deserialize}`, `time::OffsetDateTime`, `time::format_description::well_known::Rfc3339`, `uuid::Uuid`.
- **`pub struct IngestLogWriter`** (`file: BufWriter<File>`, `path: PathBuf`, `run_id: String`, `started_at: SystemTime`).
- `pub fn open(cfg: &kebab_config::LoggingCfg) -> anyhow::Result<Option<Self>>``cfg.ingest_log_enabled == false``Ok(None)`, true 시 log_dir 생성 + file create + run_id 발급. open 실패는 `Err` 반환 (caller 가 swallow + tracing::warn).
- `pub fn write_event(&mut self, event: &LogEvent<'_>) -> anyhow::Result<()>` — serde_json::to_writer + writeln.
- `pub fn write_summary(&mut self, summary: &IngestSummary) -> anyhow::Result<()>` — 동일 pattern.
- `pub fn flush(&mut self) -> anyhow::Result<()>`.
- getters: `run_id() / path() / started_at()`.
- **`impl Drop`** — best-effort `self.file.flush()` (spec §6 R-4 panic unwind path).
- **`fn generate_run_id() -> String`** — `OffsetDateTime::now_utc().format(time::macros::format_description!("[year][month][day]T[hour][minute][second]Z"))` 의 ISO 8601 compact prefix + `Uuid::now_v7().simple().to_string()` 의 마지막 8 hex char. `rand` 추가 0 (spec §6 R-5).
- **`fn expand_log_dir(path: &Path) -> PathBuf`** — string-replace `{state_dir}``kebab_config::Config::xdg_state_dir()`. tilde/env 는 `kebab_config::expand_path` 위임.
- **`pub(crate) fn now_ts() -> String`** — Rfc3339 formatted UTC. step 4 의 hook 들이 호출.
- **`pub enum LogEvent<'a>`** — `#[serde(tag="kind", rename_all="snake_case")]`. 4 variant:
- `Ocr { ts, doc_path, page, image_byte_size: Option<u64>, image_width: Option<u32>, image_height: Option<u32>, ms, chars, success, reason: Option<&'a str>, ocr_engine }`.
- `ParseError { ts, doc_path, reason, message }`.
- `Skip { ts, doc_path, reason, detail: Option<&'a str> }`.
- `Error { ts, code, message }`.
- **`pub struct IngestSummary`** — owned fields (`ts: String`, `run_id: String`, `scanned/new/errors/ocr_pages/ocr_failures: u32`, `ocr_p50_ms/p90_ms/max_ms: Option<u64>`, `duration_ms: u64`). `#[serde(tag = "kind", rename = "summary")]` 또는 별도 `kind: &'static str = "summary"` literal field 로 wire-shape 의 `kind: "summary"` 강제. **권장**: 별도 `IngestSummary` enum variant 대신 `tagged-struct` (serde 의 `#[serde(rename = "summary")]` + explicit kind field) — wire output 의 line 단위 JSON 이 항상 `{"kind":"summary",…}` 시작.
**unit test (5 fn, ingest_log.rs 의 `#[cfg(test)] mod tests`)**:
1. `generate_run_id_has_iso_prefix_and_8_hex_suffix``^\d{8}T\d{6}Z-[0-9a-f]{8}$` regex match.
2. `expand_log_dir_substitutes_state_dir_placeholder``"{state_dir}/logs"` → xdg_state_dir + "/logs".
3. `writer_disabled_returns_none``LoggingCfg { enabled: false, .. } → IngestLogWriter::open() == Ok(None)`.
4. `writer_writes_one_event_per_line_with_kind_discriminator` — temp file 에 write_event ×3 → 3 line, 각 line 의 첫 char `{`, `"kind":` substring present.
5. `drop_flushes_pending_buffer` — write_event 후 explicit flush 없이 drop, 그 후 read_to_string 으로 line count ≥ 1 verify.
**OQ-2 (p50 / p90 계산)**: workspace 에 `quantiles` crate 없음, simple sorted Vec 으로 처리. 본 step 의 IngestSummary 는 numeric field 만 제공 — 실제 p50/p90 계산은 step 4 의 emit hook 이 ms accumulator (Vec<u64>) 유지 후 final stage 에서 sort + percentile index.
**OQ-3 (log cleanup policy 명문 위치)**: LoggingCfg::ingest_log_dir 의 doc-comment 에 한 줄 (`Log file 누적 disk usage 는 user-managed`) 으로 충분 — README/SMOKE 변경 없음. step 1 의 commit body 에 한 줄 명문.
#### §2.2.3 Acceptance
- `cargo test -p kebab-app --lib ingest_log -j 4` 4 passed.
- `cargo build -p kebab-app -j 4` clean.
- `cargo clippy -p kebab-app -- -D warnings` 0 warning.
#### §2.2.4 Commit
```
feat(app): IngestLogWriter + LogEvent enum (per-ingest-run ndjson log)
v0.20.x ingest log surface 의 module side. crates/kebab-app/src/
ingest_log.rs 신규:
* IngestLogWriter — open/write_event/write_summary/flush + Drop flush
* LogEvent enum 4 variant (ocr / parse_error / skip / error)
* IngestSummary struct (kind="summary" literal + 11 stat field)
* generate_run_id (ISO 8601 prefix + uuid v7 마지막 8 hex)
* expand_log_dir ({state_dir} placeholder 의 hand-roll expand)
uuid v7 = workspace dep (Cargo.toml line 132), rand 신규 의존 회피
(spec §6 R-5).
본 step 은 self-contained writer + 5 unit test. ingest pipeline 의
emit hook 5개 wiring 은 step 4.
```
---
### §2.3 Step 3 — PdfOcrProgress::Finished extend + wire cascade
**Goal**: spec §4.2 HIGH-1 + §3.3 ocr fields — `PdfOcrProgress::Finished``IngestEvent::PdfOcrFinished` 에 4 additive field 추가, wire schema additive minor cascade.
#### §2.3.1 Files affected
| Path | Action | Approx LOC | Notes |
|------|--------|------------|-------|
| `crates/kebab-app/src/pdf_ocr_apply.rs` | edit | +25 / -5 | `PdfOcrProgress::Finished` variant 의 4 field 추가, 3 emit_progress callsite (line 145, 173, 247) 의 measurement + emit 갱신 |
| `crates/kebab-app/src/ingest_progress.rs` | edit | +15 / -5 | `IngestEvent::PdfOcrFinished` 의 4 field 추가, 기존 test `ingest_event_serializes_with_discriminator` 류 보존 |
| `crates/kebab-app/src/lib.rs` | edit | +10 / -3 | line 18651882 의 `PdfOcrProgress::Finished { … } => IngestEvent::PdfOcrFinished { … }` mapping 의 4 field carry |
| `docs/wire-schema/v1/ingest_progress.schema.json` | edit | +12 | `image_byte_size` / `image_width` / `image_height` / `failure_reason` property 추가 (모두 optional, `required` 변경 없음) |
| `integrations/claude-code/kebab/SKILL.md` | edit | +5 | wire schema description 동기 (추가 optional field 명시) |
#### §2.3.2 Action diff outline
`PdfOcrProgress::Finished` (pdf_ocr_apply.rs line 283):
```rust
Finished {
page: u32,
ms: u64,
chars: u32,
skipped: bool,
// NEW (4 field, optional):
image_byte_size: Option<u64>,
image_width: Option<u32>,
image_height: Option<u32>,
failure_reason: Option<String>, // "timeout" | "ocr_error" | "network_error" | None
},
```
3 emit_progress callsite 갱신 (pdf_ocr_apply.rs line 145 / 173 / 247):
- **line 145** (success path, OCR 정상 완료): `image_byte_size: Some(<image bytes>)`, `image_width: Some(<w>)`, `image_height: Some(<h>)`, `failure_reason: None`. `<image bytes/w/h>` 는 raster image 의 measurement (Bug #11 follow-up 의 측정 spot 재사용 또는 인접 변수).
- **line 173** (engine 실패 → skip): `failure_reason: Some("ocr_error".into())` (or "timeout" if 분류 가능). image metric 은 available 시 emit, unavailable 시 None.
- **line 247** (validation/threshold skip, OCR 미수행): `failure_reason: None`, image metric 가능 시 emit.
`IngestEvent::PdfOcrFinished` (ingest_progress.rs line 96102):
```rust
PdfOcrFinished {
page: u32,
ms: u64,
chars: u32,
ocr_engine: String,
skipped: bool,
// NEW (4 field, optional):
image_byte_size: Option<u64>,
image_width: Option<u32>,
image_height: Option<u32>,
failure_reason: Option<String>,
},
```
`crates/kebab-app/src/lib.rs` line 18651882 의 mapping:
```rust
crate::pdf_ocr_apply::PdfOcrProgress::Finished {
page, ms, chars, skipped,
image_byte_size, image_width, image_height, failure_reason,
} => {
if let Some(sender) = progress {
let _ = sender.send(
crate::ingest_progress::IngestEvent::PdfOcrFinished {
page, ms, chars,
ocr_engine: engine.engine_name().to_string(),
skipped,
image_byte_size, image_width, image_height,
failure_reason: failure_reason.clone(),
},
);
}
// step 4 의 Hook 2 가 이 위치에서 추가로 log writer 에 write.
}
```
`docs/wire-schema/v1/ingest_progress.schema.json``properties` (line 8+) 에 추가:
```jsonc
"image_byte_size": { "type": "integer", "minimum": 0, "description": "pdf_ocr_finished (optional): raster image byte size." },
"image_width": { "type": "integer", "minimum": 0, "description": "pdf_ocr_finished (optional): raster image width px." },
"image_height": { "type": "integer", "minimum": 0, "description": "pdf_ocr_finished (optional): raster image height px." },
"failure_reason": { "type": "string", "enum": ["timeout", "ocr_error", "network_error", "other"], "description": "pdf_ocr_finished (optional): present iff OCR failed." }
```
**중요**: `required` array 는 변경 없음 (현재 `["schema_version", "kind", "ts"]`). 4 field 모두 optional → **additive minor** = backward compat.
`integrations/claude-code/kebab/SKILL.md` 갱신 — wire schema 의 `pdf_ocr_finished` 설명에 4 추가 field 한 줄 명문 (existing 1 paragraph 다음).
#### §2.3.3 Acceptance
- `cargo test -p kebab-app pdf_ocr_apply -j 4` 전수 pass.
- `cargo test -p kebab-app ingest_progress -j 4` 전수 pass.
- `cargo test -p kebab-cli wire_search wire_ask -j 4` regression check (기존 PdfOcrFinished consumer 가 4 추가 field 의 `Option::None` 으로도 deserialize 성공).
- `cargo build -p kebab-app -p kebab-cli -j 4` clean.
- wire schema validate: `jq '.properties | keys' docs/wire-schema/v1/ingest_progress.schema.json` 가 4 신규 key 포함, `.required` 변경 없음.
#### §2.3.4 Commit
```
feat(wire): PdfOcrProgress.Finished + ingest_progress.v1 additive 4 fields
v0.20.x ingest log feature 의 wire side. additive minor cascade:
* PdfOcrProgress::Finished + IngestEvent::PdfOcrFinished 의 4 field:
- image_byte_size: Option<u64>
- image_width: Option<u32>
- image_height: Option<u32>
- failure_reason: Option<String>
* docs/wire-schema/v1/ingest_progress.schema.json — 4 추가 property
(모두 optional, required 변경 없음 = additive minor)
* integrations/claude-code/kebab/SKILL.md — wire schema description 동기
기존 ingest_progress.v1 consumer (CLI wire dump, integration test
fixture, kebab-cli wire_search/wire_ask) 는 4 추가 field 의
Option::None 으로 backward-compat. version bump 0 (additive minor =
binary-version cascade trigger 아님 per CLAUDE.md §Versioning cascade).
```
---
### §2.4 Step 4 — 5 emit hook integration (Arc<Mutex<IngestLogWriter>>)
**Goal**: spec §4.2 — 5 hook 위치에서 IngestLogWriter 호출. ownership = `Option<Arc<Mutex<IngestLogWriter>>>` (binding 은 `ingest_with_config_opts` 에서, 5 hook 이 clone+lock+write).
#### §2.4.1 Files affected
| Path | Action | Approx LOC | Hook |
|------|--------|------------|------|
| `crates/kebab-app/src/lib.rs` | edit | +110 / -10 | Hook 1 (init + flush), Hook 3 (parse_error path), Hook 5 (fatal error), summary stage 의 percentile 계산 + write_summary |
| `crates/kebab-app/src/pdf_ocr_apply.rs` | edit | +35 / -3 | Hook 2: image metric capture (existing raster decode spot) + emit_progress 의 4 field carry. *signature 변경 없음*`Finished` field 추가만으로 caller 가 image metric carry 가능 |
| `crates/kebab-source-fs/src/connector.rs` | edit | +25 | Hook 4: scan_with_skips 의 skip event 마다 callback (또는 `FsScanSkips``events: Vec<SkipEvent>` accumulator) — kebab-app 이 scan 후 enumerate + write |
#### §2.4.2 Hook detail
**Hook 1 — `ingest_with_config_opts` (lib.rs line 281)**:
function entry 직후 `let log_writer: Option<Arc<Mutex<IngestLogWriter>>> = IngestLogWriter::open(&config.logging) | Ok(Some(w)) → Some(Arc::new(Mutex::new(w))) | Ok(None) → None | Err(e) → tracing::warn + None`. function exit (Completed / Aborted 경로 직전) 에서 `summary 계산 + write_summary + flush`. summary 의 `ocr_p50_ms / p90_ms / max_ms` 는 success-only OCR duration accumulator `Vec<u64>``sort_unstable()``len*50/100` / `len*90/100` index 로 추출, `samples.last()` 로 max.
**Hook 2 — `apply_ocr_to_pdf_pages` (pdf_ocr_apply.rs) → caller closure in lib.rs line 1855**:
step 3 에서 `PdfOcrProgress::Finished` 의 4 field 추가됐으므로 본 step 은 closure 의 `Finished arm` 에 한 줄 추가: `log_writer.clone()` 캡처 + lock + `write_event(&LogEvent::Ocr { ts: now_ts(), doc_path, page, image_*, ms, chars, success: !skipped && failure_reason.is_none(), reason: failure_reason.as_deref(), ocr_engine: engine.engine_name() })`. success path 시 `ocr_ms_samples.lock().push(ms)`.
**ownership note (MEDIUM-1)**: emit_progress 는 `F: FnMut(PdfOcrProgress)` (pdf_ocr_apply.rs line 88) → closure 가 `Arc<Mutex<_>>` clone 캡처 가능. single-threaded per-asset loop 이므로 deadlock 위험 없음.
**Hook 3 — `parse_error` (lib.rs `ingest_one_pdf_asset` line 1770 + `ingest_one_code_asset` line 2002 의 parse Err arm)**:
`kebab_parse_pdf::extract(...)` (또는 code parser) 의 `Err(e)` arm 마다 한 줄: `log_writer.lock().write_event(&LogEvent::ParseError { ts: now_ts(), doc_path: asset.path_str(), reason: classify_parse_error(&e), message: &format!("{e}") })`. `classify_parse_error``kebab_core::Error::PdfFormat → "lopdf_error"`, `Error::ImageFormat → "image_format"`, fallback `"other"` 분류 — pdf_ocr_apply.rs 또는 ingest_log.rs 의 helper.
**Hook 4 — skip event (kebab-source-fs/src/connector.rs)**:
current `FsSourceConnector::scan_with_skips` (line 100) 은 skip 마다 `tracing::debug` + counter increment 만 함. 두 option — A (`FsScanSkips``events: Vec<FsSkipEvent>` field 추가) vs B (connector 에 `Arc<Mutex<IngestLogWriter>>` 주입). **A 채택** (B 는 kebab-source-fs → kebab-app cycle).
`FsScanSkips` (line 207 부근) 에 `pub events: Vec<FsSkipEvent>` 추가, 새 struct `FsSkipEvent { doc_path: String, reason: &'static str, detail: Option<String> }` 정의. 5 skip arm (line 113 builtin_blacklist / 122 gitignore / 131 kebabignore / 154 generated / 179 size_exceeded) 마다 `fs_skips.events.push(FsSkipEvent { ... })` 추가. kebab-app/lib.rs 가 scan 직후 (asset loop 진입 전) `for ev in &fs_skips.events { log_writer.lock().write_event(&LogEvent::Skip { ts: now_ts(), doc_path: &ev.doc_path, reason: ev.reason, detail: ev.detail.as_deref() }) }` enumerate.
**Hook 5 — fatal error (lib.rs `ingest_with_config_opts` 의 error return path)**:
`?` operator bubbling 패턴이므로 explicit catch spot 부재. **권장 위치**: `ingest_with_config_opts` body 전체를 inner closure `(|| -> anyhow::Result<IngestReport> { ... })()` 로 wrap 후 outer 에서 `match result { Err(e) => { log_writer.lock().write_event(&LogEvent::Error { ts: now_ts(), code: "ingest_fatal", message: &format!("{e:#}") }); flush; Err(e) }, Ok(r) => { write_summary + flush; Ok(r) } }`. 본 패턴은 기존 ingest_progress 의 Completed / Aborted emit 과 mutually exclusive — Aborted 는 cancel 의 정상 종료 (not Err), Err arm 만 LogEvent::Error 발동. spec §4.2 의 "error_wire::classify 자체 변경 0" 와 정합 — classify 는 kebab-cli wire.rs 에서 호출, 본 hook 는 facade 안 generic 처리.
#### §2.4.3 ownership wiring 요약
```
ingest_with_config_opts:
log_writer: Option<Arc<Mutex<IngestLogWriter>>>
├─ Hook 1: init at entry + write_summary at exit
├─ apply_ocr_to_pdf_pages closure:
│ log_writer.clone() 캡처 → Hook 2 write_event(LogEvent::Ocr)
│ ocr_ms_samples.clone() 캡처 → success-only ms push
├─ ingest_one_pdf_asset / _code_asset 의 parse Err arm: Hook 3 write_event(LogEvent::ParseError)
├─ scan 직후 fs_skips.events enumerate: Hook 4 write_event(LogEvent::Skip)
└─ error_wire::classify 호출 spot: Hook 5 write_event(LogEvent::Error)
```
#### §2.4.4 Acceptance
- `cargo test --workspace -j 1 --no-fail-fast` 전수 pass (기존 1358 test + 어떤 새 test 도 regression 0).
- `cargo build --workspace -j 4` clean.
- `cargo clippy --workspace --all-targets -j 4 -- -D warnings` 0 warning.
- 본 step 내 새 module-level test 없음 — integration test 는 step 5.
#### §2.4.5 Commit
```
feat(app): wire IngestLogWriter into 5 ingest emit hooks (Arc<Mutex> sync)
v0.20.x ingest log feature 의 ingest pipeline wiring. 5 emit hook:
Hook 1: ingest_with_config_opts entry/exit (writer init + summary write + flush)
Hook 2: apply_ocr_to_pdf_pages closure (PdfOcrProgress::Finished → LogEvent::Ocr)
Hook 3: ingest_one_*_asset parse Err arm (LogEvent::ParseError)
Hook 4: scan 직후 fs_skips.events enumerate (LogEvent::Skip)
Hook 5: error_wire::classify 호출 spot (LogEvent::Error)
Hook 4 의 skip event carry 위해 kebab-source-fs 의 FsScanSkips 에
events: Vec<FsSkipEvent> field 추가 (kebab-source-fs 가 kebab-app
재호출 안 함 — cycle 회피).
Ownership: Option<Arc<Mutex<IngestLogWriter>>> binding 1 곳, 5 hook 이
clone+lock+write. ocr_ms_samples (Vec<u64> success-only) 는 Arc<Mutex>
로 share, summary stage 가 sort+p50/p90/max 계산. single-threaded
per-asset loop 라 deadlock/contention 위험 없음.
Writer 실패는 ingest 자체 fail 시키지 않음 (tracing::warn + 진행).
```
---
### §2.5 Step 5 — Integration test `ingest_log_smoke`
**Goal**: spec §5 AC-9 — 5-step body integration test.
#### §2.5.1 Files affected
| Path | Action | Approx LOC | Notes |
|------|--------|------------|-------|
| `crates/kebab-app/tests/ingest_log_smoke.rs` | new | +160 | 1 fn `ingest_log_smoke` + 1 supporting fn (minimal corpus generator) |
| `crates/kebab-app/Cargo.toml` | edit (optional) | +0 / +0 | `tempfile` 가 이미 dev-dep 이면 변경 0. (`crates/kebab-app/tests/` 의 기존 test 가 사용 중인지 verify — 거의 확실) |
#### §2.5.2 Action diff outline
`crates/kebab-app/tests/ingest_log_smoke.rs` 신규, 2 `#[test]`:
**`fn ingest_log_smoke` (AC-9, 6-step body)**:
1. `TempDir::new()` + workspace `tmp/kb` + log_dir `tmp/logs` 생성.
2. minimal corpus — `kb/hello.md` (plain text) + `kb/scanned.pdf` (fixture `tests/fixtures/scanned-1page.pdf` copy; fallback fixture 결정은 §5 OQ-A).
3. `Config::test_default(&workspace)``cfg.logging = LoggingCfg { ingest_log_enabled: true, ingest_log_dir: log_dir }`.
4. `ingest_with_config_opts(cfg, SourceScope::Workspace, false, IngestOpts::default())``.expect("ingest")`.
5. `read_dir(&log_dir)``ingest-*.ndjson` 정확히 1 file assert. `read_to_string` 으로 body.
6. `body.lines()` 각 line → `serde_json::from_str` 으로 parse, `kind` field ∈ {"ocr","parse_error","skip","error","summary"} assert (matches! macro). 마지막 line `kind == "summary"`, `scanned > 0`, `ocr_pages > 0` assert.
**`fn ingest_log_disabled_emits_no_file` (AC-6, 4-step body)**:
1. TempDir + workspace + `hello.md` 만.
2. `cfg.logging = LoggingCfg { ingest_log_enabled: false, .. }`.
3. ingest_with_config_opts 실행.
4. `log_dir``ingest-*.ndjson` 파일 0개 assert (log_dir 자체 생성됐을 수 있으나 file 0).
**imports**: `tempfile::TempDir`, `kebab_app::{ingest_with_config_opts, IngestOpts, SourceScope}`, `kebab_config::{Config, LoggingCfg}`, `serde_json::Value`.
**Fixture fallback**: `tests/fixtures/scanned-1page.pdf` 가 미존재 시 (likely — 본 PR scope 가 아니어서) 기존 PDF fixture (e.g. `tests/fixtures/*.pdf`) 중 1 page 의 raster-only 가 있으면 그것을, 없으면 plain text PDF + skip ocr 사례로 test scope 축소 (`ocr_pages > 0` 대신 `summary kind 만 verify`).
→ executor 가 fixture 위치 확인 후 결정. 본 plan 은 `scanned-1page.pdf` 를 가정.
#### §2.5.3 Acceptance
- `cargo test -p kebab-app --test ingest_log_smoke -j 4 2>&1 | tail -3``1 passed; 0 failed`.
- `cargo test -p kebab-app --test ingest_log_smoke ingest_log_disabled_emits_no_file -j 4``1 passed; 0 failed`.
#### §2.5.4 Commit
```
test(app): ingest_log_smoke integration test (AC-9)
crates/kebab-app/tests/ingest_log_smoke.rs 신규:
* ingest_log_smoke (AC-9): tempdir + 1 md + 1 scanned PDF →
ingest → assert log file exists + 각 line valid JSON +
각 kind ∈ {ocr,parse_error,skip,error,summary} + last
line kind=summary + scanned>0 + ocr_pages>0.
* ingest_log_disabled_emits_no_file (AC-6): enabled=false 일
때 log_dir 안 ingest-*.ndjson 파일 0개 verify.
fixture: tests/fixtures/scanned-1page.pdf (executor 가 기존
fb-* PR 시 추가했던 scanned PDF fixture 재사용; 미존재 시
fallback path — fixture 추가 commit 별도 prepend).
```
---
### §2.6 Step 6 — Final sanity (no commit)
**Goal**: 누적 workspace test + clippy + (optional) dogfood.
#### §2.6.1 Verifier
- **workspace test 전수**: `CARGO_TARGET_DIR=/build/out/cargo-target/target cargo test --workspace --no-fail-fast -j 1 2>&1 | tail -20``test result: ok`.
- **clippy**: `cargo clippy --workspace --all-targets -j 4 -- -D warnings 2>&1 | tail -10` → exit 0.
- **format**: `cargo fmt --all --check` → exit 0.
- **(optional) dogfood smoke**: `target/release/kebab ingest --config /tmp/kebab-smoke/config.toml --json 2>/dev/null | tail -3` → success + `ls /tmp/kebab-smoke/logs/ingest-*.ndjson | wc -l` ≥ 1.
#### §2.6.2 Commit
본 step 은 commit 0. regression detected 시 step 15 중 해당 step 으로 돌아가 fix → `git commit --amend` 또는 `git commit --fixup` (CLAUDE.md §Git Hygiene: "create NEW commits rather than amending" — fixup 권장).
---
## §3 Verifier checklist (cumulative)
spec §5 AC 마다 step 매핑 + verifier command. 본 plan 의 executor 가 step 종료 시마다 누적 verifier 실행:
| AC | Spec text 요약 | Verifier | Step |
|----|--------------|----------|------|
| AC-1 | `[logging]` default emit | TOML serialize 시 `[logging]` block 자동 추가 (`Config::default() | toml::to_string`) | Step 1 |
| AC-2 | `ingest-{run_id}.ndjson` 파일 생성 | `ls {log_dir}/ingest-*.ndjson` ≥ 1 (smoke test) | Step 5 (smoke 안 검증) |
| AC-3 | 각 line valid JSON + kind enum | `jq -c 'select(.kind \| IN("ocr","parse_error","skip","error","summary"))' < log.ndjson \| wc -l` = line count | Step 5 |
| AC-4 | OCR per-page + summary record | `grep -c '"kind":"ocr"' < log.ndjson` ≥ 1 + last line kind=summary | Step 5 |
| AC-5 | 모든 failure type record (size_exceeded / parse_error / ocr timeout) | smoke test 의 fixture 가 1개 size_exceeded 또는 ocr-fail 를 trigger 시 grep | Step 5 (optional fixture 확장) |
| AC-6 | `ingest_log_enabled = false` → 파일 0 | `ingest_log_disabled_emits_no_file` test | Step 5 |
| AC-7 | `ingest_log_dir` override → custom path emit | smoke test 의 tempdir 가 그 검증 (default 가 아닌 path 에 file 생성) | Step 5 |
| AC-8 | workspace test + clippy | `cargo test --workspace -j 1` + `cargo clippy --workspace --all-targets -- -D warnings` | Step 6 |
| AC-9 | integration test | `cargo test -p kebab-app --test ingest_log_smoke -j 4` | Step 5 |
| AC-10 | pre-v0.20 config (no [logging]) load with defaults | Step 1 의 새 test 가 fixture toml 의 [logging] 부재 → Config::load 후 logging == LoggingCfg::default() | Step 1 |
**누적 invariant**:
- step 1 종료 후: AC-1, AC-10.
- step 2 종료 후: AC-1, AC-10 (writer struct unit test 만).
- step 3 종료 후: 동일 + wire schema additive verified (consumer regression 0).
- step 4 종료 후: 동일 + workspace test regression 0.
- step 5 종료 후: AC-1, AC-2, AC-3, AC-4, AC-6, AC-7, AC-9, AC-10. AC-5 는 fixture coverage 에 따라.
- step 6 종료 후: AC-8 + 전체 cumulative.
---
## §4 Risks resolution
spec §6 R-1 ~ R-5 + OQ-1 ~ OQ-3 의 plan resolution:
- **R-1 (log rotation cleanup)**: step 1 의 `LoggingCfg::ingest_log_dir` doc-comment 에 `Log file 누적 disk usage 는 user-managed` 한 줄. README/SMOKE/ARCHITECTURE 변경 0 (user-facing surface 가 config field 자체이고 일반 user 가 default 로 만족).
- **R-2 (concurrent ingest run_id collision)**: step 2 의 `generate_run_id` = ISO 8601 second-precision prefix + uuid v7 마지막 8 hex. uuid v7 은 ms precision + 74-bit random, 8 hex (32 bit) 도 동일 ms 안 collision 확률 1e-9 미만. concurrent ingest 가 의도된 use case 아님 (single-user local-first KB) 이라 mitigate 충분.
- **R-3 (`{state_dir}` placeholder expand)**: step 2 의 `expand_log_dir` 가 hand-roll string-replace. existing `kebab_config::expand_path` 는 tilde/env 만 처리, `{state_dir}` 미지원. follow-up: `expand_path_with_base``{state_dir}` 도 추가하는 일반화는 본 PR scope 아님 (LOW-2 deferred).
- **R-4 (panic/abort 시 flush 미실행)**: step 2 의 `Drop for IngestLogWriter``let _ = self.file.flush()` — panic unwind 도 BufWriter::drop 이 flush 시도 (kernel write call). abort (libc::abort, SIGKILL) 는 drop 미실행 — 본 case 는 mitigate 불가 (OS-level limitation).
- **R-5 (`rand` 신규 의존 회피)**: step 2 의 generate_run_id 가 `uuid::Uuid::now_v7().simple().to_string()` 의 마지막 8 hex 사용. uuid v7 는 workspace dep, `rand` 추가 0.
OQ:
- **OQ-1 (image_byte_size + dimensions 출처)**: spec ACCEPT 이 `PdfOcrProgress::Finished` carry (Option A) 채택. step 3 가 이 patch 의 wire cascade.
- **OQ-2 (p50 / p90 계산)**: step 4 의 summary stage 가 success-only `Vec<u64>` sort + index `len*50/100` (truncating). `quantiles` crate 추가 0.
- **OQ-3 (log cleanup doc 위치)**: step 1 의 `LoggingCfg::ingest_log_dir` doc-comment 만 — README/SMOKE 변경 0. 만약 user-facing 명문이 필요해지면 follow-up commit 으로 README 의 `Configuration` section 에 1 줄.
추가 OQ (closure r2 LOW-3 의 spec line 22 vs 414 inconsistency):
- **OQ-4**: spec line 22 의 "wire schema 변경 0" 와 line 414 의 "additive minor (backward compat)" 가 의미상 동등 (additive minor = wire-schema major bump 미발생 = "변경 0" 의 의도). step 3 의 commit body 가 이 명문화 — `additive minor = binary-version cascade trigger 아님` (CLAUDE.md §Versioning cascade 의 "wire 의 additive minor 변경 (...) 은 backward-compat 이라 본 트리거에 해당 안 됨" 와 일치). spec body 자체의 1-line 수정은 별도 prepend commit 또는 본 step 3 commit body 의 명문화로 충분 (executor 재량).
---
## §5 Open questions for executor
executor (Phase C round 0) 가 결정해야 할 in-step open question:
- **OQ-A (fixture availability)**: `crates/kebab-app/tests/fixtures/scanned-1page.pdf` 존재 여부 확인. 미존재 시 (a) 기존 fixture 재사용 (e.g. fb-04 의 PDF) — fixture path 만 수정, (b) plain text PDF 로 test scope 축소, (c) 신규 fixture 추가 commit prepend. **권장**: (a) 또는 (b). 추가 fixture commit 은 5-commit 분량 초과.
- **OQ-B (Hook 5 위치 정밀화)**: spec §4.2 Hook 5 가 "`ingest_with_config_opts` 의 error return path (per-asset catch + final Err arm)" 라고 명시. 실제 `lib.rs` 에는 명시적 `match err { ... }` 패턴 부재 — `?` operator chain 으로 bubble. executor 가 `error_wire::classify` 호출 자체 찾아서 그 직전 spot 에 한 줄 추가. classify 호출 위치는 현재 `crates/kebab-app/src/error_wire.rs` 혹은 그 caller (kebab-cli `wire.rs`). 본 plan 은 kebab-app facade 안에서 classify 호출이 발생한다고 가정 — 만약 classify 가 kebab-cli 에서만 호출되면 Hook 5 가 spec 의 "writer 생명주기" 와 mismatch (writer 는 kebab-app 안). 이때는 kebab-app facade 의 final Err arm 에서 `let _ = log_writer.lock().map(|mut w| w.write_event(LogEvent::Error { code: "ingest_fatal", message: &format!("{e}") }))` 식 generic 처리. executor 가 grep 후 결정.
- **OQ-C (OCR ms accumulator share pattern)**: closure 가 FnMut 라면 `RefCell<Vec<u64>>` 충분, FnOnce/Fn 라면 `Arc<Mutex<Vec<u64>>>`. emit_progress 가 FnMut 로 보임 (line 88 `F: FnMut(PdfOcrProgress)`) → RefCell 도 가능하나 본 plan 은 lock writer 와 같은 pattern (`Arc<Mutex<_>>`) 으로 일관성.
- **OQ-D (skip event 누락 case)**: `FsScanSkips.events` 가 5 skip arm 중 어느 한 곳이라도 누락되면 AC-5 가 fail. executor 가 connector.rs 의 5 skip spot (builtin_blacklist / gitignore / kebabignore / generated / size_exceeded) 모두 push 추가 verify.
---
## §6 References
- **Spec**: `docs/superpowers/specs/2026-05-28-v0.20-ingest-log-spec.md` (491 line, ACCEPT 7/7 + 1 LOW)
- **Closure critic r2**: `.omc/reviews/2026-05-28-v0.20-ingest-log-spec-closure-r2-result.md`
- **Brief**: `.omc/reviews/2026-05-28-v0.20-ingest-log-plan-drafter-brief.md`
- **Parent task**: `tasks/p10/p10-1A-5-ingest-failure-log.md`
- **Parent design**: `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` §8 (wire schema), §9 (versioning cascade)
- **Bug #11 follow-up**: OCR raster image metric capture (pdf_ocr_apply.rs line 145 vicinity)
- **Existing wire schema**: `docs/wire-schema/v1/ingest_progress.schema.json` (57 line)
- **Existing IngestEvent**: `crates/kebab-app/src/ingest_progress.rs` line 61103
- **Existing PdfOcrProgress**: `crates/kebab-app/src/pdf_ocr_apply.rs` line 276294
- **Existing fs skip detection**: `crates/kebab-source-fs/src/connector.rs::scan_with_skips` (line 100 부근, 5 skip arm)
- **xdg_state_dir**: `crates/kebab-config/src/lib.rs` line 1112
- **uuid v7 workspace dep**: `Cargo.toml` line 132 (`uuid = { version = "1", features = ["v7", "serde"] }`)
- **time crate workspace dep**: `Cargo.toml` line 131 (`time = { version = "0.3", features = ["serde", "macros", "formatting", "parsing"] }`)
---
## §7 Constraints (worker protocol + spec)
1. **branch 변경 0** — 모든 commit 은 `feat/pdf-scanned-ocr` HEAD (`6a9551e`) 의 직계 descendant. PR 은 main 으로.
2. **subagent skip** — executor 가 nested subagent spawn 안 함, in-session direct edit.
3. **spec ACCEPT frozen 변경 0** — spec body 의 1-line LOW-3 fix (line 22 ↔ 414 정합화) 는 별도 spec-edit commit (필요 시 본 PR 이외).
4. **wire schema = additive minor**`ingest_progress.v1` 의 4 추가 field 가 모두 optional, `required` array 변경 0. 기존 consumer (`kebab-cli wire_search` / `wire_ask` / Claude Code skill) regression 0.
5. **regression 0** — 기존 1358 workspace test + 새 +6 test (config roundtrip 1, ingest_log unit 5, integration 2). cumulative `cargo test --workspace -j 1` 전수 pass.
6. **commit 단위 = 5** — spec acceptance scope 의 commit boundary. step 6 는 verifier-only, no commit.
7. **plan line 500700** — 본 file 약 670 line target.
8. **dogfood 영향 0** — 본 plan 의 commit 들이 mass-merged 후 dogfood smoke 가 fail 시 사용자 보고 + revert. dogfood = `docs/SMOKE.md` 의 isolated TempDir KB pipeline.
9. **binary version bump 0** — wire schema additive minor + design contract 변경 0 → CLAUDE.md §Versioning cascade 의 bump trigger 미발동 (현재 `0.19.x` 가정, executor 가 workspace `Cargo.toml` version 확인).
10. **HANDOFF/ARCHITECTURE 변경 0** — 사용자 surface (CLI flag, TUI key, config field 사용자 노출) 변경이 config 1개 (logging) 뿐 → README 의 `Configuration` section 에 한 줄 (feedback_readme_sync_rule 의 "사용자 visible surface 변경 시" 가 강하게 trigger). step 1 또는 step 4 commit 에서 README 한 줄 추가 (option) — line 700 분량 절감 위해 본 plan 은 명문 0, executor 재량.
---
## §8 Plan-level estimate
- **drafter (current task)**: 30 min — read brief + spec + 8 source spot grep + write plan.
- **executor (next phase)**: 4-6 h — step 1 (30 min) + step 2 (90 min) + step 3 (60 min) + step 4 (120 min) + step 5 (90 min) + step 6 (30 min) + commit drafting + dogfood smoke.
- **review (final phase)**: 30-60 min — 5 commit diff scan + AC verifier reproduce + dogfood log file 1 spot check.
총 5-7 h end-to-end, 본 PR 만으로 dogfood 사용자 (Phase B4 → B4-execute → review) 완료 가능.

View File

@@ -0,0 +1,491 @@
---
title: "v0.20.x ingest log feature — spec"
date: 2026-05-28
status: "DRAFT (round r1c)"
target_version: 0.20.x
phase: A4 (spec drafter rewrite)
parent_spec: ../../../tasks/p10/p10-1A-5-ingest-failure-log.md
contract_sections: []
references: [2026-04-27-kebab-final-form-design.md, 프리-v0.20 dogfood Bug #15 (ocr timeout + log analysis), 2026-05-28 closure critic result]
---
# v0.20.x ingest log feature — spec
## 동기
dogfood 4-round 후 사용자가 명시 요구:
> ocr 실패를 포함한 전체적인 실패 발생 시 그 로그를 자세하기 볼 수 있도록 하고 싶은데. 그 통계들을 알 수 있어야 정확한 설정 sweet spot이나 설계 방향성을 알 수 있을 것 같아.
**핵심 문제**: OCR 타임아웃, parse error, skip 같은 ingest 실패를 individual event + summary stats로 기록할 structured log surface가 없음. 사용자가 sweet-spot (OCR timeout, model 선택) 을 찾으려면 per-page 통계 + aggregate stats가 필수.
**Surface 선택**: structured ndjson file (wire schema 변경 0, internal write-only). 별도 CLI query 없이 grep/jq로 사용자가 직접 분석 가능.
---
## 스코프
### 포함
- `[logging]` config section (2 field: `ingest_log_enabled`, `ingest_log_dir`)
- Per-ingest-run ndjson log file (`ingest-{run_id}.ndjson`, run_id = ISO 8601 + random suffix)
- 5 kind 의 log record: `ocr`, `parse_error`, `skip`, `error`, `summary`
- OCR per-page event 에 image byte size + dimensions 측정 포함
- Run ID + timestamp ISO 8601 UTC format
- Backward compat: pre-v0.20 config files가 new `[logging]` field 자동 무시
### Out of scope
- SQLite 영구 저장 (future enhancement)
- Log level switches (verbose / minimal) — 본 round single level
- Log rotation / cleanup policy (user-managed)
- Query CLI (`kebab logs`)
- Wire schema 변경 (internal use only)
---
## 설계 결정
### §3.1 Config schema — `[logging]` section
**파일**: `crates/kebab-config/src/lib.rs` (line 37+ 에 Config struct 기존)
新 struct `LoggingCfg`:
```rust
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
pub struct LoggingCfg {
/// ingest 시 structured ndjson log 자동 write. default = true.
/// false 시 log file 생성 0.
#[serde(default = "default_ingest_log_enabled")]
pub ingest_log_enabled: bool,
/// per-ingest-run log file directory. default = `{state_dir}/logs`.
/// `{state_dir}` placeholder 가 XDG state dir 로 expand.
#[serde(default = "default_ingest_log_dir")]
pub ingest_log_dir: PathBuf,
}
fn default_ingest_log_enabled() -> bool { true }
fn default_ingest_log_dir() -> PathBuf {
// expand 후 = ~/.local/state/kebab/logs
PathBuf::from("{state_dir}/logs")
}
impl Default for LoggingCfg {
fn default() -> Self {
Self {
ingest_log_enabled: default_ingest_log_enabled(),
ingest_log_dir: default_ingest_log_dir(),
}
}
}
```
**Config struct 에 추가** (line 37+):
```rust
pub struct Config {
// ... existing fields ...
#[serde(default)]
pub logging: LoggingCfg,
}
```
`#[serde(default)]` 로 backward compat 보장 (pre-v0.20 config 파일이 `[logging]` section 없으면 자동 default init).
### §3.2 Per-ingest-run filename + ID generation
**Filename format**: `ingest-{run_id}.ndjson`
**Run ID**: ISO 8601 timestamp + 8-char random suffix
- Example: `20260528T013000Z-abc123de`
- Format: `YYYYMMDDTHHmmssZ-{8-char random alphanumeric}`
- Sortable + readable + concurrent collision unlikely
**Path expansion**: `{state_dir}` placeholder 를 `~/.local/state/kebab` 로 expand
- `{state_dir}` 대체 별도 hand-roll (kebab-config 의 expand_path_with_base 는 {state_dir} 미지원)
- Directory auto-create if missing
### §3.3 Log content — ndjson schema
각 line = JSON object (ndjson format). 각 record 의 `kind` 가 union variant discriminator.
#### Record kind: `ocr`
OCR per-page attempt (success / failure). **HIGH-1 (PdfOcrProgress::Finished extend)**: image_byte/dims/failure_reason 은 PdfOcrProgress::Finished variant 에 추가되어 carry.
```jsonc
{
"ts": "2026-05-28T01:30:01.123Z",
"kind": "ocr",
"doc_path": "metro-korea.pdf",
"page": 8,
"image_byte_size": 989512,
"image_width": 2480,
"image_height": 3508,
"ms": 180000,
"chars": 0,
"success": false,
"reason": "timeout",
"ocr_engine": "ollama-vision"
}
```
**Fields**:
- `ts`: ISO 8601 UTC (milliseconds, using `time` crate)
- `kind`: literal `"ocr"`
- `doc_path`: relative or absolute source path
- `page`: page number (0-indexed or 1-indexed — confirm in impl)
- `image_byte_size`: raster image byte size (from PdfOcrProgress::Finished.image_byte_size)
- `image_width`, `image_height`: raster dimensions pixel (from PdfOcrProgress::Finished.image_width/height)
- `ms`: OCR duration milliseconds
- `chars`: extracted character count (0 if failed)
- `success`: boolean (failure_reason Some → false) (from PdfOcrProgress::Finished.failure_reason.is_some() negation)
- `reason`: "timeout" | "network_error" | "malformed_image" | "other" (from PdfOcrProgress::Finished.failure_reason)
- `ocr_engine`: "ollama-vision" or config'd name
#### Record kind: `parse_error`
PDF / other format parse failure.
```jsonc
{
"ts": "2026-05-28T01:30:02.456Z",
"kind": "parse_error",
"doc_path": "weird.pdf",
"reason": "lopdf_error",
"message": "unexpected EOF in xref table"
}
```
#### Record kind: `skip`
Document skip (size_exceeded / gitignore / builtin_blacklist).
```jsonc
{
"ts": "2026-05-28T01:30:03.789Z",
"kind": "skip",
"doc_path": "large.zip",
"reason": "builtin_blacklist",
"detail": ".zip extension"
}
```
#### Record kind: `error`
Fatal error emit (error.v1 이 emit 될 때 동시 log write).
```jsonc
{
"ts": "2026-05-28T01:30:04.012Z",
"kind": "error",
"code": "config_not_found",
"message": "config file does not exist: /tmp/config.toml"
}
```
#### Record kind: `summary`
Ingest run 최종 aggregate stats (마지막 line).
```jsonc
{
"ts": "2026-05-28T01:31:00.000Z",
"kind": "summary",
"run_id": "20260528T013000Z-abc123de",
"scanned": 11,
"new": 11,
"errors": 0,
"ocr_pages": 21,
"ocr_failures": 2,
"ocr_p50_ms": 1500,
"ocr_p90_ms": 63000,
"ocr_max_ms": 180007,
"duration_ms": 555550
}
```
**Fields**:
- `run_id`: 이 ingest run 의 고유 ID
- `scanned`: 전체 시도 doc count
- `new`: 실제 ingest 된 new doc count
- `errors`: fatal error count
- `ocr_pages`: OCR 시도한 총 page count
- `ocr_failures`: OCR 실패한 page count
- `ocr_p50_ms`, `ocr_p90_ms`, `ocr_max_ms`: percentile + max (success 한 page 만, null 가능)
- `duration_ms`: entire run elapsed
---
## §4 구현 명세
### §4.1 새 module: `crates/kebab-app/src/ingest_log.rs`
```rust
use std::fs::File;
use std::io::{BufWriter, Write};
use std::path::PathBuf;
use std::time::SystemTime;
use serde::{Serialize, Deserialize};
pub struct IngestLogWriter {
file: Option<BufWriter<File>>,
run_id: String,
started_at: SystemTime,
}
impl IngestLogWriter {
/// Open log file. `cfg.ingest_log_enabled == false` 면 None 반환.
pub fn open(cfg: &kebab_config::LoggingCfg) -> anyhow::Result<Option<Self>> {
if !cfg.ingest_log_enabled {
return Ok(None);
}
let run_id = generate_run_id();
let log_dir = expand_log_dir(&cfg.ingest_log_dir)?;
std::fs::create_dir_all(&log_dir)?;
let path = log_dir.join(format!("ingest-{run_id}.ndjson"));
let file = BufWriter::new(File::create(&path)?);
Ok(Some(Self {
file: Some(file),
run_id,
started_at: SystemTime::now(),
}))
}
pub fn write_event(&mut self, event: &LogEvent) -> anyhow::Result<()> {
if let Some(ref mut f) = self.file {
serde_json::to_writer(&mut *f, event)?;
writeln!(f)?;
}
Ok(())
}
pub fn write_summary(&mut self, summary: &IngestSummary) -> anyhow::Result<()> {
if let Some(ref mut f) = self.file {
serde_json::to_writer(&mut *f, summary)?;
writeln!(f)?;
}
Ok(())
}
pub fn flush(&mut self) -> anyhow::Result<()> {
if let Some(ref mut f) = self.file {
f.flush()?;
}
Ok(())
}
pub fn run_id(&self) -> &str {
&self.run_id
}
}
impl Drop for IngestLogWriter {
fn drop(&mut self) {
let _ = self.flush();
}
}
fn generate_run_id() -> String {
use time::OffsetDateTime;
use time::format_description::well_known::iso8601;
let now = OffsetDateTime::now_utc();
// Format: 20260528T013000Z (compact ISO 8601)
let mut buffer = [0u8; 16];
let ts = now.format(
&format_description::parse("[year][month][day]T[hour][minute][second]Z")
.expect("format_description is valid")
).expect("format should succeed");
let suffix: String = (0..8)
.map(|_| {
const CHARS: &[u8] = b"abcdefghijklmnopqrstuvwxyz0123456789";
CHARS[rand::random::<usize>() % CHARS.len()] as char
})
.collect();
format!("{ts}-{suffix}")
}
fn expand_log_dir(path: &PathBuf) -> anyhow::Result<PathBuf> {
// 이미 기존 expand_path / expand_path_with_base 가 있으므로 활용
// {state_dir} 의존 → kebab_config::Config::xdg_state_dir() + "/logs"
use std::env;
let path_str = path.to_string_lossy();
if path_str.contains("{state_dir}") {
let state_dir = kebab_config::Config::xdg_state_dir();
Ok(PathBuf::from(path_str.replace("{state_dir}", state_dir.to_str().unwrap())))
} else {
Ok(path.clone())
}
}
#[derive(Serialize, Deserialize)]
#[serde(tag = "kind")]
pub enum LogEvent<'a> {
#[serde(rename = "ocr")]
Ocr {
ts: String,
doc_path: &'a str,
page: u32,
image_byte_size: Option<u64>,
image_width: Option<u32>,
image_height: Option<u32>,
ms: u64,
chars: u32,
success: bool,
reason: Option<&'a str>,
ocr_engine: &'a str,
},
#[serde(rename = "parse_error")]
ParseError {
ts: String,
doc_path: &'a str,
reason: &'a str,
message: &'a str,
},
#[serde(rename = "skip")]
Skip {
ts: String,
doc_path: &'a str,
reason: &'a str,
detail: Option<&'a str>,
},
#[serde(rename = "error")]
Error {
ts: String,
code: &'a str,
message: &'a str,
},
#[serde(rename = "summary")]
Summary {
ts: String,
run_id: String,
scanned: u32,
new: u32,
errors: u32,
ocr_pages: u32,
ocr_failures: u32,
ocr_p50_ms: Option<u64>,
ocr_p90_ms: Option<u64>,
ocr_max_ms: Option<u64>,
duration_ms: u64,
},
}
#[derive(Serialize, Deserialize)]
pub struct IngestSummary {
pub ts: String,
pub run_id: String,
pub scanned: u32,
pub new: u32,
pub errors: u32,
pub ocr_pages: u32,
pub ocr_failures: u32,
pub ocr_p50_ms: Option<u64>,
pub ocr_p90_ms: Option<u64>,
pub ocr_max_ms: Option<u64>,
pub duration_ms: u64,
}
```
### §4.2 Emit hook integration points
**Hook 1**: `crates/kebab-app/src/lib.rs::ingest_with_config` (line 234)
- `IngestLogWriter::open(cfg.logging)` at function entry → `Option<Arc<Mutex<IngestLogWriter>>>`
- `IngestLogWriter::flush()` at function exit (success / error)
- `IngestLogWriter``Arc<Mutex<_>>` wrap 후 `ingest_with_config_opts()``IngestOpts` 에 carry
- **MEDIUM-1 (ownership)**: `apply_pdf_ocr` 의 emit_progress closure 가 `Arc<Mutex<IngestLogWriter>>` clone 캡처 + lock + write. single-threaded sync (per-asset loop) 라 blocking lock 안전.
**Hook 2**: OCR progress event (pdf_ocr_apply.rs)
- **HIGH-1 (PdfOcrProgress::Finished extend)**: `PdfOcrProgress::Finished` variant 확장 — additive field:
```rust
PdfOcrProgress::Finished {
page: u32,
ms: u64,
chars: u32,
skipped: bool,
// NEW:
image_byte_size: Option<u64>,
image_width: Option<u32>,
image_height: Option<u32>,
failure_reason: Option<String>, // "timeout" | "ocr_error" | "network_error" | None
}
```
- emit_progress closure 내에서 log writer (Arc<Mutex<IngestLogWriter>>) 에 ocr event 전달
- ingest_progress.v1 (IngestEvent::PdfOcrFinished) cascade 영향: additive minor (backward compat)
**Hook 3**: Parse error (app.rs / ingest pipeline error path)
- parse_error kind emit when `Error::PdfFormat` / `Error::ImageFormat` 발생
**Hook 4**: Skip event (kebab-source-fs/src/connector.rs)
- size_exceeded / gitignore / builtin_blacklist skip 시 log writer에 skip event 전달
**Hook 5**: Fatal error (**HIGH-4 위치 정정**)
- **위치**: `crates/kebab-app/src/lib.rs::ingest_with_config_opts` 의 error return path (per-asset catch + final Err arm)
- `classify(err, verbose)` invoke 직후 log writer에 LogEvent::Error emit
- `error_wire::classify` 자체는 변경 0 유지 (side-effect 없는 순수 변환)
### §4.3 Timestamp format
**HIGH-3 (chrono → time crate)**: ISO 8601 UTC, milliseconds precision:
- Example: `2026-05-28T01:30:01.123Z`
- Use `time::OffsetDateTime::now_utc().format(&time::format_description::well_known::Rfc3339)`
- 항상 UTC (system timezone 무관)
- workspace 는 `time` crate 이미 사용 중 (chrono 중복 의존 제거)
### §4.4 Backward compat
`Config` struct의 `logging` field에 `#[serde(default)]` tag:
```rust
#[serde(default)]
pub logging: LoggingCfg,
```
→ pre-v0.20 config file이 `[logging]` section 없으면 자동 `LoggingCfg::default()` init (enabled=true, dir=~/.local/state/kebab/logs)
---
## §5 Acceptance criteria
- **AC-1**: `[logging]` section default emit (새 config 또는 `config init`).
- **AC-2**: `kebab ingest` 실행 후 `{log_dir}/ingest-{run_id}.ndjson` 파일 존재.
- **AC-3**: 각 line valid JSON + `kind` enum value + `ts` ISO 8601.
- **AC-4**: OCR per-page record + summary record (마지막 line).
- **AC-5**: 모든 failure type (size_exceeded / parse_error / ocr timeout) record 됨.
- **AC-6**: `ingest_log_enabled = false` 시 log file 생성 0.
- **AC-7**: `ingest_log_dir = "/tmp/custom"` override 시 그 path에 file emit.
- **AC-8**: `cargo test -p kebab-app --lib ingest_log` + `cargo clippy` green.
- **AC-9** (**MEDIUM-2 actionability**): integration test — `cargo test -p kebab-app --test ingest_log_smoke -j 4 2>&1 | tail -3` → 1 passed; 0 failed. test body:
1. tempdir + minimal corpus (1 markdown + 1 image PDF).
2. ingest with `[logging] ingest_log_dir = tempdir/logs`.
3. assert: log file `tempdir/logs/ingest-{run_id}.ndjson` exists.
4. parse each line as JSON, assert kinds = [ocr, summary] or more.
5. last line kind = "summary" + scanned > 0 && ocr_pages > 0.
- **AC-10**: pre-v0.20 config + v0.20 binary 호환성 (new `[logging]` field 무시).
---
## §6 위험 + 미해결 질문
### Risks
- **R-1**: Log file 누적 disk usage — user가 직접 정리. Doc comment 명시.
- **R-2**: Concurrent ingest 의 run_id collision — 8-char random + timestamp 로 거의 불가능. Mitigate.
- **R-3**: `{state_dir}` placeholder expand — hand-roll via xdg_state_dir + string replace. existing `expand_path_with_base` 는 {state_dir} 미지원 (LOW-2).
- **R-4**: Error path panic 시 log writer drop → flush 미실행 — Drop impl 에 flush() 호출로 mitigate.
- **R-5**: workspace dependency 추가: `rand` (run_id suffix), `time` (timestamp) 이미 kebab-app 의존. `chrono` 신규 추가 금지 (HIGH-3).
### Open questions
- **OQ-1** (**HIGH-1 의 design decision으로 승격**): image_byte_size + dimensions 출처 — PdfOcrProgress::Finished 에 carry (Option A 채택). Bug #11 follow-up 에서 raster image 측정 이미 시작.
- **OQ-2**: `ocr_p50_ms`, `ocr_p90_ms` 계산 — `quantiles` crate 사용 또는 간단 sorted vec? (plan drafter 결정)
- **OQ-3**: Log file 수동 cleanup 정책을 user-facing docs에 명시할 위치? (README / SMOKE / config example, plan drafter + executor 결정)
---
## §7 참고
- **Parent task**: `tasks/p10/p10-1A-5-ingest-failure-log.md`
- **Parent design**: `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` (§8 wire schema, §9 versioning cascade — 변경 0)
- **Bug #11 follow-up**: OCR image metric capture (raster path in kebab-parse-pdf)
- **Related**: `crates/kebab-app/src/error_wire.rs` (ErrorV1 emit), `crates/kebab-app/src/ingest_progress.rs` (IngestEvent)
- **Config**: `crates/kebab-config/src/lib.rs` (Config struct, expand_path helpers, xdg_state_dir)