Files
kebab/docs/superpowers/plans/2026-05-28-v0.20.x-logging-r2-plan.md
altair823 7c24734cc7 docs(superpowers): v0.20.x logging r2 spec + plan artifacts
logging round 2 (4 enhancement: image_w/h + V008 SQLite mirror + CLI inspect + retention) 의 spec/plan ACCEPT 후 round artifacts.

- spec: 751 line (ACCEPT, 7/7 critic round 1 finding + 7/7 closure r2 traceability)
- plan: 576 line (ACCEPT, 6/6 step + 13/13 AC + G1/G2/G3 plan-level resolve)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 08:04:32 +00:00

34 KiB
Raw Blame History

title, created, status, phase, target_spec, critic_r1, critic_r2, branch, step_count, commit_count
title created status phase target_spec critic_r1 critic_r2 branch step_count commit_count
v0.20.x ingest log round 2 — implementation plan 2026-05-28 DRAFT round 0 B5 plan drafter docs/superpowers/specs/2026-05-28-v0.20.x-logging-r2-spec.md .omc/reviews/2026-05-28-v0.20.x-logging-r2-spec-closure-result.md .omc/reviews/2026-05-28-v0.20.x-logging-r2-spec-closure-r2-result.md feat/ingest-log-round2-enhancements 6 5

v0.20.x ingest log round 2 — implementation plan

For agentic workers: REQUIRED SUB-SKILL — superpowers:executing-plans (or superpowers:subagent-driven-development) to implement task-by-task. Steps use - [ ] checkbox syntax. Each Step block carries its own commit boundary; Step 6 is verify-only (no commit).

Goal. Extend v0.20.0 round 1 ingest log (file-only ndjson) with four additive enhancements: (1) raster image dimensions, (2) SQLite mirror via V008 + dual-write, (3) CLI inspect ocr-stats / inspect ocr-failures + two wire schemas, (4) automatic file + SQLite retention. All non-breaking; wire schema cascades as additive minor; existing 1370 workspace test → 1375+.

Architecture. Hook the dual-write at the existing OCR emit_progress closure inside ingest_with_config_opts. File write first (durable), SQLite second (non-critical — tracing::warn! on failure). The closure pre-captures doc_id and clones Arc<SqliteStore> from App.sqlite before apply_ocr_to_pdf_pages runs (F1 + G1 resolution). Two App::inspect_* methods follow the facade rule via *_with_config companions (G2). Runtime introspection emit WIRE_SCHEMAS gets two entries; the JSON Schema file schema.schema.json is untouched (G3: it's pattern-based).

Tech stack. Rust 2024, rusqlite (existing), image crate (jpeg feature add), time (existing). No new workspace deps.

Spec contract. docs/superpowers/specs/2026-05-28-v0.20.x-logging-r2-spec.md (751 line). Implements 13 AC + resolves 3 closure-r2 findings (G1/G2/G3) + 6 closure-r1 findings (F1F6).


File map

Modify (10):

  • crates/kebab-app/Cargo.tomlimage features += "jpeg" (F3).
  • crates/kebab-app/src/pdf_ocr_apply.rsextract_image_dimensions helper + fill 6 emit points (Enhancement 1).
  • crates/kebab-app/src/ingest_log.rsLogEvent::Ocr adds doc_id; IngestLogWriter::open runs cleanup_old_logs; percentiles returns (p50, p90, p99, max).
  • crates/kebab-app/src/lib.rs::ingest_with_config_opts — pre-capture doc_id_for_log, clone Arc<SqliteStore>, dual-write inside closure (F1 + G1).
  • crates/kebab-app/src/schema.rsWIRE_SCHEMAS += "ocr_stats.v1", "ocr_failures.v1" (G3 runtime emit).
  • crates/kebab-app/src/app.rsinspect_ocr_stats / inspect_ocr_failures + *_with_config companion pair (G2) + 4 wire structs.
  • crates/kebab-config/src/lib.rsLoggingCfg += keep_recent_runs: u32, retention_days: u32 (#[serde(default=...)], defaults 100 + 30, AC-9).
  • crates/kebab-store-sqlite/src/store.rsrecord_pdf_ocr_event + prune_pdf_ocr_events (Mutex lock pattern, F2).
  • crates/kebab-cli/src/main.rsInspectWhat::OcrStats + InspectWhat::OcrFailures variants + dispatch arms calling *_with_config (G2).
  • integrations/claude-code/kebab/SKILL.md — list ocr_stats.v1 + ocr_failures.v1 (AC-13).

Create (5):

  • migrations/V008__pdf_ocr_events.sql — table + 3 indices.
  • docs/wire-schema/v1/ocr_stats.schema.jsonocr_stats.v1 JSON Schema (verbatim from spec §4.3).
  • docs/wire-schema/v1/ocr_failures.schema.jsonocr_failures.v1 JSON Schema (verbatim from spec §4.3).
  • crates/kebab-app/tests/ocr_inspect_smoke.rs — AC-4/5/6/13.
  • crates/kebab-store-sqlite/tests/pdf_ocr_events_insert_smoke.rs — AC-2/3/8.

Do NOT modify:

  • docs/wire-schema/v1/schema.schema.jsonwire.schemas is pattern-based (G3).
  • Spec ACCEPT frozen. Round 1 spec / PDF OCR spec / parent design doc untouched.

Closure r2 plan-level resolution

G1 (MEDIUM) — SqliteStore::open signature + double-open risk

Spec §4.7 snippet SqliteStore::open(&cfg.storage.sqlite)? fails to compile — real signature at crates/kebab-store-sqlite/src/store.rs:123 is pub fn open(config: &kebab_config::Config) -> Result<Self>. Worse, a fresh open() would create a second connection contending with App::sqlite for the single-writer lock and re-running migrations.

Resolution (Step 3.3). Inside ingest_with_config_opts, beside the existing lw_for_ocr pre-clone at crates/kebab-app/src/lib.rs:1931, add:

let store_for_ocr: Arc<SqliteStore> = Arc::clone(&app.sqlite);

App.sqlite is pub(crate) sqlite: Arc<SqliteStore> at crates/kebab-app/src/app.rs:123 — visible inside the crate. The closure moves store_for_ocr. No new SqliteStore::open call anywhere.

G2 (LOWMEDIUM) — facade rule for inspect_ocr_*

Spec §4.4 declares only App::inspect_ocr_stats(&self). CLAUDE.md "the facade rule" requires a *_with_config(cfg, …) companion. HOTFIXES P3-5 and P4-3 record two regressions of this exact shape.

Resolution (Step 4.5). Both methods ship as a pair:

pub fn inspect_ocr_stats(&self) -> Result<OcrStatsV1> {
    self.inspect_ocr_stats_with_config(&self.config)
}
#[doc(hidden)]
pub fn inspect_ocr_stats_with_config(&self, _cfg: &Config) -> Result<OcrStatsV1> {  }

Same shape for inspect_ocr_failures. The CLI dispatch (Step 4.6) calls the explicit _with_config form so --config <path> is honored — current ingest path already threads cfg via App::open_with_config(cfg.clone()).

G3 (LOW) — wire schema runtime emit, not JSON Schema file

Spec §4.3 line 299-301 says "schema.schema.json 갱신" but docs/wire-schema/v1/schema.schema.json declares "schemas": { "type": "array", "items": { "type": "string", "pattern": "^[a-z_]+\\.v[0-9]+$" } } — pattern-based, not enum. The two new strings already match.

Resolution (Step 4.2). Extend WIRE_SCHEMAS const at crates/kebab-app/src/schema.rs:104-119 (runtime emit source for kebab schema --json). Two new schema files (ocr_stats.schema.json + ocr_failures.schema.json) are still created as the external contract; the meta schema.schema.json is untouched.


Closure r1 finding resolution recap

Finding Severity Plan step Resolution
F1 doc_id NULL in dual-write HIGH Step 3 pre-capture canonical.doc_id.0.clone()
F2 SqliteStore conn lock pattern MEDIUM Step 2 self.conn.lock().expect("sqlite lock poisoned")
F3 image crate jpeg feature MEDIUM Step 1 features = ["png", "jpeg"]
F4 percentile + p99 LOW Step 4 extend ingest_log::percentiles to 4-tuple
F5 OR-on-stale wording LOW Step 5 inline comment in cleanup_old_logs
F6 V008 rollback LOW (spec §6 R-2) already in spec, no plan action

Step 1 — image_width / image_height capture (Enhancement 1)

Implements: AC-1. Commit 1/5.

Files: Modify crates/kebab-app/Cargo.toml, crates/kebab-app/src/pdf_ocr_apply.rs.

  • 1.1 Cargo feature. image = { version = "0.25", default-features = false, features = ["png", "jpeg"] } (was ["png"] only). Rationale F3.

  • 1.2 Failing unit tests in pdf_ocr_apply::tests:

    • extract_image_dimensions_valid_jpeg — load 16x12 fixture JPEG, assert Some((16, 12)).
    • extract_image_dimensions_corrupt_returns_noneb"not a jpeg"None.

    If fixtures/pdf_ocr/sample_16x12.jpg doesn't exist, generate via convert -size 16x12 xc:white … and commit. If a comparable small JPEG already lives under fixtures/, reuse it.

  • 1.3 Helper impl. Add near the top of pdf_ocr_apply.rs:

    fn extract_image_dimensions(jpeg_bytes: &[u8]) -> Option<(u32, u32)> {
        use image::ImageReader;
        ImageReader::new(std::io::Cursor::new(jpeg_bytes))
            .with_guessed_format().ok()?
            .into_dimensions().ok()
    }
    

    Returns Option so corrupt JPEG falls back to (None, None) (R-4 mitigation).

  • 1.4 Fill 6 emit points at lines 149, 155, 181, 188, 259, 265 (probe-confirmed). Each currently hardcodes image_width: None, image_height: None. Pattern:

    let (image_width, image_height) = extract_image_dimensions(&page_image_bytes)
        .map(|(w, h)| (Some(w), Some(h)))
        .unwrap_or((None, None));
    

    Then pass image_width / image_height (and image_byte_size: Some(page_image_bytes.len() as u64)) into the PdfOcrProgress::Finished constructor. For skip / error paths where page_image_bytes isn't in scope, leave None — AC-1 only requires non-null on successful raster decode.

  • 1.5 Verify. cargo test -p kebab-app --lib pdf_ocr_apply::tests + cargo test -p kebab-app --test pdf_ocr_roundtrip.

  • 1.6 Commit. Use HEREDOC form so the body formats correctly:

    feat(app): capture image_width/height in PDF OCR raster decode (Enhancement 1)
    
    Add extract_image_dimensions(jpeg_bytes) helper using image::ImageReader
    and fill the 6 PdfOcrProgress::Finished emit points in pdf_ocr_apply.rs.
    Result: LogEvent::Ocr now carries non-null image_width/image_height on
    successful raster decode, enabling future size-conditioned timeout tuning.
    
    Closure r1 F3: kebab-app/Cargo.toml image features += "jpeg" (explicit
    rather than relying on feature unification via kebab-parse-image).
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    

Step 2 — V008 migration + SqliteStore record/prune (Enhancement 2 — schema half)

Implements: AC-2, AC-8 prune logic. Commit 2/5.

Files: Create migrations/V008__pdf_ocr_events.sql, crates/kebab-store-sqlite/tests/pdf_ocr_events_insert_smoke.rs. Modify crates/kebab-store-sqlite/src/store.rs.

  • 2.1 V008 SQL per spec §4.2:

    CREATE TABLE pdf_ocr_events (
      id INTEGER PRIMARY KEY,
      run_id TEXT NOT NULL, ts TEXT NOT NULL,
      doc_id TEXT, doc_path TEXT NOT NULL, page INTEGER NOT NULL,
      image_byte_size INTEGER, image_width INTEGER, image_height INTEGER,
      ms INTEGER NOT NULL, chars INTEGER NOT NULL,
      success INTEGER NOT NULL, reason TEXT, ocr_engine TEXT NOT NULL
    );
    CREATE INDEX idx_pdf_ocr_events_doc_id ON pdf_ocr_events(doc_id);
    CREATE INDEX idx_pdf_ocr_events_run_id ON pdf_ocr_events(run_id);
    CREATE INDEX idx_pdf_ocr_events_ts ON pdf_ocr_events(ts);
    
  • 2.2 Failing smoke tests in new pdf_ocr_events_insert_smoke.rs:

    • v008_pdf_ocr_events_table_exists — open TempDir SqliteStore, run_migrations, SELECT name FROM sqlite_master WHERE name='pdf_ocr_events'.
    • record_and_prune_pdf_ocr_event — insert 2 rows (ts = 1970-… and ts = 2026-…), prune_pdf_ocr_events(0) returns 1, only the future-stamped row survives.

    Use existing per-test Config builder (probe crates/kebab-store-sqlite/tests/*.rs for prior smoke pattern; if absent, build Config::default() with data_dir = tempdir.path()).

  • 2.3 Implement two pub fn on impl SqliteStore in store.rs, signatures per spec §4.2 line 186-228:

    #[allow(clippy::too_many_arguments)]
    pub fn record_pdf_ocr_event(&self,
        run_id: &str, ts: &str, doc_id: Option<&str>, doc_path: &str, page: u32,
        image_byte_size: Option<u64>, image_width: Option<u32>, image_height: Option<u32>,
        ms: u64, chars: u32, success: bool, reason: Option<&str>, ocr_engine: &str,
    ) -> anyhow::Result<()>;
    
    pub fn prune_pdf_ocr_events(&self, retention_days: u32) -> anyhow::Result<u64>;
    

    Body: F2 lock pattern let conn = self.conn.lock().expect("sqlite lock poisoned"); followed by INSERT INTO pdf_ocr_events (…) VALUES (…) / DELETE FROM pdf_ocr_events WHERE ts < ? (cutoff from time::OffsetDateTime::now_utc() - Duration::days(retention_days as i64), RFC 3339 format). prune returns the deleted row count (u64).

  • 2.4 Verify. cargo test -p kebab-store-sqlite --test pdf_ocr_events_insert_smoke.

  • 2.5 Commit.

    feat(store): V008 pdf_ocr_events migration + record/prune API (Enhancement 2)
    
    Add migrations/V008__pdf_ocr_events.sql with the events table + 3
    indices (doc_id, run_id, ts). SqliteStore gains two pub fn:
    record_pdf_ocr_event (insert one OCR sample) and prune_pdf_ocr_events
    (delete rows older than retention_days; returns the affected row
    count). Both follow the existing Mutex<Connection> lock pattern.
    
    Wiring into ingest path lands in the next commit.
    
    Closure r1 F2: explicit lock acquisition in both methods.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    

Step 3 — Dual-write integration (Enhancement 2 — wiring half)

Implements: AC-3. Resolves F1 + G1. Commit 3/5.

Files: Modify crates/kebab-app/src/ingest_log.rs, crates/kebab-app/src/lib.rs.

  • 3.1 Extend LogEvent::Ocr at ingest_log.rs:116 — add doc_id: Option<&'a str> as the second field (after ts, before doc_path). Additive Serde — round 1 ndjson logs deserialize with doc_id = None (Serde's default Option handling, no #[serde(default)] needed because round 1 readers don't exist outside the test suite).

  • 3.2 Failing integration test appended to pdf_ocr_events_insert_smoke.rs:

    • ingest_dual_write_doc_id_matches_ndjson — run scanned-PDF ingest via App::open_with_config, then for every emitted OCR event assert pdf_ocr_events.doc_id equals the ndjson LogEvent::Ocr.doc_id, and both equal canonical.doc_id. Reuse the pdf_ocr_roundtrip test scaffold for the scanned fixture.
  • 3.3 Pre-capture in ingest_with_config_opts. Immediately after the existing let doc_path_for_log = asset.workspace_path.0.clone(); at lib.rs:1935, add:

    let doc_id_for_log: String = canonical.doc_id.0.clone();
    let store_for_ocr: Arc<SqliteStore> = Arc::clone(&app.sqlite);  // G1
    let run_id_for_log: String = lw_for_ocr.as_ref()
        .and_then(|lw| lw.lock().ok().map(|w| w.run_id().to_string()))
        .unwrap_or_default();
    

    canonical is bound at lib.rs:1912 via app.extract_for(...) — in scope before the closure. All three captures are owned values / Arc → Send + 'static (rusqlite Connection is not Send, but Mutex<Connection> inside Arc<SqliteStore> is Send + Sync).

  • 3.4 Wire into closure. Inside the existing PdfOcrProgress::Finished arm at lib.rs:1950-2000, replace the current file-write block with: (a) bind ts_for_event = ingest_log::now_ts(), (b) pass doc_id: Some(&doc_id_for_log) into the existing LogEvent::Ocr construction, (c) immediately after the file write, call store_for_ocr.record_pdf_ocr_event(&run_id_for_log, &ts_for_event, Some(&doc_id_for_log), &doc_path_for_log, page, image_byte_size, image_width, image_height, ms, chars, success, reason_str, engine.engine_name()) wrapped in if let Err(e) = … { tracing::warn!(target: "kebab-app", "sqlite ocr event insert failed: {e}"); }. File write first → SQLite second (R-1 ordering).

  • 3.5 Update existing tests that consume LogEvent::Ocr:

    • ingest_log_smoke (and any sibling smoke that round-trips ndjson) — assertions must accept the new doc_id field. Either ignore or assert Some(_).
    • logging_roundtrip integration test — same.
  • 3.6 Verify. cargo test -p kebab-app --test pdf_ocr_roundtrip ingest_log_smoke logging_roundtrip + cargo test -p kebab-store-sqlite --test pdf_ocr_events_insert_smoke ingest_dual_write_doc_id_matches_ndjson.

  • 3.7 Commit.

    feat(app): dual-write PDF OCR events to SQLite + ndjson (Enhancement 2 wiring)
    
    Pre-capture canonical.doc_id and Arc<SqliteStore> before the OCR
    emit_progress closure so both the ndjson file and the SQLite mirror
    carry the same doc_id for every event. File write is durable
    (errors propagate); SQLite insert is non-critical (tracing::warn on
    failure, ingest does not abort) per spec R-1.
    
    LogEvent::Ocr gains a doc_id: Option<&str> field as an additive
    Serde change — round 1 ndjson logs deserialize with doc_id=None.
    
    Closure r1 F1: doc_id NULL in dual-write resolved via
    let doc_id_for_log = canonical.doc_id.0.clone() pre-capture.
    Closure r2 G1: Arc::clone(&app.sqlite) reused instead of opening a
    second SqliteStore — eliminates double-open lock contention and
    duplicate migration runs.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    

Step 4 — CLI inspect commands + wire schemas (Enhancement 3)

Implements: AC-4, AC-5, AC-6, AC-11 (ocr_inspect_smoke), AC-13. Resolves G2 + G3 + F4. Commit 4/5.

Files: Modify crates/kebab-app/src/ingest_log.rs, crates/kebab-app/src/schema.rs, crates/kebab-app/src/app.rs, crates/kebab-cli/src/main.rs, integrations/claude-code/kebab/SKILL.md. Create docs/wire-schema/v1/ocr_stats.schema.json, docs/wire-schema/v1/ocr_failures.schema.json, crates/kebab-app/tests/ocr_inspect_smoke.rs.

  • 4.1 JSON Schema files. Author the two files verbatim from spec §4.3 line 234-297. These are the external contract for downstream consumers (claude-code skill, MCP).

  • 4.2 WIRE_SCHEMAS runtime emit (G3). In crates/kebab-app/src/schema.rs:104, append two entries to the &[&str] literal: "ocr_stats.v1", "ocr_failures.v1". Do NOT edit docs/wire-schema/v1/schema.schema.json — its wire.schemas is pattern-based.

  • 4.3 Percentile helper (F4). Change crates/kebab-app/src/ingest_log.rs:200 pub(crate) fn percentiles from returning (Option<u64>, Option<u64>, Option<u64>) (p50, p90, max) to (Option<u64>, Option<u64>, Option<u64>, Option<u64>) (p50, p90, p99, max). Implementation: sort in-memory, pick at index ((n-1) * q).round() for q ∈ {0.50, 0.90, 0.99}, *sorted.last() for max. Update all callers inside kebab-app (1-2 sites) to destructure 4 elements. IngestSummary (round 1, 3-percentile) stays as-is — p99 surfaces only via inspect ocr-stats.

  • 4.4 Wire types in crates/kebab-app/src/app.rs (or new sibling inspect_ocr.rs under the same module umbrella):

    • OcrStatsV1: schema_version: &'static str, total_events, total_runs, success_count, failure_count: u64; success_rate: f64; p50_ms, p90_ms, p99_ms, max_ms: Option<u64>; by_engine: BTreeMap<String, u64>; by_doc: Vec<OcrStatsByDoc>.
    • OcrStatsByDoc: doc_id: String, failure_count, success_count: u64, p90_ms: Option<u64>.
    • OcrFailuresV1: schema_version: &'static str, doc_id: Option<String>, failure_count: u64, failures: Vec<OcrFailureRow>.
    • OcrFailureRow: ts: String, page: u32, ms: u64, reason: String, image_byte_size: Option<u64>.

    All #[derive(serde::Serialize)]. schema_version uses string literal "ocr_stats.v1" / "ocr_failures.v1" at construction.

  • 4.5 Facade pair (G2) on impl App:

    pub fn inspect_ocr_stats(&self) -> Result<OcrStatsV1> { self.inspect_ocr_stats_with_config(&self.config) }
    #[doc(hidden)]
    pub fn inspect_ocr_stats_with_config(&self, _cfg: &Config) -> Result<OcrStatsV1>;
    
    pub fn inspect_ocr_failures(&self, doc_id: Option<&str>, limit: usize) -> Result<OcrFailuresV1> {
        self.inspect_ocr_failures_with_config(&self.config, doc_id, limit)
    }
    #[doc(hidden)]
    pub fn inspect_ocr_failures_with_config(&self, _cfg: &Config, doc_id: Option<&str>, limit: usize) -> Result<OcrFailuresV1>;
    

    Implementation reads from self.sqlite (the App's Arc<SqliteStore>):

    • inspect_ocr_stats_with_config:

      1. SELECT COUNT(*), SUM(CASE WHEN success=1 …), SUM(CASE WHEN success=0 …), COUNT(DISTINCT run_id) FROM pdf_ocr_events → 4 counters.
      2. SELECT ms FROM pdf_ocr_events WHERE success=1 ORDER BY msVec<u64>ingest_log::percentiles(&samples) → 4-tuple.
      3. SELECT ocr_engine, COUNT(*) FROM pdf_ocr_events GROUP BY ocr_engineBTreeMap.
      4. SELECT doc_id, SUM(success=0), SUM(success=1) FROM pdf_ocr_events WHERE doc_id IS NOT NULL GROUP BY doc_id ORDER BY 2 DESC LIMIT 10Vec<OcrStatsByDoc> with p90_ms = None (per-doc p90 deferred — see open question #3).
      5. success_rate = success_count / total_events (guard zero-division).
    • inspect_ocr_failures_with_config: SELECT ts, page, ms, reason, image_byte_size FROM pdf_ocr_events WHERE success=0 [AND doc_id=?] ORDER BY ts DESC LIMIT ?. Map to OcrFailureRow. failure_count = failures.len().

    Connection access: use existing pub(crate) Mutex<Connection> (either via an existing closure accessor on SqliteStore or by adding a pub(crate) fn conn(&self) -> &Mutex<Connection> — probe store.rs for prior pattern and follow it).

  • 4.6 CLI variants in crates/kebab-cli/src/main.rs:356 (extend InspectWhat):

    OcrStats,
    OcrFailures {
        #[arg(long)] doc_id: Option<String>,
        #[arg(long, default_value_t = 10)] limit: usize,
    },
    

    Dispatch arms at main.rs:671 after the existing Doc / Chunk arms — for both, construct App::open_with_config(cfg.clone())?, call the *_with_config(&cfg, …) form (G2 explicit cfg threading), then route through the existing print_json_or_text helper (or whatever the file's pattern is — probe before naming).

  • 4.7 Integration test in new crates/kebab-app/tests/ocr_inspect_smoke.rs:

    • ocr_stats_after_scanned_pdf_ingest — TempDir KB, ingest the existing scanned-PDF fixture, assert stats.schema_version == "ocr_stats.v1", stats.total_events >= 1, 0.0 ≤ stats.success_rate ≤ 1.0.
    • ocr_failures_filter_by_doc_id — requires a fixture that produces at least one failure (probe fixtures/pdf/ for a corrupt-page or timeout-inducing PDF; if none exists, mark the test #[ignore] with a // TODO(v0.20.y): fixture and rely on record_and_prune_pdf_ocr_event for the synthetic-row coverage).
    • skill_md_lists_new_schemasread_to_string("integrations/claude-code/kebab/SKILL.md").contains("ocr_stats.v1") (AC-13 mechanical).
  • 4.8 SKILL.md sync — append ocr_stats.v1 and ocr_failures.v1 to the wire schema enumeration block (currently at SKILL.md lines 37-147 per probe), each with a one-line description.

  • 4.9 Verify. cargo test -p kebab-app --test ocr_inspect_smoke + cargo build -p kebab-cli (compile-only confirms variant + dispatch wire-up).

  • 4.10 Commit.

    feat(cli): kebab inspect ocr-stats + ocr-failures (Enhancement 3 + wire schema additive minor)
    
    Two new wire schemas land as additive minor: ocr_stats.v1 (corpus-wide
    aggregate — total_events, success_rate, p50/p90/p99/max_ms, by_engine,
    top-10 by_doc by failure count) and ocr_failures.v1 (per-doc or
    corpus-wide recent failures, with --doc-id + --limit). Both ship via
    new CLI subcommands `kebab inspect ocr-stats` / `inspect ocr-failures`.
    
    App gains four facade methods: inspect_ocr_stats /
    inspect_ocr_failures plus their *_with_config companions — required by
    CLAUDE.md "the facade rule" so `--config <path>` is honored. The CLI
    dispatch arms thread cfg explicitly into the _with_config form.
    
    Runtime introspection emit (WIRE_SCHEMAS in schema.rs) gains two
    entries; the meta JSON Schema (schema.schema.json) is untouched
    because its wire.schemas is pattern-based, not enum-based.
    
    ingest_log::percentiles extended to (p50, p90, p99, max). p99 surfaces
    only via inspect ocr-stats; IngestSummary (round 1) stays 3-percentile.
    
    SKILL.md synced with the two new schemas (AC-13).
    
    Closure r2 G2 (facade *_with_config pair) + G3 (runtime emit, not
    meta schema file) + closure r1 F4 (p99) resolved.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    

Step 5 — Log retention (Enhancement 4)

Implements: AC-7, AC-8 wire-up, AC-9, AC-12. Resolves F5. Commit 5/5.

Files: Modify crates/kebab-config/src/lib.rs, crates/kebab-app/src/ingest_log.rs, crates/kebab-app/src/lib.rs, docs/SMOKE.md.

  • 5.1 Failing backward-compat test in crates/kebab-config tests:

    • old_logging_config_parses_with_defaults — toml input with only ingest_log_enabled + ingest_log_dir, assert cfg.logging.keep_recent_runs == 100, retention_days == 30.
  • 5.2 LoggingCfg extension at crates/kebab-config/src/lib.rs:438-462:

    pub struct LoggingCfg {
        pub ingest_log_enabled: bool,
        pub ingest_log_dir: PathBuf,
        #[serde(default = "default_keep_recent_runs")] pub keep_recent_runs: u32,
        #[serde(default = "default_retention_days")] pub retention_days: u32,
    }
    fn default_keep_recent_runs() -> u32 { 100 }
    fn default_retention_days() -> u32 { 30 }
    

    Plus impl Default for LoggingCfg returning the same defaults. AC-12 automatic via Config::default()toml::to_string_pretty at crates/kebab-app/src/lib.rs:142, 178 (kebab init path).

  • 5.3 Failing cleanup tests in ingest_log::tests:

    • cleanup_keeps_recent_n_drops_old — 5 files, 3 fresh + 2 sixty-days-old; cleanup_old_logs(dir, 3, 30) → 3 freshest survive.
    • cleanup_drops_stale_even_within_count — 2 files, both keep_recent=10 but 90 days old; cleanup_old_logs(dir, 10, 30) → all dropped (F5 OR-on-stale).

    Use filetime crate (or std set_file_mtime polyfill) for backdated mtimes.

  • 5.4 Implement cleanup_old_logs at module-level in ingest_log.rs:

    pub(crate) fn cleanup_old_logs(log_dir: &Path, keep_recent: u32, retention_days: u32) -> Result<()>;
    

    Body: read_dir → filter ingest-*.ndjson → sort by modified() descending → walk with index; delete iff (idx >= keep_recent) OR (modified <= cutoff); equivalently keep iff (idx < keep_recent) AND (modified > cutoff). Inline that as a comment (F5 wording fix).

  • 5.5 Hook into IngestLogWriter::open. Before creating the new ingest-*.ndjson file, call cleanup_old_logs(&log_dir, cfg.keep_recent_runs, cfg.retention_days). On error, tracing::warn! and continue — cleanup is non-critical (R-6 mitigation).

  • 5.6 SQLite prune hook in crates/kebab-app/src/lib.rs::ingest_with_config_opts (near the existing IngestLogWriter::open call site):

    let _pruned = app.sqlite
        .prune_pdf_ocr_events(app.config.logging.retention_days)
        .unwrap_or_else(|e| {
            tracing::warn!(target: "kebab-app", "pdf_ocr_events prune failed: {e}");
            0
        });
    

    One prune per ingest run — matches file-side cadence.

  • 5.7 Doc sync. docs/SMOKE.md config example [logging] block gets the two new fields with their defaults. README / HANDOFF / ARCHITECTURE: skip — no user-visible surface beyond CLI inspect (covered in Step 4) and config keys (SMOKE-only).

  • 5.8 Verify. cargo test -p kebab-config -- logging + cargo test -p kebab-app --lib ingest_log::tests + cargo test -p kebab-store-sqlite --test pdf_ocr_events_insert_smoke record_and_prune.

  • 5.9 Commit.

    feat(app): log retention — keep_recent_runs + retention_days (Enhancement 4)
    
    LoggingCfg gains two fields with serde defaults: keep_recent_runs
    (default 100, top-N file retention) and retention_days (default 30,
    time-based retention for both ndjson files and the SQLite mirror).
    
    IngestLogWriter::open now runs cleanup_old_logs before creating a new
    ingest-*.ndjson — delete iff (idx >= keep_recent) OR (modified <=
    cutoff). ingest_with_config_opts also calls
    SqliteStore::prune_pdf_ocr_events(retention_days) at ingest start so
    the SQLite mirror tracks the same retention window.
    
    Backward compat (AC-9): both new fields use #[serde(default = ...)],
    so a pre-v0.20.x config with only [logging] ingest_log_enabled +
    ingest_log_dir parses unchanged. kebab init writes the new defaults
    automatically via Config::default() → toml::to_string_pretty (AC-12).
    
    docs/SMOKE.md config example synced.
    
    Closure r1 F5: explicit OR-on-stale comment inside cleanup_old_logs.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    

Step 6 — Final sanity (no commit)

Implements: AC-10, AC-11 cumulative.

  • 6.1 Workspace test.

    export CARGO_TARGET_DIR=/build/out/cargo-target/target
    cargo test --workspace --no-fail-fast -j 1 > /tmp/wstest.out 2>&1
    grep -E "^test result:" /tmp/wstest.out | awk '
      { for(i=1;i<=NF;i++){
          if($i=="passed;") pass+=$(i-1);
          if($i=="failed;") fail+=$(i-1);
        } }
      END { print "passed:", pass, "failed:", fail }
    '
    

    Expected: passed >= 1375, failed = 0.

  • 6.2 Clippy. cargo clippy --workspace --all-targets -- -D warnings → 0 warnings.

  • 6.3 Wire schema additive check. ./target/release/kebab schema --json | jq '.wire.schemas[]' | grep -E "ocr_stats|ocr_failures" returns both strings.

  • 6.4 Dogfood smoke (optional). Per docs/SMOKE.md against TempDir KB: ingest a scanned PDF → kebab inspect ocr-stats --json | jq .schema_version returns "ocr_stats.v1"sqlite3 .../kebab.sqlite 'SELECT COUNT(*) FROM pdf_ocr_events' non-zero.

  • 6.5 cargo fmt --all + git status confirms all intended files are staged. No commit at Step 6.


§3 Cumulative verifier checklist (13 row)

AC Description Verifier Step
AC-1 image_w/h non-null on raster decode extract_image_dimensions_* units + pdf_ocr_roundtrip (extended) Step 1
AC-2 V008 table exists post-migration v008_pdf_ocr_events_table_exists Step 2
AC-3 ndjson ↔ SQLite row count + doc_id match ingest_dual_write_doc_id_matches_ndjson Step 3
AC-4 inspect ocr-stats --json schema_version ocr_stats_after_scanned_pdf_ingest Step 4
AC-5 inspect ocr-failures --doc-id <id> returns rows ocr_failures_filter_by_doc_id (F1 makes this passable) Step 3 + Step 4
AC-6 inspect ocr-failures --json corpus-wide sibling assertion in inspect smoke Step 4
AC-7 keep_recent_runs=N → oldest deleted cleanup_keeps_recent_n_drops_old Step 5
AC-8 retention_days=0 → SQLite old rows deleted record_and_prune_pdf_ocr_event Step 2 + Step 5 hook
AC-9 old config parses with defaults old_logging_config_parses_with_defaults Step 5
AC-10 workspace test + clippy green Step 6 final sanity Step 6
AC-11 new integration test binaries ocr_inspect_smoke + pdf_ocr_events_insert_smoke Step 3 + Step 4
AC-12 kebab init writes new defaults auto via Config::default()toml::to_string_pretty Step 5
AC-13 SKILL.md lists new schemas skill_md_lists_new_schemas Step 4

§4 Risk → resolution map

Risk / finding Resolution location Plan step
R-1 dual-write race file first, SQLite warn-only Step 3.4
R-2 V008 rollback spec §6 R-2 documents manual SQL (no plan action)
R-3 concurrent cleanup cleanup only at IngestLogWriter::open Step 5.4
R-4 corrupt JPEG decode Option fallback Step 1.3
R-5 old consumer reads new schema schema_version skip (schema design)
R-6 concurrent cleanup vs tail cleanup only at ingest start, top-N recent safe Step 5.4
R-7 doc_id NULL (F1) pre-capture canonical.doc_id.0.clone() Step 3.3
G1 SqliteStore::open + double-open Arc::clone(&app.sqlite) Step 3.3
G2 facade rule violation *_with_config companion pair Step 4.5 + 4.6
G3 schema.schema.json wording extend WIRE_SCHEMAS const Step 4.2

§5 Open questions for executor

LOW priority — judgment calls, none block the spec contract.

  1. SqliteStore connection accessor. Step 4.5 reads from self.sqlite's inner Mutex<Connection>. Probe store.rs for an existing with_conn(|c| …) closure-form accessor and prefer it (tight lock scope). If none exists, add a minimal pub(crate) fn conn(&self) -> &Mutex<Connection> — visibility unchanged.
  2. p99 in IngestSummary. percentiles now returns 4-tuple. Round 1 IngestSummary carries p50/p90/max only. Adding ocr_p99_ms is out of scope (no AC); leave it surfaced only via inspect ocr-stats. Executor may fold it in if the diff is one line.
  3. OcrStatsByDoc.p90_ms = None. Per-doc p90 deferred — wire schema allows ["integer", "null"]. If a clean O(n)-per-doc sort fits, executor may implement; otherwise leave None.
  4. JPEG fixture. Step 1.2 commits a 16x12 JPEG. If fixtures/ already carries a comparable small JPEG, reuse + adjust assertions.
  5. ingest_log_smoke adjustment. Step 3.5 likely needs assertion tweaks for the new doc_id field — light touch (Option field omitted when None).
  6. Failure-producing PDF fixture for AC-5. If no fixture reliably produces an OCR failure, mark ocr_failures_filter_by_doc_id as #[ignore] with a TODO; the synthetic-row coverage in record_and_prune_pdf_ocr_event still exercises the WHERE-doc_id path.

§6 References

  • Spec contract: docs/superpowers/specs/2026-05-28-v0.20.x-logging-r2-spec.md (751 line, ACCEPT, frozen).
  • Critic r1: .omc/reviews/2026-05-28-v0.20.x-logging-r2-spec-closure-result.md (6 finding, applied in r1c).
  • Critic r2: .omc/reviews/2026-05-28-v0.20.x-logging-r2-spec-closure-r2-result.md (3 G-finding, plan-level resolution above).
  • Parent design: docs/superpowers/specs/2026-04-27-kebab-final-form-design.md (§8 dep graph, §9 version cascade).
  • Round 1 spec: docs/superpowers/specs/2026-05-28-v0.20-ingest-log-spec.md (frozen).
  • PDF OCR spec: docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (frozen).
  • Branch: feat/ingest-log-round2-enhancements (HEAD 89d334a).
  • HOTFIXES: tasks/HOTFIXES.md — any post-merge deviations land here.
  • CLAUDE.md sections: §The facade rule (G2), §Versioning cascade (additive minor → no release trigger), §Naming + paths.

§7 Constraints

  1. No branch change. All 5 commits land on feat/ingest-log-round2-enhancements.
  2. Spec frozen. No edits to the ACCEPT spec from this plan. Deviations → tasks/HOTFIXES.md.
  3. Wire schema additive minor. Two new *.v1 schemas + two new WIRE_SCHEMAS entries. No major bump.
  4. Regression budget 0. Baseline 1370 workspace tests stay green; round 2 adds ≥5 new.
  5. Worker protocol — subagent skip. Executor runs in a single subagent-driven-development pass without spawning nested workers.
  6. Length budget. 500-700 line plan (this file ≈ 600).
  7. Build path. export CARGO_TARGET_DIR=/build/out/cargo-target/target; -j 4 default, -j 1 for the workspace sanity pass.
  8. Commit cadence. 5 commits (Steps 1-5). Step 6 verify-only.
  9. Doc sync. README untouched (no surface beyond inspect subcommands — append to README's 명령 table only if it already lists inspect); HANDOFF gets one line post-merge if load-bearing; ARCHITECTURE untouched; SMOKE config example updated in Step 5.7.
  10. Release trigger. Wire additive minor → NOT a release trigger per CLAUDE.md §Versioning cascade. Defer bump to a dogfood-driven chore: bump 0.20.x → 0.20.y later.