logging round 2 (4 enhancement: image_w/h + V008 SQLite mirror + CLI inspect + retention) 의 spec/plan ACCEPT 후 round artifacts. - spec: 751 line (ACCEPT, 7/7 critic round 1 finding + 7/7 closure r2 traceability) - plan: 576 line (ACCEPT, 6/6 step + 13/13 AC + G1/G2/G3 plan-level resolve) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
34 KiB
title, created, status, phase, target_spec, critic_r1, critic_r2, branch, step_count, commit_count
| title | created | status | phase | target_spec | critic_r1 | critic_r2 | branch | step_count | commit_count |
|---|---|---|---|---|---|---|---|---|---|
| v0.20.x ingest log round 2 — implementation plan | 2026-05-28 | DRAFT round 0 | B5 plan drafter | docs/superpowers/specs/2026-05-28-v0.20.x-logging-r2-spec.md | .omc/reviews/2026-05-28-v0.20.x-logging-r2-spec-closure-result.md | .omc/reviews/2026-05-28-v0.20.x-logging-r2-spec-closure-r2-result.md | feat/ingest-log-round2-enhancements | 6 | 5 |
v0.20.x ingest log round 2 — implementation plan
For agentic workers: REQUIRED SUB-SKILL —
superpowers:executing-plans(orsuperpowers:subagent-driven-development) to implement task-by-task. Steps use- [ ]checkbox syntax. Each Step block carries its own commit boundary; Step 6 is verify-only (no commit).
Goal. Extend v0.20.0 round 1 ingest log (file-only ndjson) with four additive enhancements: (1) raster image dimensions, (2) SQLite mirror via V008 + dual-write, (3) CLI inspect ocr-stats / inspect ocr-failures + two wire schemas, (4) automatic file + SQLite retention. All non-breaking; wire schema cascades as additive minor; existing 1370 workspace test → 1375+.
Architecture. Hook the dual-write at the existing OCR emit_progress closure inside ingest_with_config_opts. File write first (durable), SQLite second (non-critical — tracing::warn! on failure). The closure pre-captures doc_id and clones Arc<SqliteStore> from App.sqlite before apply_ocr_to_pdf_pages runs (F1 + G1 resolution). Two App::inspect_* methods follow the facade rule via *_with_config companions (G2). Runtime introspection emit WIRE_SCHEMAS gets two entries; the JSON Schema file schema.schema.json is untouched (G3: it's pattern-based).
Tech stack. Rust 2024, rusqlite (existing), image crate (jpeg feature add), time (existing). No new workspace deps.
Spec contract. docs/superpowers/specs/2026-05-28-v0.20.x-logging-r2-spec.md (751 line). Implements 13 AC + resolves 3 closure-r2 findings (G1/G2/G3) + 6 closure-r1 findings (F1–F6).
File map
Modify (10):
crates/kebab-app/Cargo.toml—imagefeatures +="jpeg"(F3).crates/kebab-app/src/pdf_ocr_apply.rs—extract_image_dimensionshelper + fill 6 emit points (Enhancement 1).crates/kebab-app/src/ingest_log.rs—LogEvent::Ocraddsdoc_id;IngestLogWriter::openrunscleanup_old_logs;percentilesreturns(p50, p90, p99, max).crates/kebab-app/src/lib.rs::ingest_with_config_opts— pre-capturedoc_id_for_log, cloneArc<SqliteStore>, dual-write inside closure (F1 + G1).crates/kebab-app/src/schema.rs—WIRE_SCHEMAS+="ocr_stats.v1","ocr_failures.v1"(G3 runtime emit).crates/kebab-app/src/app.rs—inspect_ocr_stats/inspect_ocr_failures+*_with_configcompanion pair (G2) + 4 wire structs.crates/kebab-config/src/lib.rs—LoggingCfg+=keep_recent_runs: u32,retention_days: u32(#[serde(default=...)], defaults 100 + 30, AC-9).crates/kebab-store-sqlite/src/store.rs—record_pdf_ocr_event+prune_pdf_ocr_events(Mutex lock pattern, F2).crates/kebab-cli/src/main.rs—InspectWhat::OcrStats+InspectWhat::OcrFailuresvariants + dispatch arms calling*_with_config(G2).integrations/claude-code/kebab/SKILL.md— listocr_stats.v1+ocr_failures.v1(AC-13).
Create (5):
migrations/V008__pdf_ocr_events.sql— table + 3 indices.docs/wire-schema/v1/ocr_stats.schema.json—ocr_stats.v1JSON Schema (verbatim from spec §4.3).docs/wire-schema/v1/ocr_failures.schema.json—ocr_failures.v1JSON Schema (verbatim from spec §4.3).crates/kebab-app/tests/ocr_inspect_smoke.rs— AC-4/5/6/13.crates/kebab-store-sqlite/tests/pdf_ocr_events_insert_smoke.rs— AC-2/3/8.
Do NOT modify:
docs/wire-schema/v1/schema.schema.json—wire.schemasispattern-based (G3).- Spec ACCEPT frozen. Round 1 spec / PDF OCR spec / parent design doc untouched.
Closure r2 plan-level resolution
G1 (MEDIUM) — SqliteStore::open signature + double-open risk
Spec §4.7 snippet SqliteStore::open(&cfg.storage.sqlite)? fails to compile — real signature at crates/kebab-store-sqlite/src/store.rs:123 is pub fn open(config: &kebab_config::Config) -> Result<Self>. Worse, a fresh open() would create a second connection contending with App::sqlite for the single-writer lock and re-running migrations.
Resolution (Step 3.3). Inside ingest_with_config_opts, beside the existing lw_for_ocr pre-clone at crates/kebab-app/src/lib.rs:1931, add:
let store_for_ocr: Arc<SqliteStore> = Arc::clone(&app.sqlite);
App.sqlite is pub(crate) sqlite: Arc<SqliteStore> at crates/kebab-app/src/app.rs:123 — visible inside the crate. The closure moves store_for_ocr. No new SqliteStore::open call anywhere.
G2 (LOW–MEDIUM) — facade rule for inspect_ocr_*
Spec §4.4 declares only App::inspect_ocr_stats(&self). CLAUDE.md "the facade rule" requires a *_with_config(cfg, …) companion. HOTFIXES P3-5 and P4-3 record two regressions of this exact shape.
Resolution (Step 4.5). Both methods ship as a pair:
pub fn inspect_ocr_stats(&self) -> Result<OcrStatsV1> {
self.inspect_ocr_stats_with_config(&self.config)
}
#[doc(hidden)]
pub fn inspect_ocr_stats_with_config(&self, _cfg: &Config) -> Result<OcrStatsV1> { … }
Same shape for inspect_ocr_failures. The CLI dispatch (Step 4.6) calls the explicit _with_config form so --config <path> is honored — current ingest path already threads cfg via App::open_with_config(cfg.clone()).
G3 (LOW) — wire schema runtime emit, not JSON Schema file
Spec §4.3 line 299-301 says "schema.schema.json 갱신" but docs/wire-schema/v1/schema.schema.json declares "schemas": { "type": "array", "items": { "type": "string", "pattern": "^[a-z_]+\\.v[0-9]+$" } } — pattern-based, not enum. The two new strings already match.
Resolution (Step 4.2). Extend WIRE_SCHEMAS const at crates/kebab-app/src/schema.rs:104-119 (runtime emit source for kebab schema --json). Two new schema files (ocr_stats.schema.json + ocr_failures.schema.json) are still created as the external contract; the meta schema.schema.json is untouched.
Closure r1 finding resolution recap
| Finding | Severity | Plan step | Resolution |
|---|---|---|---|
| F1 doc_id NULL in dual-write | HIGH | Step 3 | pre-capture canonical.doc_id.0.clone() |
| F2 SqliteStore conn lock pattern | MEDIUM | Step 2 | self.conn.lock().expect("sqlite lock poisoned") |
| F3 image crate jpeg feature | MEDIUM | Step 1 | features = ["png", "jpeg"] |
| F4 percentile + p99 | LOW | Step 4 | extend ingest_log::percentiles to 4-tuple |
| F5 OR-on-stale wording | LOW | Step 5 | inline comment in cleanup_old_logs |
| F6 V008 rollback | LOW | (spec §6 R-2) | already in spec, no plan action |
Step 1 — image_width / image_height capture (Enhancement 1)
Implements: AC-1. Commit 1/5.
Files: Modify crates/kebab-app/Cargo.toml, crates/kebab-app/src/pdf_ocr_apply.rs.
-
1.1 Cargo feature.
image = { version = "0.25", default-features = false, features = ["png", "jpeg"] }(was["png"]only). Rationale F3. -
1.2 Failing unit tests in
pdf_ocr_apply::tests:extract_image_dimensions_valid_jpeg— load 16x12 fixture JPEG, assertSome((16, 12)).extract_image_dimensions_corrupt_returns_none—b"not a jpeg"→None.
If
fixtures/pdf_ocr/sample_16x12.jpgdoesn't exist, generate viaconvert -size 16x12 xc:white …and commit. If a comparable small JPEG already lives underfixtures/, reuse it. -
1.3 Helper impl. Add near the top of
pdf_ocr_apply.rs:fn extract_image_dimensions(jpeg_bytes: &[u8]) -> Option<(u32, u32)> { use image::ImageReader; ImageReader::new(std::io::Cursor::new(jpeg_bytes)) .with_guessed_format().ok()? .into_dimensions().ok() }Returns
Optionso corrupt JPEG falls back to(None, None)(R-4 mitigation). -
1.4 Fill 6 emit points at lines 149, 155, 181, 188, 259, 265 (probe-confirmed). Each currently hardcodes
image_width: None, image_height: None. Pattern:let (image_width, image_height) = extract_image_dimensions(&page_image_bytes) .map(|(w, h)| (Some(w), Some(h))) .unwrap_or((None, None));Then pass
image_width/image_height(andimage_byte_size: Some(page_image_bytes.len() as u64)) into thePdfOcrProgress::Finishedconstructor. For skip / error paths wherepage_image_bytesisn't in scope, leaveNone— AC-1 only requires non-null on successful raster decode. -
1.5 Verify.
cargo test -p kebab-app --lib pdf_ocr_apply::tests+cargo test -p kebab-app --test pdf_ocr_roundtrip. -
1.6 Commit. Use HEREDOC form so the body formats correctly:
feat(app): capture image_width/height in PDF OCR raster decode (Enhancement 1) Add extract_image_dimensions(jpeg_bytes) helper using image::ImageReader and fill the 6 PdfOcrProgress::Finished emit points in pdf_ocr_apply.rs. Result: LogEvent::Ocr now carries non-null image_width/image_height on successful raster decode, enabling future size-conditioned timeout tuning. Closure r1 F3: kebab-app/Cargo.toml image features += "jpeg" (explicit rather than relying on feature unification via kebab-parse-image). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 2 — V008 migration + SqliteStore record/prune (Enhancement 2 — schema half)
Implements: AC-2, AC-8 prune logic. Commit 2/5.
Files: Create migrations/V008__pdf_ocr_events.sql, crates/kebab-store-sqlite/tests/pdf_ocr_events_insert_smoke.rs. Modify crates/kebab-store-sqlite/src/store.rs.
-
2.1 V008 SQL per spec §4.2:
CREATE TABLE pdf_ocr_events ( id INTEGER PRIMARY KEY, run_id TEXT NOT NULL, ts TEXT NOT NULL, doc_id TEXT, doc_path TEXT NOT NULL, page INTEGER NOT NULL, image_byte_size INTEGER, image_width INTEGER, image_height INTEGER, ms INTEGER NOT NULL, chars INTEGER NOT NULL, success INTEGER NOT NULL, reason TEXT, ocr_engine TEXT NOT NULL ); CREATE INDEX idx_pdf_ocr_events_doc_id ON pdf_ocr_events(doc_id); CREATE INDEX idx_pdf_ocr_events_run_id ON pdf_ocr_events(run_id); CREATE INDEX idx_pdf_ocr_events_ts ON pdf_ocr_events(ts); -
2.2 Failing smoke tests in new
pdf_ocr_events_insert_smoke.rs:v008_pdf_ocr_events_table_exists— open TempDir SqliteStore, run_migrations,SELECT name FROM sqlite_master WHERE name='pdf_ocr_events'.record_and_prune_pdf_ocr_event— insert 2 rows (ts = 1970-…andts = 2026-…),prune_pdf_ocr_events(0)returns1, only the future-stamped row survives.
Use existing per-test
Configbuilder (probecrates/kebab-store-sqlite/tests/*.rsfor prior smoke pattern; if absent, buildConfig::default()withdata_dir = tempdir.path()). -
2.3 Implement two pub fn on
impl SqliteStoreinstore.rs, signatures per spec §4.2 line 186-228:#[allow(clippy::too_many_arguments)] pub fn record_pdf_ocr_event(&self, run_id: &str, ts: &str, doc_id: Option<&str>, doc_path: &str, page: u32, image_byte_size: Option<u64>, image_width: Option<u32>, image_height: Option<u32>, ms: u64, chars: u32, success: bool, reason: Option<&str>, ocr_engine: &str, ) -> anyhow::Result<()>; pub fn prune_pdf_ocr_events(&self, retention_days: u32) -> anyhow::Result<u64>;Body: F2 lock pattern
let conn = self.conn.lock().expect("sqlite lock poisoned");followed byINSERT INTO pdf_ocr_events (…) VALUES (…)/DELETE FROM pdf_ocr_events WHERE ts < ?(cutoff fromtime::OffsetDateTime::now_utc() - Duration::days(retention_days as i64), RFC 3339 format).prunereturns the deleted row count (u64). -
2.4 Verify.
cargo test -p kebab-store-sqlite --test pdf_ocr_events_insert_smoke. -
2.5 Commit.
feat(store): V008 pdf_ocr_events migration + record/prune API (Enhancement 2) Add migrations/V008__pdf_ocr_events.sql with the events table + 3 indices (doc_id, run_id, ts). SqliteStore gains two pub fn: record_pdf_ocr_event (insert one OCR sample) and prune_pdf_ocr_events (delete rows older than retention_days; returns the affected row count). Both follow the existing Mutex<Connection> lock pattern. Wiring into ingest path lands in the next commit. Closure r1 F2: explicit lock acquisition in both methods. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 3 — Dual-write integration (Enhancement 2 — wiring half)
Implements: AC-3. Resolves F1 + G1. Commit 3/5.
Files: Modify crates/kebab-app/src/ingest_log.rs, crates/kebab-app/src/lib.rs.
-
3.1 Extend
LogEvent::Ocratingest_log.rs:116— adddoc_id: Option<&'a str>as the second field (afterts, beforedoc_path). Additive Serde — round 1 ndjson logs deserialize withdoc_id = None(Serde's defaultOptionhandling, no#[serde(default)]needed because round 1 readers don't exist outside the test suite). -
3.2 Failing integration test appended to
pdf_ocr_events_insert_smoke.rs:ingest_dual_write_doc_id_matches_ndjson— run scanned-PDF ingest viaApp::open_with_config, then for every emitted OCR event assertpdf_ocr_events.doc_idequals the ndjsonLogEvent::Ocr.doc_id, and both equalcanonical.doc_id. Reuse thepdf_ocr_roundtriptest scaffold for the scanned fixture.
-
3.3 Pre-capture in
ingest_with_config_opts. Immediately after the existinglet doc_path_for_log = asset.workspace_path.0.clone();atlib.rs:1935, add:let doc_id_for_log: String = canonical.doc_id.0.clone(); let store_for_ocr: Arc<SqliteStore> = Arc::clone(&app.sqlite); // G1 let run_id_for_log: String = lw_for_ocr.as_ref() .and_then(|lw| lw.lock().ok().map(|w| w.run_id().to_string())) .unwrap_or_default();canonicalis bound atlib.rs:1912viaapp.extract_for(...)— in scope before the closure. All three captures are owned values / Arc →Send + 'static(rusqliteConnectionis notSend, butMutex<Connection>insideArc<SqliteStore>isSend + Sync). -
3.4 Wire into closure. Inside the existing
PdfOcrProgress::Finishedarm atlib.rs:1950-2000, replace the current file-write block with: (a) bindts_for_event = ingest_log::now_ts(), (b) passdoc_id: Some(&doc_id_for_log)into the existingLogEvent::Ocrconstruction, (c) immediately after the file write, callstore_for_ocr.record_pdf_ocr_event(&run_id_for_log, &ts_for_event, Some(&doc_id_for_log), &doc_path_for_log, page, image_byte_size, image_width, image_height, ms, chars, success, reason_str, engine.engine_name())wrapped inif let Err(e) = … { tracing::warn!(target: "kebab-app", "sqlite ocr event insert failed: {e}"); }. File write first → SQLite second (R-1 ordering). -
3.5 Update existing tests that consume
LogEvent::Ocr:ingest_log_smoke(and any sibling smoke that round-trips ndjson) — assertions must accept the newdoc_idfield. Either ignore or assertSome(_).logging_roundtripintegration test — same.
-
3.6 Verify.
cargo test -p kebab-app --test pdf_ocr_roundtrip ingest_log_smoke logging_roundtrip+cargo test -p kebab-store-sqlite --test pdf_ocr_events_insert_smoke ingest_dual_write_doc_id_matches_ndjson. -
3.7 Commit.
feat(app): dual-write PDF OCR events to SQLite + ndjson (Enhancement 2 wiring) Pre-capture canonical.doc_id and Arc<SqliteStore> before the OCR emit_progress closure so both the ndjson file and the SQLite mirror carry the same doc_id for every event. File write is durable (errors propagate); SQLite insert is non-critical (tracing::warn on failure, ingest does not abort) per spec R-1. LogEvent::Ocr gains a doc_id: Option<&str> field as an additive Serde change — round 1 ndjson logs deserialize with doc_id=None. Closure r1 F1: doc_id NULL in dual-write resolved via let doc_id_for_log = canonical.doc_id.0.clone() pre-capture. Closure r2 G1: Arc::clone(&app.sqlite) reused instead of opening a second SqliteStore — eliminates double-open lock contention and duplicate migration runs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 4 — CLI inspect commands + wire schemas (Enhancement 3)
Implements: AC-4, AC-5, AC-6, AC-11 (ocr_inspect_smoke), AC-13. Resolves G2 + G3 + F4. Commit 4/5.
Files: Modify crates/kebab-app/src/ingest_log.rs, crates/kebab-app/src/schema.rs, crates/kebab-app/src/app.rs, crates/kebab-cli/src/main.rs, integrations/claude-code/kebab/SKILL.md. Create docs/wire-schema/v1/ocr_stats.schema.json, docs/wire-schema/v1/ocr_failures.schema.json, crates/kebab-app/tests/ocr_inspect_smoke.rs.
-
4.1 JSON Schema files. Author the two files verbatim from spec §4.3 line 234-297. These are the external contract for downstream consumers (claude-code skill, MCP).
-
4.2
WIRE_SCHEMASruntime emit (G3). Incrates/kebab-app/src/schema.rs:104, append two entries to the&[&str]literal:"ocr_stats.v1","ocr_failures.v1". Do NOT editdocs/wire-schema/v1/schema.schema.json— itswire.schemasispattern-based. -
4.3 Percentile helper (F4). Change
crates/kebab-app/src/ingest_log.rs:200pub(crate) fn percentilesfrom returning(Option<u64>, Option<u64>, Option<u64>)(p50, p90, max) to(Option<u64>, Option<u64>, Option<u64>, Option<u64>)(p50, p90, p99, max). Implementation: sort in-memory, pick at index((n-1) * q).round()forq ∈ {0.50, 0.90, 0.99},*sorted.last()for max. Update all callers insidekebab-app(1-2 sites) to destructure 4 elements.IngestSummary(round 1, 3-percentile) stays as-is — p99 surfaces only viainspect ocr-stats. -
4.4 Wire types in
crates/kebab-app/src/app.rs(or new siblinginspect_ocr.rsunder the same module umbrella):OcrStatsV1:schema_version: &'static str,total_events,total_runs,success_count,failure_count: u64;success_rate: f64;p50_ms,p90_ms,p99_ms,max_ms: Option<u64>;by_engine: BTreeMap<String, u64>;by_doc: Vec<OcrStatsByDoc>.OcrStatsByDoc:doc_id: String,failure_count,success_count: u64,p90_ms: Option<u64>.OcrFailuresV1:schema_version: &'static str,doc_id: Option<String>,failure_count: u64,failures: Vec<OcrFailureRow>.OcrFailureRow:ts: String,page: u32,ms: u64,reason: String,image_byte_size: Option<u64>.
All
#[derive(serde::Serialize)].schema_versionuses string literal"ocr_stats.v1"/"ocr_failures.v1"at construction. -
4.5 Facade pair (G2) on
impl App:pub fn inspect_ocr_stats(&self) -> Result<OcrStatsV1> { self.inspect_ocr_stats_with_config(&self.config) } #[doc(hidden)] pub fn inspect_ocr_stats_with_config(&self, _cfg: &Config) -> Result<OcrStatsV1>; pub fn inspect_ocr_failures(&self, doc_id: Option<&str>, limit: usize) -> Result<OcrFailuresV1> { self.inspect_ocr_failures_with_config(&self.config, doc_id, limit) } #[doc(hidden)] pub fn inspect_ocr_failures_with_config(&self, _cfg: &Config, doc_id: Option<&str>, limit: usize) -> Result<OcrFailuresV1>;Implementation reads from
self.sqlite(the App'sArc<SqliteStore>):-
inspect_ocr_stats_with_config:SELECT COUNT(*), SUM(CASE WHEN success=1 …), SUM(CASE WHEN success=0 …), COUNT(DISTINCT run_id) FROM pdf_ocr_events→ 4 counters.SELECT ms FROM pdf_ocr_events WHERE success=1 ORDER BY ms→Vec<u64>→ingest_log::percentiles(&samples)→ 4-tuple.SELECT ocr_engine, COUNT(*) FROM pdf_ocr_events GROUP BY ocr_engine→BTreeMap.SELECT doc_id, SUM(success=0), SUM(success=1) FROM pdf_ocr_events WHERE doc_id IS NOT NULL GROUP BY doc_id ORDER BY 2 DESC LIMIT 10→Vec<OcrStatsByDoc>withp90_ms = None(per-doc p90 deferred — see open question #3).success_rate = success_count / total_events(guard zero-division).
-
inspect_ocr_failures_with_config:SELECT ts, page, ms, reason, image_byte_size FROM pdf_ocr_events WHERE success=0 [AND doc_id=?] ORDER BY ts DESC LIMIT ?. Map toOcrFailureRow.failure_count = failures.len().
Connection access: use existing
pub(crate)Mutex<Connection>(either via an existing closure accessor onSqliteStoreor by adding apub(crate) fn conn(&self) -> &Mutex<Connection>— probestore.rsfor prior pattern and follow it). -
-
4.6 CLI variants in
crates/kebab-cli/src/main.rs:356(extendInspectWhat):OcrStats, OcrFailures { #[arg(long)] doc_id: Option<String>, #[arg(long, default_value_t = 10)] limit: usize, },Dispatch arms at
main.rs:671after the existingDoc / Chunkarms — for both, constructApp::open_with_config(cfg.clone())?, call the*_with_config(&cfg, …)form (G2 explicit cfg threading), then route through the existingprint_json_or_texthelper (or whatever the file's pattern is — probe before naming). -
4.7 Integration test in new
crates/kebab-app/tests/ocr_inspect_smoke.rs:ocr_stats_after_scanned_pdf_ingest— TempDir KB, ingest the existing scanned-PDF fixture, assertstats.schema_version == "ocr_stats.v1",stats.total_events >= 1,0.0 ≤ stats.success_rate ≤ 1.0.ocr_failures_filter_by_doc_id— requires a fixture that produces at least one failure (probefixtures/pdf/for a corrupt-page or timeout-inducing PDF; if none exists, mark the test#[ignore]with a// TODO(v0.20.y): fixtureand rely onrecord_and_prune_pdf_ocr_eventfor the synthetic-row coverage).skill_md_lists_new_schemas—read_to_string("integrations/claude-code/kebab/SKILL.md").contains("ocr_stats.v1")(AC-13 mechanical).
-
4.8 SKILL.md sync — append
ocr_stats.v1andocr_failures.v1to the wire schema enumeration block (currently at SKILL.md lines 37-147 per probe), each with a one-line description. -
4.9 Verify.
cargo test -p kebab-app --test ocr_inspect_smoke+cargo build -p kebab-cli(compile-only confirms variant + dispatch wire-up). -
4.10 Commit.
feat(cli): kebab inspect ocr-stats + ocr-failures (Enhancement 3 + wire schema additive minor) Two new wire schemas land as additive minor: ocr_stats.v1 (corpus-wide aggregate — total_events, success_rate, p50/p90/p99/max_ms, by_engine, top-10 by_doc by failure count) and ocr_failures.v1 (per-doc or corpus-wide recent failures, with --doc-id + --limit). Both ship via new CLI subcommands `kebab inspect ocr-stats` / `inspect ocr-failures`. App gains four facade methods: inspect_ocr_stats / inspect_ocr_failures plus their *_with_config companions — required by CLAUDE.md "the facade rule" so `--config <path>` is honored. The CLI dispatch arms thread cfg explicitly into the _with_config form. Runtime introspection emit (WIRE_SCHEMAS in schema.rs) gains two entries; the meta JSON Schema (schema.schema.json) is untouched because its wire.schemas is pattern-based, not enum-based. ingest_log::percentiles extended to (p50, p90, p99, max). p99 surfaces only via inspect ocr-stats; IngestSummary (round 1) stays 3-percentile. SKILL.md synced with the two new schemas (AC-13). Closure r2 G2 (facade *_with_config pair) + G3 (runtime emit, not meta schema file) + closure r1 F4 (p99) resolved. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 5 — Log retention (Enhancement 4)
Implements: AC-7, AC-8 wire-up, AC-9, AC-12. Resolves F5. Commit 5/5.
Files: Modify crates/kebab-config/src/lib.rs, crates/kebab-app/src/ingest_log.rs, crates/kebab-app/src/lib.rs, docs/SMOKE.md.
-
5.1 Failing backward-compat test in
crates/kebab-configtests:old_logging_config_parses_with_defaults— toml input with onlyingest_log_enabled+ingest_log_dir, assertcfg.logging.keep_recent_runs == 100,retention_days == 30.
-
5.2
LoggingCfgextension atcrates/kebab-config/src/lib.rs:438-462:pub struct LoggingCfg { pub ingest_log_enabled: bool, pub ingest_log_dir: PathBuf, #[serde(default = "default_keep_recent_runs")] pub keep_recent_runs: u32, #[serde(default = "default_retention_days")] pub retention_days: u32, } fn default_keep_recent_runs() -> u32 { 100 } fn default_retention_days() -> u32 { 30 }Plus
impl Default for LoggingCfgreturning the same defaults. AC-12 automatic viaConfig::default()→toml::to_string_prettyatcrates/kebab-app/src/lib.rs:142, 178(kebab initpath). -
5.3 Failing cleanup tests in
ingest_log::tests:cleanup_keeps_recent_n_drops_old— 5 files, 3 fresh + 2 sixty-days-old;cleanup_old_logs(dir, 3, 30)→ 3 freshest survive.cleanup_drops_stale_even_within_count— 2 files, bothkeep_recent=10but 90 days old;cleanup_old_logs(dir, 10, 30)→ all dropped (F5 OR-on-stale).
Use
filetimecrate (or stdset_file_mtimepolyfill) for backdated mtimes. -
5.4 Implement
cleanup_old_logsat module-level iningest_log.rs:pub(crate) fn cleanup_old_logs(log_dir: &Path, keep_recent: u32, retention_days: u32) -> Result<()>;Body: read_dir → filter
ingest-*.ndjson→ sort bymodified()descending → walk with index; delete iff(idx >= keep_recent) OR (modified <= cutoff); equivalently keep iff(idx < keep_recent) AND (modified > cutoff). Inline that as a comment (F5 wording fix). -
5.5 Hook into
IngestLogWriter::open. Before creating the new ingest-*.ndjson file, callcleanup_old_logs(&log_dir, cfg.keep_recent_runs, cfg.retention_days). On error,tracing::warn!and continue — cleanup is non-critical (R-6 mitigation). -
5.6 SQLite prune hook in
crates/kebab-app/src/lib.rs::ingest_with_config_opts(near the existingIngestLogWriter::opencall site):let _pruned = app.sqlite .prune_pdf_ocr_events(app.config.logging.retention_days) .unwrap_or_else(|e| { tracing::warn!(target: "kebab-app", "pdf_ocr_events prune failed: {e}"); 0 });One prune per ingest run — matches file-side cadence.
-
5.7 Doc sync.
docs/SMOKE.mdconfig example[logging]block gets the two new fields with their defaults. README / HANDOFF / ARCHITECTURE: skip — no user-visible surface beyond CLI inspect (covered in Step 4) and config keys (SMOKE-only). -
5.8 Verify.
cargo test -p kebab-config -- logging+cargo test -p kebab-app --lib ingest_log::tests+cargo test -p kebab-store-sqlite --test pdf_ocr_events_insert_smoke record_and_prune. -
5.9 Commit.
feat(app): log retention — keep_recent_runs + retention_days (Enhancement 4) LoggingCfg gains two fields with serde defaults: keep_recent_runs (default 100, top-N file retention) and retention_days (default 30, time-based retention for both ndjson files and the SQLite mirror). IngestLogWriter::open now runs cleanup_old_logs before creating a new ingest-*.ndjson — delete iff (idx >= keep_recent) OR (modified <= cutoff). ingest_with_config_opts also calls SqliteStore::prune_pdf_ocr_events(retention_days) at ingest start so the SQLite mirror tracks the same retention window. Backward compat (AC-9): both new fields use #[serde(default = ...)], so a pre-v0.20.x config with only [logging] ingest_log_enabled + ingest_log_dir parses unchanged. kebab init writes the new defaults automatically via Config::default() → toml::to_string_pretty (AC-12). docs/SMOKE.md config example synced. Closure r1 F5: explicit OR-on-stale comment inside cleanup_old_logs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 6 — Final sanity (no commit)
Implements: AC-10, AC-11 cumulative.
-
6.1 Workspace test.
export CARGO_TARGET_DIR=/build/out/cargo-target/target cargo test --workspace --no-fail-fast -j 1 > /tmp/wstest.out 2>&1 grep -E "^test result:" /tmp/wstest.out | awk ' { for(i=1;i<=NF;i++){ if($i=="passed;") pass+=$(i-1); if($i=="failed;") fail+=$(i-1); } } END { print "passed:", pass, "failed:", fail } 'Expected:
passed >= 1375,failed = 0. -
6.2 Clippy.
cargo clippy --workspace --all-targets -- -D warnings→ 0 warnings. -
6.3 Wire schema additive check.
./target/release/kebab schema --json | jq '.wire.schemas[]' | grep -E "ocr_stats|ocr_failures"returns both strings. -
6.4 Dogfood smoke (optional). Per
docs/SMOKE.mdagainst TempDir KB: ingest a scanned PDF →kebab inspect ocr-stats --json | jq .schema_versionreturns"ocr_stats.v1"→sqlite3 .../kebab.sqlite 'SELECT COUNT(*) FROM pdf_ocr_events'non-zero. -
6.5
cargo fmt --all+git statusconfirms all intended files are staged. No commit at Step 6.
§3 Cumulative verifier checklist (13 row)
| AC | Description | Verifier | Step |
|---|---|---|---|
| AC-1 | image_w/h non-null on raster decode | extract_image_dimensions_* units + pdf_ocr_roundtrip (extended) |
Step 1 |
| AC-2 | V008 table exists post-migration | v008_pdf_ocr_events_table_exists |
Step 2 |
| AC-3 | ndjson ↔ SQLite row count + doc_id match | ingest_dual_write_doc_id_matches_ndjson |
Step 3 |
| AC-4 | inspect ocr-stats --json schema_version |
ocr_stats_after_scanned_pdf_ingest |
Step 4 |
| AC-5 | inspect ocr-failures --doc-id <id> returns rows |
ocr_failures_filter_by_doc_id (F1 makes this passable) |
Step 3 + Step 4 |
| AC-6 | inspect ocr-failures --json corpus-wide |
sibling assertion in inspect smoke | Step 4 |
| AC-7 | keep_recent_runs=N → oldest deleted | cleanup_keeps_recent_n_drops_old |
Step 5 |
| AC-8 | retention_days=0 → SQLite old rows deleted | record_and_prune_pdf_ocr_event |
Step 2 + Step 5 hook |
| AC-9 | old config parses with defaults | old_logging_config_parses_with_defaults |
Step 5 |
| AC-10 | workspace test + clippy green | Step 6 final sanity | Step 6 |
| AC-11 | new integration test binaries | ocr_inspect_smoke + pdf_ocr_events_insert_smoke |
Step 3 + Step 4 |
| AC-12 | kebab init writes new defaults |
auto via Config::default() → toml::to_string_pretty |
Step 5 |
| AC-13 | SKILL.md lists new schemas | skill_md_lists_new_schemas |
Step 4 |
§4 Risk → resolution map
| Risk / finding | Resolution location | Plan step |
|---|---|---|
| R-1 dual-write race | file first, SQLite warn-only | Step 3.4 |
| R-2 V008 rollback | spec §6 R-2 documents manual SQL | (no plan action) |
| R-3 concurrent cleanup | cleanup only at IngestLogWriter::open |
Step 5.4 |
| R-4 corrupt JPEG decode | Option fallback |
Step 1.3 |
| R-5 old consumer reads new schema | schema_version skip |
(schema design) |
| R-6 concurrent cleanup vs tail | cleanup only at ingest start, top-N recent safe | Step 5.4 |
| R-7 doc_id NULL (F1) | pre-capture canonical.doc_id.0.clone() |
Step 3.3 |
| G1 SqliteStore::open + double-open | Arc::clone(&app.sqlite) |
Step 3.3 |
| G2 facade rule violation | *_with_config companion pair |
Step 4.5 + 4.6 |
| G3 schema.schema.json wording | extend WIRE_SCHEMAS const |
Step 4.2 |
§5 Open questions for executor
LOW priority — judgment calls, none block the spec contract.
SqliteStoreconnection accessor. Step 4.5 reads fromself.sqlite's innerMutex<Connection>. Probestore.rsfor an existingwith_conn(|c| …)closure-form accessor and prefer it (tight lock scope). If none exists, add a minimalpub(crate) fn conn(&self) -> &Mutex<Connection>— visibility unchanged.- p99 in
IngestSummary.percentilesnow returns 4-tuple. Round 1IngestSummarycarries p50/p90/max only. Addingocr_p99_msis out of scope (no AC); leave it surfaced only viainspect ocr-stats. Executor may fold it in if the diff is one line. OcrStatsByDoc.p90_ms = None. Per-doc p90 deferred — wire schema allows["integer", "null"]. If a clean O(n)-per-doc sort fits, executor may implement; otherwise leaveNone.- JPEG fixture. Step 1.2 commits a 16x12 JPEG. If
fixtures/already carries a comparable small JPEG, reuse + adjust assertions. ingest_log_smokeadjustment. Step 3.5 likely needs assertion tweaks for the newdoc_idfield — light touch (Optionfield omitted when None).- Failure-producing PDF fixture for AC-5. If no fixture reliably produces an OCR failure, mark
ocr_failures_filter_by_doc_idas#[ignore]with a TODO; the synthetic-row coverage inrecord_and_prune_pdf_ocr_eventstill exercises the WHERE-doc_id path.
§6 References
- Spec contract:
docs/superpowers/specs/2026-05-28-v0.20.x-logging-r2-spec.md(751 line, ACCEPT, frozen). - Critic r1:
.omc/reviews/2026-05-28-v0.20.x-logging-r2-spec-closure-result.md(6 finding, applied in r1c). - Critic r2:
.omc/reviews/2026-05-28-v0.20.x-logging-r2-spec-closure-r2-result.md(3 G-finding, plan-level resolution above). - Parent design:
docs/superpowers/specs/2026-04-27-kebab-final-form-design.md(§8 dep graph, §9 version cascade). - Round 1 spec:
docs/superpowers/specs/2026-05-28-v0.20-ingest-log-spec.md(frozen). - PDF OCR spec:
docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md(frozen). - Branch:
feat/ingest-log-round2-enhancements(HEAD89d334a). - HOTFIXES:
tasks/HOTFIXES.md— any post-merge deviations land here. - CLAUDE.md sections: §The facade rule (G2), §Versioning cascade (additive minor → no release trigger), §Naming + paths.
§7 Constraints
- No branch change. All 5 commits land on
feat/ingest-log-round2-enhancements. - Spec frozen. No edits to the ACCEPT spec from this plan. Deviations →
tasks/HOTFIXES.md. - Wire schema additive minor. Two new
*.v1schemas + two newWIRE_SCHEMASentries. No major bump. - Regression budget 0. Baseline 1370 workspace tests stay green; round 2 adds ≥5 new.
- Worker protocol — subagent skip. Executor runs in a single subagent-driven-development pass without spawning nested workers.
- Length budget. 500-700 line plan (this file ≈ 600).
- Build path.
export CARGO_TARGET_DIR=/build/out/cargo-target/target;-j 4default,-j 1for the workspace sanity pass. - Commit cadence. 5 commits (Steps 1-5). Step 6 verify-only.
- Doc sync. README untouched (no surface beyond
inspectsubcommands — append to README's 명령 table only if it already listsinspect); HANDOFF gets one line post-merge if load-bearing; ARCHITECTURE untouched; SMOKE config example updated in Step 5.7. - Release trigger. Wire additive minor → NOT a release trigger per CLAUDE.md §Versioning cascade. Defer bump to a dogfood-driven
chore: bump 0.20.x → 0.20.ylater.