Files
kebab/docs/wire-schema/v1/ingest_report.schema.json
altair823 4c5ccd5447 feat(wire): additive minor — IngestEvent kind 의 pdf_ocr_* + ingest_report.items[] 의 pdf_ocr_pages/ms_total + skipped field carry (Step 6 M-4/M-2)
Step 7 (Group G) of v0.20.0 sub-item 1 (scanned PDF OCR) plan +
Step 6 code reviewer Important M-4 (skipped field carry) + Minor M-2
(ordering invariant doc) fix.

G3 — JSON Schema sync (additive minor — schema_version 보존):

ingest_progress.schema.json:
- kind enum 2 추가: pdf_ocr_started + pdf_ocr_finished.
- 새 field: page (1-based PDF page), ocr_engine (engine_name), skipped (bool).
- 기존 ms / chars field 의 description 갱신 (pdf_ocr_finished carry 추가).

ingest_report.schema.json:
- items.items.properties 신규 정의 (이전 stub ["array", "null"] 만).
- pdf_ocr_pages + pdf_ocr_ms_total (nullable integer).
- 모든 기존 IngestItem field 도 명시화 (kind, doc_path, byte_len, ...).

Step 6 reviewer M-4 (Important) — skipped field carry:
- IngestEvent::PdfOcrFinished 에 skipped: bool 추가.
- ingest_one_pdf_asset 의 emit closure (lib.rs:~1864) 가 source
  PdfOcrProgress::Finished { skipped } 를 discard 않고 propagate.

Step 6 reviewer M-2 (Minor) — ordering invariant doc:
- crates/kebab-app/src/ingest_progress.rs 의 ordering text 갱신:
  ScanStarted < ScanCompleted < (AssetStarted [< (PdfOcrStarted <
  PdfOcrFinished)*] < AssetFinished)* < (Completed | Aborted).

.md doc (docs/wire-schema/v1/*.md) 부재 — plan §3 Step 7 G3 의 .md
deliverable retro N/A (해당 file 0).

spec:  docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md
plan:  docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 7 G3)
prior: b9ee09f (Step 6 wiring) + Step 6 reviewer M-4/M-2 권고
contract: §9 (additive minor wire bump — schema_version 보존)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 08:51:51 +00:00

79 lines
3.6 KiB
JSON

{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://kb.local/wire/v1/ingest_report.schema.json",
"title": "IngestReport v1",
"description": "Stub schema — declares the schema_version label and the required fields per design §2.4.",
"type": "object",
"required": [
"schema_version",
"scope",
"scanned",
"new",
"updated",
"skipped",
"unchanged",
"errors",
"duration_ms",
"skipped_by_extension"
],
"properties": {
"schema_version": { "const": "ingest_report.v1" },
"scope": { "type": "object" },
"scanned": { "type": "integer", "minimum": 0 },
"new": { "type": "integer", "minimum": 0 },
"updated": { "type": "integer", "minimum": 0 },
"skipped": { "type": "integer", "minimum": 0 },
"unchanged": {
"type": "integer",
"minimum": 0,
"description": "p9-fb-23: assets whose checksum + parser_version + chunker_version + embedding_version all matched the existing record. Parse / chunk / embed / vector upsert all skipped."
},
"errors": { "type": "integer", "minimum": 0 },
"duration_ms": { "type": "integer", "minimum": 0 },
"skipped_by_extension": {
"type": "object",
"additionalProperties": {
"type": "integer",
"minimum": 0
},
"description": "p9-fb-25: per-extension skip count. Key = lowercase extension without leading dot (e.g. 'docx'). Files without extension key under '<no-ext>'."
},
"items": {
"type": ["array", "null"],
"items": {
"type": "object",
"required": ["kind", "doc_path"],
"properties": {
"kind": { "type": "string", "enum": ["new", "updated", "skipped", "unchanged", "error"] },
"doc_id": { "type": ["string", "null"] },
"doc_path": { "type": "string" },
"asset_id": { "type": ["string", "null"] },
"byte_len": { "type": ["integer", "null"], "minimum": 0 },
"block_count": { "type": ["integer", "null"], "minimum": 0 },
"chunk_count": { "type": ["integer", "null"], "minimum": 0 },
"parser_version": { "type": ["string", "null"] },
"chunker_version": { "type": ["string", "null"] },
"warnings": { "type": "array", "items": { "type": "string" } },
"pdf_ocr_pages": { "type": ["integer", "null"], "minimum": 0, "description": "v0.20.0 sub-item 1: number of PDF pages 가 OCR pipeline 통과. null = OCR disabled or non-PDF asset." },
"pdf_ocr_ms_total": { "type": ["integer", "null"], "minimum": 0, "description": "v0.20.0 sub-item 1: cumulative OCR engine wall-clock duration (ms). null = OCR disabled or non-PDF asset." },
"error": { "type": ["string", "null"] }
}
}
},
"skipped_gitignore": { "type": "integer", "minimum": 0 },
"skipped_kebabignore": { "type": "integer", "minimum": 0 },
"skipped_builtin_blacklist": { "type": "integer", "minimum": 0 },
"skipped_generated": { "type": "integer", "minimum": 0 },
"skipped_size_exceeded": { "type": "integer", "minimum": 0 },
"skip_examples": {
"type": "object",
"properties": {
"generated": { "type": "array", "items": { "type": "string" }, "maxItems": 5 },
"size_exceeded": { "type": "array", "items": { "type": "string" }, "maxItems": 5 },
"builtin_blacklist": { "type": "array", "items": { "type": "string" }, "maxItems": 5 },
"gitignore": { "type": "array", "items": { "type": "string" }, "maxItems": 5 }
}
}
}
}