refactor(app): extract dispatch polymorphism — App.extract_for(...) + 11 Extractor registry #187
Reference in New Issue
Block a user
Delete Branch "refactor/extractor-dispatch-unification"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
요약
kebab-app의 hardcoded*Extractor::new().extract(...)callsite 11 곳 + 9 AST armmatch code_lang분기를App::extract_for(&MediaType, &ExtractContext, &[u8])단일 polymorphic call 로 통합.Extractortrait 의 dead polymorphism (trait 존재하나 vtable dispatch 미사용) 해소.본 PR scope = AST 9-arm + image + pdf extract callsite only. MarkdownExtractor 신설 / Tier 2/3 free function / outer 4-arm
match &asset.media_type/ inner 4-arm match (parser_version / chunker_version / tier3_fallback_cv / chunk dispatch) / Chunker dispatch unification 은 모두 별 PR future-defer (spec §11).설계: docs/superpowers/specs/2026-05-26-extractor-dispatch-unification-spec.md (2 round APPROVE — round 1 opus thorough + round 2 sonnet closure verify)
계획: docs/superpowers/plans/2026-05-26-extractor-dispatch-unification-plan.md (3 round APPROVE — round 1 opus + round 2/3 sonnet)
핵심 변경
App.extractors: Vec<Box<dyn Extractor + Send + Sync>>field 신설 +pub(crate) fn extract_for(...)helper method.App::open_with_config의 registry init = 11 Extractor (image + pdf + 9 AST: rust / python / typescript / javascript / go / java / kotlin / c / cpp).extractor: &'a ImageExtractorfield 제거 + lib.rs:356 localimage_extractor제거 + lib.rs:1235 alias 제거. ImagePipeline 은ocr_engine+caption_llm만 carry.image_extractor.extract(...)→app.extract_for(&asset.media_type, &ctx, &bytes)?PdfTextExtractor::new().extract(...)→app.extract_for(...)*Extractor::new().extract(...)9 callsite → 1 callsite (12 explicit+wildcard arm → 4 arm: 9 AST grouped + 7 manifest grouped + 1 shell + 1 other-bail)crates/kebab-app/src/app.rs의#[cfg(test)] mod tests_extractor_dispatchblock:supports()grid 16 sampleOMC review history
Phase A (spec, 2 round):
Phase B (plan, 3 round):
pub(crate)접근 불가 (integration test 위치) / test code workspace_root lifetime / test_fixtures 부재 / Step 4 retroactive 수정 / wire baseline cmd 부재 / Step 3 destructure 누락 / error.v1 wire scopeImplementation (executor):
2c05dbd)검증
11 acceptance gate (plan §4 + Step 11 exit gate)
git diff main..HEAD -- crates/kebab-parse-*= 0 line (trait surface + parser source 변경 0)Box::new(...Extractor::new(...))× 11 매칭 (image + pdf + 9 AST)app.extract_for(...)in lib.rs = 3 (image + pdf + AST grouped)*Extractor::new().extract잔존 = 0image_extractor잔존 = 0image_pipeline.extractor잔존 = 0extractor: &'a ImageExtractor잔존 = 0ingest_report.v1+search_response.v1byte-identical (jq filterindexed_at/duration_msstrip 후 diff = 0)Wire schema 변경 0 (success path)
ingest_report.v1/search_response.v1/answer.v1byte-identical 유지error.v1.message의 internal context string wording 변경은 spec §5.5 risk acceptance (error.v1.code+error.v1.schema_version보존 — user-visible surface 정의 외)v2 → v3 race condition (post-mortem)
plan round 2 critic + verifier 의 finding 3건 (CRITICAL #1 test 위치 / MAJOR #5 lib.rs:1235 / RawAsset field) 이 false-negative — planner v2 (754 lines) → v3 (920 lines) revision 시점에 reviewer spawn 발생, reviewer 가 v2 baseline 으로 misread. Round 3 grep cross-check 로 v3 actual content 가 이미 closure 됨 확인.
이 race condition lesson 은 future spec/plan revision 에서 reviewer spawn 전에 latest line count + version snapshot 명시 권장.
Wire / 변경 없음
error.v1.messageinternal context wording 변경 (spec §5.5 risk acceptance,error.v1.code보존)kebab-core/src/traits.rs): 변경 0kebab-parse-md / -pdf / -image / -code): 변경 0docs/superpowers/specs/2026-04-27-kebab-final-form-design.md): 변경 0 (contract_sections: [])parser_versioncascade: 변경 0시험 항목 (Test Plan)
cargo test --workspace --no-fail-fast -j 1→ 1316 PASS (baseline 1313 + 3 new)cargo build --release -p kebab-cli -j 4→ greencargo clippy --workspace --all-targets -j 1 -- -D warnings→ cleancargo tree -p kebab-app -e normal | grep "kebab-parse-" | wc -l→ 4app.extract_for≥ 3 / 9 AST*Extractor::new().extract= 0 /image_extractor= 0kebab ingest --jsonoutput schema 보존Assisted-by: Claude Code
회차 1 — Extractor dispatch polymorphism refactor 검토.
OMC team
extractor-dispatch-unification의 2-round spec APPROVE (planner spec drafter + opus thorough critic round 1 + sonnet closure verify round 2) + 3-round plan ACCEPT (planner → 3 critic-plan + 3 verifier-plan) 의 모든 closure 결과가 본 PR 의 코드에 정확히 reflect. 11 verification gate 모두 green + workspace 1316 tests (baseline 1313 + 3 new in-crate unit tests, delta = +3) + wire schema success path byte-identical.칭찬 (산문):
Pattern β scope 의 정확한 정렬 — spec round 1 critic 의 MAJOR #2 발견 (Pattern β 의 warning channel handover risk → wire schema diff 0 깨질 위험) 가 Option (ii) (MarkdownExtractor 신설 defer) 채택으로 closure. 본 PR scope 가 "AST 9-arm + image + pdf extract callsite only" 로 정확히 정렬 — markdown path 변경 0 + warning channel handover 회피. 결과적으로
ingest_report.v1.IngestItem.warningsbyte-identical 유지.App.extractors registry + extract_for helper 의 single-owner pattern — Option A 채택 (별 ExtractorRegistry struct 가 아닌 App 의 field). state-less Extractor 11 entry init cost 0 + future plugin system 의 migration path (OnceLock wrapper 또는 Vec element 만 wrapping) 보존.
pub(crate)visibility 가 facade rule 정합 + in-crate unit test 의 access 가능.Atomic block 1 의 ordering invariant — Step 3-4-5-6 의 image dispatch migration (lib.rs:1235 alias 삭제 → lib.rs:356 local 삭제 → ImagePipeline.extractor field 제거 → lib.rs:1289 callsite 교체) 4 edit 가 한 atomic block 으로 처리. plan round 1 critic MAJOR #5 + verifier GAP #4 발견 (lib.rs:1235 alias 의 explicit 삭제 의무 명시 누락) 가 round 2 reflection 으로 Step 3 (b) 에 정확히 명시. executor 단계의 build-red intermediate state 가 git history 에 안 들어감 (single clean commit).
9 AST arm hoist 의 atomic 단위 검증 — lib.rs:2012-2047 의 12 arm (11 explicit + 1 wildcard) → 4 arm (9 AST grouped + 7 manifest grouped + 1 shell + 1 other-bail). plan round 2 의 arm count partial closure (spec §3.7 "13 → 12" 정정) 가 plan + spec 모두 일관. Tier1 fail → Tier3 fallback 의 control flow 보존 trace (anyhow chain 의 outer context 추가가 root cause variant matching 영향 0).
In-crate unit test 의
pub(crate)access closure — plan round 1 critic CRITICAL #1 발견 (integration test 위치tests/extract_for_dispatch.rs에서pub(crate)필드 접근 불가) 가 Option α (in-crate#[cfg(test)] mod tests_extractor_dispatchin app.rs) 채택으로 closure. 3 test class — registry length 11 / mutually-exclusive supports() grid 16 sample / extract_for "no matching extractor" error path (Audio(Wav) MediaType) — 모두 PASS.RawAsset 8 field byte-identical — plan round 1 critic MAJOR #2 + verifier Blocker 2 발견 (test fixture 부재 + content_hash/ContentHash 오기 + stored 필드 누락) 가 actual
crates/kebab-core/src/asset.rs:63-73의 8 field (asset_id / source_uri / workspace_path / media_type / byte_len / checksum / discovered_at / stored) byte-identical 로 closure. executor 가 plan line 600 의 delegation 받아AssetStorage::Copied { path }정합화.wire schema 안정성 검증 — Step 11 의 wire diff verify 가
ingest_report.v1+search_response.v1byte-identical (jq filterindexed_at/duration_ms/scope.workspace_rootstrip 후 diff = 0).error.v1.message의 internal context string wording 변경은 spec §5.5 risk acceptance (error.v1.code+error.v1.schema_version보존 — user-visible surface 정의 외).dead polymorphism 해소 (부분) — Extractor trait 의 vtable dispatch 가 처음으로 활용. 11 hardcoded
*Extractor::new().extract(...)callsite + 9 AST arm match → 1app.extract_for(...)callsite + 1 grouped AST arm. trait surface 변경 0 + parser source 변경 0 + 기존 11 Extractor impl 보존. spec §6.5 의 self-aware partial polymorphism 명시 (outer 4-arm + Tier 2/3 + Chunker 는 별 PR future-defer).frozen contract 보존 —
contract_sections: [](design 변경 0) +git diff main..HEAD --name-only | grep "^tasks/p"= 0 line (~25 referencing task spec frozen 유지) + wire schema 변경 0 + workspace.version bump 0. minimal-surface refactor 의 cleanest pattern.v2 → v3 race condition lesson — plan round 2 critic + verifier 의 finding 3건이 false-negative (planner v2 → v3 revision 시점 reviewer spawn race). round 3 grep cross-check 로 v3 actual content 가 이미 closure 확인. future spec/plan revision 의 reviewer spawn 전 latest line count + version snapshot 명시 권장.
추가 actionable 없음. 2 file changed (app.rs +186/-2, lib.rs +16/-43), workspace 1316 tests PASS, release binary clean, wire schema success path 안정. 다음 작업 = 도그푸딩 (사용자 직접 검증 — markdown / pdf / image / code ingest 의 byte-identical 결과) → 도그푸딩 성공 시
gitea-release v0.19.0진행 (sub-item 2 의 normalize-absorption + 본 sub-item 3 의 extractor-dispatch 합쳐 single release tag).머지 OK. 머지 후 도그푸딩 → release tag 컷 권장.
altair823 referenced this pull request2026-05-27 11:04:37 +00:00