Commit Graph

167 Commits

Author SHA1 Message Date
a48b055358 feat(ingest): asset 내부 phase 진행 로깅 (asset_chunked/expansion_progress/asset_timings) + v0.24.0
asset(문서) 단위뿐이던 ingest 진행 이벤트에 문서 내부 phase 가시성을 추가.
큰 문서가 expansion(별칭 LLM, 청크당 순차)으로 수십 분 걸려도 진행바가
1/N 에 멈춘 듯 보이던 문제 해결.

wire ingest_progress.v1 additive (backward-compat):
- asset_chunked {idx,total,chunks} — 청킹 직후, markdown/image/pdf 전 경로
- expansion_progress {idx,total,done,chunks} — expansion 루프 스로틀
  (25청크 또는 1s, 종료 시 done==chunks). 캐시 히트도 done 에 포함
- asset_timings {idx,total,parse_ms,chunk_ms,expansion_ms,embed_ms,store_ms}
  — markdown 경로 phase별 wall-clock

설계: timing 은 kebab_core::IngestItem(wire-stable) 변경을 피해 신규
AssetTimings 이벤트로 ingest_one_asset 가 직접 emit (AssetFinished 무변경).

CLI(progress.rs): 진행바 sub-message(→ N chunks / 별칭 확장 done/chunks) +
asset 종료 시 phase timing 한 줄(fmt_ms). TUI reducer no-op arm.

검증: clippy -D warnings exit 0; cargo test -p kebab-app -p kebab-cli
312 passed/0 failed. ordering-invariant 테스트 재작성 + 신규 직렬화 테스트.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 13:58:27 +00:00
369aeb3d24 feat(embed): candle Metal (Apple Silicon GPU) opt-in build feature + v0.23.0
- kebab-embed-candle: `metal` feature → candle metal backend; select_device()
  picks Device::new_metal(0) (CPU fallback) under the feature, else Device::Cpu.
  .contiguous() before to_vec2 (Metal rejects strided views; CPU tolerates).
- feature passthrough: kebab-app/embed_metal → kebab-cli/embed_metal.
  Build on macOS: cargo build --release --features embed_metal.
- default (non-metal) path unchanged: clippy 0, candle units + thread_cap + parity pass.
- README + HOTFIXES: Mac-GPU-ingest → copy sqlite+lancedb → server CPU-query workflow.
- version 0.22.0 → 0.23.0 (opt-in build surface).

macOS-only compile; Metal execution/speed/parity validated by user on M4 Pro
(not buildable on the Linux CI/dev machine).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 11:37:08 +00:00
8f7b6ee538 feat(embed): candle 임베딩 provider (NUMA-안전, opt-in) + v0.22.0
duo-socket NUMA 서버에서 fastembed(onnxruntime)가 intra-op 스레드를 48개로
하드코딩해 NUMA 힙 손상 → double-free 로 ingest 가 죽는 문제를 회피하기 위해,
같은 multilingual-e5-large 모델을 순수 Rust(candle)로 돌리는 opt-in 임베딩
provider 를 추가한다.

- 신규 crate kebab-embed-candle: CandleEmbedder (kebab_core::Embedder).
  hf-hub safetensors → XLMRobertaModel forward → mask mean-pool → L2 → e5
  prefix. candle 의존성 트리를 이 crate 에 격리 (core/config 외 kebab-* 의존 0).
- 스레드 캡: [models.embedding].num_threads + env KEBAB_EMBED_THREADS →
  글로벌 rayon 풀 1회 캡 (NUMA-안전 레버).
- kebab-app::embedder() 가 provider 분기 (fastembed/onnx/"" → 기존 경로 불변,
  candle → CandleEmbedder, 미지값 → 에러).
- Phase 0 스파이크 crate 제거 (production 흡수).
- 버전 0.21.1 → 0.22.0 (신규 config surface, pre-1.0 minor bump).

패리티: cosine_min=1.000000, max abs diff=2.01e-7 (< 1e-5) → embedding_version
유지, 재색인 0. fastembed default 동작/벡터 불변. wire schema 변경 없음.

검증(파일+exit code): clippy -D warnings EXIT=0(warning 0), test EXIT=0
(candle unit 5 + thread_cap rayon=4 + config 68), parity #[ignore] EXIT=0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 14:52:25 +00:00
f2cc325cf3 feat(cli): kebab config migrate 서브커맨드 + wire config_migration.v1
- Cmd::Config { Migrate { --dry-run } }, --json 시 config_migration.v1.
- wire_config_migration (ConfigMigrationReport 가 schema_version 자체 보유).
- schema.rs WIRE_SCHEMAS 에 config_migration.v1 등록 + JSON schema 파일.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 12:09:31 +00:00
b7e022a5e3 feat(app): config migrate facade + init 주석 공유 + doctor 체크
- config_migrate_with_config_path: 백업(.bak)+atomic write(tmp→rename)+dry-run,
  round-trip 검증으로 실패 시 원본 보존. ConfigMigrationReport 반환.
- init_workspace 가 annotated_default_document() 사용(섹션 주석 포함).
- doctor 에 config_migration 체크 추가(미동기 시 ok=false + hint).
- tests/config_migrate.rs 4개(백업/atomic/dry-run/멱등/doctor) 통과.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 12:09:31 +00:00
e9b520216e fix(expansion): per-alias sentinel orphan cleanup + 캐시 견고성 (PR #195 리뷰)
MAJOR: 별칭 dense 벡터의 chunk_id 가 레거시 단일 `{id}#alias` 에서 줄별
`{id}#alias#0`, `#alias#1`, … 로 바뀌었으나 orphan cleanup 이 단일 sentinel
하나만 삭제해 `#alias#N` 벡터가 LanceDB / embedding_records 에 누수됐다.

- kebab-app: `alias_sentinel_ids_to_delete` 헬퍼 추가(접근법 A) — 본문 +
  legacy `{id}#alias` + `{id}#alias#0`..`{id}#alias#{max-1}` 를 모두 delete-set
  에 포함. max=expansion.max_aliases_per_chunk(= parse_aliases 의 하드 cap)와
  일치. parser-bump / edited-asset / deleted-file 세 LanceDB cleanup 경로 모두
  이 헬퍼를 사용.
- kebab-store-sqlite: embedding_records 명시 DELETE 4 경로(put_chunks /
  purge_*_except_doc_id / purge_orphan_at_workspace_path /
  purge_deleted_workspace_path)를 정확 일치(`|| '#alias'`)에서 `{id}#alias%`
  프리픽스 LIKE 로 전환. 본문 chunk_id 는 32자 hex 라 LIKE 와일드카드 없음.

MINOR 1: alias 캐시 히트 시 비-UTF8 payload 를 미스로 강등(재생성 분기로)
— embedding 경로의 decode-실패→미스 강등과 동작 일치.
MINOR 2: embedding version_key 맨 앞에 kind 토큰("doc") 추가 — 임베더가
kind 별 프리픽스를 붙이므로 미래에 query 임베딩이 같은 캐시를 타도 충돌 방지.

회귀 테스트:
- kebab-app: alias_sentinel_ids_to_delete 단위 테스트 2건.
- kebab-store-sqlite: per-alias sentinel embedding_records 가 세 cleanup
  경로 모두에서 사라지는지 핀하는 통합 테스트 3건.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 09:14:34 +00:00
a8fd76499c feat(expansion): doc-side expansion 별칭 개별 dense 벡터 + 파생물 캐시(V012)
별칭을 줄별 개별 dense 벡터(sentinel `{chunk}#alias#N`)로 색인하고
boilerplate 청크는 별칭 생성을 skip. 묶음 1벡터 방식은 평균화로 특정
표현이 희석돼 오히려 회귀(13/18)했던 것을 폐기. 변형 일관성 14/18 →
16/18, mean_spread@10 0.222 → 0.111 (나무위키 ~1000 문서 CS corpus).
`kebab-core::strip_alias_suffix` 가 suffix 형과 per-alias 형 둘 다 처리.

파생물 캐시(V012): embedding 벡터 + 별칭 LLM 결과를 청크 내용 해시
키로 캐싱해 재색인 시 내용 불변 청크의 재계산을 skip. cache_key =
blake3(kind ‖ text_blake3 ‖ version_key)[:32], version_key 에
model/prompt/dimensions 포함 → §9 cascade 와 정합(버전 bump 시 자동
miss). 측정: 정답 3개 cold 1879s → warm 13s ≈ 145배. 순수 가산이라
corpus_revision bump 없음. search/ask 는 kebab.sqlite+lancedb 만으로
동작 → 외부 서버 색인 후 DB 만 복사하는 이식 워크플로 가능.

V012 schema migration + 신규 surface 로 workspace version 0.20.2 →
0.21.0 (minor) bump. README/HANDOFF/ARCHITECTURE/HOTFIXES sync.
known limitation: stack·svm 설명형 2개 잔존 + grounded 판정이 부분
인용을 grounded 로 오분류(후속 후보).

측정 상세: docs/superpowers/handoffs/2026-05-31-namu-wiki-alias-cache-study.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 08:24:04 +00:00
afa8af0f88 feat(app): 별칭 dense 별도 벡터 색인 + purge (sentinel) 2026-05-30 10:48:58 +00:00
116b3e6377 fix(app): clippy unused_self — build_request 를 associated fn 으로
CI 게이트(clippy --workspace --all-targets -D warnings) 통과. 동작 동일.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 03:47:06 +00:00
cde4d75f6b feat(app): ingest 별칭 생성 hook (flag off 기본, fail-soft) 2026-05-30 03:03:09 +00:00
bddcd53688 fix(app): parse_aliases 접두 제거가 숫자/하이픈 선두 별칭 손상 (Task 4 리뷰 MAJOR-1)
탐욕적 trim_start_matches → 명시적 strip_list_marker(마커+공백 패턴만 1회).
"3D 렌더링"/"2단계"/"-fast" 보존, "- "/"1. " 마커만 제거. 회귀 테스트 2개.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 02:49:25 +00:00
2a207f9868 feat(app): ExpansionGenerator — 청크당 별칭 생성 (fail-soft)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 02:36:20 +00:00
dece5e89fc feat(bulk): document bulk search input schema + error shape hint (Todo #2)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 02:16:21 +00:00
21b52bc285 style: cargo fmt --all (S3+S4+S5+S7 follow-up)
V009 morphological tokenizer 작업 (S3 chunk + S4 backfill + S5
short_query_hint 제거 + S7 신규 tests) 의 형식 정리. 동작 변경 없음.

Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S11)
2026-05-28 12:06:01 +00:00
c5de5f812b test(fts,app): V009 morphological tokenizer integration tests
신규 4 test 추가:

- crates/kebab-store-sqlite/tests/fts.rs:
  - fts_v009_korean_morphological_2char_query_hits: tokenized_korean_text
    column 이 채워진 chunk 의 '한국' 2-char query hit.
  - fts_v009_english_whole_token_only: V007 trigram substring 매칭
    회귀 (Path A) — 'token' query 가 'tokenizer' chunk 에서 0-hit.
- crates/kebab-app/tests/search_korean.rs:
  - korean_morphological_2char_query_lexical_mode: end-to-end
    한국어 wiki fixture ingest → '한국' / '서울' query hit.
  - korean_morphological_mixed_english_korean_query: 'Rust' English
    whole-token + '최적화' Korean morpheme hit.

crates/kebab-search/src/lexical.rs:
  - build_match_string() 의 MIN_TRIGRAM_CHARS(3) → MIN_QUERY_CHARS(2).
    V009 unicode61 은 최소 token 길이 제한 없어 2자 한국어 morpheme
    query 가 통과되어야 함. 1자 단독은 여전히 필터.
  - 관련 unit test 2개 V009 동작으로 갱신.

fixture text 는 lindera ko-dic 의 실제 segmentation 동작에 의존
(spec Appendix B prior-knowledge 예측). 실측 시 fixture 조정 가능.

Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §9.1, §9.2
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S7)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 11:38:52 +00:00
f94e0c4a9b feat(app): bump lexical_index_version to V009 (fts5-v009-korean-morphological)
V009 의 FTS5 tokenizer 가 trigram → unicode61 + 한국어 형태소 분해
column 로 갱신됨. lexical_index_version 의 format 에
`fts5-v009-korean-morphological` suffix 추가하여 V007 baseline 과
구별. eval runner 의 config_snapshot 및 search cache 무효화에
자동 picks up.

기존 format: lex:{chunker_version}
신규 format: lex:{chunker_version}:fts5-v009-korean-morphological

Wire schema shape 변경 없음 (SearchHit.index_version 의 string
content 만 변화). lexical_index_version_is_returned_unchanged test
는 IndexVersion 의 임의 string 을 사용해 unchanged.

Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §11.1, §11.3
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S6)
2026-05-28 11:23:13 +00:00
923b959610 refactor(app): retire short_query_hint helper, keep wire field as None
V009 unicode61 + 형태소 tokenizer 환경에서 2-char 한국어 query 가
hit 가능해졌으므로 V007 시기의 "3자 이상 권장" hint 가 obsolete.
SearchResponse.hint field 는 wire schema 보존 위해 struct 에 유지 +
항상 None.

- kebab-app/src/app.rs: short_query_hint 함수 + doc-comment 삭제.
  2 호출 site 가 hint = None 으로 정리.
- kebab-app/src/lib.rs: re-export 에서 short_query_hint 제거.
- kebab-tui/{app.rs,search.rs,run.rs}: short_query_hint field + 4
  호출 cascade 제거.
- kebab-cli/tests/wire_search_response.rs:
  search_plain_emits_short_query_hint_to_stderr test 삭제.
  search_json_emits_hint_field_for_short_query →
  search_json_hint_absent_for_short_query_v009 으로 교체
  (hint 항상 None 검증).
- kebab-search/src/lexical.rs::build_match_string: V007 의 trigram
  multi-token OR-combine 분기는 V009 환경에서 redundant 하나 보존
  (future 확장성) — doc-comment 1 줄 추가.

Wire schema shape 변경 없음 (search_response.schema.json:33 의 hint
field 보존, struct 에 None 으로 항상 셋팅).

Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §7.2, §7.3, §11.3
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S5)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 11:13:45 +00:00
b63af20b72 feat(app): first-boot eager backfill for tokenized_korean_text
V007 → V009 업그레이드 시 기존 chunks 의 tokenized_korean_text 가
NULL — 첫 App::open_with_config 호출 시 자동으로 lindera ko-dic
으로 분해 후 UPDATE. chunks_au trigger 가 chunks_fts 를 자동 재-index.
사용자 재-ingest 불필요.

- crates/kebab-store-sqlite/src/store.rs:
  backfill_tokenized_korean_text(progress_cb, tokenize) API. 1000 row 마다
  commit + progress 콜백. idempotent (IS NULL 필터로 partial
  completion 재실행 안전). tokenizer 를 파라미터로 받아 §8 dep 경계 유지.
- crates/kebab-app/src/app.rs::open_with_config: run_migrations 직후
  backfill 호출. 실패 시 warn log 만 (App open 은 성공 — vector/hybrid
  mode 계속 가능). 500 row 마다 info log progress.
- crates/kebab-store-sqlite/tests/fts.rs:
  backfill_tokenized_korean_text_populates_nullable_rows 단위 test
  (idempotency 포함).
- clippy pre-existing 오류 수정 (redundant_closure, map_unwrap_or,
  cast_lossless, uninlined_format_args — kebab-app/ingest_log.rs,
  pdf_ocr_apply.rs, app.rs, tests/ocr_inspect_smoke.rs).

Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §8.1, §8.2
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S4)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 11:01:00 +00:00
e8f44a57e3 fix(test): update first_ingest_bumps_corpus_revision baseline for V009
V004 seeds corpus_revision=0, V009 migration bumps to 1 (spec §5.2 —
LRU cache invalidation). Test previously asserted fresh store = 0;
now reads post-migration baseline dynamically and verifies that the
ingest commit increments past it.

Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §5.2
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S3 follow-up)
2026-05-28 10:47:09 +00:00
597d8b70ad feat(deps): add lindera + lindera-ko-dic for korean morphological tokenizer
Workspace dependency 만 추가 — 실제 사용은 S3 의 kebab-chunk
tokenize_korean_morphological() helper.

- Cargo.toml (workspace): lindera = "3", lindera-ko-dic = "3" 추가.
- crates/kebab-chunk/Cargo.toml: per-crate dep (lindera-ko-dic 에
  embed-ko-dic feature 로 KO-DIC 딕셔너리 embedded blob 활성화).
- crates/kebab-app/Cargo.toml: [features] 에 fts_korean_morphological
  (spec §6.3 Option A — marker role only, disable path 없음).

License: lindera = MIT, lindera-ko-dic = MIT
(cargo info 로 확인). cargo deny 도입은 P9 follow-up.

Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §6.1, §10.1
Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S2)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 10:03:58 +00:00
9a36a06f97 style: cargo fmt --all (v0.20.x logging r2 feature follow-up)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 06:34:01 +00:00
35c987df1c feat(app): log retention — keep_recent_runs + retention_days (Enhancement 4)
LoggingCfg gains two fields with serde defaults: keep_recent_runs
(default 100, top-N file retention) and retention_days (default 30,
time-based retention for both ndjson files and the SQLite mirror).

IngestLogWriter::open now runs cleanup_old_logs before creating a new
ingest-*.ndjson — delete iff (idx >= keep_recent) OR (modified <=
cutoff). ingest_with_config_opts also calls
SqliteStore::prune_pdf_ocr_events(retention_days) at ingest start so
the SQLite mirror tracks the same retention window.

Backward compat (AC-9): both new fields use #[serde(default = ...)],
so a pre-v0.20.x config with only [logging] ingest_log_enabled +
ingest_log_dir parses unchanged. kebab init writes the new defaults
automatically via Config::default() -> toml::to_string_pretty (AC-12).

docs/SMOKE.md config example synced.

Closure r1 F5: explicit OR-on-stale comment inside cleanup_old_logs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 06:17:47 +00:00
d9ec7b8dc3 feat(cli): kebab inspect ocr-stats + ocr-failures (Enhancement 3 + wire schema additive minor)
Two new wire schemas land as additive minor: ocr_stats.v1 (corpus-wide
aggregate — total_events, success_rate, p50/p90/p99/max_ms, by_engine,
top-10 by_doc by failure count) and ocr_failures.v1 (per-doc or
corpus-wide recent failures, with --doc-id + --limit). Both ship via
new CLI subcommands `kebab inspect ocr-stats` / `inspect ocr-failures`.

App gains four facade methods: inspect_ocr_stats /
inspect_ocr_failures plus their *_with_config companions — required by
CLAUDE.md "the facade rule" so `--config <path>` is honored. The CLI
dispatch arms thread cfg explicitly into the _with_config form.

Runtime introspection emit (WIRE_SCHEMAS in schema.rs) gains two
entries; the meta JSON Schema (schema.schema.json) is untouched
because its wire.schemas is pattern-based, not enum-based.

ingest_log::percentiles extended to (p50, p90, p99, max). p99 surfaces
only via inspect ocr-stats; IngestSummary (round 1) stays 3-percentile.

SKILL.md synced with the two new schemas (AC-13).

Closure r2 G2 (facade *_with_config pair) + G3 (runtime emit, not
meta schema file) + closure r1 F4 (p99) resolved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 06:13:08 +00:00
4e451c9f7c feat(app): dual-write PDF OCR events to SQLite + ndjson (Enhancement 2 wiring)
Pre-capture canonical.doc_id and Arc<SqliteStore> before the OCR
emit_progress closure so both the ndjson file and the SQLite mirror
carry the same doc_id for every event. File write is durable
(errors propagate); SQLite insert is non-critical (tracing::warn on
failure, ingest does not abort) per spec R-1.

LogEvent::Ocr gains a doc_id: Option<&str> field as an additive
Serde change — round 1 ndjson logs deserialize with doc_id=None.

Closure r1 F1: doc_id NULL in dual-write resolved via
let doc_id_for_log = canonical.doc_id.0.clone() pre-capture.
Closure r2 G1: Arc::clone(&app.sqlite) reused instead of opening a
second SqliteStore — eliminates double-open lock contention and
duplicate migration runs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 06:06:03 +00:00
5977c8cdf1 feat(app): capture image_width/height in PDF OCR raster decode (Enhancement 1)
Add extract_image_dimensions(bytes) helper using image::ImageReader
and fill the 2 PdfOcrProgress::Finished emit points in pdf_ocr_apply.rs
where page_image_bytes is in scope (OCR error path + success path).
The no-DCTDecode skip path leaves None as page_image_bytes is absent.
Result: LogEvent::Ocr carries non-null image_width/image_height on
successful raster decode, enabling future size-conditioned timeout tuning.

Closure r1 F3: kebab-app/Cargo.toml image features += "jpeg" added as
direct [dependencies] entry (not relying on feature unification via
kebab-parse-image).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 05:54:55 +00:00
685007789a style: cargo fmt --all (round 4 ingest log feature follow-up)
Phase C4 executor 의 마지막 `fix(test): clippy + fmt fixes` commit 이
test file 부분만 fmt 적용. workspace 전체 fmt 누락 발견 → cargo fmt --all
적용. 모든 import alphabetical reorder + line wrapping 정합.

추가 untracked artifact 동시 commit:
- docs/superpowers/specs/2026-05-28-v0.20-ingest-log-spec.md (491 line, ACCEPT)
- docs/superpowers/plans/2026-05-28-v0.20-ingest-log-plan.md (616 line, ACCEPT)

workspace test: 1370 passed / 0 failed / 50 ignored, ingest_log_smoke green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 04:18:40 +00:00
445b096215 fix(test): clippy + fmt fixes for logging_roundtrip and ingest_log_smoke
* kebab-config/tests/logging_roundtrip.rs: r#"..."# → plain string
    (clippy::unnecessary_hashes).
  * kebab-app/tests/ingest_log_smoke.rs: |e| e.ok() → Result::ok,
    |s| s.as_u64() → Value::as_u64 (clippy::redundant_closure).
  * cargo fmt --all applied to pre-existing formatting drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 03:26:42 +00:00
415227bf76 test(app): ingest_log_smoke integration test (AC-9)
crates/kebab-app/tests/ingest_log_smoke.rs 신규:

  * ingest_log_smoke (AC-9): tempdir + 1 md + 1 scanned PDF →
    ingest → assert log file exists + 각 line valid JSON +
    각 kind ∈ {ocr,parse_error,skip,error,summary} + last
    line kind=summary + scanned>0.

  * ingest_log_disabled_emits_no_file (AC-6): enabled=false 일
    때 log_dir 안 ingest-*.ndjson 파일 0개 verify.

fixture: ../kebab-parse-pdf/tests/fixtures/scanned_page1.pdf
재사용 (OCR disabled — Ollama 없이 smoke 실행).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 03:06:43 +00:00
f9dc0f749f feat(app): wire IngestLogWriter into 5 ingest emit hooks (Arc<Mutex> sync)
v0.20.x ingest log feature 의 ingest pipeline wiring. 5 emit hook:

  Hook 1: ingest_with_config_opts entry/exit (writer init + summary write + flush)
  Hook 2: apply_ocr_to_pdf_pages closure (PdfOcrProgress::Finished → LogEvent::Ocr)
  Hook 3: ingest_one_*_asset Err arm (LogEvent::Error)
  Hook 4: scan 직후 fs_skips.events enumerate (LogEvent::Skip)
  Hook 5: (Hook 3 통합) per-asset fatal error → LogEvent::Error

Hook 4 의 skip event carry 위해 kebab-source-fs 의 FsScanSkips 에
events: Vec<FsSkipEvent> field 추가 (kebab-source-fs 가 kebab-app
재호출 안 함 — cycle 회피).

Ownership: Option<Arc<Mutex<IngestLogWriter>>> binding 1 곳, 5 hook 이
clone+lock+write. ocr_ms_samples (Vec<u64> success-only) 는 Arc<Mutex>
로 share, summary stage 가 sort+p50/p90/max 계산. single-threaded
per-asset loop 라 deadlock/contention 위험 없음.

Writer 실패는 ingest 자체 fail 시키지 않음 (tracing::warn + 진행).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 03:05:07 +00:00
bef0c98867 feat(wire): PdfOcrProgress.Finished + ingest_progress.v1 additive 4 fields
v0.20.x ingest log feature 의 wire side. additive minor cascade:

  * PdfOcrProgress::Finished + IngestEvent::PdfOcrFinished 의 4 field:
      - image_byte_size: Option<u64>
      - image_width:     Option<u32>
      - image_height:    Option<u32>
      - failure_reason:  Option<String>
  * docs/wire-schema/v1/ingest_progress.schema.json — 4 추가 property
    (모두 optional, required 변경 없음 = additive minor)
  * integrations/claude-code/kebab/SKILL.md — wire schema description 동기

기존 ingest_progress.v1 consumer (CLI wire dump, integration test
fixture, kebab-cli wire_search/wire_ask) 는 4 추가 field 의
Option::None 으로 backward-compat. version bump 0 (additive minor =
binary-version cascade trigger 아님 per CLAUDE.md §Versioning cascade).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 02:57:59 +00:00
f8a4c79727 feat(app): IngestLogWriter + LogEvent enum (per-ingest-run ndjson log)
v0.20.x ingest log surface 의 module side. crates/kebab-app/src/
ingest_log.rs 신규:
  * IngestLogWriter — open/write_event/write_summary/flush + Drop flush
  * LogEvent enum 4 variant (ocr / parse_error / skip / error)
  * IngestSummary struct (kind="summary" literal + 11 stat field)
  * generate_run_id (ISO 8601 prefix + uuid v7 마지막 8 hex)
  * expand_log_dir ({state_dir} placeholder 의 hand-roll expand)
  * now_ts (Rfc3339 UTC helper)
  * percentiles helper (sorted Vec p50/p90/max)

uuid v7 = workspace dep, rand 신규 의존 회피 (spec §6 R-5).

본 step 은 self-contained writer + 5 unit test. ingest pipeline 의
emit hook 5개 wiring 은 step 4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 02:53:09 +00:00
9b44e27dfe test(app): update schema_report assertion for streaming_ask=true (Bug #9 follow-up)
schema_report_reflects_freshly_ingested_kb 가 `!streaming_ask` 를 assert 했으나
Bug #9 fix (760eee8) 로 streaming_ask 가 true 로 정정됨. assertion 을 반전.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 23:58:10 +00:00
d9c7aabce1 feat(schema): add active_parsers + active_chunkers arrays to schema.v1.models (Bug #13)
이전: schema.v1.models 가 parser_version / chunker_version 단일 값만 보고 →
multi-medium corpus (md + pdf + code Rust/Python + dockerfile + k8s + manifest)
의 version cascade audit 누락 risk.

이후: additive minor — Models struct 에 active_parsers + active_chunkers Vec<String>
추가. backward compat: 기존 단일 field 보존 (markdown default), 신규 array 는
optional (#[serde(default)] + JSON schema required 미포함).

source:
- kebab_store_sqlite::fetch_distinct_parser_versions() 가
  documents.parser_version DISTINCT + ORDER BY 반환.
- fetch_distinct_chunker_versions() 가 chunks.chunker_version 동일 pattern.
- collect_models 가 매 schema 호출마다 재계산 (cache 없음 — R-3 자동 해결).

wire schema additive only — 메이저 bump 불필요. v0.20.1 minor 로 충분.
integrations/claude-code/kebab/SKILL.md 동기 갱신.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 23:15:58 +00:00
28f513795e fix(config): emit error.v1 code=config_not_found for missing --config path (Bug #10)
이전: `kebab search "rust" --config /tmp/nonexistent.toml --json` 가 exit=0 +
`{"hits":[]}` silent fallback to XDG default. typo / wrong path 가 0-hit 으로만
surface — debugging nightmare.

이후: kebab_config::ConfigNotFound thiserror::Error 추가, Config::load 의
`Some(p) if !p.exists()` arm 이 anyhow::Error::new(ConfigNotFound { path })
return. kebab_app::error_wire::classify 가 downcast → ErrorV1 code=config_not_found,
hint, details.path 채워서 stderr 에 ndjson 으로 emit.

R-1 (relative path): std::path::Path::exists() 는 cwd-relative — 별도 작업 없이
absolute + relative 모두 cover. integration test 두 개로 검증.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 23:14:54 +00:00
760eee89c8 fix(app): flip streaming_ask + single_file_ingest capabilities to actual surface (Bug #9)
capabilities_snapshot() 가 streaming_ask + single_file_ingest 를 hardcoded false 로
보고했으나 실제 구현은 v0.20 final-dogfood 에서 production-grade:
- kebab ask --stream → answer_event.v1 ndjson 191 event 정상 emit
- kebab ingest-file <path> / kebab ingest-stdin --title <T> → ingest_report.v1 정상

MCP host + Claude Code skill 등 agent 가 schema.capabilities 로 routing 결정 시
false negative → 사용자가 실제 동작 feature 를 사용 불가능하다고 오인.

http_daemon 은 false 유지 (별도 sub-item 의 non-impl).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 23:13:57 +00:00
241ded59df test(app): multi-scanned PDF chunk_id collision-free integration test (Bug #3 regression)
v0.20.0 sub-item 1 bugfix Step 3 (Group C) — integration-level regression
for Bug #3 (intra-doc chunk_id collision under aggressive overlap).

- `crates/kebab-app/tests/common/mod.rs`: `pub mod mock_ocr;` 1 line append.
- `crates/kebab-app/tests/common/mock_ocr.rs` (new): MockOcrEngine lift +
  `single` / `per_page` ctor (backward-compat single + per-page cursor).
- `crates/kebab-app/tests/pdf_ocr_apply.rs`: inline MockOcrEngine 제거 +
  `mod common; use common::mock_ocr::MockOcrEngine;` import. 10 ctor call
  site migration (`MockOcrEngine { .. }` → `MockOcrEngine::single(...)`).
- `crates/kebab-app/tests/multi_scanned_pdf_ingest_no_chunk_id_collision.rs`
  (new): F1 + F2 scanned PDF + Bug #3 trigger shape (10 char "가" + ". " +
  500 char "나") via mock OCR. assertion: chunk_id global uniqueness (HashSet
  dedup) across F1 + F2; F2 trigger text produces ≥2 chunks (collision shape).
- C1 decision: Option A (share via tests/common/mock_ocr.rs). Facade mock
  injection unavailable (OllamaVisionOcr hardcoded) — helper-level chain test
  (apply_ocr_to_pdf_pages → PdfPageV1Chunker) adds value beyond unit B5.

spec: docs/superpowers/specs/2026-05-27-v0.20-sub1-bugfix-spec.md (§4.5)
plan: docs/superpowers/plans/2026-05-27-v0.20-sub1-bugfix-plan.md (Step 3)
prior: 436fd01 (Step 2 Bug #3 chunk_id fix)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 13:45:38 +00:00
436fd015a2 fix(chunk): chunk_id collision under aggressive overlap; bump pdf-page-v1 → pdf-page-v1.1 (Bug #3)
v0.20.0 sub-item 1 dogfood report 의 Bug #3 (Critical). scanned_page2.pdf
(1580 char OCR text) ingest 시 `chunks.chunk_id` PRIMARY KEY violation —
`per_chunk_hash = #c{char_start}` 가 post-overlap `actual_start` 사용 +
overlap walk floor 가 `prev_min` 으로 collapse → segment 1/2 동일 `#c0`.

- `crates/kebab-chunk/src/pdf_page_v1.rs`: `chunk_page` returns 4-tuple
  (segment_start, actual_start, chunk_end, slice); caller `per_chunk_hash`
  suffix uses `segment_start` (pre-overlap boundary, strictly increasing)
  instead of `char_start` (post-overlap, may collapse to prev_min).
- VERSION_LABEL `"pdf-page-v1"` → `"pdf-page-v1.1"` (design §9 cascade,
  explicit user-facing audit trail). `crates/kebab-app/tests/pdf_pipeline.rs:
  168, 368` 의 hardcoded literal 도 v1.1 로 갱신.
- module docs (`pdf_page_v1.rs:47-60`): workaround description 의
  `#c{char_start}` reference 를 `#c{segment_start}` 로 갱신 + segment_start
  invariant 명문 + HOTFIXES.md cross-ref.
- `pdf_page_v1.rs::tests`: `multi_chunk_page_with_aggressive_overlap_produces_unique_chunk_ids`
  regression pin (10 char "가" + ". " + 500 char "나" — multi-chunk +
  overlap walk collapse trigger).
- `tasks/HOTFIXES.md`: 2026-05-27 entry (symptom F2 1580 char OCR,
  intra-doc collision root cause, second-iteration patch rationale).

spec:  docs/superpowers/specs/2026-05-27-v0.20-sub1-bugfix-spec.md (§4)
plan:  docs/superpowers/plans/2026-05-27-v0.20-sub1-bugfix-plan.md (Step 2)
prior: d9acda5 (Step 1 Bug #2 walker fix)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:32:09 +00:00
48197687b7 test(pdf): integration smoke (w/ search + cancel) + vector regression + alnum e2e (#[ignore]) for v0.20 sub-item 1
Step 9 (Group I) of v0.20.0 sub-item 1 (scanned PDF OCR) plan.

I3 — crates/kebab-app/tests/ingest_pdf_ocr_smoke.rs (신규):
- ingest_with_mock_ocr_yields_pdf_ocr_summary — `#[ignore]` real Ollama,
  ingest_with_config production path + IngestItem.pdf_ocr_pages verify.
- ocr_text_indexed_and_searchable — `#[ignore]` real Ollama, app.search
  의 OCR text indexed verify (§ Acceptance #2).
- ingest_with_cancel_aborts_mid_pdf — production cancel chain (pre-set
  cancel=true + dummy endpoint, no panic/deadlock verify).

I4 — crates/kebab-parse-pdf/tests/text_extractor_regression.rs (신규):
- vector_pdf_extract_byte_identical_to_baseline — F4 mojibake.pdf 의 vector
  PDF path canonical 의 byte-identical 보존 (Step 1-8 모든 변경 전후 invariant).
- baseline 신규 = tests/snapshots/vector_pdf_canonical.json (first run create).
- normalize_provenance_timestamps inline helper (R-3 mitigation, workspace
  전체 부재 — 신규 12-line).

I5 — crates/kebab-parse-pdf/tests/ocr_e2e.rs (신규):
- f1_alnum_accuracy_ge_85 / f2_alnum_accuracy_ge_70 — `#[ignore]` real
  Ollama qwen2.5vl:3b, § Acceptance §9 #3 의 implementation.
- alnum metric = strsim::levenshtein (dev-dep 추가).
- truth file copy from PoC scratch (page1.txt + page2-batchim.txt) →
  scanned_page1_truth.txt + scanned_page2_truth.txt.
- kebab-parse-image dev-dep 추가 (OllamaVisionOcr::from_parts 호출용).
  parser isolation invariant 의 dev-dep exception (spec §3.1, dep graph
  baseline -e normal 보존).

spec:  docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md
plan:  docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 9 I3+I4+I5)
prior: c9e0594 (Step 8 CLI printer)
contract: §9

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 10:10:58 +00:00
c9e05941c5 feat(cli): activate per-page PDF OCR progress printer + test(app): ingest_progress emit verify + spec(pdf-ocr): align §4.6.1 literal with option_A (ms/chars)
Step 8 (Group H) of v0.20.0 sub-item 1 (scanned PDF OCR) plan +
Step 7 reviewer concern fix (spec literal deviation).

H1 — kebab-cli/src/progress.rs printer activation:
- 구 no-op stub `IngestEvent::PdfOcr* { .. } => {}` (Step 6 placeholder)
  를 사람-친화 stderr line printer 로 활성화.
- spec §4.6.1 line 1085-1086 wording 그대로:
  - PdfOcrStarted → `  📷 OCR page {page}...`
  - PdfOcrFinished (skipped=false) → `  ✓ OCR page {page} ({chars} chars, {ms}ms via {ocr_engine})`
  - PdfOcrFinished (skipped=true)  → `  ⊘ OCR page {page} skipped (no DCTDecode or engine fail, {ms}ms)` (M-4 의 skipped field carry 활용)
- `!quiet` gate 정합 (AssetStarted/Finished pattern mirror).

H2 — crates/kebab-app/tests/ingest_progress.rs 의 새 test:
- pdf_ocr_progress_emits_started_finished_events (real Ollama 의존, `#[ignore]`).
- F1 fixture (scanned_page1.pdf) ingest 시 pdf_ocr_started + pdf_ocr_finished
  event 가 emit 됨을 verify. Started count == Finished count invariant.
- Manual invoke: `KEBAB_PDF_OCR_ENABLED=true cargo test -p kebab-app --test
  ingest_progress --ignored`.
- mock OcrEngine inject path 부재 (Step 6 의 eager build), Step 9 I5 의
  ocr_e2e pattern (real Ollama + `#[ignore]`) 와 동일.

Step 7 reviewer concern fix — spec §4.6.1 literal:
- line 1076-1077 의 `ocr_ms` / `ocr_chars` literal 을 wire schema 의 실제
  field name `ms` / `chars` (option_A, Rust serde 와 정합) 로 갱신.
- line 1087 의 printer wording 도 `{ocr_chars}` / `{ocr_ms}` → `{chars}` / `{ms}`.
- line 1556 의 rationale 참조 `pdf_ocr_finished.ocr_ms` → `.ms`.
- `skipped` field 도 명시 (Step 6 reviewer M-4 결과).

spec:  docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (§4.6.1)
plan:  docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 8 H1+H2)
prior: 4c5ccd5 (Step 7 wire schema) — Step 7 reviewer concern 1 의 fix
contract: §9 (additive minor wire bump — Step 7 commit 에서 완료)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 09:18:49 +00:00
4c5ccd5447 feat(wire): additive minor — IngestEvent kind 의 pdf_ocr_* + ingest_report.items[] 의 pdf_ocr_pages/ms_total + skipped field carry (Step 6 M-4/M-2)
Step 7 (Group G) of v0.20.0 sub-item 1 (scanned PDF OCR) plan +
Step 6 code reviewer Important M-4 (skipped field carry) + Minor M-2
(ordering invariant doc) fix.

G3 — JSON Schema sync (additive minor — schema_version 보존):

ingest_progress.schema.json:
- kind enum 2 추가: pdf_ocr_started + pdf_ocr_finished.
- 새 field: page (1-based PDF page), ocr_engine (engine_name), skipped (bool).
- 기존 ms / chars field 의 description 갱신 (pdf_ocr_finished carry 추가).

ingest_report.schema.json:
- items.items.properties 신규 정의 (이전 stub ["array", "null"] 만).
- pdf_ocr_pages + pdf_ocr_ms_total (nullable integer).
- 모든 기존 IngestItem field 도 명시화 (kind, doc_path, byte_len, ...).

Step 6 reviewer M-4 (Important) — skipped field carry:
- IngestEvent::PdfOcrFinished 에 skipped: bool 추가.
- ingest_one_pdf_asset 의 emit closure (lib.rs:~1864) 가 source
  PdfOcrProgress::Finished { skipped } 를 discard 않고 propagate.

Step 6 reviewer M-2 (Minor) — ordering invariant doc:
- crates/kebab-app/src/ingest_progress.rs 의 ordering text 갱신:
  ScanStarted < ScanCompleted < (AssetStarted [< (PdfOcrStarted <
  PdfOcrFinished)*] < AssetFinished)* < (Completed | Aborted).

.md doc (docs/wire-schema/v1/*.md) 부재 — plan §3 Step 7 G3 의 .md
deliverable retro N/A (해당 file 0).

spec:  docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md
plan:  docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 7 G3)
prior: b9ee09f (Step 6 wiring) + Step 6 reviewer M-4/M-2 권고
contract: §9 (additive minor wire bump — schema_version 보존)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 08:51:51 +00:00
b9ee09f176 feat(app): wire PDF OCR enrichment + cancel propagation into ingest_one_pdf_asset (H-5 eager init + post-extract hook + per-page cancel) + workspace lopdf dep (Step 4 M-4)
Step 6 (Group E) of v0.20.0 sub-item 1 (scanned PDF OCR) plan +
Step 7 spillover (IngestEvent variant + IngestItem field for compile
boundary) + Step 4 reviewer Minor M-4 fix.

E1 — eager PDF OCR engine build at `ingest_with_config_opts` entry,
mirror of image OCR pattern (lib.rs:338-347). `pdf.ocr.enabled ||
always_on` 시 `OllamaVisionOcr::from_parts(endpoint, model, ...)` 호출
+ fail-fast `?`. App field 추가 0 (local var only, spec L-1 / Step 1
A1 cosmetic fix 정합).

E2 — `ingest_one_pdf_asset` signature extension: +3 param
(`pdf_ocr_engine: Option<&OllamaVisionOcr>`, `progress: Option<&
mpsc::Sender<IngestEvent>>`, `cancel: Option<&Arc<AtomicBool>>`).
`ingest_one_asset` dispatch wrapper + caller (dispatch loop) update.

E3 — post-extract enrichment block at `extract_for` 직후 (line 1779).
`pdf.ocr.enabled || always_on` 시 `apply_ocr_to_pdf_pages` 호출,
PdfOcrProgress → IngestEvent emit (PdfOcrStarted / PdfOcrFinished
with ocr_engine), summary 의 pages_ocrd/ms_total 을 IngestItem field
로 carry. PR #187 registry dispatch invariant 보존
(`extract_for(&asset.media_type, ...)` 그대로).

E4 — cancel handle propagation: ingest_with_config_cancellable →
IngestOpts.cancel → ingest_with_config_opts → ingest_one_asset →
ingest_one_pdf_asset (new `cancel` param) → PdfOcrOpts.cancel chain.
spec §4.8 line 1159 production wiring.

Step 7 spillover (compile boundary):
- `kebab_app::ingest_progress::IngestEvent`: PdfOcrStarted { page } +
  PdfOcrFinished { page, ms, chars, ocr_engine }. serde discriminant
  `pdf_ocr_started` / `pdf_ocr_finished` (Step 7 G3 wire schema 와 일치).
- `kebab_core::IngestItem`: pdf_ocr_pages: Option<u32> +
  pdf_ocr_ms_total: Option<u64> (warnings/error 사이). 11 non-PDF
  IngestItem construct site 가 `None` 채움.
- `kebab-cli/src/progress.rs` + `kebab-tui/src/ingest_progress.rs`:
  새 variant no-op handler (v1에서 per-page progress 미노출, future
  refinement 시 활성화 가능).
- `kebab-store-sqlite/tests/ingest_report_snapshot.rs` + snapshot
  `ingest_report.snapshot.json`: 2 IngestItem fixture 의 새 field 추가.
- Step 7 의 JSON Schema 갱신 + CLI printer activation + snapshot
  regenerate 는 별 commit (G3/H1/H2 deliverable).

M-4 (Step 4 reviewer Minor) — lopdf workspace dep 통합:
- workspace `Cargo.toml [workspace.dependencies] lopdf = "0.32"`.
- kebab-app + kebab-parse-pdf 의 direct dep → `{ workspace = true }`.

Verifier evidence:
- workspace test (`cargo test --workspace --no-fail-fast -j 1`):
  175 test result summary lines, 0 failures, 0 FAILED.
- workspace clippy (`-D warnings`): exit 0, 0 warning.
- dep graph baseline (`.omc/state/pdf-ocr-{parse-pdf,app-parse}-deps.baseline.txt`):
  empty diff for both.

spec:  docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (§4.4 + §4.6 + §4.8)
plan:  docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 6 E1-E4 + Step 7 partial G1+G2)
prior: 4672cba (Step 5 fix) + fd918a6 (Step 5) + 9f003ef (Step 4 helper)
contract: §9 (additive minor wire bump — Step 7 JSON Schema 완료 시)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 08:18:34 +00:00
fd918a60ce feat(config): add [pdf.ocr] section — qwen2.5vl:3b default, opt-in + env overrides + doc(app): PdfOcrOpts field doc (Step 4 I-1)
Step 5 (Group F) of v0.20.0 sub-item 1 (scanned PDF OCR) plan +
Step 4 reviewer Important I-1 fix (PdfOcrOpts field doc) 동봉.

F1 — `kebab-config::PdfCfg` + `PdfOcrCfg` + 4 default fn:
- PdfCfg { ocr: PdfOcrCfg }.
- PdfOcrCfg with 11 field (enabled/always_on/engine/model/endpoint/
  languages/max_pixels/request_timeout_secs/valid_ratio_threshold/
  min_char_count/lang_hint).
- defaults: opt-in (enabled=false), qwen2.5vl:3b, 0.5 threshold, 20 char.
- mirror of image OCR cfg pattern (spec §4.5).

Config struct extension:
- `pdf: PdfCfg` field with `#[serde(default = "PdfCfg::defaults")]`.

11 env var override (parallel to KEBAB_IMAGE_OCR_*):
KEBAB_PDF_OCR_{ENABLED,ALWAYS_ON,ENGINE,MODEL,ENDPOINT,LANGUAGES,
MAX_PIXELS,REQUEST_TIMEOUT_SECS,VALID_RATIO_THRESHOLD,MIN_CHAR_COUNT,
LANG_HINT}.

F2 — `crates/kebab-config/tests/pdf_ocr.rs` (신규):
- toml roundtrip (11 field).
- defaults (opt-in + qwen2.5vl:3b).
- env override (4 key sample + default preservation).

F3 (Step 4 I-1) — `pdf_ocr_apply.rs` 4 public item 의 doc comment:
- PdfOcrOpts struct + 6 field.
- PdfOcrSummary struct + 2 field.
- apply_ocr_to_pdf_pages fn (Errors block 포함).
- PdfOcrProgress enum + 2 variant + 5 field.
body 변경 0, doc-only.

spec:  docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (§4.5)
plan:  docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 5 F1+F2)
prior: 9f003ef (Step 4) — code reviewer Important I-1 resolution
contract: §9 (additive minor wire bump — Step 7)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 07:07:18 +00:00
9f003ef1cd feat(app): add pdf_ocr_apply helper (10 test, F7 split + cancel) — post-extract OCR enrichment for PDF (H-1 resolution)
Step 4 (Group D) of v0.20.0 sub-item 1 (scanned PDF OCR) plan.

D1 — `apply_ocr_to_pdf_pages(&mut canonical, &dyn OcrEngine, &bytes, &opts, emit_progress)`
in `kebab-app::pdf_ocr_apply`. spec §4.1 line 381-599 body 그대로 +
PdfOcrOpts.cancel field + per-page cancel check (verifier LOW L-1).

post-extract enrichment pattern (H-1 resolution): kebab-parse-pdf 가
kebab-parse-image::OcrEngine 을 import 하지 않음 (parser isolation 보존).
helper 가 kebab-app 의 facade 안 — both parser crate 의 cross-import 회피.

Per-page decision matrix (spec §4.1 line 459-464):
- always_on=true → 모든 page OCR (dual-block, ordinal = page-1 + page_count).
- always_on=false + needs_ocr → in-place OCR (text-detect block mutate).
- needs_ocr=false → skip.

DCTDecode-only v1 (H-3): FlateDecode / CCITTFaxDecode page 는
extract_dctdecode_page_image=None → Warning event + skip + emit_progress(skipped=true).

OcrEngine.recognize 실패 → Warning event + skip + emit_progress(skipped=true).

D3 — per-page cancel handle (verifier LOW L-1 + spec §4.8 line 1159):
PdfOcrOpts.cancel: Option<Arc<AtomicBool>>. set→true 시
`anyhow::bail!("PDF OCR cancelled mid-PDF at page N")`.

lopdf = "0.32" added to [dependencies] (already transitive via kebab-parse-pdf;
no new crate introduced — dep graph kebab-parse-* baseline unchanged).

Integration test (`tests/pdf_ocr_apply.rs`, 10 test):
- f1_input_with_ocr_enabled_replaces_empty_block — in-place mutate.
- f3_input_with_ocr_enabled_keeps_text_detect_blocks — vector PDF skip.
- f1_input_with_ocr_disabled_keeps_empty_block — disabled no-op.
- f4_input_with_ocr_enabled_replaces_mojibake_block — mojibake → in-place mutate.
- f3_input_with_always_on_pushes_dual_blocks — always_on dual-block.
- f6_flatedecode_skipped_with_warning — FlateDecode skip + Warning event.
- f7_ccittfax_skipped_with_warning — CCITTFax skip + Warning event (verifier M-4 split).
- ocr_engine_failure_surfaces_as_warning — OCR failure → Warning event.
- dual_block_ordinals_are_deterministic_and_unique — ordinal invariant.
- cancel_handle_aborts_mid_pdf — cancel handle 의 production source (D3).

MockOcrEngine fixture: spec §5.5 line 1284-1299. F3 fixture 부재 →
mock CanonicalDocument construction + F1 bytes reuse pattern (Option B:
PdfTextExtractor::extract 를 통한 실제 production path canonical 생성).

spec:  docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (§4.1 + §5.5)
plan:  docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 4 D1+D2+D3)
prior: c2cd3a7 (Step 3) + 8d81bc1 (Step 3 clippy fix)
contract: §9 (additive minor wire bump — 후속 step)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 06:42:01 +00:00
2c05dbd0dd refactor(app): extract dispatch polymorphism — App.extract_for(...) + 11 Extractor registry
kebab-app 의 hardcoded extract dispatch (`ImageExtractor` + `PdfTextExtractor` + 9 AST `*Extractor` 의 `::new().extract(…)` callsite 11곳 + 9 AST arm match) 를 `App::extract_for(&MediaType, &ExtractContext, &[u8])` 단일 polymorphic call 로 통합. trait 변경 0, parser source 변경 0, wire schema 변경 0 (success path).

핵심 변경:
- App struct 에 `pub(crate) extractors: Vec<Box<dyn Extractor + Send + Sync>>` field + `pub(crate) fn extract_for(...)` helper method.
- App::open_with_config 의 registry init = 11 Extractor (image + pdf + 9 AST).
- ImagePipeline struct 의 `extractor: &'a ImageExtractor` field 제거 + lib.rs:356 local + lib.rs:1235 alias 삭제 (atomic block).
- 9 AST arm (lib.rs:2012-2047 의 12 arm = 11 explicit + 1 wildcard) → 4 arm (9 AST grouped + 7 manifest + 1 shell + 1 other-bail).
- in-crate unit test (app.rs 의 `mod tests_extractor_dispatch`) 3 class: registry length 11 / mutually-exclusive supports() grid (16 sample MediaType) / extract_for error path (Audio).

scope = AST 9-arm + image + pdf extract callsite only. MarkdownExtractor / Tier 2/3 / outer 4-arm / inner 4 match / Chunker dispatch 모두 future-defer (별 PR — spec §11).

Wire schema (success path) 변경 0 — ingest_report.v1 / search_response.v1 / answer.v1 byte-identical (4-medium SMOKE 비교 검증). error.v1.message 의 internal context string wording 변경 (예: `kb-parse-image::ImageExtractor::extract` → `kb-app::extract_for (image)`) 은 spec §5.5 risk acceptance — `error.v1.code` + `error.v1.schema_version` 보존, user-visible surface 외. Cargo workspace.version bump 0.

Refs:
- docs/superpowers/specs/2026-05-26-extractor-dispatch-unification-spec.md (2 round APPROVE)
- docs/superpowers/plans/2026-05-26-extractor-dispatch-unification-plan.md (3 round APPROVE)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 17:43:44 +00:00
710945c4b0 refactor(parse-md): absorb kebab-normalize + kebab-parse-types — 24 → 22 crates + §3.7b 재작성
design §3.7b 의 thin layer (ParsedBlock 류) 가 4 parser 중 1개 (markdown) 만 lift 를
경유하는 현실 — fan-in/fan-out 모두 1 → layer 의미 잃음. kebab-normalize (1097 LOC)
+ kebab-parse-types (98 LOC) 둘을 kebab-parse-md 로 흡수.

설계: docs/superpowers/specs/2026-05-26-normalize-absorption-spec.md
플랜: docs/superpowers/plans/2026-05-26-normalize-absorption-plan.md
HOTFIXES: tasks/HOTFIXES.md 의 2026-05-26 entry (design deviation)

- 5 사용 type + 3 forward-declared struct → kebab-parse-md::types module 의 pub explicit re-export.
- build_canonical_document + derive_title + warning_agent → kebab-parse-md::normalize module.
- 4 hard-coded agent literal (lib.rs:122/128/134/153) + warning_agent body return + tracing target literal 모두 보존 — stage label 일관성.
- kebab-app callsite (lib.rs:51 use + :1119 context string) + Cargo.toml 의 2 dep (regular + dead) 제거.
- kebab-chunk + kebab-store-sqlite 의 [dev-dependencies] kebab-normalize → 제거 (kebab-parse-md 로 갈음). 통합 test source 의 use shift.
- test file 이동 (kebab-normalize/tests/normalize_snapshot.rs → kebab-parse-md/tests/).
- workspace Cargo.toml: Hunk (a) members 2 entry 삭제 + Hunk (b) version 0.18.0 → 0.19.0 (frozen contract 변경).
- design §3.7b 4-단락 재작성 (원래 intent 보존 + 현재 상태 + 보존된 surface + future re-extraction trigger).
- design §8 graph 갱신 (3 edge 제거 + 2 forbidden bullet 의미 갱신 + commentary).
- ARCHITECTURE.md crate graph + directory tree mechanical 갱신.
- tasks/INDEX.md L169 closure mention + "Future work / deferred" 섹션 신설 (image/pdf normalize integration entry).
- tasks/HOTFIXES.md 신규 entry (4-block — design deviation Symptom).
- HANDOFF.md cross-link 한 줄.
- 3 dead struct (ParsedImageRegion / ParsedPdfPage / ParsedAudioSegment) 는 보존 — v0.20+ image/pdf normalize integration 의 future surface (spec §11).

Wire / surface impact: 0건. CLI / TUI / MCP / --json 출력 / config / XDG path /
parser_version 모두 unchanged. wire-invisible provenance.events[].agent + tracing target
literal "kb-normalize" 도 보존 — old DB row 와 new DB row 의 audit log 일관성.

Verification: cargo test --workspace --no-fail-fast -j 1 → 1313 passed / 0 failed (172 result blocks).
cargo clippy --workspace --all-targets -j 1 -- -D warnings → 0 warning (5m 46s).
cargo metadata --no-deps --format-version 1 | jq '.workspace_members | length' = 22.
cargo tree -p kebab-app --depth 2 | grep -E "kebab_(parse_types|normalize)" = 0 줄.
2026-05-26 15:00:59 +00:00
7c27633df2 chore(rag): post-PR9 refactor — H1/H2/H3/D/E + test coverage + post-refactor dogfood retest
OMC team `post-pr9-refactor` 의 architectural cleanup. architect priorities 분석 후 executor + test-engineer 가 file edits, system-architect 가 component-level review 로 *pre-cut nothing — all v0.18.1+ defer* 결론.

## Executor 작업 (H1/H2/H3/D/E)

- **H1** (kebab-nli/src/onnx.rs): `[models.nli]` config wire 활성화. `DEFAULT_MODEL_ID` const 제거 (kebab-config 의 NliCfg::defaults 가 single source). OnnxNliVerifier::new 가 config.models.nli.model 읽고 config.models.nli.provider 가 "onnx" 아니면 anyhow::bail. 3 stale "PR-9c-1 will wire this" 코멘트 제거. 2 unit test 추가 (`new_uses_config_model_id`, `new_rejects_unsupported_provider`).
- **H2** (kebab-rag/src/pipeline.rs): `truncate_for_nli(premise: &str, _hypothesis: &str)` → `truncate_for_nli(premise: &str)`. v0.18.1 placeholder doc 제거. 4 callsite (tests/multi_hop.rs) 갱신 + test rename `multi_hop_truncate_for_nli_preserves_hypothesis` → `multi_hop_truncate_for_nli_char_budget` (contract 정합).
- **H3** (kebab-rag/src/pipeline.rs:1041): `was_truncated` 가 tracing::debug! 으로 surface (observability 추가, signature 보존 — caller logging contract).
- **D** (kebab-mcp/tests/tools_call_ask_multi_hop.rs): request_timeout_secs 2 → 5 (slow CI 안정성), `mh_code` discriminator 제거. dispatch contract = `mh.is_error.unwrap_or(false)` (기존 assertion 으로 충분).
- **E** (tasks/HOTFIXES.md + pipeline.rs:1633-1638): fb-41 PR-9 closure entry 의 sibling 으로 "### PR-9 NLI refusal: terminal Synthesize hop omitted from hops trace" subsection 추가. pipeline 의 "cleanup deferred to a follow-up" → "// See tasks/HOTFIXES.md ... for follow-up" cross-link.

## Test-engineer 작업 (T1/T2/T3/T4, 9 new tests)

- **T1** (kebab-nli/src/onnx.rs::tests): sanitize_model_id 3 unit (replaces_slash / idempotent / leaves_other_chars).
- **T2** (kebab-rag/tests/multi_hop_nli_panic.rs 신규): 2 panic-path tests — facade invariant (`expect("verifier must be Some when nli_threshold > 0.0")`) 의 #[should_panic] + threshold=0 의 companion.
- **T3** (kebab-rag/tests/multi_hop_nli_stream.rs 신규): 2 StreamEvent::Final tests — refuse_nli_verification + refuse_nli_model_unavailable 의 stream_sink Final 분기 wire shape pinning.
- **T4** (kebab-app/tests/open_with_config_nli.rs 신규): 2 NLI failure path — model_dir 가 unwritable 일 때 App::open_with_config 의 Result<App> Err (with "OnnxNliVerifier" in chain) + threshold=0 일 때 graceful skip.

## System-architect 결론

3 lenses (absorption / duplication / under-engineered interface) 분석 결과 — *pre-cut nothing*. Top-3 items 모두 v0.18.1+ defer:
- Lens 1: kebab-normalize + kebab-parse-types 흡수 가능 (parse-md 만 사용, 5 parsers 우회) → v0.18.1+.
- Lens 3: Extractor + Chunker trait 의 dead polymorphism (모든 callsite 가 hardcoded) → v0.18.1+.
- Lens 1 bundled: kebab-source-fs 가 kebab-parse-code 의 9 tree-sitter grammars drag → low-risk dep-graph win, v0.18.1+ bundled.
- Defer-with-intent: LanguageModel async refactor (cloud-LLM 시), NliVerifier::score_batch + typed NliError (2nd impl 시), compute_stale → kebab-core::stale.

보고서: /build/cache/tmp/post-pr9-refactor-priorities.md, /build/cache/tmp/system-architecture-priorities.md (둘 다 repo 외 — analysis 보존).

## 검증

- cargo test -p kebab-nli -j 1 → 11/11 pass.
- cargo test -p kebab-rag -j 1 → 75/75 pass (5 NLI multi-hop + 4 신규 T2/T3 포함).
- cargo test -p kebab-app -j 1 → 23 pass + 2 ignored (T4 의 2 포함).
- cargo test -p kebab-mcp --test tools_call_ask_multi_hop -j 1 → 1 pass + 1 pre-existing flaky (HOTFIX #15, no_chunks short-circuit, executor D fix 와 무관 — line 86 의 base assertion 이 fixture 없어서 fail).
- cargo clippy --workspace --all-targets -j 1 -- -D warnings clean.
- cargo test --workspace --no-fail-fast -j 1 → 1304 passed (+11 new) + 1 pre-existing flaky 동일.
- **Post-refactor dogfood retest byte-identical** (PR-9d / post-cleanup / post-refactor 3번 모두): S7 0.0035389824770390987, S1 0.058334656059741974, S10 0.0027875436935573816, S3 nli_model_unavailable.

docs/dogfood/v0.18.0/SUMMARY.md 에 "Post-architectural-refactor retest" section 추가.

Wire 영향: 없음.
Behavior 영향: 없음 (H1 의 config wiring 가 default 와 같은 model → byte-identical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 04:42:37 +00:00
7c85de065a chore: workspace-wide cleanup — clippy::pedantic baseline + auto-fix
cut PR v0.18.0 전 마지막 정리. 사용자 요청: "전체 코드베이스를 깔끔하고 알아보기 쉽게".

## Workspace lints

- `Cargo.toml` 의 `[workspace.lints.clippy]` 에 `pedantic = "warn"` (priority -1) + 의도적 allow-list 추가:
  - cast_possible_truncation / cast_possible_wrap / cast_sign_loss / cast_precision_loss — ONNX i64 / hash modular reduction 등 의도적 truncation.
  - doc_markdown / missing_errors_doc / missing_panics_doc — cosmetic doc style.
  - too_many_lines / module_name_repetitions / must_use_candidate / needless_pass_by_value / manual_let_else / items_after_statements / similar_names — informational only.
  - format_collect / match_wildcard_for_single_variants / trivially_copy_pass_by_ref / unnecessary_wraps — intentional patterns (exhaustive match, future Result variants 등).
  - default_trait_access — `Foo::default()` 가 idiomatic.
  - float_cmp — NLI / RRF score 의 explicit threshold 비교 의도.
  - struct_excessive_bools / case_sensitive_file_extension_comparisons / naive_bytecount / ignore_without_reason — domain-specific 의도.
  - format_push_string / return_self_not_must_use / match_same_arms — builder / wire-label / hot-path 패턴 보존.
  - needless_continue / used_underscore_binding / nonminimal_bool / unreadable_literal / many_single_char_names / doc_link_with_quotes / assigning_clones / collapsible_str_replace / trivial_regex / elidable_lifetime_names / range_plus_one / explicit_iter_loop / implicit_hasher / ref_option — remaining low-value style.
- 각 24 crate `Cargo.toml` 에 `[lints] workspace = true` 추가.

## Auto-fix

`cargo clippy --workspace --all-targets --fix` 적용 — 128 files changed, 552 insertions / 472 deletions. 주로:
- uninlined_format_args (~18): `format!("{}", x)` → `format!("{x}")`.
- redundant_closure_for_method_calls (~33): `.map(|x| x.foo())` → `.map(T::foo)`.
- 그 외 mechanical refactor.

## 검증

- `cargo clippy --workspace --all-targets -j 1 -- -D warnings` clean (pedantic + 모든 lint group).
- `cargo test --workspace --no-fail-fast -j 1` — **1293 tests pass + 1 pre-existing flaky fail** (`kebab-mcp::tools_call_ask_multi_hop::ask_tool_routes_multi_hop_true_to_decompose_first`, HOTFIX candidate, cleanup 무관). 회귀 0.

Wire 영향: 없음.
Behavior 영향: 없음 (mechanical refactor only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 03:01:58 +00:00
00ffe9c792 feat(rag): fb-41 PR-9c-2 — pipeline integration + mock test + SKILL.md (★ NLI 실 활성화)
PR-9c-1 의 wire surface 위에 behavior 활성화 — `ask_multi_hop` 의 step 8.5 hook 가 `[rag] nli_threshold > 0` 일 때 NLI 검증 실 수행. **첫 user-visible behavior change** in PR-9.

- crates/kebab-rag/src/pipeline.rs:
  - ask_multi_hop step 8.5 NLI hook (empty answer 가드 + truncate_for_nli + verifier.score + verification field + refusal 분기).
  - refuse_nli_verification helper (verification: Some(...) + RefusalReason::NliVerificationFailed).
  - refuse_nli_model_unavailable helper (verification: None + RefusalReason::NliModelUnavailable).
  - truncate_for_nli helper (module-level pub fn, MAX_NLI_PREMISE_CHARS = 4 * 400 = 1600 chars 의 chars-based budget, _hypothesis 미사용 placeholder — v0.18.1 token-budget 갱신 candidate).
  - PR-9c-1 의 #[allow(dead_code)] 두 곳 제거 (verifier field + with_verifier builder; doc 의 transitional sentence 도 정리). round-1 PR-9c-1 review N1 carry-forward closure.
- crates/kebab-app/src/app.rs:
  - App::open_with_config 의 NliVerifier construction — config.rag.nli_threshold > 0 → OnnxNliVerifier::new + Arc::new wrap + 후속 RagPipeline 초기화 시 with_verifier 호출. 실패 시 ? 전파 (시그니처 Result<Self> 그대로 — caller cascading 0).
  - kebab-app/Cargo.toml 에 kebab-nli path 의존 추가.
- crates/kebab-rag/tests/multi_hop.rs + tests/common/mod.rs:
  - MockNliVerifier (pass / fail / err 생성자 + score call_count instrumented).
  - multi_hop_nli_pass_keeps_grounded — entailment 0.9 → grounded=true, verification.nli_passed=true.
  - multi_hop_nli_fail_refuses — entailment 0.1 → refusal=NliVerificationFailed.
  - multi_hop_nli_disabled_skip_verify — threshold 0.0 → verify skip, verification=None.
  - multi_hop_nli_model_unavailable_refuses — verifier Err → refusal=NliModelUnavailable.
  - multi_hop_truncate_for_nli_preserves_hypothesis — long premise truncation + hypothesis 보전.
- integrations/claude-code/kebab/SKILL.md: mcp__kebab__ask 절에 NLI 안내 한 단락 (verification.nli_passed 의미 + threshold tuning + nli_verification_failed/nli_model_unavailable refusal handling).

검증: cargo test --workspace -j 1 — 5 신규 multi-hop pass + 회귀 0 (pre-existing kebab-mcp::tools_call_ask_multi_hop 동일 flaky). cargo clippy --workspace --all-targets -j 1 -- -D warnings clean.
Wire 영향: PR-9c-1 의 schema 변경에 *behavior wiring* — answer.v1.verification field 가 multi-hop happy path + refuse path 양쪽에서 채움.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 00:55:02 +00:00
cf35f36f88 feat(rag): fb-41 PR-2 — RagPipeline::ask_multi_hop skeleton (fixed depth=2)
PR-2 of fb-41 multi-hop RAG. Decompose + retrieve + synthesize 3-stage
pipeline가 `opts.multi_hop=true` 일 때 dispatch. Dynamic decide loop
는 PR-3.

- `AskOpts.multi_hop: bool` 필드 추가 + `impl Default for AskOpts`
  도입 (HOTFIXES 2026-05-07 의 known limitation 해소). 9 explicit
  init site 모두 `multi_hop: false` 추가 — Default 도입으로 향후
  `..Default::default()` 점진 migrate 가능.
- `RagPipeline::ask` 의 entry 에 dispatcher 한 줄
  (`if opts.multi_hop { return self.ask_multi_hop(...) }`).
- `RagPipeline::ask_multi_hop` 신규 method. 1) decompose LLM call
  → JSON array of strings parse, 2) 각 sub-query 로 retrieve +
  chunk_id dedup pool, 3) score gate / no-chunks 가드, 4)
  pack_context (single-pass 와 helper 공유), 5) synthesize LLM
  call w/ MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT, 6) citation extract
  + Answer build. `prompt_template_version` = "rag-multi-hop-v1"
  로 stamp — eval `compare` 가 single-pass vs multi-hop 분리.
- Prompt const 신규: MULTI_HOP_DECOMPOSE_SYSTEM_PROMPT +
  MULTI_HOP_DECOMPOSE_USER_TEMPLATE + MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT
  + PROMPT_TEMPLATE_VERSION_MULTI_HOP + MULTI_HOP_MAX_SUB_QUERIES_DEFAULT.
- `kebab_core::RefusalReason::MultiHopDecomposeFailed` variant 신규.
  Cascade: kebab-store-sqlite `refusal_reason_label` + kebab-tui `ask
  refusal render` exhaustive match 갱신.
- `parse_decompose_response` + `strip_markdown_json_fence` helper —
  markdown code fence (```json / ```) strip + JSON array of strings
  parse + trim + drop empty + cap at MULTI_HOP_MAX_SUB_QUERIES_DEFAULT.
  None 반환 시 caller 가 `MultiHopDecomposeFailed` refusal.

Tests (55 passing total, 8 신규):
- 6 unit (parse_decompose_response 의 bare array / fence variants /
  garbage / cap / trim 회귀 핀).
- 2 integration: `ask_multi_hop_dispatches_and_decompose_garbage_refuses`
  (decompose garbage → MultiHopDecomposeFailed + 정확히 1 LLM call) +
  `ask_with_multi_hop_false_keeps_single_pass_path` (회귀 핀, 기존
  caller 자동 backwards-compat).

Happy-path multi-hop (decompose 성공 → synthesize) 의 integration
test 는 ScriptedLm helper 가 PR-3 의 decide loop 와 함께 도입될
때 같이 추가. 현 `MockLanguageModel` 는 canned single response 라
2-LLM-call sequence 핀 불가.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 06:45:32 +00:00
13a3361ba2 docs(v0.17.0/PR-C): rustdoc — code_lang_breakdown / repo_breakdown 가
실제로 doc count 임을 명시 (PR #161 워커 리뷰 MEDIUM 반영)

JSON schema description 은 PR-C 본체에서 'code chunk count' →
'doc count' 로 정정했으나 Rust struct field 의 rustdoc 은 같은
오기재를 그대로 carry — Gemini round 2 가 JSON schema 만 봤고
rustdoc 은 miss. 워커 둘 다 동일 finding (MEDIUM).

implementation 변경 없음 — 의미가 doc count 였던 사실이 처음부터
일관. wording 만 맞춤.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 20:35:01 +00:00