kebab

Author	SHA1	Message	Date
altair823	35c987df1c	feat(app): log retention — keep_recent_runs + retention_days (Enhancement 4) LoggingCfg gains two fields with serde defaults: keep_recent_runs (default 100, top-N file retention) and retention_days (default 30, time-based retention for both ndjson files and the SQLite mirror). IngestLogWriter::open now runs cleanup_old_logs before creating a new ingest-*.ndjson — delete iff (idx >= keep_recent) OR (modified <= cutoff). ingest_with_config_opts also calls SqliteStore::prune_pdf_ocr_events(retention_days) at ingest start so the SQLite mirror tracks the same retention window. Backward compat (AC-9): both new fields use #[serde(default = ...)], so a pre-v0.20.x config with only [logging] ingest_log_enabled + ingest_log_dir parses unchanged. kebab init writes the new defaults automatically via Config::default() -> toml::to_string_pretty (AC-12). docs/SMOKE.md config example synced. Closure r1 F5: explicit OR-on-stale comment inside cleanup_old_logs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 06:17:47 +00:00
altair823	d9ec7b8dc3	feat(cli): kebab inspect ocr-stats + ocr-failures (Enhancement 3 + wire schema additive minor) Two new wire schemas land as additive minor: ocr_stats.v1 (corpus-wide aggregate — total_events, success_rate, p50/p90/p99/max_ms, by_engine, top-10 by_doc by failure count) and ocr_failures.v1 (per-doc or corpus-wide recent failures, with --doc-id + --limit). Both ship via new CLI subcommands `kebab inspect ocr-stats` / `inspect ocr-failures`. App gains four facade methods: inspect_ocr_stats / inspect_ocr_failures plus their _with_config companions — required by CLAUDE.md "the facade rule" so `--config <path>` is honored. The CLI dispatch arms thread cfg explicitly into the _with_config form. Runtime introspection emit (WIRE_SCHEMAS in schema.rs) gains two entries; the meta JSON Schema (schema.schema.json) is untouched because its wire.schemas is pattern-based, not enum-based. ingest_log::percentiles extended to (p50, p90, p99, max). p99 surfaces only via inspect ocr-stats; IngestSummary (round 1) stays 3-percentile. SKILL.md synced with the two new schemas (AC-13). Closure r2 G2 (facade _with_config pair) + G3 (runtime emit, not meta schema file) + closure r1 F4 (p99) resolved. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 06:13:08 +00:00
altair823	4e451c9f7c	feat(app): dual-write PDF OCR events to SQLite + ndjson (Enhancement 2 wiring) Pre-capture canonical.doc_id and Arc<SqliteStore> before the OCR emit_progress closure so both the ndjson file and the SQLite mirror carry the same doc_id for every event. File write is durable (errors propagate); SQLite insert is non-critical (tracing::warn on failure, ingest does not abort) per spec R-1. LogEvent::Ocr gains a doc_id: Option<&str> field as an additive Serde change — round 1 ndjson logs deserialize with doc_id=None. Closure r1 F1: doc_id NULL in dual-write resolved via let doc_id_for_log = canonical.doc_id.0.clone() pre-capture. Closure r2 G1: Arc::clone(&app.sqlite) reused instead of opening a second SqliteStore — eliminates double-open lock contention and duplicate migration runs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 06:06:03 +00:00
altair823	685007789a	style: cargo fmt --all (round 4 ingest log feature follow-up) Phase C4 executor 의 마지막 `fix(test): clippy + fmt fixes` commit 이 test file 부분만 fmt 적용. workspace 전체 fmt 누락 발견 → cargo fmt --all 적용. 모든 import alphabetical reorder + line wrapping 정합. 추가 untracked artifact 동시 commit: - docs/superpowers/specs/2026-05-28-v0.20-ingest-log-spec.md (491 line, ACCEPT) - docs/superpowers/plans/2026-05-28-v0.20-ingest-log-plan.md (616 line, ACCEPT) workspace test: 1370 passed / 0 failed / 50 ignored, ingest_log_smoke green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 04:18:40 +00:00
altair823	445b096215	fix(test): clippy + fmt fixes for logging_roundtrip and ingest_log_smoke * kebab-config/tests/logging_roundtrip.rs: r#"..."# → plain string (clippy::unnecessary_hashes). * kebab-app/tests/ingest_log_smoke.rs: \|e\| e.ok() → Result::ok, \|s\| s.as_u64() → Value::as_u64 (clippy::redundant_closure). * cargo fmt --all applied to pre-existing formatting drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 03:26:42 +00:00
altair823	415227bf76	test(app): ingest_log_smoke integration test (AC-9) crates/kebab-app/tests/ingest_log_smoke.rs 신규: * ingest_log_smoke (AC-9): tempdir + 1 md + 1 scanned PDF → ingest → assert log file exists + 각 line valid JSON + 각 kind ∈ {ocr,parse_error,skip,error,summary} + last line kind=summary + scanned>0. * ingest_log_disabled_emits_no_file (AC-6): enabled=false 일 때 log_dir 안 ingest-*.ndjson 파일 0개 verify. fixture: ../kebab-parse-pdf/tests/fixtures/scanned_page1.pdf 재사용 (OCR disabled — Ollama 없이 smoke 실행). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 03:06:43 +00:00
altair823	9b44e27dfe	test(app): update schema_report assertion for streaming_ask=true (Bug #9 follow-up) schema_report_reflects_freshly_ingested_kb 가 `!streaming_ask` 를 assert 했으나 Bug #9 fix (`760eee8`) 로 streaming_ask 가 true 로 정정됨. assertion 을 반전. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 23:58:10 +00:00
altair823	d9c7aabce1	feat(schema): add active_parsers + active_chunkers arrays to schema.v1.models (Bug #13 ) 이전: schema.v1.models 가 parser_version / chunker_version 단일 값만 보고 → multi-medium corpus (md + pdf + code Rust/Python + dockerfile + k8s + manifest) 의 version cascade audit 누락 risk. 이후: additive minor — Models struct 에 active_parsers + active_chunkers Vec<String> 추가. backward compat: 기존 단일 field 보존 (markdown default), 신규 array 는 optional (#[serde(default)] + JSON schema required 미포함). source: - kebab_store_sqlite::fetch_distinct_parser_versions() 가 documents.parser_version DISTINCT + ORDER BY 반환. - fetch_distinct_chunker_versions() 가 chunks.chunker_version 동일 pattern. - collect_models 가 매 schema 호출마다 재계산 (cache 없음 — R-3 자동 해결). wire schema additive only — 메이저 bump 불필요. v0.20.1 minor 로 충분. integrations/claude-code/kebab/SKILL.md 동기 갱신. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 23:15:58 +00:00
altair823	241ded59df	test(app): multi-scanned PDF chunk_id collision-free integration test (Bug #3 regression) v0.20.0 sub-item 1 bugfix Step 3 (Group C) — integration-level regression for Bug #3 (intra-doc chunk_id collision under aggressive overlap). - `crates/kebab-app/tests/common/mod.rs`: `pub mod mock_ocr;` 1 line append. - `crates/kebab-app/tests/common/mock_ocr.rs` (new): MockOcrEngine lift + `single` / `per_page` ctor (backward-compat single + per-page cursor). - `crates/kebab-app/tests/pdf_ocr_apply.rs`: inline MockOcrEngine 제거 + `mod common; use common::mock_ocr::MockOcrEngine;` import. 10 ctor call site migration (`MockOcrEngine { .. }` → `MockOcrEngine::single(...)`). - `crates/kebab-app/tests/multi_scanned_pdf_ingest_no_chunk_id_collision.rs` (new): F1 + F2 scanned PDF + Bug #3 trigger shape (10 char "가" + ". " + 500 char "나") via mock OCR. assertion: chunk_id global uniqueness (HashSet dedup) across F1 + F2; F2 trigger text produces ≥2 chunks (collision shape). - C1 decision: Option A (share via tests/common/mock_ocr.rs). Facade mock injection unavailable (OllamaVisionOcr hardcoded) — helper-level chain test (apply_ocr_to_pdf_pages → PdfPageV1Chunker) adds value beyond unit B5. spec: docs/superpowers/specs/2026-05-27-v0.20-sub1-bugfix-spec.md (§4.5) plan: docs/superpowers/plans/2026-05-27-v0.20-sub1-bugfix-plan.md (Step 3) prior: `436fd01` (Step 2 Bug #3 chunk_id fix) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 13:45:38 +00:00
altair823	436fd015a2	fix(chunk): chunk_id collision under aggressive overlap; bump pdf-page-v1 → pdf-page-v1.1 (Bug #3 ) v0.20.0 sub-item 1 dogfood report 의 Bug #3 (Critical). scanned_page2.pdf (1580 char OCR text) ingest 시 `chunks.chunk_id` PRIMARY KEY violation — `per_chunk_hash = #c{char_start}` 가 post-overlap `actual_start` 사용 + overlap walk floor 가 `prev_min` 으로 collapse → segment 1/2 동일 `#c0`. - `crates/kebab-chunk/src/pdf_page_v1.rs`: `chunk_page` returns 4-tuple (segment_start, actual_start, chunk_end, slice); caller `per_chunk_hash` suffix uses `segment_start` (pre-overlap boundary, strictly increasing) instead of `char_start` (post-overlap, may collapse to prev_min). - VERSION_LABEL `"pdf-page-v1"` → `"pdf-page-v1.1"` (design §9 cascade, explicit user-facing audit trail). `crates/kebab-app/tests/pdf_pipeline.rs: 168, 368` 의 hardcoded literal 도 v1.1 로 갱신. - module docs (`pdf_page_v1.rs:47-60`): workaround description 의 `#c{char_start}` reference 를 `#c{segment_start}` 로 갱신 + segment_start invariant 명문 + HOTFIXES.md cross-ref. - `pdf_page_v1.rs::tests`: `multi_chunk_page_with_aggressive_overlap_produces_unique_chunk_ids` regression pin (10 char "가" + ". " + 500 char "나" — multi-chunk + overlap walk collapse trigger). - `tasks/HOTFIXES.md`: 2026-05-27 entry (symptom F2 1580 char OCR, intra-doc collision root cause, second-iteration patch rationale). spec: docs/superpowers/specs/2026-05-27-v0.20-sub1-bugfix-spec.md (§4) plan: docs/superpowers/plans/2026-05-27-v0.20-sub1-bugfix-plan.md (Step 2) prior: `d9acda5` (Step 1 Bug #2 walker fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 13:32:09 +00:00
altair823	48197687b7	test(pdf): integration smoke (w/ search + cancel) + vector regression + alnum e2e (#[ignore]) for v0.20 sub-item 1 Step 9 (Group I) of v0.20.0 sub-item 1 (scanned PDF OCR) plan. I3 — crates/kebab-app/tests/ingest_pdf_ocr_smoke.rs (신규): - ingest_with_mock_ocr_yields_pdf_ocr_summary — `#[ignore]` real Ollama, ingest_with_config production path + IngestItem.pdf_ocr_pages verify. - ocr_text_indexed_and_searchable — `#[ignore]` real Ollama, app.search 의 OCR text indexed verify (§ Acceptance #2). - ingest_with_cancel_aborts_mid_pdf — production cancel chain (pre-set cancel=true + dummy endpoint, no panic/deadlock verify). I4 — crates/kebab-parse-pdf/tests/text_extractor_regression.rs (신규): - vector_pdf_extract_byte_identical_to_baseline — F4 mojibake.pdf 의 vector PDF path canonical 의 byte-identical 보존 (Step 1-8 모든 변경 전후 invariant). - baseline 신규 = tests/snapshots/vector_pdf_canonical.json (first run create). - normalize_provenance_timestamps inline helper (R-3 mitigation, workspace 전체 부재 — 신규 12-line). I5 — crates/kebab-parse-pdf/tests/ocr_e2e.rs (신규): - f1_alnum_accuracy_ge_85 / f2_alnum_accuracy_ge_70 — `#[ignore]` real Ollama qwen2.5vl:3b, § Acceptance §9 #3 의 implementation. - alnum metric = strsim::levenshtein (dev-dep 추가). - truth file copy from PoC scratch (page1.txt + page2-batchim.txt) → scanned_page1_truth.txt + scanned_page2_truth.txt. - kebab-parse-image dev-dep 추가 (OllamaVisionOcr::from_parts 호출용). parser isolation invariant 의 dev-dep exception (spec §3.1, dep graph baseline -e normal 보존). spec: docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md plan: docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 9 I3+I4+I5) prior: `c9e0594` (Step 8 CLI printer) contract: §9 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 10:10:58 +00:00
altair823	c9e05941c5	feat(cli): activate per-page PDF OCR progress printer + test(app): ingest_progress emit verify + spec(pdf-ocr): align §4.6.1 literal with option_A (ms/chars) Step 8 (Group H) of v0.20.0 sub-item 1 (scanned PDF OCR) plan + Step 7 reviewer concern fix (spec literal deviation). H1 — kebab-cli/src/progress.rs printer activation: - 구 no-op stub `IngestEvent::PdfOcr* { .. } => {}` (Step 6 placeholder) 를 사람-친화 stderr line printer 로 활성화. - spec §4.6.1 line 1085-1086 wording 그대로: - PdfOcrStarted → ` 📷 OCR page {page}...` - PdfOcrFinished (skipped=false) → ` ✓ OCR page {page} ({chars} chars, {ms}ms via {ocr_engine})` - PdfOcrFinished (skipped=true) → ` ⊘ OCR page {page} skipped (no DCTDecode or engine fail, {ms}ms)` (M-4 의 skipped field carry 활용) - `!quiet` gate 정합 (AssetStarted/Finished pattern mirror). H2 — crates/kebab-app/tests/ingest_progress.rs 의 새 test: - pdf_ocr_progress_emits_started_finished_events (real Ollama 의존, `#[ignore]`). - F1 fixture (scanned_page1.pdf) ingest 시 pdf_ocr_started + pdf_ocr_finished event 가 emit 됨을 verify. Started count == Finished count invariant. - Manual invoke: `KEBAB_PDF_OCR_ENABLED=true cargo test -p kebab-app --test ingest_progress --ignored`. - mock OcrEngine inject path 부재 (Step 6 의 eager build), Step 9 I5 의 ocr_e2e pattern (real Ollama + `#[ignore]`) 와 동일. Step 7 reviewer concern fix — spec §4.6.1 literal: - line 1076-1077 의 `ocr_ms` / `ocr_chars` literal 을 wire schema 의 실제 field name `ms` / `chars` (option_A, Rust serde 와 정합) 로 갱신. - line 1087 의 printer wording 도 `{ocr_chars}` / `{ocr_ms}` → `{chars}` / `{ms}`. - line 1556 의 rationale 참조 `pdf_ocr_finished.ocr_ms` → `.ms`. - `skipped` field 도 명시 (Step 6 reviewer M-4 결과). spec: docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (§4.6.1) plan: docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 8 H1+H2) prior: `4c5ccd5` (Step 7 wire schema) — Step 7 reviewer concern 1 의 fix contract: §9 (additive minor wire bump — Step 7 commit 에서 완료) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 09:18:49 +00:00
altair823	9f003ef1cd	feat(app): add pdf_ocr_apply helper (10 test, F7 split + cancel) — post-extract OCR enrichment for PDF (H-1 resolution) Step 4 (Group D) of v0.20.0 sub-item 1 (scanned PDF OCR) plan. D1 — `apply_ocr_to_pdf_pages(&mut canonical, &dyn OcrEngine, &bytes, &opts, emit_progress)` in `kebab-app::pdf_ocr_apply`. spec §4.1 line 381-599 body 그대로 + PdfOcrOpts.cancel field + per-page cancel check (verifier LOW L-1). post-extract enrichment pattern (H-1 resolution): kebab-parse-pdf 가 kebab-parse-image::OcrEngine 을 import 하지 않음 (parser isolation 보존). helper 가 kebab-app 의 facade 안 — both parser crate 의 cross-import 회피. Per-page decision matrix (spec §4.1 line 459-464): - always_on=true → 모든 page OCR (dual-block, ordinal = page-1 + page_count). - always_on=false + needs_ocr → in-place OCR (text-detect block mutate). - needs_ocr=false → skip. DCTDecode-only v1 (H-3): FlateDecode / CCITTFaxDecode page 는 extract_dctdecode_page_image=None → Warning event + skip + emit_progress(skipped=true). OcrEngine.recognize 실패 → Warning event + skip + emit_progress(skipped=true). D3 — per-page cancel handle (verifier LOW L-1 + spec §4.8 line 1159): PdfOcrOpts.cancel: Option<Arc<AtomicBool>>. set→true 시 `anyhow::bail!("PDF OCR cancelled mid-PDF at page N")`. lopdf = "0.32" added to [dependencies] (already transitive via kebab-parse-pdf; no new crate introduced — dep graph kebab-parse-* baseline unchanged). Integration test (`tests/pdf_ocr_apply.rs`, 10 test): - f1_input_with_ocr_enabled_replaces_empty_block — in-place mutate. - f3_input_with_ocr_enabled_keeps_text_detect_blocks — vector PDF skip. - f1_input_with_ocr_disabled_keeps_empty_block — disabled no-op. - f4_input_with_ocr_enabled_replaces_mojibake_block — mojibake → in-place mutate. - f3_input_with_always_on_pushes_dual_blocks — always_on dual-block. - f6_flatedecode_skipped_with_warning — FlateDecode skip + Warning event. - f7_ccittfax_skipped_with_warning — CCITTFax skip + Warning event (verifier M-4 split). - ocr_engine_failure_surfaces_as_warning — OCR failure → Warning event. - dual_block_ordinals_are_deterministic_and_unique — ordinal invariant. - cancel_handle_aborts_mid_pdf — cancel handle 의 production source (D3). MockOcrEngine fixture: spec §5.5 line 1284-1299. F3 fixture 부재 → mock CanonicalDocument construction + F1 bytes reuse pattern (Option B: PdfTextExtractor::extract 를 통한 실제 production path canonical 생성). spec: docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (§4.1 + §5.5) plan: docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 4 D1+D2+D3) prior: `c2cd3a7` (Step 3) + `8d81bc1` (Step 3 clippy fix) contract: §9 (additive minor wire bump — 후속 step) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 06:42:01 +00:00
altair823	7c27633df2	chore(rag): post-PR9 refactor — H1/H2/H3/D/E + test coverage + post-refactor dogfood retest OMC team `post-pr9-refactor` 의 architectural cleanup. architect priorities 분석 후 executor + test-engineer 가 file edits, system-architect 가 component-level review 로 pre-cut nothing — all v0.18.1+ defer 결론. ## Executor 작업 (H1/H2/H3/D/E) - H1 (kebab-nli/src/onnx.rs): `[models.nli]` config wire 활성화. `DEFAULT_MODEL_ID` const 제거 (kebab-config 의 NliCfg::defaults 가 single source). OnnxNliVerifier::new 가 config.models.nli.model 읽고 config.models.nli.provider 가 "onnx" 아니면 anyhow::bail. 3 stale "PR-9c-1 will wire this" 코멘트 제거. 2 unit test 추가 (`new_uses_config_model_id`, `new_rejects_unsupported_provider`). - H2 (kebab-rag/src/pipeline.rs): `truncate_for_nli(premise: &str, _hypothesis: &str)` → `truncate_for_nli(premise: &str)`. v0.18.1 placeholder doc 제거. 4 callsite (tests/multi_hop.rs) 갱신 + test rename `multi_hop_truncate_for_nli_preserves_hypothesis` → `multi_hop_truncate_for_nli_char_budget` (contract 정합). - H3 (kebab-rag/src/pipeline.rs:1041): `was_truncated` 가 tracing::debug! 으로 surface (observability 추가, signature 보존 — caller logging contract). - D (kebab-mcp/tests/tools_call_ask_multi_hop.rs): request_timeout_secs 2 → 5 (slow CI 안정성), `mh_code` discriminator 제거. dispatch contract = `mh.is_error.unwrap_or(false)` (기존 assertion 으로 충분). - E (tasks/HOTFIXES.md + pipeline.rs:1633-1638): fb-41 PR-9 closure entry 의 sibling 으로 "### PR-9 NLI refusal: terminal Synthesize hop omitted from hops trace" subsection 추가. pipeline 의 "cleanup deferred to a follow-up" → "// See tasks/HOTFIXES.md ... for follow-up" cross-link. ## Test-engineer 작업 (T1/T2/T3/T4, 9 new tests) - T1 (kebab-nli/src/onnx.rs::tests): sanitize_model_id 3 unit (replaces_slash / idempotent / leaves_other_chars). - T2 (kebab-rag/tests/multi_hop_nli_panic.rs 신규): 2 panic-path tests — facade invariant (`expect("verifier must be Some when nli_threshold > 0.0")`) 의 #[should_panic] + threshold=0 의 companion. - T3 (kebab-rag/tests/multi_hop_nli_stream.rs 신규): 2 StreamEvent::Final tests — refuse_nli_verification + refuse_nli_model_unavailable 의 stream_sink Final 분기 wire shape pinning. - T4 (kebab-app/tests/open_with_config_nli.rs 신규): 2 NLI failure path — model_dir 가 unwritable 일 때 App::open_with_config 의 Result<App> Err (with "OnnxNliVerifier" in chain) + threshold=0 일 때 graceful skip. ## System-architect 결론 3 lenses (absorption / duplication / under-engineered interface) 분석 결과 — pre-cut nothing. Top-3 items 모두 v0.18.1+ defer: - Lens 1: kebab-normalize + kebab-parse-types 흡수 가능 (parse-md 만 사용, 5 parsers 우회) → v0.18.1+. - Lens 3: Extractor + Chunker trait 의 dead polymorphism (모든 callsite 가 hardcoded) → v0.18.1+. - Lens 1 bundled: kebab-source-fs 가 kebab-parse-code 의 9 tree-sitter grammars drag → low-risk dep-graph win, v0.18.1+ bundled. - Defer-with-intent: LanguageModel async refactor (cloud-LLM 시), NliVerifier::score_batch + typed NliError (2nd impl 시), compute_stale → kebab-core::stale. 보고서: /build/cache/tmp/post-pr9-refactor-priorities.md, /build/cache/tmp/system-architecture-priorities.md (둘 다 repo 외 — analysis 보존). ## 검증 - cargo test -p kebab-nli -j 1 → 11/11 pass. - cargo test -p kebab-rag -j 1 → 75/75 pass (5 NLI multi-hop + 4 신규 T2/T3 포함). - cargo test -p kebab-app -j 1 → 23 pass + 2 ignored (T4 의 2 포함). - cargo test -p kebab-mcp --test tools_call_ask_multi_hop -j 1 → 1 pass + 1 pre-existing flaky (HOTFIX #15, no_chunks short-circuit, executor D fix 와 무관 — line 86 의 base assertion 이 fixture 없어서 fail). - cargo clippy --workspace --all-targets -j 1 -- -D warnings clean. - cargo test --workspace --no-fail-fast -j 1 → 1304 passed (+11 new) + 1 pre-existing flaky 동일. - Post-refactor dogfood retest byte-identical (PR-9d / post-cleanup / post-refactor 3번 모두): S7 0.0035389824770390987, S1 0.058334656059741974, S10 0.0027875436935573816, S3 nli_model_unavailable. docs/dogfood/v0.18.0/SUMMARY.md 에 "Post-architectural-refactor retest" section 추가. Wire 영향: 없음. Behavior 영향: 없음 (H1 의 config wiring 가 default 와 같은 model → byte-identical). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 04:42:37 +00:00
altair823	7c85de065a	chore: workspace-wide cleanup — clippy::pedantic baseline + auto-fix cut PR v0.18.0 전 마지막 정리. 사용자 요청: "전체 코드베이스를 깔끔하고 알아보기 쉽게". ## Workspace lints - `Cargo.toml` 의 `[workspace.lints.clippy]` 에 `pedantic = "warn"` (priority -1) + 의도적 allow-list 추가: - cast_possible_truncation / cast_possible_wrap / cast_sign_loss / cast_precision_loss — ONNX i64 / hash modular reduction 등 의도적 truncation. - doc_markdown / missing_errors_doc / missing_panics_doc — cosmetic doc style. - too_many_lines / module_name_repetitions / must_use_candidate / needless_pass_by_value / manual_let_else / items_after_statements / similar_names — informational only. - format_collect / match_wildcard_for_single_variants / trivially_copy_pass_by_ref / unnecessary_wraps — intentional patterns (exhaustive match, future Result variants 등). - default_trait_access — `Foo::default()` 가 idiomatic. - float_cmp — NLI / RRF score 의 explicit threshold 비교 의도. - struct_excessive_bools / case_sensitive_file_extension_comparisons / naive_bytecount / ignore_without_reason — domain-specific 의도. - format_push_string / return_self_not_must_use / match_same_arms — builder / wire-label / hot-path 패턴 보존. - needless_continue / used_underscore_binding / nonminimal_bool / unreadable_literal / many_single_char_names / doc_link_with_quotes / assigning_clones / collapsible_str_replace / trivial_regex / elidable_lifetime_names / range_plus_one / explicit_iter_loop / implicit_hasher / ref_option — remaining low-value style. - 각 24 crate `Cargo.toml` 에 `[lints] workspace = true` 추가. ## Auto-fix `cargo clippy --workspace --all-targets --fix` 적용 — 128 files changed, 552 insertions / 472 deletions. 주로: - uninlined_format_args (~18): `format!("{}", x)` → `format!("{x}")`. - redundant_closure_for_method_calls (~33): `.map(\|x\| x.foo())` → `.map(T::foo)`. - 그 외 mechanical refactor. ## 검증 - `cargo clippy --workspace --all-targets -j 1 -- -D warnings` clean (pedantic + 모든 lint group). - `cargo test --workspace --no-fail-fast -j 1` — 1293 tests pass + 1 pre-existing flaky fail (`kebab-mcp::tools_call_ask_multi_hop::ask_tool_routes_multi_hop_true_to_decompose_first`, HOTFIX candidate, cleanup 무관). 회귀 0. Wire 영향: 없음. Behavior 영향: 없음 (mechanical refactor only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 03:01:58 +00:00
altair823	cf35f36f88	feat(rag): fb-41 PR-2 — RagPipeline::ask_multi_hop skeleton (fixed depth=2) PR-2 of fb-41 multi-hop RAG. Decompose + retrieve + synthesize 3-stage pipeline가 `opts.multi_hop=true` 일 때 dispatch. Dynamic decide loop 는 PR-3. - `AskOpts.multi_hop: bool` 필드 추가 + `impl Default for AskOpts` 도입 (HOTFIXES 2026-05-07 의 known limitation 해소). 9 explicit init site 모두 `multi_hop: false` 추가 — Default 도입으로 향후 `..Default::default()` 점진 migrate 가능. - `RagPipeline::ask` 의 entry 에 dispatcher 한 줄 (`if opts.multi_hop { return self.ask_multi_hop(...) }`). - `RagPipeline::ask_multi_hop` 신규 method. 1) decompose LLM call → JSON array of strings parse, 2) 각 sub-query 로 retrieve + chunk_id dedup pool, 3) score gate / no-chunks 가드, 4) pack_context (single-pass 와 helper 공유), 5) synthesize LLM call w/ MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT, 6) citation extract + Answer build. `prompt_template_version` = "rag-multi-hop-v1" 로 stamp — eval `compare` 가 single-pass vs multi-hop 분리. - Prompt const 신규: MULTI_HOP_DECOMPOSE_SYSTEM_PROMPT + MULTI_HOP_DECOMPOSE_USER_TEMPLATE + MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT + PROMPT_TEMPLATE_VERSION_MULTI_HOP + MULTI_HOP_MAX_SUB_QUERIES_DEFAULT. - `kebab_core::RefusalReason::MultiHopDecomposeFailed` variant 신규. Cascade: kebab-store-sqlite `refusal_reason_label` + kebab-tui `ask refusal render` exhaustive match 갱신. - `parse_decompose_response` + `strip_markdown_json_fence` helper — markdown code fence (```json / ```) strip + JSON array of strings parse + trim + drop empty + cap at MULTI_HOP_MAX_SUB_QUERIES_DEFAULT. None 반환 시 caller 가 `MultiHopDecomposeFailed` refusal. Tests (55 passing total, 8 신규): - 6 unit (parse_decompose_response 의 bare array / fence variants / garbage / cap / trim 회귀 핀). - 2 integration: `ask_multi_hop_dispatches_and_decompose_garbage_refuses` (decompose garbage → MultiHopDecomposeFailed + 정확히 1 LLM call) + `ask_with_multi_hop_false_keeps_single_pass_path` (회귀 핀, 기존 caller 자동 backwards-compat). Happy-path multi-hop (decompose 성공 → synthesize) 의 integration test 는 ScriptedLm helper 가 PR-3 의 decide loop 와 함께 도입될 때 같이 추가. 현 `MockLanguageModel` 는 canned single response 라 2-LLM-call sequence 핀 불가. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 06:45:32 +00:00
altair823	93ddece111	feat(v0.17.0/PR-B/B1): C typedef extractor + parser_version bump + orphan purge cascade closure of HOTFIXES 2026-05-21. C typedef-wrapped anonymous struct/enum/union 이 typedef alias 이름으로 symbol unit 방출. - crates/kebab-parse-code/src/c.rs: type_definition 분기 추가. inner anonymous struct_specifier / enum_specifier / union_specifier 탐지 → declarator field 의 type_identifier 재귀 추출 → synthetic unit (typedef alias). named inner aggregate / plain alias 는 기존대로 glue. PARSER_VERSION code-c-v1 → code-c-v2. recover_typedef_alias + extract_typedef_alias_name helper 추가. - crates/kebab-store-sqlite/src/store.rs: 두 helper 신규 (parser_version bump cascade 용 doc-id 기반 orphan purge). - stale_chunk_ids_for_workspace_path_except_doc_id(workspace_path, keep_doc_id) — sister of stale_chunk_ids_at, doc_id 기반. - purge_document_at_workspace_path_except_doc_id(workspace_path, keep_doc_id) — CASCADE document/chunks 제거, assets 보존. keep_doc_id="" 가 "모든 doc 제거" 사용. - crates/kebab-app/src/lib.rs: try_skip_unchanged 의 parser_mismatch 분기에서 purge_workspace_path_for_parser_bump 호출. helper 가 app.vector() 로 lazy 접근 + delete_by_chunk_ids + SQLite document row 제거. Ok(None) 반환 전 cleanup 끝나서 caller 의 새 INSERT 시 idx_docs_workspace_path UNIQUE 충돌 회피. - tests: - c.rs unit tests 4 신규 — typedef_struct_emits_unit / typedef_enum_emits_unit / typedef_union_emits_unit / typedef_to_existing_type_stays_glue (negative). - tier1_c_ingest_searchable: parser_version assertion code-c-v1 → code-c-v2. - 회귀: bytes-edit 경로 (asset_id 변경) 의 기존 purge_orphan_at_workspace_path + purge_vector_orphans_for_workspace_path 는 그대로 — 신규 분기와 공존, 기존 test 모두 PASS. 미해결 (Risks): nested typedef (typedef struct { struct {...} inner; } Outer;) 의 inner 익명 struct 는 여전히 glue — v2 의 1차 범위는 top-level typedef alias 만. cargo test --workspace --no-fail-fast -j 1 + clippy 통과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 20:30:57 +00:00
altair823	0ee18149e7	test(v0.17.0/A5 follow-up): trigram tokenizer downstream test fixes trigram tokenizer 가 snippet 단위 + 단어 경계 + BM25 raw score 분포를 모두 바꿔서 unicode61 assumption 기반의 3 test 가 regression. - wire_search_response::search_json_truncates_with_max_tokens + search_plain_emits_truncated_hint_to_stderr: 단일 doc + 작은 max_tokens 로는 snippet 이 짧아서 budget loop 가 trip 안 함. 다중 doc fixture (5 doc) + budget 30 token 으로 hit-pop 경로 통해 truncated=true 보장. - fetch_integration::fetch_chunk_with_context_returns_neighbors: fixture body 의 2-char tokens (A1/A3 등) 가 trigram 비호환으로 0-hit. apples/banana/cherry/durian/elder 5-char unique words 로 갱신, query 도 cherry 로 deterministic pin. - eval/runner::runner_per_query_snapshot_matches_fixture: trigram token stream 으로 BM25 raw score 변동. UPDATE_SNAPSHOTS=1 로 regenerate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 12:21:34 +00:00
altair823	6ac7fea7b9	feat(v0.17.0/A5): trigram-aware build_match_string + SearchResponse.hint PR-A 본체. plan Task A4 Step 1c + A5. - lexical.rs::build_match_string 재설계: whole-phrase + token-AND OR-combined, 3자 미만 토큰 drop, 후보 없음 시 None (빈 MATCH 회피). raw single-quote mode 유지. - SearchResponse.hint additive — empty result + trimmed < 3 chars + non-raw 케이스에 short_query_hint helper 가 set. - CLI 'kebab search' 가 [hint] stderr 한 줄 (text mode). - TUI SearchState.short_query_hint + poll_worker stale-aware set + fire_search/mark_input_changed reset + dynamic_status 표시. - docs/wire-schema/v1/search_response.schema.json hint additive. - 신규 unit tests (lexical 9 PASS, 기존 2 expectation 갱신) + 통합 회귀 (search_korean: multi_token + mixed, 3 PASS) + BM25 snapshot regen (trigram token stream). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 11:54:25 +00:00
altair823	1969c8e3b5	fix(dogfood): k8s multi-resource YAML chunk_id collision P10 dogfooding found that a k8s manifest with 2+ documents (e.g. Deployment + Service in one file) fails to ingest: UNIQUE constraint failed: chunks.chunk_id Root cause: tier2_shared::push_chunks_with_oversize's non-oversize branch hardcoded split_key = None. K8sManifestResourceV1Chunker calls it once per resource; with split_key None every resource from the same document gets the same id_hash (= base_policy_hash) → identical chunk_id. p10-3's code_text_paragraph_v1 had the same bug (fixed in `df3c5b8`) but it calls build_chunk_no_symbol directly — the push_chunks_with_oversize path was never fixed. Fix: push_chunks_with_oversize gains a base_split_key parameter for the non-oversize single-chunk case. k8s chunker passes Some(resource.line_start) so each resource gets a distinct chunk_id; dockerfile / manifest pass None (1 chunk per file — no sibling collision, chunk_id stays stable). Regression coverage: k8s_multi_doc_emits_one_chunk_per_resource now asserts chunk_id distinctness; new integration test tier2_k8s_multi_resource_yaml_ingests_without_collision ingests a real 2-document YAML end-to-end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 23:49:37 +00:00
altair823	192835e5bf	test(p10-1d): integration smoke tests for C + C++ Verifies end-to-end ingest + search + Citation::Code shape: - tier1_c_ingest_searchable: .c file → --code-lang c search → symbol = function name (no nesting), lang = "c", chunker_version = "code-c-ast-v1". - tier1_cpp_ingest_searchable: .cpp file → --code-lang cpp search → symbol starts with namespace::Class prefix, lang = "cpp", chunker_version = "code-cpp-ast-v1". Brings code_ingest_smoke to 18 tests (Tier 1: 9 → 11, Tier 2: 3, Tier 3: 4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 14:31:35 +00:00
altair823	1034de25a2	fix(p10-3+p10-1d): land the missing try_skip_unchanged fallback-aware fix PR #155 (p10-3) merged WITHOUT the reviewer's required Option B1 fix — the implementer reported a commit SHA (2a39513) that never made it to main. Result: every reingest of a Tier 3-fallback file (non-k8s YAML, invalid YAML, AST extractor failure) re-runs full extract + chunk + embed because the parser/chunker version comparison can never match (stored is code-text-paragraph-v1 / none-v1, but caller uses Tier 1/2 dispatch values). This commit: 1. Adds the 7th param `fallback_chunker_version: Option<&ChunkerVersion>` to try_skip_unchanged + the stored_is_tier3_fallback detection branch (skip parser/chunker equality, keep embedder check). 2. Threads `None` through non-code call sites (md / image / pdf). 3. Code call site computes tier3_fallback_cv covering all Tier 1/2 langs that can fall back: rust / python / ts / js / go / java / kotlin / yaml / dockerfile / toml / json / xml / groovy / go-mod / c / cpp (p10-1D additions). 4. Adds tier3_yaml_fallback_reingest_is_unchanged + tier3_shell_reingest_is_unchanged regression tests (the originally-promised PR #155 regression coverage that also never made it to main). Smoke tests: 14 + 2 = 16 PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 14:19:17 +00:00
altair823	df3c5b8caf	test(p10-3): integration smoke tests for Tier 3 (shell + yaml fallback) Two new tests verify end-to-end Tier 3 wiring: - tier3_shell_ingest_searchable: .sh file → --code-lang shell search → Citation::Code { symbol: None, lang: "shell" }, chunker_version "code-text-paragraph-v1". - tier3_yaml_fallback_picks_up_non_k8s_yaml: docker-compose-shaped yaml (no apiVersion/kind) triggers k8s chunker's Ok(vec![]) result, fallback retries with Tier 3 → Citation::Code { symbol: None, lang: "yaml" } and chunker_version "code-text-paragraph-v1". Also fixes a bug in CodeTextParagraphV1Chunker (Task B): short paragraphs (≤80 lines) were emitted with split_key=None, causing all paragraphs from the same document to share the same chunk_id (UNIQUE constraint violation at put_chunks). Fix: always use para.line_start as split_key so every paragraph gets a distinct id regardless of size. Brings code_ingest_smoke to 14 tests (Tier 1: 9, Tier 2: 3, Tier 3: 2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 11:37:44 +00:00
altair823	166e1ddfaf	test(p10-2): integration smoke tests for Tier 2 (k8s yaml + Dockerfile + Cargo.toml) Three new tests in code_ingest_smoke.rs verifying isolated-TempDir ingest + --code-lang filter + Citation::Code.lang / .symbol shape for each Tier 2 chunker. Brings the suite to 12 tests (Rust 3 + Python 1 + TS 1 + JS 1 + Go 1 + Java 1 + Kotlin 1 + yaml 1 + dockerfile 1 + manifest 1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 13:23:01 +00:00
altair823	ff1bedbef5	feat(p10-1c-jk): activate Kotlin in ingest_one_code_asset dispatch Replaces Kotlin bail! arms with KotlinAstExtractor + CodeKotlinAstV1Chunker. Adds kotlin_file_ingests_and_searches_as_code_citation integration test — asserts citation.lang=kotlin, symbol=com.foo.Foo.bar, code_lang=kotlin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 10:54:55 +00:00
altair823	ebc4ef2eea	feat(p10-1c-jk): activate Java in ingest_one_code_asset dispatch Replaces Java bail! arms with JavaAstExtractor + CodeJavaAstV1Chunker. Adds java_file_ingests_and_searches_as_code_citation integration test — asserts citation.lang=java, symbol=com.foo.Foo.bar, code_lang=java. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 10:44:05 +00:00
altair823	c19aa006d0	feat(p10-1c-go): activate Go in ingest_one_code_asset dispatch Replaces Go bail! arms with GoAstExtractor + CodeGoAstV1Chunker. Adds go_file_ingests_and_searches_as_code_citation integration test — asserts citation.lang=go, symbol=chunk.ParseDoc, code_lang=go. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 09:13:47 +00:00
altair823	453ec15df4	fix(dogfood): document-centric fetch_span + remove get_asset_by_workspace_path assets.workspace_path is INTENTIONALLY 'last-registered path' for twin files (identical content at different paths share one asset row PK'd by blake3 content hash). PR #146 made try_skip_unchanged document-centric; PR #149 made reset --orphans-only document-centric; this PR removes the last caller of get_asset_by_workspace_path (fetch.rs:193 in fetch_span, which used it to reject PDF/audio media — for twins this could read the wrong asset's media_type and pick the wrong branch). Replaced with the natural 2-step lookup: get_document_by_workspace_path (PR #146) → doc.source_asset_id → get_asset (NEW trait method, asset_id is PRIMARY KEY so flip-flop-immune by construction). Then removed get_asset_by_workspace_path trait method + SqliteStore impl — 0 callers after the refactor. UPSERT doc-comment refreshed in store.rs to make the 'last-registered' semantics explicit so future readers don't try to 'fix' the flip-flop. Dogfood follow-up (PR #142 1B + multi-root corpus). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 08:03:38 +00:00
altair823	5f2bd9e97e	feat(dogfood): kebab reset --orphans-only — purge stored docs outside walker scope PR #148 auto-purges only filesystem-missing files (conservative — leaves on-disk-but-out-of-scope docs alone for data safety). This is the explicit complement: when the user has narrowed include / widened exclude / removed a sub-directory from the workspace and WANTS the stored docs reconciled, they invoke 'kebab reset --orphans-only'. Confirm prompt with orphan count + sample paths; --yes required in non-TTY. SQLite purge via existing purge_deleted_workspace_path (PR #148) + vector store delete_by_chunk_ids when configured. No fs existence check — orphans-only is the explicit 'I know what I'm doing' variant. dogfood follow-up to PR #148 (file deletion auto-purge). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 07:38:10 +00:00
altair823	27baec82ea	fix(dogfood): auto-purge stored docs for filesystem-deleted files Files deleted from disk (rm a.md) were leaving stale documents + chunks + embeddings in the store, surfacing as ghost citations in search/ask. Existing purge_orphan_at_workspace_path only handled content-changed stale (WHERE workspace_path=? AND asset_id != ?) — file deletion has no new asset_id. Fix: post-walker-scan sweep. Compute (stored_paths - scanned_paths), for each candidate check filesystem existence — only purge when the file is TRULY missing. Scope-narrowing case (file on disk but outside include glob) is explicitly NOT purged to protect users from accidental data loss via config edits. Adds: - DocumentStore::all_workspace_paths trait method + SqliteStore impl - purge_deleted_workspace_path in store-sqlite (returns chunk_ids for vector delete; deletes doc CASCADE + asset row + copied storage file) - sweep_deleted_files in kebab-app::ingest path; called once per ingest before the per-asset loop - IngestReport.purged_deleted_files counter (additive, serde default) - CLI ingest summary mentions purge count when > 0 - 2 integration tests: file_deletion_auto_purge + include_scope_narrowing_does_NOT_purge dogfood discovery (PR #142 1B + multi-root: kebab-docs + httpx + zod + lodash). Per user decision: only filesystem deletion auto-purges; scope narrowing requires explicit kebab reset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 06:51:07 +00:00
altair823	641b92af7d	fix(dogfood): document-centric try_skip_unchanged for twin-file idempotency Identical-content files at different workspace paths share one assets row (assets.asset_id = blake3 content hash, PRIMARY KEY). The UPSERT `ON CONFLICT(asset_id) DO UPDATE SET workspace_path = excluded` made twin files overwrite each other's workspace_path on every ingest, so `get_asset_by_workspace_path(path1)` returned the OTHER twin's row (or None) — break idempotent unchanged-detection for both files. Fix: switch try_skip_unchanged to document-centric lookup. `documents. workspace_path` is already UNIQUE (V001) and `id_for_doc(path, ...)` includes path, so each twin has its own stable document row. Compare `doc.source_asset_id` with the new asset's checksum instead of going through the assets table. Dogfood (multi-root: kebab-docs + httpx + zod + lodash) showed 27 of 726 docs marked Updated on every idempotent re-ingest — all 27 are twin-file victims (empty `__init__.py` ×3, AGENTS.md ↔ CLAUDE.md same content, duplicate logo PDFs/JPGs). After: re-ingest reports 0 new / 0 updated / 726 unchanged. No schema migration needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 05:27:21 +00:00
altair823	d53995a6d4	feat(p10-1b): code-js-ast-v1 chunker + activate JavaScript in app dispatch Chunker: duplicate-with-substitution from code-ts-ast-v1 / code-rust-ast-v1. Dispatch: replaces JS bail! arms with JavascriptAstExtractor + CodeJsAstV1Chunker. Integration test javascript_file_ingests_and_searches_as_code_citation asserts citation.lang=javascript, symbol=src/Bar.Bar.baz, code_lang=javascript. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 01:16:07 +00:00
altair823	31245a4328	fix(p10-1b): TS parser_version code-typescript-v1 → code-ts-v1 (naming consistency) Task H implementer chose code-typescript-v1 but plan + design §3.3 use the short form (chunker is code-ts-ast-v1 / code-js-ast-v1). Aligning parser versions to match: rust=code-rust-v1 / python=code-python-v1 / ts=code-ts-v1 / js=code-js-v1 (Task K). Fixes 2 sites: const PARSER_VERSION + integration test assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 01:05:17 +00:00
altair823	acb61b6830	feat(p10-1b): activate TypeScript in ingest_one_code_asset dispatch Replaces TS bail! arms with TypescriptAstExtractor + CodeTsAstV1Chunker. Adds typescript_file_ingests_and_searches_as_code_citation integration test — asserts citation.lang=typescript, symbol=src/Foo.Foo.bar, code_lang=typescript. JS arms remain bail!() (Task L). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 00:59:41 +00:00
altair823	1815091247	feat(p10-1b): activate Python in ingest_one_code_asset dispatch Replaces Python bail! arms with PythonAstExtractor + CodePythonAstV1Chunker. Adds python_file_ingests_and_searches_as_code_citation integration test — asserts citation.lang=python, symbol=kebab_eval.metrics.compute_mrr, code_lang=python. TS/JS arms remain bail!() (Tasks J/L). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 00:49:01 +00:00
altair823	b5d1fe8c1e	feat(p10-1a-2): backfill SearchHit.repo from doc metadata (Task 8b) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 21:13:01 +00:00
altair823	580576c2c6	feat(p10-1a-2): wire code ingest dispatch (ingest_one_code_asset) Add `MediaType::Code("rust")` dispatch arm in `ingest_one_asset`, `ingest_one_code_asset` fn (faithful mirror of `ingest_one_pdf_asset`), and `backfill_code_lang` post-processing in `App::search_uncached`. Integration test `code_ingest_smoke.rs` verifies full pipeline: ingest `.rs` → Citation::Code hit with lang/symbol/line_start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 20:14:59 +00:00
th-kim0823	231d80e82d	feat(stats): media/lang/bytes/stale fields on schema.v1.stats (fb-37) Extends CountSummary with media_breakdown, lang_breakdown, stale_doc_count fields populated via stats_ext::breakdowns(). Adds count_summary_with_threshold for callers that need real stale counts. Mirrors all new fields onto the wire-bound Stats struct in kebab-app::schema with #[serde(default)] for backwards-compat. Also fixes search_budget_integration.rs for the trace field added to SearchOpts in Task 1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 12:34:57 +09:00
th-kim0823	b86b763dfb	fix(fb-35): address PR #126 round 2 review - wire schema: relax effective_end.minimum 1 → 0 + expand description to cover line-clamp + out-of-range sentinel (panic-fix R1 emits Some(0) when line_start=1 and range is beyond doc end — schema must accept it) - tests: tighten first-chunk-target boundary test to assert ≤ 2 total neighbors (3-chunk doc, N=2). Strict "first chunk → context_before empty" not assertable until chunks.ordinal column lands (R1 #9 architectural caveat) - store: trim contradiction in list_chunk_ids_for_doc warning comment — drop "good enough for sequentially chunked markdown" phrase that conflicts with "hash sort dominates" paragraph above Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 00:55:29 +09:00
th-kim0823	7dddc1d706	fix(fb-35): address PR #126 round 1 review - fetch_span: panic-fix on line_start > total / empty doc (return empty text + effective_end = line_start - 1 instead of out-of-bounds slice) - truncated: reserved for budget-driven truncation only; line range clamp signaled via effective_end < line_end - spec / SKILL.md / README: align rejection wording to "PDF / audio" (matches code; Image OCR allowed for span) - store: warning comment on list_chunk_ids_for_doc — chunk_id hash sort does NOT preserve document position; real fix is a chunks.ordinal column, tracked as follow-up - surrounding_chunks: saturating_add to defend against u32::MAX context arg on 32-bit targets - tests: line_start > total returns empty + chunk context at doc boundary clamps lower bound Deferred nits (follow-up): table-separator strict CommonMark form; MCP per-mode strict validation; CLI chunk_id truncation in plain output. None block correctness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 00:45:29 +09:00
th-kim0823	1b9d89eb3a	feat(app): App::fetch span mode + PDF/audio rejection (fb-35) Line-based slice over fmt_canonical_to_markdown output. PDF / audio source_type → span_not_supported StructuredError. Out-of-range line_end clamps to total; effective_end reflects post-budget trim. invalid_input on zero / inverted bounds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 23:54:22 +09:00
th-kim0823	7d1f855f7e	feat(app): App::fetch doc mode with budget (fb-35) Walks CanonicalDocument blocks, serializes to markdown, applies chars/4 budget when opts.max_tokens is set. doc_not_found preserved through StructuredError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 23:48:40 +09:00
th-kim0823	610d29f053	feat(app): App::fetch chunk mode + markdown serializer (fb-35) Chunk mode + +-N context. doc / span modes return placeholder errors (filled by subsequent tasks). fmt_canonical_to_markdown helper introduced now since doc mode (Task 4) consumes it. Errors are typed StructuredError so classify preserves chunk_not_found / doc_not_found through the wire layer. Adds SqliteStore::list_chunk_ids_for_doc so the facade can derive +-N neighbors without leaking direct rusqlite usage into kebab-app. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 23:44:51 +09:00
th-kim0823	e084b306e5	fix(fb-34): align next_cursor semantics with docs (PR #125 round 2) Previous round-1 fix dropped the speculative cursor branch on the truncated path, leaving a contradiction with the docs: - snippet-only shrunk → cursor emitted (returned == k_effective) - k-popped → cursor null (returned < k_effective) But docs promised the opposite. R2 resolution: emit cursor whenever more hits may be reachable (either retriever filled the page OR budget popped hits — the popped ones remain fetchable from offset+returned). Drop the artificial "widen vs paginate" copy; truncated and next_cursor are now independent signals — caller may do either or both. Updates: app.rs::search_with_opts logic + SearchResponse doc + schema description + SKILL.md two bullets + max_tokens=0 test asserts cursor IS emitted on k-pop case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 21:07:04 +09:00
th-kim0823	f485608108	fix(fb-34): address PR #125 round 1 review - error_wire: StructuredError wrapper preserves ErrorV1 through anyhow → classify pipeline. Adds downcast short-circuit so cursor::decode's typed code = "stale_cursor" reaches the wire instead of being string-formatted to code = "generic". - app: search_with_opts now wraps cursor::decode error in StructuredError instead of anyhow! string format. - test: error_wire pins both negative (bare anyhow → not stale_cursor) AND positive (StructuredError → stale_cursor) invariants. CLI integration test runs end-to-end and asserts error.v1.code on stderr. - app: next_cursor only emitted on full-page (k-pop) path; drop speculative emit on snippet-only truncation that would point at a different page than the agent expected. - cursor: differentiate malformed-base64 / malformed-payload / revision-mismatch error messages; all keep code = stale_cursor. - test: cursor_rejected fixture uses .expect() to fail loud on cursor non-emission instead of silent skip. - test: max_tokens=0 → 1-hit floor + truncated=true. - docs: SKILL.md + schema description distinguish snippet-shrink (widen) vs k-pop (paginate) truncated cases. HOTFIXES notes --no-cache semantic shift (cached path + clear vs uncached path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 20:49:27 +09:00
th-kim0823	af80cedd81	feat(app): App::search_with_opts + SearchResponse (fb-34) Budget loop: snippet shorten → k pop → ≥1 hit floor. Cursor encode/decode threads corpus_revision; mismatch surfaces as stale_cursor anyhow error. App::search retained as thin wrapper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 17:59:48 +09:00
th-kim0823	ebbc3a46ae	feat(app): cursor encode/decode for paginated search (fb-34) Opaque base64(JSON{offset, corpus_revision}). Mismatch or malformed input returns ErrorV1 with code = stale_cursor. base64 promoted to workspace dep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 17:49:23 +09:00
th-kim0823	dfef65f196	feat(app): staleness module + post-process search hits (fb-32) compute_stale: strict > boundary, threshold=0 disables, future timestamps treated as fresh (clock skew safety). App::search re-stamps on cache hit so config threshold changes take effect without flushing the cache. Also unblocks the workspace build by plugging placeholder indexed_at/stale into the two AnswerCitation construction sites in kebab-rag/pipeline.rs (the score-gate refusal path forwards from SearchHit; the LLM-citation path uses UNIX_EPOCH/false until Task 7 wires the real values through pack_context). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 01:30:10 +09:00
th-kim0823	7f5739d8fb	🏗️ refactor(fb-31): apply round 1 review nits - ingest_file_with_config: lowercase normalize ext (caller-side) + early error on unsupported extension (`.docx` etc. now Err with helpful message instead of silent skipped_by_extension counter). New test ingest_file_errors_on_unsupported_extension. - ingest_stdin_with_config: doc comment explaining intentional double-call of ensure helpers (idempotent + ~ms negligible). - external::inject_frontmatter: simplify precheck via single trim_start binding + add CR-only line ending edge case. - external::inject_frontmatter: doc note on yaml_quote escape contract (agent-supplied titles with special chars are safe). Round 1 review summary: #111 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 18:46:55 +09:00
th-kim0823	a42f907640	🧪 test(kebab-app): ingest_stdin_with_config integration (fb-31) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 18:04:52 +09:00

1 2

75 Commits