kebab

Author	SHA1	Message	Date
altair823	c19aa006d0	feat(p10-1c-go): activate Go in ingest_one_code_asset dispatch Replaces Go bail! arms with GoAstExtractor + CodeGoAstV1Chunker. Adds go_file_ingests_and_searches_as_code_citation integration test — asserts citation.lang=go, symbol=chunk.ParseDoc, code_lang=go. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 09:13:47 +00:00
altair823	2559d0d95a	refactor(p10-1c-go): add go to ingest dispatch allowlist (bail until Task F) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 09:03:28 +00:00
altair823	453ec15df4	fix(dogfood): document-centric fetch_span + remove get_asset_by_workspace_path assets.workspace_path is INTENTIONALLY 'last-registered path' for twin files (identical content at different paths share one asset row PK'd by blake3 content hash). PR #146 made try_skip_unchanged document-centric; PR #149 made reset --orphans-only document-centric; this PR removes the last caller of get_asset_by_workspace_path (fetch.rs:193 in fetch_span, which used it to reject PDF/audio media — for twins this could read the wrong asset's media_type and pick the wrong branch). Replaced with the natural 2-step lookup: get_document_by_workspace_path (PR #146) → doc.source_asset_id → get_asset (NEW trait method, asset_id is PRIMARY KEY so flip-flop-immune by construction). Then removed get_asset_by_workspace_path trait method + SqliteStore impl — 0 callers after the refactor. UPSERT doc-comment refreshed in store.rs to make the 'last-registered' semantics explicit so future readers don't try to 'fix' the flip-flop. Dogfood follow-up (PR #142 1B + multi-root corpus). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 08:03:38 +00:00
altair823	5f2bd9e97e	feat(dogfood): kebab reset --orphans-only — purge stored docs outside walker scope PR #148 auto-purges only filesystem-missing files (conservative — leaves on-disk-but-out-of-scope docs alone for data safety). This is the explicit complement: when the user has narrowed include / widened exclude / removed a sub-directory from the workspace and WANTS the stored docs reconciled, they invoke 'kebab reset --orphans-only'. Confirm prompt with orphan count + sample paths; --yes required in non-TTY. SQLite purge via existing purge_deleted_workspace_path (PR #148) + vector store delete_by_chunk_ids when configured. No fs existence check — orphans-only is the explicit 'I know what I'm doing' variant. dogfood follow-up to PR #148 (file deletion auto-purge). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 07:38:10 +00:00
altair823	d6d165df01	docs(dogfood): sync sweep_deleted_files algorithm doc with try_exists (PR #148 nit) Round 2 review found the function-level doc-comment still referenced the old fs::exists() (now replaced by try_exists().unwrap_or(true) in commit `2baa846`). One-line clarification — describes the conservative-on-Err semantics so future readers don't reintroduce the data-safety bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 07:10:27 +00:00
altair823	2baa846c6b	fix(dogfood): conservative try_exists() in sweep_deleted_files (PR #148 review) Round 1 review found a data-safety bug: fs::exists() returns false on errors like EACCES / EPERM / NFS-hiccup / ownership-change, which would trigger purge on a file that is in fact still on disk (just unreadable this moment). Switched to try_exists().unwrap_or(true) so transient FS errors are CONSERVATIVELY treated as 'file present' — never purge on uncertain signal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 07:04:03 +00:00
altair823	27baec82ea	fix(dogfood): auto-purge stored docs for filesystem-deleted files Files deleted from disk (rm a.md) were leaving stale documents + chunks + embeddings in the store, surfacing as ghost citations in search/ask. Existing purge_orphan_at_workspace_path only handled content-changed stale (WHERE workspace_path=? AND asset_id != ?) — file deletion has no new asset_id. Fix: post-walker-scan sweep. Compute (stored_paths - scanned_paths), for each candidate check filesystem existence — only purge when the file is TRULY missing. Scope-narrowing case (file on disk but outside include glob) is explicitly NOT purged to protect users from accidental data loss via config edits. Adds: - DocumentStore::all_workspace_paths trait method + SqliteStore impl - purge_deleted_workspace_path in store-sqlite (returns chunk_ids for vector delete; deletes doc CASCADE + asset row + copied storage file) - sweep_deleted_files in kebab-app::ingest path; called once per ingest before the per-asset loop - IngestReport.purged_deleted_files counter (additive, serde default) - CLI ingest summary mentions purge count when > 0 - 2 integration tests: file_deletion_auto_purge + include_scope_narrowing_does_NOT_purge dogfood discovery (PR #142 1B + multi-root: kebab-docs + httpx + zod + lodash). Per user decision: only filesystem deletion auto-purges; scope narrowing requires explicit kebab reset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 06:51:07 +00:00
altair823	360f825f3a	docs(dogfood): refresh try_skip_unchanged doc-comment to match new flow (PR #146 review) Round 1 review found the function-level doc-comment still described the old asset-side algorithm (item 2 asset-row checksum, item 3 id_for_doc miss). Updated to the document-centric flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 05:35:17 +00:00
altair823	641b92af7d	fix(dogfood): document-centric try_skip_unchanged for twin-file idempotency Identical-content files at different workspace paths share one assets row (assets.asset_id = blake3 content hash, PRIMARY KEY). The UPSERT `ON CONFLICT(asset_id) DO UPDATE SET workspace_path = excluded` made twin files overwrite each other's workspace_path on every ingest, so `get_asset_by_workspace_path(path1)` returned the OTHER twin's row (or None) — break idempotent unchanged-detection for both files. Fix: switch try_skip_unchanged to document-centric lookup. `documents. workspace_path` is already UNIQUE (V001) and `id_for_doc(path, ...)` includes path, so each twin has its own stable document row. Compare `doc.source_asset_id` with the new asset's checksum instead of going through the assets table. Dogfood (multi-root: kebab-docs + httpx + zod + lodash) showed 27 of 726 docs marked Updated on every idempotent re-ingest — all 27 are twin-file victims (empty `__init__.py` ×3, AGENTS.md ↔ CLAUDE.md same content, duplicate logo PDFs/JPGs). After: re-ingest reports 0 new / 0 updated / 726 unchanged. No schema migration needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 05:27:21 +00:00
altair823	4e8b84c4e0	fix(dogfood): populate schema.v1.repo_breakdown (Task 9 follow-up) Dogfooding (PR #142 1B + multi-root corpus: kebab-docs + httpx + zod + lodash) revealed schema.v1.repo_breakdown is always {} despite the 1A-2 Task 9 having added the code_lang_breakdown sibling. The schema.rs:171 placeholder `BTreeMap::new()` was left in place. Mirror Task 9's code_lang_breakdown query for the repo field — same metadata_json JSON-path pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 05:09:19 +00:00
altair823	d53995a6d4	feat(p10-1b): code-js-ast-v1 chunker + activate JavaScript in app dispatch Chunker: duplicate-with-substitution from code-ts-ast-v1 / code-rust-ast-v1. Dispatch: replaces JS bail! arms with JavascriptAstExtractor + CodeJsAstV1Chunker. Integration test javascript_file_ingests_and_searches_as_code_citation asserts citation.lang=javascript, symbol=src/Bar.Bar.baz, code_lang=javascript. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 01:16:07 +00:00
altair823	31245a4328	fix(p10-1b): TS parser_version code-typescript-v1 → code-ts-v1 (naming consistency) Task H implementer chose code-typescript-v1 but plan + design §3.3 use the short form (chunker is code-ts-ast-v1 / code-js-ast-v1). Aligning parser versions to match: rust=code-rust-v1 / python=code-python-v1 / ts=code-ts-v1 / js=code-js-v1 (Task K). Fixes 2 sites: const PARSER_VERSION + integration test assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 01:05:17 +00:00
altair823	acb61b6830	feat(p10-1b): activate TypeScript in ingest_one_code_asset dispatch Replaces TS bail! arms with TypescriptAstExtractor + CodeTsAstV1Chunker. Adds typescript_file_ingests_and_searches_as_code_citation integration test — asserts citation.lang=typescript, symbol=src/Foo.Foo.bar, code_lang=typescript. JS arms remain bail!() (Task L). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 00:59:41 +00:00
altair823	1815091247	feat(p10-1b): activate Python in ingest_one_code_asset dispatch Replaces Python bail! arms with PythonAstExtractor + CodePythonAstV1Chunker. Adds python_file_ingests_and_searches_as_code_citation integration test — asserts citation.lang=python, symbol=kebab_eval.metrics.compute_mrr, code_lang=python. TS/JS arms remain bail!() (Tasks J/L). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 00:49:01 +00:00
altair823	8bdb3e8090	refactor(p10-1b): generalize ingest_one_code_asset for multi-language dispatch Rust path observably unchanged (verified by existing code_ingest_smoke tests). Python/TS/JS arms bail with TODO; per-lang extractor + chunker land in subsequent tasks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 00:35:53 +00:00
altair823	b1d5047399	fix(p10-1a-2): PR review round 1 — doc inconsistencies + observable backfill error path PR #140 회차 1 actionable 7건 반영: - docs/SMOKE.md: parser_version "code-rust-ast-v1" → "code-rust-v1" (chunker_version 과 혼동); jq path .citation.code_lang → .citation.lang (wire 의 code_lang 은 SearchHit top-level) - docs/ARCHITECTURE.md: Mermaid pcode→ptypes 잘못된 edge → pcode→core 로 정정 (kebab-parse-code Cargo.toml 실제 dep 와 일치); 디렉토리 트리에서 code-rust-ast-v1 chunker 표기 위치 kebab-parse-code → kebab-chunk 로 정정 - crates/kebab-app/src/app.rs: backfill_repo 의 .ok().flatten() 실패 silent swallow → tracing::warn 로 관측 가능, 비-abort 의도 보존 - crates/kebab-parse-code/src/rust.rs: impl_item arm 의 "function_item 만 unit 생성" 1A scope 한정 주석을 외부에서도 보이도록 arm 상단에 한 줄 추가 (내부 주석은 유지) verify: kebab-parse-code 7/7 / kebab-app --lib 51/51 / code_ingest_smoke 3/3 green; touched-crate clippy clean (재부팅 전 검증). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 23:24:20 +00:00
altair823	da51e59081	feat(p10-1a-2): populate schema.v1 code_lang_breakdown Add `SqliteStore::code_lang_breakdown()` that queries `json_extract(metadata_json, '$.code_lang')`, groups by it, and skips NULL rows — returning `BTreeMap<String, u32>`. Wire it into `collect_stats` in `kebab-app::schema`, replacing the `BTreeMap::new()` placeholder inserted by 1A-1. Test: `store::tests::code_lang_breakdown_counts_by_code_lang` asserts rust=1 and that a null-code_lang doc does NOT appear in the map. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 21:41:52 +00:00
altair823	11a0fc758f	docs(p10-1a-2): note backfill invariant at search_with_opts non-trace path (Task 8 review) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 21:20:13 +00:00
altair823	b5d1fe8c1e	feat(p10-1a-2): backfill SearchHit.repo from doc metadata (Task 8b) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 21:13:01 +00:00
altair823	580576c2c6	feat(p10-1a-2): wire code ingest dispatch (ingest_one_code_asset) Add `MediaType::Code("rust")` dispatch arm in `ingest_one_asset`, `ingest_one_code_asset` fn (faithful mirror of `ingest_one_pdf_asset`), and `backfill_code_lang` post-processing in `App::search_uncached`. Integration test `code_ingest_smoke.rs` verifies full pipeline: ingest `.rs` → Citation::Code hit with lang/symbol/line_start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 20:14:59 +00:00
altair823	7a6a24ad10	feat(p10-1a-2): add MediaType::Code(lang) variant TDD: red → green cycle confirmed. New `Code(String)` variant serializes as `{"code":"rust"}` via serde `rename_all = "lowercase"`. All exhaustive `match` sites updated (`media_label`, `ingest_one_asset` catch-all → explicit or-pattern). Design §3.5 enum listing synced. Also fix `/target` symlink gitignore pattern so integration-test binary lookup via workspace-relative path works with CARGO_TARGET_DIR redirect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 17:14:45 +00:00
th-kim0823	298f4adc81	feat(p10-1a-1): CLI filter flags + SchemaStats breakdowns + regression tests Task 13: add wire regression tests proving markdown SearchHit omits repo/code_lang when None, and all 5 original Citation variants serialize byte-identically without spurious Code-variant keys. Task 15: add --repo (repeatable) and --code-lang (repeatable, comma-separated) flags to `kebab search`; propagate both into SearchFilters instead of the previous vec![] stub. Add #[allow(clippy::large_enum_variant)] — Cmd is short-lived, boxing buys nothing. Task 16: add code_lang_breakdown and repo_breakdown BTreeMap fields to Stats (schema.v1); derive Default on Stats; populate both as empty in collect_stats (1A-2 fills them when code chunks land). Add unit test asserting both keys are present in the serialized object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 17:21:59 +09:00
th-kim0823	4e8b70a04b	feat(p10-1a-1): apply generated-header + size-cap skip per file Wire kebab_parse_code::is_generated_file and is_oversized into FsSourceConnector::scan_with_skips. Files that pass gitignore/builtin/ kebabignore matching are now checked for generated-file markers (config-gated via ingest.code.skip_generated_header) and byte/line caps (ingest.code.max_file_bytes / max_file_lines). FsScanSkips gains skipped_generated + skipped_size_exceeded counters; kebab-app threads them into IngestReport. Also fixes a pre-existing clippy::derivable_impls warning in IngestCfg. Three new connector tests cover all three paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 17:06:59 +09:00
th-kim0823	9fce24b106	feat(p10-1a-1): wire IngestReport skip counters by category (gitignore/builtin/kebabignore) Refactor walker to expose WalkOverrides (combined + per-source matchers), add walk_files_with_skips that returns accepted files alongside skip attribution, wire FsSourceConnector::scan_with_skips into kebab-app so IngestReport.skipped_gitignore, skipped_kebabignore, skipped_builtin_blacklist, and skip_examples are populated instead of left at zero. Priority order per spec §5.2 (builtin > gitignore > kebabignore) enforced in classify_skip, with a directory-aware builtin matcher so pruned directory entries are correctly attributed to builtin rather than a coincident gitignore entry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:42:28 +09:00
th-kim0823	351c7a0826	feat(p10-1a-1): add IngestReport skip counters + SkipExamples Adds five new u32 counters (skipped_gitignore, skipped_kebabignore, skipped_builtin_blacklist, skipped_generated, skipped_size_exceeded) and a SkipExamples struct (≤5 sample paths per category) to IngestReport. All new fields are #[serde(default)] for backward-compat deserialization. Downstream literal construction sites patched with zeros/empty; snapshot re-baked. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 15:28:19 +09:00
th-kim0823	fa4eeb5a87	feat(p10-1a-1): add SearchHit.repo / code_lang + SearchFilters.repo / code_lang Wire two new optional fields onto SearchHit (skip_serializing_if = None) and two Vec<String> filter fields onto SearchFilters (serde default). Add RetrievalDetail::Default impl (manual, uses SearchMode::Hybrid as sentinel). Patch all downstream SearchHit / SearchFilters literal constructors with repo: None / code_lang: None / vec![] as appropriate. Also covers Citation::Code arm in kebab-eval metrics match.	2026-05-15 15:04:23 +09:00
th-kim0823	441f1192ee	docs(fb-42): wire schema + README + SMOKE + design + SKILL + INDEX - Add bulk_search_item.v1 + bulk_search_response.v1 wire schemas - Register both in WIRE_SCHEMAS const - README: --bulk flag mention + MCP tool list 7→8 (bulk_search) - SMOKE: bulk multi-query walkthrough (CLI + MCP equivalent) - Design §2.2: Bulk multi-query (fb-42) subsection (additive minor) - SKILL: mcp__kebab__bulk_search section + tool table row - Task spec status open→completed, banner replaced - INDEX: fb-42 row 머지 (rerank hint deferred) - Fix: missed Capabilities {bulk_search} in cli wire.rs test (Task 7 leftover) - Fix: missed tools.len() 7→8 in cli_mcp_smoke (Task 5 leftover) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 21:07:36 +09:00
th-kim0823	e8da415624	feat(schema): bulk_search capability flag (fb-42) - Capabilities.bulk_search: true (snapshot) - schema.v1 wire required list updated Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:49:09 +09:00
th-kim0823	1ebbd6b711	feat(app): bulk_search_with_config facade (fb-42)	2026-05-10 20:18:49 +09:00
th-kim0823	f7e2072d66	test(cli): integration tests for --trace + schema breakdowns (fb-37) Also fixes App::search_with_opts trace branch to use NoopRetriever for SearchMode::Lexical, removing the embeddings requirement when the user only wants lexical-mode trace.	2026-05-10 13:21:33 +09:00
th-kim0823	69037c313a	feat(app): SearchResponse.trace + opts.trace threading (fb-37) Adds the `trace: Option<SearchTrace>` field to `SearchResponse` and threads `SearchOpts.trace` through `App::search_with_opts`. When the caller sets `opts.trace = true` the path bypasses the LRU search cache and runs through `HybridRetriever::search_with_trace`, which dispatches all 3 SearchModes internally; this means `--trace` requires embeddings (same constraint as `--mode hybrid`). The non-trace path keeps its exact prior behavior with `trace: None` stamped on the response. Picked up Task 1 / Task 3 follow-ups in the same commit so the workspace compiles: SearchOpts struct-literals in kebab-cli/main.rs + kebab-mcp/tools/search.rs default the new `trace` field to false, and the schema-wrapper test in kebab-cli/wire.rs fills the new media_breakdown / lang_breakdown / index_bytes / stale_doc_count fields on Stats with `Default::default()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 13:01:18 +09:00
th-kim0823	231d80e82d	feat(stats): media/lang/bytes/stale fields on schema.v1.stats (fb-37) Extends CountSummary with media_breakdown, lang_breakdown, stale_doc_count fields populated via stats_ext::breakdowns(). Adds count_summary_with_threshold for callers that need real stale counts. Mirrors all new fields onto the wire-bound Stats struct in kebab-app::schema with #[serde(default)] for backwards-compat. Also fixes search_budget_integration.rs for the trace field added to SearchOpts in Task 1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 12:34:57 +09:00
th-kim0823	b86b763dfb	fix(fb-35): address PR #126 round 2 review - wire schema: relax effective_end.minimum 1 → 0 + expand description to cover line-clamp + out-of-range sentinel (panic-fix R1 emits Some(0) when line_start=1 and range is beyond doc end — schema must accept it) - tests: tighten first-chunk-target boundary test to assert ≤ 2 total neighbors (3-chunk doc, N=2). Strict "first chunk → context_before empty" not assertable until chunks.ordinal column lands (R1 #9 architectural caveat) - store: trim contradiction in list_chunk_ids_for_doc warning comment — drop "good enough for sequentially chunked markdown" phrase that conflicts with "hash sort dominates" paragraph above Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 00:55:29 +09:00
th-kim0823	7dddc1d706	fix(fb-35): address PR #126 round 1 review - fetch_span: panic-fix on line_start > total / empty doc (return empty text + effective_end = line_start - 1 instead of out-of-bounds slice) - truncated: reserved for budget-driven truncation only; line range clamp signaled via effective_end < line_end - spec / SKILL.md / README: align rejection wording to "PDF / audio" (matches code; Image OCR allowed for span) - store: warning comment on list_chunk_ids_for_doc — chunk_id hash sort does NOT preserve document position; real fix is a chunks.ordinal column, tracked as follow-up - surrounding_chunks: saturating_add to defend against u32::MAX context arg on 32-bit targets - tests: line_start > total returns empty + chunk context at doc boundary clamps lower bound Deferred nits (follow-up): table-separator strict CommonMark form; MCP per-mode strict validation; CLI chunk_id truncation in plain output. None block correctness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 00:45:29 +09:00
th-kim0823	1b9d89eb3a	feat(app): App::fetch span mode + PDF/audio rejection (fb-35) Line-based slice over fmt_canonical_to_markdown output. PDF / audio source_type → span_not_supported StructuredError. Out-of-range line_end clamps to total; effective_end reflects post-budget trim. invalid_input on zero / inverted bounds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 23:54:22 +09:00
th-kim0823	7d1f855f7e	feat(app): App::fetch doc mode with budget (fb-35) Walks CanonicalDocument blocks, serializes to markdown, applies chars/4 budget when opts.max_tokens is set. doc_not_found preserved through StructuredError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 23:48:40 +09:00
th-kim0823	610d29f053	feat(app): App::fetch chunk mode + markdown serializer (fb-35) Chunk mode + +-N context. doc / span modes return placeholder errors (filled by subsequent tasks). fmt_canonical_to_markdown helper introduced now since doc mode (Task 4) consumes it. Errors are typed StructuredError so classify preserves chunk_not_found / doc_not_found through the wire layer. Adds SqliteStore::list_chunk_ids_for_doc so the facade can derive +-N neighbors without leaking direct rusqlite usage into kebab-app. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 23:44:51 +09:00
th-kim0823	e084b306e5	fix(fb-34): align next_cursor semantics with docs (PR #125 round 2) Previous round-1 fix dropped the speculative cursor branch on the truncated path, leaving a contradiction with the docs: - snippet-only shrunk → cursor emitted (returned == k_effective) - k-popped → cursor null (returned < k_effective) But docs promised the opposite. R2 resolution: emit cursor whenever more hits may be reachable (either retriever filled the page OR budget popped hits — the popped ones remain fetchable from offset+returned). Drop the artificial "widen vs paginate" copy; truncated and next_cursor are now independent signals — caller may do either or both. Updates: app.rs::search_with_opts logic + SearchResponse doc + schema description + SKILL.md two bullets + max_tokens=0 test asserts cursor IS emitted on k-pop case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 21:07:04 +09:00
th-kim0823	f485608108	fix(fb-34): address PR #125 round 1 review - error_wire: StructuredError wrapper preserves ErrorV1 through anyhow → classify pipeline. Adds downcast short-circuit so cursor::decode's typed code = "stale_cursor" reaches the wire instead of being string-formatted to code = "generic". - app: search_with_opts now wraps cursor::decode error in StructuredError instead of anyhow! string format. - test: error_wire pins both negative (bare anyhow → not stale_cursor) AND positive (StructuredError → stale_cursor) invariants. CLI integration test runs end-to-end and asserts error.v1.code on stderr. - app: next_cursor only emitted on full-page (k-pop) path; drop speculative emit on snippet-only truncation that would point at a different page than the agent expected. - cursor: differentiate malformed-base64 / malformed-payload / revision-mismatch error messages; all keep code = stale_cursor. - test: cursor_rejected fixture uses .expect() to fail loud on cursor non-emission instead of silent skip. - test: max_tokens=0 → 1-hit floor + truncated=true. - docs: SKILL.md + schema description distinguish snippet-shrink (widen) vs k-pop (paginate) truncated cases. HOTFIXES notes --no-cache semantic shift (cached path + clear vs uncached path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 20:49:27 +09:00
th-kim0823	e1fcea6313	chore: clippy fix for fb-34 — allow result_large_err on cursor::decode ErrorV1 is the workspace wire error struct; boxing here would force every call site to deref through a Box for no win — the err-path is rare. Single allow at the function level. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 20:20:36 +09:00
th-kim0823	21220f6d39	feat(cli): kebab search --max-tokens / --snippet-chars / --cursor (fb-34) JSON output wrapped in search_response.v1 (breaking — agent must adapt). Plain output unchanged + [truncated; use --cursor X] stderr hint when budget tripped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 20:02:50 +09:00
th-kim0823	af80cedd81	feat(app): App::search_with_opts + SearchResponse (fb-34) Budget loop: snippet shorten → k pop → ≥1 hit floor. Cursor encode/decode threads corpus_revision; mismatch surfaces as stale_cursor anyhow error. App::search retained as thin wrapper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 17:59:48 +09:00
th-kim0823	aabe66f5e2	docs(error_wire): note stale_cursor convention (fb-34) stale_cursor is built by cursor::decode, not classify. Test locks the invariant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 17:50:39 +09:00
th-kim0823	ebbc3a46ae	feat(app): cursor encode/decode for paginated search (fb-34) Opaque base64(JSON{offset, corpus_revision}). Mismatch or malformed input returns ErrorV1 with code = stale_cursor. base64 promoted to workspace dep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 17:49:23 +09:00
th-kim0823	e5c99f5b80	feat(tui): adapt ask worker to StreamEvent sink (fb-33) Worker channel now carries kebab_app::StreamEvent. drain_stream matches on Token { delta }; RetrievalDone and Final are ignored (citations render from last_answer, Final is redundant with worker join). app::AskState.rx type widened to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 14:57:46 +09:00
th-kim0823	16db60f7bd	docs(fb-32): mirror back-reference + fix pipeline doc-comment ordering - kebab-app::staleness::compute_stale gains note pointing at the kebab-rag mirror so future modifiers know to update both copies. - kebab-rag::pipeline: doc comments adjacent to compute_stale and embedding_ref_for were positioned such that rustdoc would misattribute them. Reorder/separate so each comment hugs its own function. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 01:51:40 +09:00
th-kim0823	dfef65f196	feat(app): staleness module + post-process search hits (fb-32) compute_stale: strict > boundary, threshold=0 disables, future timestamps treated as fresh (clock skew safety). App::search re-stamps on cache hit so config threshold changes take effect without flushing the cache. Also unblocks the workspace build by plugging placeholder indexed_at/stale into the two AnswerCitation construction sites in kebab-rag/pipeline.rs (the score-gate refusal path forwards from SearchHit; the LLM-citation path uses UNIX_EPOCH/false until Task 7 wires the real values through pack_context). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 01:30:10 +09:00
th-kim0823	7f5739d8fb	🏗️ refactor(fb-31): apply round 1 review nits - ingest_file_with_config: lowercase normalize ext (caller-side) + early error on unsupported extension (`.docx` etc. now Err with helpful message instead of silent skipped_by_extension counter). New test ingest_file_errors_on_unsupported_extension. - ingest_stdin_with_config: doc comment explaining intentional double-call of ensure helpers (idempotent + ~ms negligible). - external::inject_frontmatter: simplify precheck via single trim_start binding + add CR-only line ending edge case. - external::inject_frontmatter: doc note on yaml_quote escape contract (agent-supplied titles with special chars are safe). Round 1 review summary: #111 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 18:46:55 +09:00
th-kim0823	dc24cb34b1	🚑 fix(fb-31): apply final review nits - kebab-app: add #[doc(hidden)] to ingest_stdin_with_config (CLAUDE.md convention — all *_with_config functions should have this attribute; fb-31's first impl missed it on the second facade fn). - SKILL.md: "Since v0.4.0" → "Since v0.3.1" (MCP shipped in fb-30 release v0.3.1; the wrong version claim was introduced in fb-30 doc sync and carried forward into fb-31). - tools_call_ingest_file: add idempotency test (second call with same content → unchanged=1, new=0). Spec called for two tests; first impl shipped only the happy path. Version bump 0.3.1 → 0.3.2 deferred to separate `chore/bump-v0.3.2` PR mirroring fb-27 + fb-30 precedent (commits `73f5d73` / `5495d96`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 18:38:19 +09:00
th-kim0823	a42f907640	🧪 test(kebab-app): ingest_stdin_with_config integration (fb-31) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 18:04:52 +09:00

1 2

100 Commits