kebab

Author	SHA1	Message	Date
th-kim0823	67f2c16cc2	feat(cli): kebab search --bulk flag + stdin ndjson + output stream (fb-42)	2026-05-10 20:22:45 +09:00
th-kim0823	1ebbd6b711	feat(app): bulk_search_with_config facade (fb-42)	2026-05-10 20:18:49 +09:00
th-kim0823	892175d009	feat(core): BulkSearchItem / Summary / Response types (fb-42)	2026-05-10 20:12:31 +09:00
th-kim0823	b35f163f56	fix(fb-40): address PR #132 round 1 review Module doc still pinned "rag-v1" — update to reflect dispatched template via system_prompt_for (rag-v1 legacy / rag-v2 default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:42:57 +09:00
th-kim0823	0e8b800b6b	test(rag): integration tests for rag-v1/v2/unknown dispatch (fb-40) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:18:36 +09:00
th-kim0823	126559ce7a	fix(fb-40): update test fixtures for rag-v2 default	2026-05-10 19:15:15 +09:00
th-kim0823	137fc4ee31	feat(config): default prompt_template_version rag-v1 → rag-v2 (fb-40)	2026-05-10 19:04:55 +09:00
th-kim0823	59f01f8185	feat(rag): pipeline reads prompt_template_version via helper (fb-40)	2026-05-10 19:02:39 +09:00
th-kim0823	9f70681b77	feat(rag): SYSTEM_PROMPT_RAG_V2 + system_prompt_for dispatch helper (fb-40)	2026-05-10 19:01:05 +09:00
th-kim0823	67aee9f480	test(cli): integration tests for score_kind on lexical mode (fb-38)	2026-05-10 18:12:14 +09:00
th-kim0823	4440fa6659	fix(fb-38): add score_kind to remaining SearchHit literals Add missing score_kind field to SearchHit constructors in: - kebab-tui/tests/search.rs::make_hit() - kebab-eval/tests/metrics_and_compare.rs::hit() - kebab-eval/src/metrics.rs::hit() All test fixtures default to Rrf (hybrid mode), matching the field's Default impl and the test semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 18:08:29 +09:00
th-kim0823	b51cdb9e8f	feat(search/hybrid): fuse hits override score_kind to Rrf (fb-38)	2026-05-10 17:56:56 +09:00
th-kim0823	4e739f3cd8	feat(search): add score_kind to VectorRetriever (Cosine) and hybrid test helpers (Rrf) This commit unblocks Tasks 3 and 4 of fb-38: - VectorRetriever::build_hit now labels hits with ScoreKind::Cosine - Hybrid retriever test helpers (mk_hit functions) label synthetic hits with ScoreKind::Rrf - Updated lexical snapshot fixture to reflect new score_kind field in output Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 17:54:16 +09:00
th-kim0823	3a621bba0d	feat(search/lexical): label hits with ScoreKind::Bm25 (fb-38 task 2) - Add ScoreKind::Bm25 to LexicalRetriever::build_hit SearchHit construction - Import ScoreKind from kebab_core in lexical.rs - Add integration test lexical_retriever_hits_carry_bm25_score_kind to verify all hits from LexicalRetriever carry score_kind == ScoreKind::Bm25 - Update lexical snapshot test baseline to include new score_kind field Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 17:54:11 +09:00
th-kim0823	3c605b1a5d	feat(core): ScoreKind enum + SearchHit.score_kind (fb-38)	2026-05-10 17:49:02 +09:00
th-kim0823	6a33d08aea	fix(fb-37): address PR #129 round 1 review - doc TraceFusionInput.fusion_score semantics (single-mode vs hybrid) - comment why total_ms vs stage sum can drift (millis truncation) - TODO marker on TUI trace popup filter passthrough Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 16:26:34 +09:00
th-kim0823	a40593590b	docs(fb-37): wire schema + README + SMOKE + INDEX + SKILL	2026-05-10 14:13:47 +09:00
th-kim0823	5687cbc0e2	feat(tui): search pane t-key opens TracePopup (fb-37)	2026-05-10 13:39:11 +09:00
th-kim0823	653e432a30	feat(mcp): kebab__search trace input + output mirror (fb-37) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 13:32:30 +09:00
th-kim0823	f7e2072d66	test(cli): integration tests for --trace + schema breakdowns (fb-37) Also fixes App::search_with_opts trace branch to use NoopRetriever for SearchMode::Lexical, removing the embeddings requirement when the user only wants lexical-mode trace.	2026-05-10 13:21:33 +09:00
th-kim0823	72c227af23	feat(cli): kebab search --trace flag + wire trace + pretty print (fb-37) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 13:08:48 +09:00
th-kim0823	69037c313a	feat(app): SearchResponse.trace + opts.trace threading (fb-37) Adds the `trace: Option<SearchTrace>` field to `SearchResponse` and threads `SearchOpts.trace` through `App::search_with_opts`. When the caller sets `opts.trace = true` the path bypasses the LRU search cache and runs through `HybridRetriever::search_with_trace`, which dispatches all 3 SearchModes internally; this means `--trace` requires embeddings (same constraint as `--mode hybrid`). The non-trace path keeps its exact prior behavior with `trace: None` stamped on the response. Picked up Task 1 / Task 3 follow-ups in the same commit so the workspace compiles: SearchOpts struct-literals in kebab-cli/main.rs + kebab-mcp/tools/search.rs default the new `trace` field to false, and the schema-wrapper test in kebab-cli/wire.rs fills the new media_breakdown / lang_breakdown / index_bytes / stale_doc_count fields on Stats with `Default::default()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 13:01:18 +09:00
th-kim0823	6a067e3ab1	feat(search): HybridRetriever::search_with_trace (fb-37)	2026-05-10 12:38:53 +09:00
th-kim0823	231d80e82d	feat(stats): media/lang/bytes/stale fields on schema.v1.stats (fb-37) Extends CountSummary with media_breakdown, lang_breakdown, stale_doc_count fields populated via stats_ext::breakdowns(). Adds count_summary_with_threshold for callers that need real stale counts. Mirrors all new fields onto the wire-bound Stats struct in kebab-app::schema with #[serde(default)] for backwards-compat. Also fixes search_budget_integration.rs for the trace field added to SearchOpts in Task 1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 12:34:57 +09:00
th-kim0823	69c6e23432	feat(store): breakdowns + index_bytes helpers (fb-37) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 12:24:43 +09:00
th-kim0823	1e943f21dc	feat(core): SearchTrace + IndexBytes types + SearchOpts.trace (fb-37) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 12:17:04 +09:00
th-kim0823	84287d0ef6	fix(fb-36): address PR #127 round 1 review - ingested_after: convert OffsetDateTime to UTC before formatting so non-Z offsets compare correctly against UTC TEXT storage (lexical.rs + filters.rs) - README: --tag is repeatable-only, not csv (only --media is csv) - test(cli): add multi-value --tag OR-within IN-list coverage - test(store): add UTC-offset regression test for ingested_after - mcp: use ERROR_V1_ID const instead of hardcoded "error.v1" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 04:47:55 +09:00
th-kim0823	b06f4654e7	feat(mcp): kebab__search filter inputs (fb-36) 7 new optional inputs on SearchInput: tags, lang, path_glob, trust_min, media, ingested_after, doc_id. Validation surfaces as error.v1 code = invalid_input via StructuredError. Dispatch builds SearchFilters from the inputs and forwards through the existing search_with_opts_with_config facade. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 04:11:27 +09:00
th-kim0823	4e0379c04f	test(cli): wire_search_filters — lexical-only integration tests (fb-36) Cover: --doc-id scoping, --ingested-after validation error, --media md alias, --tag repeatable + frontmatter parsing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 04:06:21 +09:00
th-kim0823	6a18847892	feat(cli): kebab search filter flags (fb-36) 7 new flags: --tag (repeatable), --lang, --path-glob, --trust-min (value_enum), --media (csv with `md` alias), --ingested-after (RFC3339; config_invalid on parse fail), --doc-id. Dispatch translates clap values into SearchFilters and propagates structured errors through the existing StructuredError wrapper from fb-34. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 03:57:55 +09:00
th-kim0823	c6cc1e2bfe	feat(search/vector): media / ingested_after / doc_id filters (fb-36) filter_chunks helper in kebab-store-sqlite extended with the same 3 WHERE clauses as lexical. Vector still over-fetches k*2 then post-filters via SqliteStore::filter_chunks; small k can return < k hits when filters drop a lot — agent is expected to widen k or paginate. AND combinator with existing filters. - kebab-store-sqlite/src/filters.rs: media IN-list subquery, ingested_after lexicographic >= compare, doc_id equality; mirrors lexical SQL arms - 3 direct unit tests (filter_chunks_media_type/ingested_after/doc_id) that run without AVX/Lance - common/mod.rs: insert_doc / insert_doc_with_media / run_vector_search helpers on HybridEnv for integration-test use - hybrid.rs: 2 new #[ignore = "requires AVX..."] integration tests (vector_filter_by_media, vector_filter_by_doc_id) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 03:50:56 +09:00
th-kim0823	86475e5ba2	fix(search/lexical): use std::iter::repeat_n (clippy) Per code review on `2c80e2a`. manual-repeat-n lint triggers for Rust 1.94+ when repeat().take() can be expressed as repeat_n directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 03:43:51 +09:00
th-kim0823	2c80e2ad91	feat(search/lexical): media / ingested_after / doc_id filters (fb-36) SQL WHERE clause extension. media uses CASE WHEN json_type='text' to handle both unit (\`"markdown"\`) and tuple (\`{"image":"png"}\`) MediaType serde shapes. ingested_after relies on RFC3339 lexicographic ordering with UTC Z (per fb-32 ingest invariant). doc_id is a simple equality. AND combinator with existing tags / lang / trust filters. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 03:41:02 +09:00
th-kim0823	d3f38c76e9	feat(core): SearchFilters gains media / ingested_after / doc_id (fb-36) 3 additive optional fields. #[serde(default)] preserves backwards compat for older JSON without the new keys. MEDIA_KINDS const exposes canonical "markdown"/"pdf"/"image"/ "audio"/"other" labels for downstream alias normalization. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 03:36:45 +09:00
th-kim0823	b86b763dfb	fix(fb-35): address PR #126 round 2 review - wire schema: relax effective_end.minimum 1 → 0 + expand description to cover line-clamp + out-of-range sentinel (panic-fix R1 emits Some(0) when line_start=1 and range is beyond doc end — schema must accept it) - tests: tighten first-chunk-target boundary test to assert ≤ 2 total neighbors (3-chunk doc, N=2). Strict "first chunk → context_before empty" not assertable until chunks.ordinal column lands (R1 #9 architectural caveat) - store: trim contradiction in list_chunk_ids_for_doc warning comment — drop "good enough for sequentially chunked markdown" phrase that conflicts with "hash sort dominates" paragraph above Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 00:55:29 +09:00
th-kim0823	7dddc1d706	fix(fb-35): address PR #126 round 1 review - fetch_span: panic-fix on line_start > total / empty doc (return empty text + effective_end = line_start - 1 instead of out-of-bounds slice) - truncated: reserved for budget-driven truncation only; line range clamp signaled via effective_end < line_end - spec / SKILL.md / README: align rejection wording to "PDF / audio" (matches code; Image OCR allowed for span) - store: warning comment on list_chunk_ids_for_doc — chunk_id hash sort does NOT preserve document position; real fix is a chunks.ordinal column, tracked as follow-up - surrounding_chunks: saturating_add to defend against u32::MAX context arg on 32-bit targets - tests: line_start > total returns empty + chunk context at doc boundary clamps lower bound Deferred nits (follow-up): table-separator strict CommonMark form; MCP per-mode strict validation; CLI chunk_id truncation in plain output. None block correctness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 00:45:29 +09:00
th-kim0823	8d8f1c0294	test(cli): bump expected MCP tool count 6 → 7 for fb-35 fetch cli_mcp_initialize_then_tools_list asserts the exact tools[] count returned by tools/list. fb-35 added kebab__fetch as the 7th tool — bump the assertion accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 00:20:59 +09:00
th-kim0823	77bf19566c	feat(mcp): kebab__fetch tool — chunk / doc / span (fb-35) Mirrors CLI surface: same input shape, same fetch_result.v1 output. invalid_input error for missing kind-specific fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 00:11:37 +09:00
th-kim0823	beb40249a3	test(cli): wire_fetch — chunk/doc + chunk_not_found integration (fb-35) 3 lexical-only integration tests: chunk JSON shape, doc truncated with --max-tokens, unknown chunk_id returns error.v1 with code = chunk_not_found. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 00:06:14 +09:00
th-kim0823	0fffd69071	feat(cli): kebab fetch chunk / doc / span (fb-35) JSON output is fetch_result.v1; plain output is human-friendly labeled sections (chunk: before / target / after; doc/span: full text + stderr truncated hint). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 00:01:56 +09:00
th-kim0823	1b9d89eb3a	feat(app): App::fetch span mode + PDF/audio rejection (fb-35) Line-based slice over fmt_canonical_to_markdown output. PDF / audio source_type → span_not_supported StructuredError. Out-of-range line_end clamps to total; effective_end reflects post-budget trim. invalid_input on zero / inverted bounds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 23:54:22 +09:00
th-kim0823	7d1f855f7e	feat(app): App::fetch doc mode with budget (fb-35) Walks CanonicalDocument blocks, serializes to markdown, applies chars/4 budget when opts.max_tokens is set. doc_not_found preserved through StructuredError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 23:48:40 +09:00
th-kim0823	610d29f053	feat(app): App::fetch chunk mode + markdown serializer (fb-35) Chunk mode + +-N context. doc / span modes return placeholder errors (filled by subsequent tasks). fmt_canonical_to_markdown helper introduced now since doc mode (Task 4) consumes it. Errors are typed StructuredError so classify preserves chunk_not_found / doc_not_found through the wire layer. Adds SqliteStore::list_chunk_ids_for_doc so the facade can derive +-N neighbors without leaking direct rusqlite usage into kebab-app. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 23:44:51 +09:00
th-kim0823	9653592c16	feat(core): FetchQuery / FetchOpts / FetchResult / FetchKind (fb-35) Domain types for `kebab fetch` 3 modes (chunk / doc / span). All types Serialize so wire layers hand them through serde_json directly. FetchKind is snake_case-renamed to match the wire discriminator literal in fetch_result.v1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 23:35:21 +09:00
th-kim0823	e084b306e5	fix(fb-34): align next_cursor semantics with docs (PR #125 round 2) Previous round-1 fix dropped the speculative cursor branch on the truncated path, leaving a contradiction with the docs: - snippet-only shrunk → cursor emitted (returned == k_effective) - k-popped → cursor null (returned < k_effective) But docs promised the opposite. R2 resolution: emit cursor whenever more hits may be reachable (either retriever filled the page OR budget popped hits — the popped ones remain fetchable from offset+returned). Drop the artificial "widen vs paginate" copy; truncated and next_cursor are now independent signals — caller may do either or both. Updates: app.rs::search_with_opts logic + SearchResponse doc + schema description + SKILL.md two bullets + max_tokens=0 test asserts cursor IS emitted on k-pop case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 21:07:04 +09:00
th-kim0823	f485608108	fix(fb-34): address PR #125 round 1 review - error_wire: StructuredError wrapper preserves ErrorV1 through anyhow → classify pipeline. Adds downcast short-circuit so cursor::decode's typed code = "stale_cursor" reaches the wire instead of being string-formatted to code = "generic". - app: search_with_opts now wraps cursor::decode error in StructuredError instead of anyhow! string format. - test: error_wire pins both negative (bare anyhow → not stale_cursor) AND positive (StructuredError → stale_cursor) invariants. CLI integration test runs end-to-end and asserts error.v1.code on stderr. - app: next_cursor only emitted on full-page (k-pop) path; drop speculative emit on snippet-only truncation that would point at a different page than the agent expected. - cursor: differentiate malformed-base64 / malformed-payload / revision-mismatch error messages; all keep code = stale_cursor. - test: cursor_rejected fixture uses .expect() to fail loud on cursor non-emission instead of silent skip. - test: max_tokens=0 → 1-hit floor + truncated=true. - docs: SKILL.md + schema description distinguish snippet-shrink (widen) vs k-pop (paginate) truncated cases. HOTFIXES notes --no-cache semantic shift (cached path + clear vs uncached path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 20:49:27 +09:00
th-kim0823	e1fcea6313	chore: clippy fix for fb-34 — allow result_large_err on cursor::decode ErrorV1 is the workspace wire error struct; boxing here would force every call site to deref through a Box for no win — the err-path is rare. Single allow at the function level. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 20:20:36 +09:00
th-kim0823	5e0cff1b92	feat(mcp): search tool emits search_response.v1 + budget inputs (fb-34) SearchInput gains max_tokens / snippet_chars / cursor (all optional). Output wrapped in search_response.v1 to match CLI; existing tools_call_search test updated to read v["hits"] instead of the bare array. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 20:12:05 +09:00
th-kim0823	603061fb86	test(cli): wire_search_response + budget integration (fb-34) 4 lexical-only tests covering search_response.v1 wrapper shape, --max-tokens truncation, --cursor pagination, plain stderr hint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 20:09:01 +09:00
th-kim0823	21220f6d39	feat(cli): kebab search --max-tokens / --snippet-chars / --cursor (fb-34) JSON output wrapped in search_response.v1 (breaking — agent must adapt). Plain output unchanged + [truncated; use --cursor X] stderr hint when budget tripped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 20:02:50 +09:00

1 2 3 4 5 ...

327 Commits