kebab

Author	SHA1	Message	Date
th-kim0823	2a8451c033	fix(p10-1a-1): tighten kebab-parse-code manifest + tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:05:34 +09:00
th-kim0823	ff11f81f7f	feat(p10-1a-1): kebab-parse-code crate (lang + repo + skip) Tasks 5-8: new `kebab-parse-code` crate with three infrastructure modules for the code ingest framework. Ships lang.rs (extension→language identifier mapping), repo.rs (.git walk-up via gix 0.70 for RepoMeta), and skip.rs (BUILTIN_BLACKLIST, is_generated_file, is_oversized). 14 integration tests across three test files, all passing; clippy -D warnings clean. Note: gix pinned to 0.70 (not 0.83 as originally suggested) because 0.83 fails to compile against Rust 1.94.1 due to non-exhaustive match patterns in gix-hash. 0.70 resolves cleanly and has identical head_name/head_id API. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 15:57:59 +09:00
th-kim0823	bf4ebf8d2a	feat(p10-1a-1): add Metadata.repo / git_branch / git_commit / code_lang Four optional, serde-skipped-when-None fields added to `Metadata` for code ingest context. All 11 downstream construction sites patched with `repo: None, git_branch: None, git_commit: None, code_lang: None`. Full workspace check (`--tests`) and per-crate test suite pass clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 15:44:18 +09:00
th-kim0823	351c7a0826	feat(p10-1a-1): add IngestReport skip counters + SkipExamples Adds five new u32 counters (skipped_gitignore, skipped_kebabignore, skipped_builtin_blacklist, skipped_generated, skipped_size_exceeded) and a SkipExamples struct (≤5 sample paths per category) to IngestReport. All new fields are #[serde(default)] for backward-compat deserialization. Downstream literal construction sites patched with zeros/empty; snapshot re-baked. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 15:28:19 +09:00
th-kim0823	7329ba96ee	fix(p10-1a-1): patch missed SearchHit test-only construction sites Add repo: None, code_lang: None to the 3 SearchHit struct literals inside #[cfg(test)] blocks that were missed by the `fa4eeb5` sweep.	2026-05-15 15:17:10 +09:00
th-kim0823	fa4eeb5a87	feat(p10-1a-1): add SearchHit.repo / code_lang + SearchFilters.repo / code_lang Wire two new optional fields onto SearchHit (skip_serializing_if = None) and two Vec<String> filter fields onto SearchFilters (serde default). Add RetrievalDetail::Default impl (manual, uses SearchMode::Hybrid as sentinel). Patch all downstream SearchHit / SearchFilters literal constructors with repo: None / code_lang: None / vec![] as appropriate. Also covers Citation::Code arm in kebab-eval metrics match.	2026-05-15 15:04:23 +09:00
th-kim0823	3b1e878aed	feat(p10-1a-1): add Citation::Code variant Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 14:39:18 +09:00
th-kim0823	b954e9ce66	fix(fb-39b): address PR #137 round 1 review - CI-only embed_model.rs tests updated 384 → 1024 + e5-small → e5-large references (incl. file header download size, snapshot dim assert, L2 norm comment) - kebab-embed-local module docs + Cargo.toml description list both models (small + large) - Stale tracing message expanded with both model sizes - Task spec Post-merge deviation section: record dropped embedding_dim_mismatch ErrorV1 + reason (LanceDB (model, dim) namespacing makes hard-error redundant) - Task spec + HOTFIXES version bump 0.6→0.7 corrected to 0.5→0.6 (current Cargo.toml = 0.5.0; fb-42 0.6 cut deferred per user direction) - HOTFIXES "embedding_version bump 아님" line corrected — cascade rule DOES trigger release bump, plus deviation note for the dropped error Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 23:45:55 +09:00
th-kim0823	69c94b6692	feat(embed,config): add multilingual-e5-large + flip default config (fb-39b) Task 1: Add multilingual-e5-large arm to kebab-embed-local::resolve_model with tests for 1024-dim variants and error cases. Task 2: Flip kebab-config defaults from e5-small (384-dim) to e5-large (1024-dim) across defaults(), test assertions, and TOML template. All tests pass; clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 23:05:36 +09:00
th-kim0823	5870a1de15	fix(fb-39): address PR #136 round 1 review kebab eval compare now surfaces precision_at_k_chunk delta in both human-readable table + deltas JSON. Snapshot fixture regenerated additively. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 22:39:11 +09:00
th-kim0823	bb0ec0469f	feat(eval): precision_at_k_chunk metric (P@5, P@10) (fb-39)	2026-05-10 22:26:21 +09:00
th-kim0823	b53376e96e	fix(fb-42): address PR #134 round 1 review - print_schema_text plain mode: include bulk_search capability row - README: tool count 7 → 8, fetch added to MCP tool name lists Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 21:19:20 +09:00
th-kim0823	441f1192ee	docs(fb-42): wire schema + README + SMOKE + design + SKILL + INDEX - Add bulk_search_item.v1 + bulk_search_response.v1 wire schemas - Register both in WIRE_SCHEMAS const - README: --bulk flag mention + MCP tool list 7→8 (bulk_search) - SMOKE: bulk multi-query walkthrough (CLI + MCP equivalent) - Design §2.2: Bulk multi-query (fb-42) subsection (additive minor) - SKILL: mcp__kebab__bulk_search section + tool table row - Task spec status open→completed, banner replaced - INDEX: fb-42 row 머지 (rerank hint deferred) - Fix: missed Capabilities {bulk_search} in cli wire.rs test (Task 7 leftover) - Fix: missed tools.len() 7→8 in cli_mcp_smoke (Task 5 leftover) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 21:07:36 +09:00
th-kim0823	e8da415624	feat(schema): bulk_search capability flag (fb-42) - Capabilities.bulk_search: true (snapshot) - schema.v1 wire required list updated Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:49:09 +09:00
th-kim0823	d8e5f35601	test(mcp): integration tests for bulk_search tool (fb-42)	2026-05-10 20:33:32 +09:00
th-kim0823	6ab0d782ef	feat(mcp): kebab__bulk_search tool (fb-42) Exposes bulk multi-query search via MCP `bulk_search` tool: - Input: { queries: [SearchInput shapes...] }, capped at 100 - Output: bulk_search_response.v1 with per-query results + summary - Sequential execution reuses App instance for cache amortization - Per-query errors embed error.v1 JSON; never aborts bulk call Updates tool count from 7 to 8 in lib.rs comment + tools_list test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:31:20 +09:00
th-kim0823	2bbe94eb05	test(cli): integration tests for kebab search --bulk (fb-42)	2026-05-10 20:26:07 +09:00
th-kim0823	9ac13fa256	fix(cli): make query optional when --bulk is set (fb-42)	2026-05-10 20:26:03 +09:00
th-kim0823	67f2c16cc2	feat(cli): kebab search --bulk flag + stdin ndjson + output stream (fb-42)	2026-05-10 20:22:45 +09:00
th-kim0823	1ebbd6b711	feat(app): bulk_search_with_config facade (fb-42)	2026-05-10 20:18:49 +09:00
th-kim0823	892175d009	feat(core): BulkSearchItem / Summary / Response types (fb-42)	2026-05-10 20:12:31 +09:00
th-kim0823	b35f163f56	fix(fb-40): address PR #132 round 1 review Module doc still pinned "rag-v1" — update to reflect dispatched template via system_prompt_for (rag-v1 legacy / rag-v2 default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:42:57 +09:00
th-kim0823	0e8b800b6b	test(rag): integration tests for rag-v1/v2/unknown dispatch (fb-40) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:18:36 +09:00
th-kim0823	126559ce7a	fix(fb-40): update test fixtures for rag-v2 default	2026-05-10 19:15:15 +09:00
th-kim0823	137fc4ee31	feat(config): default prompt_template_version rag-v1 → rag-v2 (fb-40)	2026-05-10 19:04:55 +09:00
th-kim0823	59f01f8185	feat(rag): pipeline reads prompt_template_version via helper (fb-40)	2026-05-10 19:02:39 +09:00
th-kim0823	9f70681b77	feat(rag): SYSTEM_PROMPT_RAG_V2 + system_prompt_for dispatch helper (fb-40)	2026-05-10 19:01:05 +09:00
th-kim0823	67aee9f480	test(cli): integration tests for score_kind on lexical mode (fb-38)	2026-05-10 18:12:14 +09:00
th-kim0823	4440fa6659	fix(fb-38): add score_kind to remaining SearchHit literals Add missing score_kind field to SearchHit constructors in: - kebab-tui/tests/search.rs::make_hit() - kebab-eval/tests/metrics_and_compare.rs::hit() - kebab-eval/src/metrics.rs::hit() All test fixtures default to Rrf (hybrid mode), matching the field's Default impl and the test semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 18:08:29 +09:00
th-kim0823	b51cdb9e8f	feat(search/hybrid): fuse hits override score_kind to Rrf (fb-38)	2026-05-10 17:56:56 +09:00
th-kim0823	4e739f3cd8	feat(search): add score_kind to VectorRetriever (Cosine) and hybrid test helpers (Rrf) This commit unblocks Tasks 3 and 4 of fb-38: - VectorRetriever::build_hit now labels hits with ScoreKind::Cosine - Hybrid retriever test helpers (mk_hit functions) label synthetic hits with ScoreKind::Rrf - Updated lexical snapshot fixture to reflect new score_kind field in output Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 17:54:16 +09:00
th-kim0823	3a621bba0d	feat(search/lexical): label hits with ScoreKind::Bm25 (fb-38 task 2) - Add ScoreKind::Bm25 to LexicalRetriever::build_hit SearchHit construction - Import ScoreKind from kebab_core in lexical.rs - Add integration test lexical_retriever_hits_carry_bm25_score_kind to verify all hits from LexicalRetriever carry score_kind == ScoreKind::Bm25 - Update lexical snapshot test baseline to include new score_kind field Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 17:54:11 +09:00
th-kim0823	3c605b1a5d	feat(core): ScoreKind enum + SearchHit.score_kind (fb-38)	2026-05-10 17:49:02 +09:00
th-kim0823	6a33d08aea	fix(fb-37): address PR #129 round 1 review - doc TraceFusionInput.fusion_score semantics (single-mode vs hybrid) - comment why total_ms vs stage sum can drift (millis truncation) - TODO marker on TUI trace popup filter passthrough Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 16:26:34 +09:00
th-kim0823	a40593590b	docs(fb-37): wire schema + README + SMOKE + INDEX + SKILL	2026-05-10 14:13:47 +09:00
th-kim0823	5687cbc0e2	feat(tui): search pane t-key opens TracePopup (fb-37)	2026-05-10 13:39:11 +09:00
th-kim0823	653e432a30	feat(mcp): kebab__search trace input + output mirror (fb-37) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 13:32:30 +09:00
th-kim0823	f7e2072d66	test(cli): integration tests for --trace + schema breakdowns (fb-37) Also fixes App::search_with_opts trace branch to use NoopRetriever for SearchMode::Lexical, removing the embeddings requirement when the user only wants lexical-mode trace.	2026-05-10 13:21:33 +09:00
th-kim0823	72c227af23	feat(cli): kebab search --trace flag + wire trace + pretty print (fb-37) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 13:08:48 +09:00
th-kim0823	69037c313a	feat(app): SearchResponse.trace + opts.trace threading (fb-37) Adds the `trace: Option<SearchTrace>` field to `SearchResponse` and threads `SearchOpts.trace` through `App::search_with_opts`. When the caller sets `opts.trace = true` the path bypasses the LRU search cache and runs through `HybridRetriever::search_with_trace`, which dispatches all 3 SearchModes internally; this means `--trace` requires embeddings (same constraint as `--mode hybrid`). The non-trace path keeps its exact prior behavior with `trace: None` stamped on the response. Picked up Task 1 / Task 3 follow-ups in the same commit so the workspace compiles: SearchOpts struct-literals in kebab-cli/main.rs + kebab-mcp/tools/search.rs default the new `trace` field to false, and the schema-wrapper test in kebab-cli/wire.rs fills the new media_breakdown / lang_breakdown / index_bytes / stale_doc_count fields on Stats with `Default::default()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 13:01:18 +09:00
th-kim0823	6a067e3ab1	feat(search): HybridRetriever::search_with_trace (fb-37)	2026-05-10 12:38:53 +09:00
th-kim0823	231d80e82d	feat(stats): media/lang/bytes/stale fields on schema.v1.stats (fb-37) Extends CountSummary with media_breakdown, lang_breakdown, stale_doc_count fields populated via stats_ext::breakdowns(). Adds count_summary_with_threshold for callers that need real stale counts. Mirrors all new fields onto the wire-bound Stats struct in kebab-app::schema with #[serde(default)] for backwards-compat. Also fixes search_budget_integration.rs for the trace field added to SearchOpts in Task 1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 12:34:57 +09:00
th-kim0823	69c6e23432	feat(store): breakdowns + index_bytes helpers (fb-37) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 12:24:43 +09:00
th-kim0823	1e943f21dc	feat(core): SearchTrace + IndexBytes types + SearchOpts.trace (fb-37) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 12:17:04 +09:00
th-kim0823	84287d0ef6	fix(fb-36): address PR #127 round 1 review - ingested_after: convert OffsetDateTime to UTC before formatting so non-Z offsets compare correctly against UTC TEXT storage (lexical.rs + filters.rs) - README: --tag is repeatable-only, not csv (only --media is csv) - test(cli): add multi-value --tag OR-within IN-list coverage - test(store): add UTC-offset regression test for ingested_after - mcp: use ERROR_V1_ID const instead of hardcoded "error.v1" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 04:47:55 +09:00
th-kim0823	b06f4654e7	feat(mcp): kebab__search filter inputs (fb-36) 7 new optional inputs on SearchInput: tags, lang, path_glob, trust_min, media, ingested_after, doc_id. Validation surfaces as error.v1 code = invalid_input via StructuredError. Dispatch builds SearchFilters from the inputs and forwards through the existing search_with_opts_with_config facade. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 04:11:27 +09:00
th-kim0823	4e0379c04f	test(cli): wire_search_filters — lexical-only integration tests (fb-36) Cover: --doc-id scoping, --ingested-after validation error, --media md alias, --tag repeatable + frontmatter parsing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 04:06:21 +09:00
th-kim0823	6a18847892	feat(cli): kebab search filter flags (fb-36) 7 new flags: --tag (repeatable), --lang, --path-glob, --trust-min (value_enum), --media (csv with `md` alias), --ingested-after (RFC3339; config_invalid on parse fail), --doc-id. Dispatch translates clap values into SearchFilters and propagates structured errors through the existing StructuredError wrapper from fb-34. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 03:57:55 +09:00
th-kim0823	c6cc1e2bfe	feat(search/vector): media / ingested_after / doc_id filters (fb-36) filter_chunks helper in kebab-store-sqlite extended with the same 3 WHERE clauses as lexical. Vector still over-fetches k*2 then post-filters via SqliteStore::filter_chunks; small k can return < k hits when filters drop a lot — agent is expected to widen k or paginate. AND combinator with existing filters. - kebab-store-sqlite/src/filters.rs: media IN-list subquery, ingested_after lexicographic >= compare, doc_id equality; mirrors lexical SQL arms - 3 direct unit tests (filter_chunks_media_type/ingested_after/doc_id) that run without AVX/Lance - common/mod.rs: insert_doc / insert_doc_with_media / run_vector_search helpers on HybridEnv for integration-test use - hybrid.rs: 2 new #[ignore = "requires AVX..."] integration tests (vector_filter_by_media, vector_filter_by_doc_id) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 03:50:56 +09:00
th-kim0823	86475e5ba2	fix(search/lexical): use std::iter::repeat_n (clippy) Per code review on `2c80e2a`. manual-repeat-n lint triggers for Rust 1.94+ when repeat().take() can be expressed as repeat_n directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 03:43:51 +09:00

1 2 3 4 5 ...

345 Commits