Tasks 5-8: new `kebab-parse-code` crate with three infrastructure modules
for the code ingest framework. Ships lang.rs (extension→language identifier
mapping), repo.rs (.git walk-up via gix 0.70 for RepoMeta), and skip.rs
(BUILTIN_BLACKLIST, is_generated_file, is_oversized). 14 integration tests
across three test files, all passing; clippy -D warnings clean.
Note: gix pinned to 0.70 (not 0.83 as originally suggested) because 0.83
fails to compile against Rust 1.94.1 due to non-exhaustive match patterns
in gix-hash. 0.70 resolves cleanly and has identical head_name/head_id API.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Four optional, serde-skipped-when-None fields added to `Metadata` for
code ingest context. All 11 downstream construction sites patched with
`repo: None, git_branch: None, git_commit: None, code_lang: None`.
Full workspace check (`--tests`) and per-crate test suite pass clean.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds five new u32 counters (skipped_gitignore, skipped_kebabignore,
skipped_builtin_blacklist, skipped_generated, skipped_size_exceeded)
and a SkipExamples struct (≤5 sample paths per category) to
IngestReport. All new fields are #[serde(default)] for backward-compat
deserialization. Downstream literal construction sites patched with
zeros/empty; snapshot re-baked.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wire two new optional fields onto SearchHit (skip_serializing_if = None)
and two Vec<String> filter fields onto SearchFilters (serde default).
Add RetrievalDetail::Default impl (manual, uses SearchMode::Hybrid as
sentinel). Patch all downstream SearchHit / SearchFilters literal
constructors with repo: None / code_lang: None / vec![] as appropriate.
Also covers Citation::Code arm in kebab-eval metrics match.
Task 1: Add multilingual-e5-large arm to kebab-embed-local::resolve_model with tests for 1024-dim variants and error cases.
Task 2: Flip kebab-config defaults from e5-small (384-dim) to e5-large (1024-dim) across defaults(), test assertions, and TOML template.
All tests pass; clippy clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kebab eval compare now surfaces precision_at_k_chunk delta in both
human-readable table + deltas JSON. Snapshot fixture regenerated
additively.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Module doc still pinned "rag-v1" — update to reflect dispatched
template via system_prompt_for (rag-v1 legacy / rag-v2 default).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add missing score_kind field to SearchHit constructors in:
- kebab-tui/tests/search.rs::make_hit()
- kebab-eval/tests/metrics_and_compare.rs::hit()
- kebab-eval/src/metrics.rs::hit()
All test fixtures default to Rrf (hybrid mode), matching the field's
Default impl and the test semantics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit unblocks Tasks 3 and 4 of fb-38:
- VectorRetriever::build_hit now labels hits with ScoreKind::Cosine
- Hybrid retriever test helpers (mk_hit functions) label synthetic hits with ScoreKind::Rrf
- Updated lexical snapshot fixture to reflect new score_kind field in output
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add ScoreKind::Bm25 to LexicalRetriever::build_hit SearchHit construction
- Import ScoreKind from kebab_core in lexical.rs
- Add integration test lexical_retriever_hits_carry_bm25_score_kind to verify all
hits from LexicalRetriever carry score_kind == ScoreKind::Bm25
- Update lexical snapshot test baseline to include new score_kind field
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- doc TraceFusionInput.fusion_score semantics (single-mode vs hybrid)
- comment why total_ms vs stage sum can drift (millis truncation)
- TODO marker on TUI trace popup filter passthrough
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Also fixes App::search_with_opts trace branch to use NoopRetriever
for SearchMode::Lexical, removing the embeddings requirement when
the user only wants lexical-mode trace.
Adds the `trace: Option<SearchTrace>` field to `SearchResponse` and
threads `SearchOpts.trace` through `App::search_with_opts`. When the
caller sets `opts.trace = true` the path bypasses the LRU search cache
and runs through `HybridRetriever::search_with_trace`, which dispatches
all 3 SearchModes internally; this means `--trace` requires embeddings
(same constraint as `--mode hybrid`). The non-trace path keeps its
exact prior behavior with `trace: None` stamped on the response.
Picked up Task 1 / Task 3 follow-ups in the same commit so the
workspace compiles: SearchOpts struct-literals in kebab-cli/main.rs +
kebab-mcp/tools/search.rs default the new `trace` field to false, and
the schema-wrapper test in kebab-cli/wire.rs fills the new
media_breakdown / lang_breakdown / index_bytes / stale_doc_count fields
on Stats with `Default::default()`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends CountSummary with media_breakdown, lang_breakdown, stale_doc_count
fields populated via stats_ext::breakdowns(). Adds count_summary_with_threshold
for callers that need real stale counts. Mirrors all new fields onto the
wire-bound Stats struct in kebab-app::schema with #[serde(default)] for
backwards-compat. Also fixes search_budget_integration.rs for the trace field
added to SearchOpts in Task 1.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ingested_after: convert OffsetDateTime to UTC before formatting
so non-Z offsets compare correctly against UTC TEXT storage
(lexical.rs + filters.rs)
- README: --tag is repeatable-only, not csv (only --media is csv)
- test(cli): add multi-value --tag OR-within IN-list coverage
- test(store): add UTC-offset regression test for ingested_after
- mcp: use ERROR_V1_ID const instead of hardcoded "error.v1"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7 new optional inputs on SearchInput: tags, lang, path_glob,
trust_min, media, ingested_after, doc_id. Validation surfaces as
error.v1 code = invalid_input via StructuredError. Dispatch builds
SearchFilters from the inputs and forwards through the existing
search_with_opts_with_config facade.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7 new flags: --tag (repeatable), --lang, --path-glob,
--trust-min (value_enum), --media (csv with `md` alias),
--ingested-after (RFC3339; config_invalid on parse fail),
--doc-id. Dispatch translates clap values into SearchFilters
and propagates structured errors through the existing
StructuredError wrapper from fb-34.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
filter_chunks helper in kebab-store-sqlite extended with the same 3
WHERE clauses as lexical. Vector still over-fetches k*2 then
post-filters via SqliteStore::filter_chunks; small k can return < k
hits when filters drop a lot — agent is expected to widen k or
paginate. AND combinator with existing filters.
- kebab-store-sqlite/src/filters.rs: media IN-list subquery, ingested_after
lexicographic >= compare, doc_id equality; mirrors lexical SQL arms
- 3 direct unit tests (filter_chunks_media_type/ingested_after/doc_id)
that run without AVX/Lance
- common/mod.rs: insert_doc / insert_doc_with_media / run_vector_search
helpers on HybridEnv for integration-test use
- hybrid.rs: 2 new #[ignore = "requires AVX..."] integration tests
(vector_filter_by_media, vector_filter_by_doc_id)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per code review on 2c80e2a. manual-repeat-n lint triggers
for Rust 1.94+ when repeat().take() can be expressed as
repeat_n directly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>