Merge pull request 'feat(fb-36): search filter args (--media / --ingested-after / --doc-id + 4 existing)' (#127) from feat/fb-36-search-filters into main
Reviewed-on: #127
This commit was merged in pull request #127.
This commit is contained in:
@@ -71,7 +71,7 @@ kebab doctor
|
||||
|------|------|
|
||||
| `kebab init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 |
|
||||
| `kebab ingest [<path>]` | Markdown / 이미지 / PDF 색인 (idempotent). TTY 에서는 stderr 진행 바, non-TTY (CI / pipe) 는 stderr 한 줄씩, `--json` 은 stdout 에 `ingest_progress.v1` 라인 streaming 후 마지막에 `ingest_report.v1`. Ctrl-C 한 번이면 현재 asset 마무리 후 abort (부분 commit 보존, idempotent re-run), 두 번째 Ctrl-C 는 hard exit. Markdown title 이 frontmatter 에 없어도 첫 H1 → H2 → 첫 paragraph 80 자 → 파일명 순으로 자동 채움 (parser_version `md-frontmatter-v2`) — 기존 색인된 doc 도 다음 ingest 에서 새 title 로 갱신. **Incremental** (p9-fb-23): 두 번째 이후의 ingest 는 변하지 않은 doc (blake3 + parser/chunker/embedder version 모두 동일) 의 parse/chunk/embed/vector upsert 를 자동 스킵. final summary 에 `N unchanged` 카운트 표시. `--force-reingest` 로 skip 무시 강제 재처리. **지원 형식** (extractor 자동 결정 — config 에 명시 불가): Markdown (`.md`), 이미지 (`.png` / `.jpg` / `.jpeg`, OCR + caption), PDF (`.pdf`). 다른 확장자는 자동 skip — `IngestItem.warnings` 에 사유 (`"unsupported media type: .docx"` 등), `IngestReport.skipped_by_extension` 에 카운트 분류, CLI / TUI summary 에 breakdown 표시. |
|
||||
| `kebab search --mode {lexical,vector,hybrid} "<query>" [--no-cache] [--max-tokens N] [--snippet-chars N] [--cursor <opaque>]` | 검색. hybrid는 RRF fusion, citation 포함. 같은 process 안에서 동일 query (NFKC + trim + lowercase 정규화) 반복 시 in-process LRU 캐시 hit (capacity = `[search] cache_capacity`, default 256). `--no-cache` 로 강제 bypass — 디버깅용. ingest commit 발생 시 `kv['corpus_revision']` bump 으로 모든 entry 자동 stale. **`--max-tokens` / `--snippet-chars` / `--cursor` (p9-fb-34)** — agent budget controls. `--json` 출력은 `search_response.v1` wrapper (`{hits, next_cursor, truncated}`) — pre-fb-34 의 bare array 와 호환 안 됨. mismatched cursor → `error.v1.code = stale_cursor` |
|
||||
| `kebab search --mode {lexical,vector,hybrid} "<query>" [--no-cache] [--max-tokens N] [--snippet-chars N] [--cursor <opaque>] [--tag T] [--lang L] [--path-glob G] [--trust-min LEVEL] [--media TYPE] [--ingested-after RFC3339] [--doc-id ID]` | 검색. hybrid는 RRF fusion, citation 포함. 같은 process 안에서 동일 query (NFKC + trim + lowercase 정규화) 반복 시 in-process LRU 캐시 hit (capacity = `[search] cache_capacity`, default 256). `--no-cache` 로 강제 bypass — 디버깅용. ingest commit 발생 시 `kv['corpus_revision']` bump 으로 모든 entry 자동 stale. **`--max-tokens` / `--snippet-chars` / `--cursor` (p9-fb-34)** — agent budget controls. `--json` 출력은 `search_response.v1` wrapper (`{hits, next_cursor, truncated}`) — pre-fb-34 의 bare array 와 호환 안 됨. mismatched cursor → `error.v1.code = stale_cursor`. **filter flags (p9-fb-36):** `--tag` 는 반복 가능 flag (`--tag rust --tag async`) 로 OR 매칭, `--media` 는 `,` 구분 다중 값 OR 매칭, 나머지 flags 간은 AND 조합. `--trust-min` 은 `primary\|secondary\|generated` 중 하나 (해당 level 이상 포함). `--ingested-after` 는 RFC3339 UTC — 파싱 실패 시 `error.v1.code = config_invalid` (exit 2). `--media md` 는 `markdown` alias 로 정규화. 알 수 없는 `--media` 값은 무조건 empty hits (오류 아님). |
|
||||
| `kebab list docs` | 색인된 문서 목록 |
|
||||
| `kebab inspect doc <id>` / `kebab inspect chunk <id>` | raw record 보기 |
|
||||
| `kebab fetch chunk <id> [--context N]` / `kebab fetch doc <id> [--max-tokens N]` / `kebab fetch span <doc_id> <ls> <le> [--max-tokens N]` | (p9-fb-35) verbatim text fetch from indexed corpus. wire = `fetch_result.v1` (kind discriminator). chunk: target + ±N ordinal-context chunks. doc: full normalized markdown. span: 1-based line range (PDF/audio rejected as `error.v1.code = span_not_supported`). chars/4 budget on doc/span. |
|
||||
|
||||
@@ -131,6 +131,38 @@ enum Cmd {
|
||||
/// `corpus_revision` returns `error.v1.code = stale_cursor`.
|
||||
#[arg(long)]
|
||||
cursor: Option<String>,
|
||||
|
||||
/// p9-fb-36: filter by `metadata.tags`. Repeatable; OR-within (any tag).
|
||||
#[arg(long)]
|
||||
tag: Vec<String>,
|
||||
|
||||
/// p9-fb-36: filter by `documents.lang` (ISO code).
|
||||
#[arg(long)]
|
||||
lang: Option<String>,
|
||||
|
||||
/// p9-fb-36: filter by `documents.workspace_path` glob.
|
||||
#[arg(long)]
|
||||
path_glob: Option<String>,
|
||||
|
||||
/// p9-fb-36: filter by minimum `documents.trust_level`.
|
||||
#[arg(long, value_enum)]
|
||||
trust_min: Option<TrustLevelFlag>,
|
||||
|
||||
/// p9-fb-36: filter by `assets.media_type` kind. Comma-separated.
|
||||
/// Aliases: `md` → `markdown`. Other accepted: `markdown`, `pdf`,
|
||||
/// `image`, `audio`, `other`. Unknown values match nothing.
|
||||
#[arg(long, value_delimiter = ',')]
|
||||
media: Vec<String>,
|
||||
|
||||
/// p9-fb-36: filter to docs whose `updated_at` is >= this RFC3339
|
||||
/// timestamp (UTC). Invalid format → exit 2 with error.v1
|
||||
/// code = config_invalid.
|
||||
#[arg(long)]
|
||||
ingested_after: Option<String>,
|
||||
|
||||
/// p9-fb-36: filter to a single doc by id.
|
||||
#[arg(long)]
|
||||
doc_id: Option<String>,
|
||||
},
|
||||
|
||||
/// Retrieval-augmented question answering.
|
||||
@@ -351,6 +383,25 @@ impl From<ModeFlag> for kebab_core::SearchMode {
|
||||
}
|
||||
}
|
||||
|
||||
/// p9-fb-36: clap value enum for `--trust-min`. Maps to
|
||||
/// `kebab_core::TrustLevel` via `From`.
|
||||
#[derive(clap::ValueEnum, Clone, Debug)]
|
||||
enum TrustLevelFlag {
|
||||
Primary,
|
||||
Secondary,
|
||||
Generated,
|
||||
}
|
||||
|
||||
impl From<TrustLevelFlag> for kebab_core::TrustLevel {
|
||||
fn from(f: TrustLevelFlag) -> Self {
|
||||
match f {
|
||||
TrustLevelFlag::Primary => kebab_core::TrustLevel::Primary,
|
||||
TrustLevelFlag::Secondary => kebab_core::TrustLevel::Secondary,
|
||||
TrustLevelFlag::Generated => kebab_core::TrustLevel::Generated,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Parse boolean env var accepting "1", "true", "yes", "on" (case-insensitive)
|
||||
/// as truthy; "0", "false", "no", "off" as falsy. Used for `KEBAB_READONLY`.
|
||||
fn parse_bool_env(s: &str) -> Result<bool, String> {
|
||||
@@ -611,13 +662,71 @@ fn run(cli: &Cli) -> anyhow::Result<()> {
|
||||
max_tokens,
|
||||
snippet_chars,
|
||||
cursor,
|
||||
tag,
|
||||
lang,
|
||||
path_glob,
|
||||
trust_min,
|
||||
media,
|
||||
ingested_after,
|
||||
doc_id,
|
||||
} => {
|
||||
let cfg = kebab_config::Config::load(cli.config.as_deref())?;
|
||||
|
||||
// p9-fb-36: normalize --media aliases (md → markdown).
|
||||
fn normalize_media_alias(s: &str) -> String {
|
||||
match s.to_ascii_lowercase().as_str() {
|
||||
"md" => "markdown".to_string(),
|
||||
other => other.to_string(),
|
||||
}
|
||||
}
|
||||
let media_norm: Vec<String> =
|
||||
media.iter().map(|s| normalize_media_alias(s)).collect();
|
||||
|
||||
// p9-fb-36: parse --ingested-after as RFC3339; structured error on failure.
|
||||
let ingested_after_parsed: Option<time::OffsetDateTime> =
|
||||
match ingested_after.as_deref() {
|
||||
Some(s) => {
|
||||
match time::OffsetDateTime::parse(
|
||||
s,
|
||||
&time::format_description::well_known::Rfc3339,
|
||||
) {
|
||||
Ok(ts) => Some(ts),
|
||||
Err(e) => {
|
||||
return Err(anyhow::Error::new(
|
||||
kebab_app::StructuredError(kebab_app::ErrorV1 {
|
||||
schema_version: kebab_app::ERROR_V1_ID.to_string(),
|
||||
code: "config_invalid".to_string(),
|
||||
message: format!(
|
||||
"--ingested-after: invalid RFC3339 timestamp '{s}': {e}"
|
||||
),
|
||||
details: serde_json::Value::Null,
|
||||
hint: Some(
|
||||
"expected format like 2026-04-01T00:00:00Z".to_string(),
|
||||
),
|
||||
}),
|
||||
));
|
||||
}
|
||||
}
|
||||
}
|
||||
None => None,
|
||||
};
|
||||
|
||||
// p9-fb-36: build SearchFilters from the 7 new flags.
|
||||
let filters = kebab_core::SearchFilters {
|
||||
tags_any: tag.clone(),
|
||||
lang: lang.as_ref().map(|s| kebab_core::Lang(s.clone())),
|
||||
path_glob: path_glob.clone(),
|
||||
trust_min: trust_min.clone().map(Into::into),
|
||||
media: media_norm,
|
||||
ingested_after: ingested_after_parsed,
|
||||
doc_id: doc_id.as_ref().map(|s| kebab_core::DocumentId(s.clone())),
|
||||
};
|
||||
|
||||
let q = kebab_core::SearchQuery {
|
||||
text: query.clone(),
|
||||
mode: (*mode).into(),
|
||||
k: *k,
|
||||
filters: kebab_core::SearchFilters::default(),
|
||||
filters,
|
||||
};
|
||||
let opts = kebab_core::SearchOpts {
|
||||
max_tokens: *max_tokens,
|
||||
|
||||
306
crates/kebab-cli/tests/wire_search_filters.rs
Normal file
306
crates/kebab-cli/tests/wire_search_filters.rs
Normal file
@@ -0,0 +1,306 @@
|
||||
//! p9-fb-36: CLI integration tests for search filter flags.
|
||||
//!
|
||||
//! Lexical-only — no fastembed / no Ollama. Each test builds its own
|
||||
//! TempDir KB via `common::write_config` + `common::ingest` and drives
|
||||
//! `kebab search` through `common::run_search_with_args` or direct
|
||||
//! `Command` invocations. Verifies:
|
||||
//!
|
||||
//! - `--doc-id <id>` restricts all returned hits to the target document.
|
||||
//! - `--ingested-after <bad>` exits non-zero and emits `error.v1` on
|
||||
//! stderr with `code = "config_invalid"`.
|
||||
//! - `--media md` (alias) normalises to `markdown` and matches `.md` docs.
|
||||
//! - `--tag <tag>` (repeatable, OR-within) filters by frontmatter tags.
|
||||
|
||||
mod common;
|
||||
|
||||
use serde_json::Value;
|
||||
use std::fs;
|
||||
use std::process::Command;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 1: --doc-id restricts hits to a single document
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn search_with_doc_id_filter_returns_only_target_doc() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let (cfg, workspace, _data) = common::write_config(dir.path(), 30);
|
||||
|
||||
// Two docs that both contain the search term.
|
||||
fs::write(workspace.join("a.md"), "# Alpha\n\nrust ownership rules\n").unwrap();
|
||||
fs::write(workspace.join("b.md"), "# Beta\n\nrust borrow checker\n").unwrap();
|
||||
common::ingest(&cfg, &workspace);
|
||||
|
||||
// First, search without a doc-id filter to find what doc_ids exist.
|
||||
let (stdout, _) = common::run_search_with_args(
|
||||
&cfg,
|
||||
&["--json", "--mode", "lexical", "rust"],
|
||||
);
|
||||
let resp: Value = serde_json::from_str(stdout.trim())
|
||||
.unwrap_or_else(|e| panic!("not JSON: {stdout:?}: {e}"));
|
||||
let hits = resp["hits"].as_array().expect("hits array");
|
||||
assert!(
|
||||
hits.len() >= 2,
|
||||
"expected ≥2 hits from two docs before filter: {resp}"
|
||||
);
|
||||
|
||||
// Grab one doc_id from the results.
|
||||
let target_doc_id = hits[0]["doc_id"]
|
||||
.as_str()
|
||||
.expect("doc_id string")
|
||||
.to_string();
|
||||
|
||||
// Re-search with --doc-id set to the first hit's doc_id.
|
||||
let (stdout2, _) = common::run_search_with_args(
|
||||
&cfg,
|
||||
&[
|
||||
"--json",
|
||||
"--mode",
|
||||
"lexical",
|
||||
"--doc-id",
|
||||
&target_doc_id,
|
||||
"rust",
|
||||
],
|
||||
);
|
||||
let resp2: Value = serde_json::from_str(stdout2.trim())
|
||||
.unwrap_or_else(|e| panic!("not JSON after filter: {stdout2:?}: {e}"));
|
||||
let filtered_hits = resp2["hits"].as_array().expect("hits array (filtered)");
|
||||
|
||||
assert!(
|
||||
!filtered_hits.is_empty(),
|
||||
"expected at least one hit for the target doc"
|
||||
);
|
||||
for hit in filtered_hits {
|
||||
let got = hit["doc_id"].as_str().expect("doc_id string in hit");
|
||||
assert_eq!(
|
||||
got, target_doc_id,
|
||||
"--doc-id filter must restrict all hits to target doc, got {got}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 2: --ingested-after with bad RFC3339 → exit non-zero + error.v1
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn search_with_invalid_ingested_after_emits_config_invalid() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let (cfg, workspace, _data) = common::write_config(dir.path(), 30);
|
||||
fs::write(workspace.join("a.md"), "# T\n\nrust stuff\n").unwrap();
|
||||
common::ingest(&cfg, &workspace);
|
||||
|
||||
let bin = env!("CARGO_BIN_EXE_kebab");
|
||||
let out = Command::new(bin)
|
||||
.args([
|
||||
"--config",
|
||||
cfg.to_str().unwrap(),
|
||||
"--json",
|
||||
"search",
|
||||
"--mode",
|
||||
"lexical",
|
||||
"--ingested-after",
|
||||
"not-a-date",
|
||||
"rust",
|
||||
])
|
||||
.output()
|
||||
.expect("kebab search --ingested-after bad");
|
||||
|
||||
assert!(
|
||||
!out.status.success(),
|
||||
"expected non-zero exit for invalid --ingested-after, got: status={} stderr={}",
|
||||
out.status,
|
||||
String::from_utf8_lossy(&out.stderr)
|
||||
);
|
||||
|
||||
let stderr = String::from_utf8_lossy(&out.stderr);
|
||||
// Find the error.v1 ndjson line on stderr (one JSON event per line).
|
||||
let err_line = stderr
|
||||
.lines()
|
||||
.find(|l| {
|
||||
serde_json::from_str::<Value>(l)
|
||||
.ok()
|
||||
.and_then(|v| {
|
||||
v.get("schema_version")
|
||||
.and_then(|s| s.as_str())
|
||||
.map(String::from)
|
||||
})
|
||||
.as_deref()
|
||||
== Some("error.v1")
|
||||
})
|
||||
.unwrap_or_else(|| panic!("no error.v1 line on stderr: {stderr:?}"));
|
||||
|
||||
let v: Value = serde_json::from_str(err_line).expect("error.v1 json");
|
||||
assert_eq!(
|
||||
v["code"], "config_invalid",
|
||||
"code must be config_invalid for bad RFC3339: {err_line}"
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 3: --media md (alias) normalises to markdown and matches .md docs
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn search_with_media_filter_md_alias_normalizes_to_markdown() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let (cfg, workspace, _data) = common::write_config(dir.path(), 30);
|
||||
|
||||
// Only a markdown file — the `md` alias should match it.
|
||||
fs::write(workspace.join("notes.md"), "# Notes\n\nrust async programming\n").unwrap();
|
||||
common::ingest(&cfg, &workspace);
|
||||
|
||||
let (stdout, _) = common::run_search_with_args(
|
||||
&cfg,
|
||||
&["--json", "--mode", "lexical", "--media", "md", "rust"],
|
||||
);
|
||||
let resp: Value = serde_json::from_str(stdout.trim())
|
||||
.unwrap_or_else(|e| panic!("not JSON: {stdout:?}: {e}"));
|
||||
let hits = resp["hits"].as_array().expect("hits array");
|
||||
|
||||
assert!(
|
||||
!hits.is_empty(),
|
||||
"--media md must match the markdown doc; got 0 hits: {resp}"
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 4: --tag (repeatable, OR-within) filters by frontmatter tags
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn search_with_tag_filter_matches_frontmatter_tags() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let (cfg, workspace, _data) = common::write_config(dir.path(), 30);
|
||||
|
||||
// Doc with `rust` tag.
|
||||
fs::write(
|
||||
workspace.join("rust_doc.md"),
|
||||
"---\ntags: [rust, systems]\n---\n# Rust\n\nrust ownership\n",
|
||||
)
|
||||
.unwrap();
|
||||
// Doc without the tag (but same keyword in body so it appears in
|
||||
// unfiltered results — the tag filter must exclude it).
|
||||
fs::write(
|
||||
workspace.join("other_doc.md"),
|
||||
"# Other\n\nrust programming\n",
|
||||
)
|
||||
.unwrap();
|
||||
common::ingest(&cfg, &workspace);
|
||||
|
||||
// Without filter — both docs must produce hits.
|
||||
let (unfiltered, _) = common::run_search_with_args(
|
||||
&cfg,
|
||||
&["--json", "--mode", "lexical", "rust"],
|
||||
);
|
||||
let uresp: Value = serde_json::from_str(unfiltered.trim())
|
||||
.unwrap_or_else(|e| panic!("not JSON (unfiltered): {unfiltered:?}: {e}"));
|
||||
let uhits = uresp["hits"].as_array().expect("unfiltered hits array");
|
||||
assert!(
|
||||
uhits.len() >= 2,
|
||||
"expected ≥2 hits before tag filter: {uresp}"
|
||||
);
|
||||
|
||||
// With --tag rust — only the tagged doc's hits should appear.
|
||||
let (filtered, _) = common::run_search_with_args(
|
||||
&cfg,
|
||||
&["--json", "--mode", "lexical", "--tag", "rust", "rust"],
|
||||
);
|
||||
let fresp: Value = serde_json::from_str(filtered.trim())
|
||||
.unwrap_or_else(|e| panic!("not JSON (tag-filtered): {filtered:?}: {e}"));
|
||||
let fhits = fresp["hits"].as_array().expect("filtered hits array");
|
||||
|
||||
assert!(
|
||||
!fhits.is_empty(),
|
||||
"--tag rust must match the tagged doc; got 0 hits: {fresp}"
|
||||
);
|
||||
|
||||
// Every returned hit must come from rust_doc.md (the tagged file).
|
||||
for hit in fhits {
|
||||
let path = hit["doc_path"].as_str().unwrap_or("");
|
||||
assert!(
|
||||
path.ends_with("rust_doc.md"),
|
||||
"--tag rust must only return hits from the tagged doc, got path={path}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test 5: --tag is repeatable (OR-within); two --tag values form an IN-list
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn search_with_two_tag_filters_returns_or_within_tags() {
|
||||
// Two docs with different tag sets:
|
||||
// a.md → tags: [rust]
|
||||
// b.md → tags: [async]
|
||||
// c.md → no tags (but same keyword in body)
|
||||
// Search with --tag rust --tag async (OR within --tag).
|
||||
// Expect a.md and b.md, not c.md.
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let (cfg, workspace, _data) = common::write_config(dir.path(), 30);
|
||||
|
||||
fs::write(
|
||||
workspace.join("a.md"),
|
||||
"---\ntags: [rust]\n---\n# A\n\nrust systems programming\n",
|
||||
)
|
||||
.unwrap();
|
||||
fs::write(
|
||||
workspace.join("b.md"),
|
||||
"---\ntags: [async]\n---\n# B\n\nrust async programming\n",
|
||||
)
|
||||
.unwrap();
|
||||
fs::write(workspace.join("c.md"), "# C\n\nrust programming\n").unwrap();
|
||||
common::ingest(&cfg, &workspace);
|
||||
|
||||
// Without filter: all three docs produce hits.
|
||||
let (unfiltered, _) = common::run_search_with_args(
|
||||
&cfg,
|
||||
&["--json", "--mode", "lexical", "rust"],
|
||||
);
|
||||
let uresp: Value = serde_json::from_str(unfiltered.trim())
|
||||
.unwrap_or_else(|e| panic!("not JSON (unfiltered): {unfiltered:?}: {e}"));
|
||||
let uhits = uresp["hits"].as_array().expect("unfiltered hits array");
|
||||
assert!(
|
||||
uhits.len() >= 3,
|
||||
"expected ≥3 hits before tag filter: {uresp}"
|
||||
);
|
||||
|
||||
// With --tag rust --tag async: only a.md and b.md should appear.
|
||||
let (filtered, _) = common::run_search_with_args(
|
||||
&cfg,
|
||||
&[
|
||||
"--json", "--mode", "lexical",
|
||||
"--tag", "rust",
|
||||
"--tag", "async",
|
||||
"rust",
|
||||
],
|
||||
);
|
||||
let fresp: Value = serde_json::from_str(filtered.trim())
|
||||
.unwrap_or_else(|e| panic!("not JSON (two-tag-filtered): {filtered:?}: {e}"));
|
||||
let fhits = fresp["hits"].as_array().expect("filtered hits array");
|
||||
|
||||
assert!(
|
||||
!fhits.is_empty(),
|
||||
"--tag rust --tag async must return hits from tagged docs; got 0: {fresp}"
|
||||
);
|
||||
|
||||
// c.md must not appear — it has no tags.
|
||||
for hit in fhits {
|
||||
let path = hit["doc_path"].as_str().unwrap_or("");
|
||||
assert!(
|
||||
path.ends_with("a.md") || path.ends_with("b.md"),
|
||||
"--tag rust --tag async must only return a.md or b.md, got path={path}"
|
||||
);
|
||||
}
|
||||
|
||||
// Both a.md and b.md must appear (OR, not AND).
|
||||
let paths: Vec<&str> = fhits
|
||||
.iter()
|
||||
.filter_map(|h| h["doc_path"].as_str())
|
||||
.collect();
|
||||
let has_a = paths.iter().any(|p| p.ends_with("a.md"));
|
||||
let has_b = paths.iter().any(|p| p.ends_with("b.md"));
|
||||
assert!(has_a, "--tag rust must include a.md (rust-tagged): paths={paths:?}");
|
||||
assert!(has_b, "--tag async must include b.md (async-tagged): paths={paths:?}");
|
||||
}
|
||||
@@ -26,12 +26,30 @@ pub struct SearchQuery {
|
||||
pub filters: SearchFilters,
|
||||
}
|
||||
|
||||
/// p9-fb-36: canonical kind labels for `SearchFilters.media`. Mirrors
|
||||
/// `MediaType` variant tags; CLI / MCP normalize aliases (`md` → `markdown`)
|
||||
/// before populating this Vec.
|
||||
pub const MEDIA_KINDS: &[&str] = &["markdown", "pdf", "image", "audio", "other"];
|
||||
|
||||
#[derive(Clone, Debug, Default, PartialEq, Serialize, Deserialize)]
|
||||
pub struct SearchFilters {
|
||||
pub tags_any: Vec<String>,
|
||||
pub lang: Option<Lang>,
|
||||
pub path_glob: Option<String>,
|
||||
pub trust_min: Option<TrustLevel>,
|
||||
/// p9-fb-36: media_type filter — IN-list of `MediaType.kind`
|
||||
/// strings (`"markdown"`, `"pdf"`, `"image"`, `"audio"`, `"other"`).
|
||||
/// Empty Vec = no filter. Match is on the variant tag only;
|
||||
/// e.g. `["image"]` matches `Image(Png)` and `Image(Jpeg)`.
|
||||
#[serde(default)]
|
||||
pub media: Vec<String>,
|
||||
/// p9-fb-36: hits whose source doc's `documents.updated_at` is at
|
||||
/// or after this timestamp. None = no filter. RFC3339 / UTC.
|
||||
#[serde(default, with = "time::serde::rfc3339::option")]
|
||||
pub ingested_after: Option<OffsetDateTime>,
|
||||
/// p9-fb-36: restrict hits to a single document. None = no filter.
|
||||
#[serde(default)]
|
||||
pub doc_id: Option<DocumentId>,
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
|
||||
@@ -155,4 +173,24 @@ mod tests {
|
||||
assert!(opts.snippet_chars.is_none());
|
||||
assert!(opts.cursor.is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn search_filters_default_includes_new_fb36_fields() {
|
||||
let f = SearchFilters::default();
|
||||
assert!(f.media.is_empty(), "media default empty");
|
||||
assert!(f.ingested_after.is_none(), "ingested_after default None");
|
||||
assert!(f.doc_id.is_none(), "doc_id default None");
|
||||
assert!(f.tags_any.is_empty());
|
||||
assert!(f.lang.is_none());
|
||||
assert!(f.path_glob.is_none());
|
||||
assert!(f.trust_min.is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn search_filters_serialize_with_serde_default_compat() {
|
||||
let old: SearchFilters = serde_json::from_str(r#"{"tags_any":[],"lang":null,"path_glob":null,"trust_min":null}"#).unwrap();
|
||||
assert!(old.media.is_empty());
|
||||
assert!(old.ingested_after.is_none());
|
||||
assert!(old.doc_id.is_none());
|
||||
}
|
||||
}
|
||||
|
||||
@@ -19,6 +19,8 @@ tracing = { workspace = true }
|
||||
# /dependencies endpoint — rmcp declares optional schemars = "^1.0").
|
||||
schemars = "1"
|
||||
|
||||
time = { workspace = true }
|
||||
|
||||
kebab-app = { path = "../kebab-app" }
|
||||
kebab-config = { path = "../kebab-config" }
|
||||
kebab-core = { path = "../kebab-core" }
|
||||
|
||||
@@ -1,5 +1,7 @@
|
||||
//! `search` tool — wraps `kebab_app::search_with_opts_with_config`.
|
||||
//! Input: { query, mode?, k?, max_tokens?, snippet_chars?, cursor? }.
|
||||
//! Input: { query, mode?, k?, max_tokens?, snippet_chars?, cursor?,
|
||||
//! tags?, lang?, path_glob?, trust_min?, media?,
|
||||
//! ingested_after?, doc_id? }.
|
||||
//! Output: search_response.v1 envelope (hits + next_cursor + truncated).
|
||||
//!
|
||||
//! First tool with a non-empty `inputSchema`: `SearchInput` derives
|
||||
@@ -10,6 +12,8 @@ use rmcp::model::CallToolResult;
|
||||
use schemars::JsonSchema;
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
use kebab_app::ERROR_V1_ID;
|
||||
|
||||
use crate::error::{to_tool_error, to_tool_success};
|
||||
use crate::state::KebabAppState;
|
||||
|
||||
@@ -27,6 +31,22 @@ pub struct SearchInput {
|
||||
pub snippet_chars: Option<usize>,
|
||||
/// p9-fb-34: opaque cursor from a previous response.
|
||||
pub cursor: Option<String>,
|
||||
/// p9-fb-36: filter by `metadata.tags` (OR-within).
|
||||
pub tags: Option<Vec<String>>,
|
||||
/// p9-fb-36: filter by `documents.lang` (ISO code).
|
||||
pub lang: Option<String>,
|
||||
/// p9-fb-36: filter by `documents.workspace_path` glob.
|
||||
pub path_glob: Option<String>,
|
||||
/// p9-fb-36: filter by minimum `documents.trust_level`.
|
||||
/// Accepts: `"primary"`, `"secondary"`, `"generated"`.
|
||||
pub trust_min: Option<String>,
|
||||
/// p9-fb-36: filter by `assets.media_type` kind. IN-list. Accepts:
|
||||
/// `"markdown"`, `"pdf"`, `"image"`, `"audio"`, `"other"`. Aliases: `md` → `markdown`.
|
||||
pub media: Option<Vec<String>>,
|
||||
/// p9-fb-36: RFC3339 UTC timestamp. Invalid format → invalid_input.
|
||||
pub ingested_after: Option<String>,
|
||||
/// p9-fb-36: filter to a single doc.
|
||||
pub doc_id: Option<String>,
|
||||
}
|
||||
|
||||
pub fn handle(state: &KebabAppState, input: SearchInput) -> CallToolResult {
|
||||
@@ -37,11 +57,62 @@ pub fn handle(state: &KebabAppState, input: SearchInput) -> CallToolResult {
|
||||
"vector" => kebab_core::SearchMode::Vector,
|
||||
_ => kebab_core::SearchMode::Hybrid,
|
||||
};
|
||||
|
||||
// p9-fb-36: parse filter inputs, returning invalid_input on bad values.
|
||||
let trust_min = match input.trust_min.as_deref() {
|
||||
Some(s) => match s.to_ascii_lowercase().as_str() {
|
||||
"primary" => Some(kebab_core::TrustLevel::Primary),
|
||||
"secondary" => Some(kebab_core::TrustLevel::Secondary),
|
||||
"generated" => Some(kebab_core::TrustLevel::Generated),
|
||||
other => {
|
||||
return invalid_input(&format!(
|
||||
"trust_min: unknown level '{other}'; expected primary|secondary|generated"
|
||||
));
|
||||
}
|
||||
},
|
||||
None => None,
|
||||
};
|
||||
|
||||
let ingested_after = match input.ingested_after.as_deref() {
|
||||
Some(s) => {
|
||||
match time::OffsetDateTime::parse(
|
||||
s,
|
||||
&time::format_description::well_known::Rfc3339,
|
||||
) {
|
||||
Ok(ts) => Some(ts),
|
||||
Err(e) => {
|
||||
return invalid_input(&format!(
|
||||
"ingested_after: invalid RFC3339 '{s}': {e}"
|
||||
));
|
||||
}
|
||||
}
|
||||
}
|
||||
None => None,
|
||||
};
|
||||
|
||||
let media: Vec<String> = input
|
||||
.media
|
||||
.clone()
|
||||
.unwrap_or_default()
|
||||
.iter()
|
||||
.map(|s| normalize_media_alias(s))
|
||||
.collect();
|
||||
|
||||
let filters = kebab_core::SearchFilters {
|
||||
tags_any: input.tags.clone().unwrap_or_default(),
|
||||
lang: input.lang.clone().map(kebab_core::Lang),
|
||||
path_glob: input.path_glob.clone(),
|
||||
trust_min,
|
||||
media,
|
||||
ingested_after,
|
||||
doc_id: input.doc_id.clone().map(kebab_core::DocumentId),
|
||||
};
|
||||
|
||||
let query = kebab_core::SearchQuery {
|
||||
text: input.query,
|
||||
mode,
|
||||
k,
|
||||
filters: kebab_core::SearchFilters::default(),
|
||||
filters,
|
||||
};
|
||||
let opts = kebab_core::SearchOpts {
|
||||
max_tokens: input.max_tokens,
|
||||
@@ -81,3 +152,22 @@ pub fn handle(state: &KebabAppState, input: SearchInput) -> CallToolResult {
|
||||
Err(e) => to_tool_error(&e),
|
||||
}
|
||||
}
|
||||
|
||||
fn normalize_media_alias(s: &str) -> String {
|
||||
match s.to_ascii_lowercase().as_str() {
|
||||
"md" => "markdown".to_string(),
|
||||
other => other.to_string(),
|
||||
}
|
||||
}
|
||||
|
||||
fn invalid_input(msg: &str) -> CallToolResult {
|
||||
use kebab_app::{ErrorV1, StructuredError};
|
||||
let err = anyhow::Error::new(StructuredError(ErrorV1 {
|
||||
schema_version: ERROR_V1_ID.to_string(),
|
||||
code: "invalid_input".to_string(),
|
||||
message: msg.to_string(),
|
||||
details: serde_json::Value::Null,
|
||||
hint: None,
|
||||
}));
|
||||
to_tool_error(&err)
|
||||
}
|
||||
|
||||
@@ -62,6 +62,13 @@ async fn fetch_tool_chunk_returns_fetch_result_v1() {
|
||||
max_tokens: None,
|
||||
snippet_chars: None,
|
||||
cursor: None,
|
||||
tags: None,
|
||||
lang: None,
|
||||
path_glob: None,
|
||||
trust_min: None,
|
||||
media: None,
|
||||
ingested_after: None,
|
||||
doc_id: None,
|
||||
},
|
||||
);
|
||||
let search_text = match &search_result.content.first().unwrap().raw {
|
||||
|
||||
@@ -58,6 +58,13 @@ async fn search_tool_returns_search_response_v1() {
|
||||
max_tokens: None,
|
||||
snippet_chars: None,
|
||||
cursor: None,
|
||||
tags: None,
|
||||
lang: None,
|
||||
path_glob: None,
|
||||
trust_min: None,
|
||||
media: None,
|
||||
ingested_after: None,
|
||||
doc_id: None,
|
||||
},
|
||||
);
|
||||
|
||||
@@ -108,3 +115,175 @@ async fn search_tool_returns_search_response_v1() {
|
||||
"envelope should carry next_cursor (possibly null)"
|
||||
);
|
||||
}
|
||||
|
||||
/// p9-fb-36: search with doc_id filter — only hits from the target doc.
|
||||
#[tokio::test]
|
||||
async fn search_with_doc_id_filter_returns_only_target() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let data_dir = dir.path().join("data");
|
||||
let workspace_root = dir.path().join("notes");
|
||||
fs::create_dir_all(&data_dir).unwrap();
|
||||
fs::create_dir_all(&workspace_root).unwrap();
|
||||
|
||||
let config = minimal_config(&data_dir, &workspace_root);
|
||||
|
||||
// Write two markdown documents, both containing the query term.
|
||||
fs::write(
|
||||
workspace_root.join("a.md"),
|
||||
"# Alpha\n\nThis document mentions kebab and flatbread.",
|
||||
)
|
||||
.unwrap();
|
||||
fs::write(
|
||||
workspace_root.join("b.md"),
|
||||
"# Beta\n\nAnother document about kebab wraps and fillings.",
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let scope = SourceScope {
|
||||
root: workspace_root.clone(),
|
||||
include: vec![],
|
||||
exclude: vec![],
|
||||
};
|
||||
let _ = kebab_app::ingest_with_config(config.clone(), scope, false).unwrap();
|
||||
|
||||
let state = KebabAppState::new(config, None);
|
||||
let handler = KebabHandler::new(state);
|
||||
|
||||
// First: unfiltered search to discover a doc_id from one of the docs.
|
||||
let unfiltered = kebab_mcp::tools::search::handle(
|
||||
handler.state(),
|
||||
kebab_mcp::tools::search::SearchInput {
|
||||
query: "kebab".to_string(),
|
||||
mode: Some("lexical".to_string()),
|
||||
k: Some(10),
|
||||
max_tokens: None,
|
||||
snippet_chars: None,
|
||||
cursor: None,
|
||||
tags: None,
|
||||
lang: None,
|
||||
path_glob: None,
|
||||
trust_min: None,
|
||||
media: None,
|
||||
ingested_after: None,
|
||||
doc_id: None,
|
||||
},
|
||||
);
|
||||
assert!(
|
||||
!unfiltered.is_error.unwrap_or(false),
|
||||
"unfiltered search failed: {:?}",
|
||||
unfiltered
|
||||
);
|
||||
let unfiltered_text = match &unfiltered.content.first().unwrap().raw {
|
||||
RawContent::Text(t) => t.text.clone(),
|
||||
other => panic!("expected text content, got {other:?}"),
|
||||
};
|
||||
let unfiltered_v: serde_json::Value = serde_json::from_str(&unfiltered_text).unwrap();
|
||||
let hits = unfiltered_v["hits"].as_array().expect("hits must be array");
|
||||
assert!(hits.len() >= 2, "expected hits from both docs");
|
||||
|
||||
// Pick the doc_id of the first hit.
|
||||
let target_doc_id = hits[0]["doc_id"]
|
||||
.as_str()
|
||||
.expect("doc_id on first hit")
|
||||
.to_string();
|
||||
|
||||
// Now search with doc_id filter — all results must belong to that doc.
|
||||
let filtered = kebab_mcp::tools::search::handle(
|
||||
handler.state(),
|
||||
kebab_mcp::tools::search::SearchInput {
|
||||
query: "kebab".to_string(),
|
||||
mode: Some("lexical".to_string()),
|
||||
k: Some(10),
|
||||
max_tokens: None,
|
||||
snippet_chars: None,
|
||||
cursor: None,
|
||||
tags: None,
|
||||
lang: None,
|
||||
path_glob: None,
|
||||
trust_min: None,
|
||||
media: None,
|
||||
ingested_after: None,
|
||||
doc_id: Some(target_doc_id.clone()),
|
||||
},
|
||||
);
|
||||
assert!(
|
||||
!filtered.is_error.unwrap_or(false),
|
||||
"filtered search failed: {:?}",
|
||||
filtered
|
||||
);
|
||||
let filtered_text = match &filtered.content.first().unwrap().raw {
|
||||
RawContent::Text(t) => t.text.clone(),
|
||||
other => panic!("expected text content, got {other:?}"),
|
||||
};
|
||||
let filtered_v: serde_json::Value = serde_json::from_str(&filtered_text).unwrap();
|
||||
let filtered_hits = filtered_v["hits"].as_array().expect("hits must be array");
|
||||
|
||||
assert!(
|
||||
!filtered_hits.is_empty(),
|
||||
"expected at least one hit for target doc"
|
||||
);
|
||||
for hit in filtered_hits {
|
||||
assert_eq!(
|
||||
hit["doc_id"].as_str(),
|
||||
Some(target_doc_id.as_str()),
|
||||
"all filtered hits must belong to the target doc"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/// p9-fb-36: invalid RFC3339 for ingested_after → invalid_input error.v1.
|
||||
#[tokio::test]
|
||||
async fn search_with_invalid_ingested_after_returns_invalid_input() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let data_dir = dir.path().join("data");
|
||||
let workspace_root = dir.path().join("notes");
|
||||
fs::create_dir_all(&data_dir).unwrap();
|
||||
fs::create_dir_all(&workspace_root).unwrap();
|
||||
|
||||
let config = minimal_config(&data_dir, &workspace_root);
|
||||
let state = KebabAppState::new(config, None);
|
||||
let handler = KebabHandler::new(state);
|
||||
|
||||
let result = kebab_mcp::tools::search::handle(
|
||||
handler.state(),
|
||||
kebab_mcp::tools::search::SearchInput {
|
||||
query: "kebab".to_string(),
|
||||
mode: None,
|
||||
k: None,
|
||||
max_tokens: None,
|
||||
snippet_chars: None,
|
||||
cursor: None,
|
||||
tags: None,
|
||||
lang: None,
|
||||
path_glob: None,
|
||||
trust_min: None,
|
||||
media: None,
|
||||
ingested_after: Some("garbage".to_string()),
|
||||
doc_id: None,
|
||||
},
|
||||
);
|
||||
|
||||
assert!(
|
||||
result.is_error.unwrap_or(false),
|
||||
"expected isError=true for invalid ingested_after"
|
||||
);
|
||||
let content = result
|
||||
.content
|
||||
.first()
|
||||
.expect("expected at least one content item");
|
||||
let text = match &content.raw {
|
||||
RawContent::Text(t) => &t.text,
|
||||
other => panic!("expected text content, got {other:?}"),
|
||||
};
|
||||
let v: serde_json::Value = serde_json::from_str(text).unwrap();
|
||||
assert_eq!(
|
||||
v.get("schema_version").and_then(|s| s.as_str()),
|
||||
Some("error.v1"),
|
||||
"must carry error.v1 envelope"
|
||||
);
|
||||
assert_eq!(
|
||||
v.get("code").and_then(|s| s.as_str()),
|
||||
Some("invalid_input"),
|
||||
"code must be invalid_input for bad RFC3339"
|
||||
);
|
||||
}
|
||||
|
||||
@@ -319,6 +319,54 @@ fn run_query(
|
||||
};
|
||||
params.push(Box::new(rank));
|
||||
}
|
||||
// p9-fb-36: media_type filter (IN-list).
|
||||
// `assets.media_type` JSON has two shapes:
|
||||
// - unit variant (Markdown / Pdf): JSON text, e.g. `"markdown"`
|
||||
// - tuple variant (Image(Png) / Audio(Mp3) / Other(s)): JSON object,
|
||||
// e.g. `{"image": "png"}`
|
||||
// Extract a unified "kind" string for both shapes via:
|
||||
// CASE WHEN json_type = 'text' THEN json_extract($)
|
||||
// ELSE (first object key)
|
||||
// END IN (?, ...)
|
||||
if !filters.media.is_empty() {
|
||||
let placeholders: Vec<&str> =
|
||||
std::iter::repeat_n("?", filters.media.len()).collect();
|
||||
let placeholders = placeholders.join(",");
|
||||
sql.push_str(&format!(
|
||||
" AND f.doc_id IN (\
|
||||
SELECT d2.doc_id FROM documents d2 \
|
||||
JOIN assets a ON a.asset_id = d2.asset_id \
|
||||
WHERE CASE \
|
||||
WHEN json_type(a.media_type) = 'text' THEN json_extract(a.media_type, '$') \
|
||||
ELSE (SELECT key FROM json_each(a.media_type) LIMIT 1) \
|
||||
END IN ({placeholders}))"
|
||||
));
|
||||
for kind in &filters.media {
|
||||
params.push(Box::new(kind.clone()));
|
||||
}
|
||||
}
|
||||
|
||||
// p9-fb-36: ingested_after filter.
|
||||
// `documents.updated_at` is RFC3339 stored as TEXT (always UTC `Z` per
|
||||
// fb-32 ingest path), so lexicographic >= compare is correct — but only
|
||||
// when the filter instant is also formatted as UTC `Z`. A non-UTC offset
|
||||
// (e.g. `+09:00`) would compare as ASCII after `Z` (0x2B < 0x5A) and
|
||||
// produce wrong results. Convert to UTC before formatting.
|
||||
if let Some(after) = &filters.ingested_after {
|
||||
let formatted = after
|
||||
.to_offset(time::UtcOffset::UTC)
|
||||
.format(&time::format_description::well_known::Rfc3339)
|
||||
.expect("OffsetDateTime (UTC) formats to RFC3339");
|
||||
sql.push_str(" AND d.updated_at >= ?");
|
||||
params.push(Box::new(formatted));
|
||||
}
|
||||
|
||||
// p9-fb-36: doc_id filter — single-doc scoping.
|
||||
if let Some(id) = &filters.doc_id {
|
||||
sql.push_str(" AND d.doc_id = ?");
|
||||
params.push(Box::new(id.0.clone()));
|
||||
}
|
||||
|
||||
// path_glob is intentionally NOT applied here — see module comment
|
||||
// on PATH_GLOB_OVERFETCH and the post-filter in `LexicalRetriever::search`.
|
||||
|
||||
|
||||
@@ -19,7 +19,9 @@ use std::sync::Arc;
|
||||
use kebab_config::Config;
|
||||
use kebab_core::{
|
||||
ChunkId, DocumentId, EmbeddingId, EmbeddingInput, EmbeddingKind,
|
||||
EmbeddingModelId, EmbeddingVersion, IndexVersion, VectorRecord, VectorStore,
|
||||
EmbeddingModelId, EmbeddingVersion, IndexVersion, MediaType,
|
||||
Retriever, SearchFilters, SearchHit, SearchMode, SearchQuery,
|
||||
VectorRecord, VectorStore,
|
||||
};
|
||||
use kebab_embed::{Embedder, MockEmbedder};
|
||||
use kebab_search::{LexicalRetriever, VectorRetriever};
|
||||
@@ -173,6 +175,93 @@ impl HybridEnv {
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
/// High-level helper: seed a doc with the default media type
|
||||
/// (Markdown) and embed its text. Returns the `DocumentId` so
|
||||
/// callers can use it in `doc_id` filter tests.
|
||||
pub fn insert_doc(&self, path: &str, text: &str) -> DocumentId {
|
||||
self.insert_doc_with_media(path, text, MediaType::Markdown)
|
||||
}
|
||||
|
||||
/// High-level helper: seed a doc with an explicit `MediaType`.
|
||||
/// The `media_type` is serialized to JSON (mirrors how
|
||||
/// `DocumentStore::put_document` writes it) and stored in `assets`.
|
||||
pub fn insert_doc_with_media(
|
||||
&self,
|
||||
path: &str,
|
||||
text: &str,
|
||||
media: MediaType,
|
||||
) -> DocumentId {
|
||||
// Derive deterministic IDs from the path so repeated calls with
|
||||
// the same path are idempotent (INSERT OR IGNORE).
|
||||
let path_hash: String = {
|
||||
use std::collections::hash_map::DefaultHasher;
|
||||
use std::hash::{Hash, Hasher};
|
||||
let mut h = DefaultHasher::new();
|
||||
path.hash(&mut h);
|
||||
format!("{:032x}", h.finish())
|
||||
};
|
||||
let doc_id = format!("d{}", &path_hash[..31]);
|
||||
let chunk_id = format!("c{}", &path_hash[..31]);
|
||||
let asset_id = format!("a{}", &path_hash[..31]);
|
||||
|
||||
let media_json = serde_json::to_string(&media).expect("serialize MediaType");
|
||||
let conn = self.sqlite.read_conn();
|
||||
conn.execute(
|
||||
"INSERT OR IGNORE INTO assets (
|
||||
asset_id, source_uri, workspace_path, media_type, byte_len,
|
||||
checksum, storage_kind, storage_path, discovered_at
|
||||
) VALUES (?, ?, ?, ?, 0,
|
||||
'deadbeefdeadbeefdeadbeefdeadbeef',
|
||||
'reference', ?, '1970-01-01T00:00:00Z')",
|
||||
params![
|
||||
asset_id,
|
||||
format!("file:///{path}"),
|
||||
path,
|
||||
media_json,
|
||||
path,
|
||||
],
|
||||
)
|
||||
.unwrap();
|
||||
conn.execute(
|
||||
"INSERT OR IGNORE INTO documents (
|
||||
doc_id, asset_id, workspace_path, title, lang, source_type,
|
||||
trust_level, parser_version, doc_version, schema_version,
|
||||
metadata_json, provenance_json, created_at, updated_at
|
||||
) VALUES (?, ?, ?, NULL, 'en', 'markdown', 'primary', 'v1', 1, 1,
|
||||
'{}', '{}', '1970-01-01T00:00:00Z', '1970-01-01T00:00:00Z')",
|
||||
params![doc_id, asset_id, path],
|
||||
)
|
||||
.unwrap();
|
||||
let heading_json = "[]";
|
||||
conn.execute(
|
||||
"INSERT OR IGNORE INTO chunks (
|
||||
chunk_id, doc_id, text, heading_path_json, section_label,
|
||||
source_spans_json, token_estimate, chunker_version,
|
||||
policy_hash, block_ids_json, created_at
|
||||
) VALUES (?, ?, ?, ?, NULL,
|
||||
'[{\"kind\":\"line\",\"start\":1,\"end\":1}]',
|
||||
1, 'v1', 'h', '[]', '1970-01-01T00:00:00Z')",
|
||||
params![chunk_id, doc_id, text, heading_json],
|
||||
)
|
||||
.unwrap();
|
||||
drop(conn);
|
||||
self.embed_and_upsert(&chunk_id, &doc_id, text, &[]);
|
||||
DocumentId(doc_id)
|
||||
}
|
||||
|
||||
/// Run a `SearchMode::Vector` query against the seeded corpus and
|
||||
/// return the resulting `Vec<SearchHit>`.
|
||||
pub fn run_vector_search(&self, query: &str, filters: &SearchFilters) -> Vec<SearchHit> {
|
||||
let r = self.vector_retriever();
|
||||
let q = SearchQuery {
|
||||
text: query.to_string(),
|
||||
mode: SearchMode::Vector,
|
||||
k: 10,
|
||||
filters: filters.clone(),
|
||||
};
|
||||
r.search(&q).expect("vector search")
|
||||
}
|
||||
|
||||
/// Embed `text` as a Document and upsert it as the embedding for
|
||||
/// `chunk_id`. Drives the same code path production uses:
|
||||
/// MockEmbedder → VectorRecord → LanceVectorStore::upsert →
|
||||
|
||||
@@ -15,7 +15,7 @@ use common::{
|
||||
HybridEnv, id32, require_avx_or_panic, TEST_LEX_INDEX_VERSION, TEST_VEC_INDEX_VERSION,
|
||||
};
|
||||
use kebab_core::{
|
||||
Retriever, SearchFilters, SearchHit, SearchMode, SearchQuery,
|
||||
MediaType, Retriever, SearchFilters, SearchHit, SearchMode, SearchQuery,
|
||||
};
|
||||
use kebab_search::{FusionPolicy, HybridRetriever};
|
||||
use rusqlite::params;
|
||||
@@ -213,6 +213,57 @@ fn hybrid_snapshot_run_1() {
|
||||
}
|
||||
}
|
||||
|
||||
/// p9-fb-36: vector post-filter must pass `media` through `filter_chunks`.
|
||||
/// Seeding two docs (markdown + pdf) and filtering for pdf-only must
|
||||
/// return only the pdf chunk, proving `LanceVectorStore::search` →
|
||||
/// `SqliteStore::filter_chunks` correctly applies the media arm.
|
||||
#[test]
|
||||
#[ignore = "requires AVX-capable hardware (LanceDB)"]
|
||||
fn vector_filter_by_media() {
|
||||
require_avx_or_panic();
|
||||
let env = HybridEnv::new();
|
||||
env.insert_doc_with_media("md1.md", "rust ownership", MediaType::Markdown);
|
||||
env.insert_doc_with_media("doc.pdf", "rust pdf body", MediaType::Pdf);
|
||||
|
||||
let filters = SearchFilters {
|
||||
media: vec!["pdf".to_string()],
|
||||
..Default::default()
|
||||
};
|
||||
let hits = env.run_vector_search("rust", &filters);
|
||||
assert_eq!(hits.len(), 1, "media filter must keep only pdf chunk");
|
||||
assert!(
|
||||
hits[0].doc_path.0.ends_with(".pdf"),
|
||||
"expected .pdf path, got: {}",
|
||||
hits[0].doc_path.0
|
||||
);
|
||||
}
|
||||
|
||||
/// p9-fb-36: vector post-filter must pass `doc_id` through `filter_chunks`.
|
||||
/// Seeding two docs with shared text, filtering by one doc_id must return
|
||||
/// only chunks from that doc.
|
||||
#[test]
|
||||
#[ignore = "requires AVX-capable hardware (LanceDB)"]
|
||||
fn vector_filter_by_doc_id() {
|
||||
require_avx_or_panic();
|
||||
let env = HybridEnv::new();
|
||||
let target = env.insert_doc("a.md", "shared knowledge");
|
||||
env.insert_doc("b.md", "shared knowledge");
|
||||
|
||||
let filters = SearchFilters {
|
||||
doc_id: Some(target.clone()),
|
||||
..Default::default()
|
||||
};
|
||||
let hits = env.run_vector_search("shared", &filters);
|
||||
assert!(
|
||||
!hits.is_empty(),
|
||||
"doc_id filter must return hits for the target doc"
|
||||
);
|
||||
assert!(
|
||||
hits.iter().all(|h| h.doc_id == target),
|
||||
"all hits must belong to the target doc_id"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
#[ignore = "requires AVX-capable hardware (LanceDB)"]
|
||||
fn vector_hit_carries_indexed_at() {
|
||||
|
||||
@@ -8,11 +8,15 @@
|
||||
use std::sync::Arc;
|
||||
|
||||
use kebab_config::Config;
|
||||
use kebab_core::{IndexVersion, Lang, Retriever, SearchFilters, SearchMode, SearchQuery, TrustLevel};
|
||||
use kebab_core::{
|
||||
DocumentId, IndexVersion, Lang, MediaType, Retriever, SearchFilters, SearchHit, SearchMode,
|
||||
SearchQuery, TrustLevel,
|
||||
};
|
||||
use kebab_search::LexicalRetriever;
|
||||
use kebab_store_sqlite::SqliteStore;
|
||||
use rusqlite::Connection;
|
||||
use tempfile::TempDir;
|
||||
use time::OffsetDateTime;
|
||||
|
||||
// ── Test scaffolding ─────────────────────────────────────────────────────
|
||||
|
||||
@@ -679,6 +683,210 @@ fn search_hit_carries_indexed_at_from_documents_updated_at() {
|
||||
assert!(!hit.stale, "lexical retriever must default stale=false");
|
||||
}
|
||||
|
||||
// ── TestEnv helper for fb-36 filter tests ───────────────────────────────
|
||||
|
||||
/// Convenience wrapper over `Env` that exposes higher-level fixture helpers
|
||||
/// for the fb-36 filter tests. Intentionally kept separate from `Env` so
|
||||
/// the original tests are untouched.
|
||||
struct TestEnv {
|
||||
inner: Env,
|
||||
counter: std::cell::Cell<u32>,
|
||||
}
|
||||
|
||||
impl TestEnv {
|
||||
fn new() -> Self {
|
||||
Self {
|
||||
inner: Env::new(),
|
||||
counter: std::cell::Cell::new(0),
|
||||
}
|
||||
}
|
||||
|
||||
/// Allocate a fresh monotone counter suffix so every inserted doc / chunk
|
||||
/// gets a unique 32-hex ID without the caller worrying about collisions.
|
||||
fn next_id(&self, prefix: &str) -> String {
|
||||
let n = self.counter.get();
|
||||
self.counter.set(n + 1);
|
||||
let suffix = format!("{prefix}{n:04}");
|
||||
id32(&suffix)
|
||||
}
|
||||
|
||||
/// Insert a markdown doc with the given `body` and return its `DocumentId`.
|
||||
fn insert_doc(&self, path: &str, body: &str) -> DocumentId {
|
||||
self.insert_doc_with_media(path, body, MediaType::Markdown)
|
||||
}
|
||||
|
||||
/// Insert a doc whose `assets.media_type` JSON is set to the serialized
|
||||
/// form of `media`. The `documents.updated_at` defaults to now.
|
||||
fn insert_doc_with_media(&self, path: &str, body: &str, media: MediaType) -> DocumentId {
|
||||
self.insert_doc_full(path, body, media, OffsetDateTime::now_utc())
|
||||
}
|
||||
|
||||
/// Insert a doc with an explicit `updated_at` timestamp (for
|
||||
/// `ingested_after` filter tests).
|
||||
fn insert_doc_with_updated_at(
|
||||
&self,
|
||||
path: &str,
|
||||
body: &str,
|
||||
updated_at: OffsetDateTime,
|
||||
) -> DocumentId {
|
||||
self.insert_doc_full(path, body, MediaType::Markdown, updated_at)
|
||||
}
|
||||
|
||||
fn insert_doc_full(
|
||||
&self,
|
||||
path: &str,
|
||||
body: &str,
|
||||
media: MediaType,
|
||||
updated_at: OffsetDateTime,
|
||||
) -> DocumentId {
|
||||
use time::format_description::well_known::Rfc3339;
|
||||
let doc_id = self.next_id("doc");
|
||||
let chunk_id = self.next_id("chk");
|
||||
let asset_id = self.next_id("ast");
|
||||
let media_json = serde_json::to_string(&media).expect("serialize MediaType");
|
||||
let updated_at_str = updated_at.format(&Rfc3339).expect("format updated_at");
|
||||
|
||||
let conn = self.inner.raw_conn();
|
||||
conn.execute(
|
||||
"INSERT OR IGNORE INTO assets (
|
||||
asset_id, source_uri, workspace_path, media_type, byte_len,
|
||||
checksum, storage_kind, storage_path, discovered_at
|
||||
) VALUES (?, ?, ?, ?, 0,
|
||||
'd0', 'reference', ?, '2024-01-01T00:00:00Z')",
|
||||
rusqlite::params![asset_id, format!("file:///{path}"), path, media_json, path],
|
||||
)
|
||||
.expect("insert asset");
|
||||
|
||||
conn.execute(
|
||||
"INSERT INTO documents (
|
||||
doc_id, asset_id, workspace_path, title, lang,
|
||||
source_type, trust_level, parser_version,
|
||||
doc_version, schema_version, metadata_json,
|
||||
provenance_json, created_at, updated_at
|
||||
) VALUES (?, ?, ?, NULL, 'en', 'markdown', 'primary', 'pv1', 1, 1,
|
||||
'{}', '{\"events\":[]}',
|
||||
'2024-01-01T00:00:00Z', ?)",
|
||||
rusqlite::params![doc_id, asset_id, path, updated_at_str],
|
||||
)
|
||||
.expect("insert document");
|
||||
|
||||
let empty_headings: Vec<&str> = vec![];
|
||||
let heading_json = serde_json::to_string(&empty_headings).unwrap();
|
||||
conn.execute(
|
||||
"INSERT INTO chunks (
|
||||
chunk_id, doc_id, text, heading_path_json, section_label,
|
||||
source_spans_json, token_estimate, chunker_version,
|
||||
policy_hash, block_ids_json, created_at
|
||||
) VALUES (?, ?, ?, ?, NULL,
|
||||
'[{\"kind\":\"line\",\"start\":1,\"end\":1}]',
|
||||
1, 'v1', 'h', '[]', '2024-01-01T00:00:00Z')",
|
||||
rusqlite::params![chunk_id, doc_id, body, heading_json],
|
||||
)
|
||||
.expect("insert chunk");
|
||||
|
||||
DocumentId(doc_id)
|
||||
}
|
||||
|
||||
fn run_search(&self, query: &str, filters: &SearchFilters) -> Vec<SearchHit> {
|
||||
let r = self.inner.retriever();
|
||||
let q = SearchQuery {
|
||||
text: query.to_string(),
|
||||
mode: SearchMode::Lexical,
|
||||
k: 10,
|
||||
filters: filters.clone(),
|
||||
};
|
||||
r.search(&q).expect("search")
|
||||
}
|
||||
}
|
||||
|
||||
// ── fb-36 filter tests ───────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn lexical_filter_by_media() {
|
||||
let env = TestEnv::new();
|
||||
env.insert_doc_with_media("md1.md", "rust ownership", MediaType::Markdown);
|
||||
env.insert_doc_with_media("doc.pdf", "rust pdf body", MediaType::Pdf);
|
||||
let filters = SearchFilters {
|
||||
media: vec!["pdf".to_string()],
|
||||
..Default::default()
|
||||
};
|
||||
let hits = env.run_search("rust", &filters);
|
||||
assert_eq!(hits.len(), 1, "only pdf doc should match");
|
||||
assert!(hits[0].doc_path.0.ends_with(".pdf"), "got: {}", hits[0].doc_path.0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn lexical_filter_by_ingested_after() {
|
||||
let env = TestEnv::new();
|
||||
env.insert_doc_with_updated_at(
|
||||
"old.md",
|
||||
"ingest test",
|
||||
time::macros::datetime!(2020-01-01 00:00:00 UTC),
|
||||
);
|
||||
env.insert_doc_with_updated_at(
|
||||
"new.md",
|
||||
"ingest test",
|
||||
time::macros::datetime!(2026-01-01 00:00:00 UTC),
|
||||
);
|
||||
let filters = SearchFilters {
|
||||
ingested_after: Some(time::macros::datetime!(2025-01-01 00:00:00 UTC)),
|
||||
..Default::default()
|
||||
};
|
||||
let hits = env.run_search("ingest", &filters);
|
||||
assert_eq!(hits.len(), 1, "only post-2025 doc matches");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn lexical_filter_by_doc_id() {
|
||||
let env = TestEnv::new();
|
||||
let target = env.insert_doc("a.md", "shared term");
|
||||
env.insert_doc("b.md", "shared term");
|
||||
let filters = SearchFilters {
|
||||
doc_id: Some(target.clone()),
|
||||
..Default::default()
|
||||
};
|
||||
let hits = env.run_search("shared", &filters);
|
||||
assert!(!hits.is_empty(), "should get at least one hit for target doc");
|
||||
for h in &hits {
|
||||
assert_eq!(h.doc_id, target, "all hits must be from target doc");
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn lexical_filter_combinator_is_and() {
|
||||
let env = TestEnv::new();
|
||||
let target = env.insert_doc_with_media("a.md", "rust", MediaType::Markdown);
|
||||
env.insert_doc_with_media("b.pdf", "rust", MediaType::Pdf);
|
||||
let filters = SearchFilters {
|
||||
media: vec!["markdown".to_string()],
|
||||
doc_id: Some(target.clone()),
|
||||
..Default::default()
|
||||
};
|
||||
let hits = env.run_search("rust", &filters);
|
||||
assert!(!hits.is_empty(), "target doc should match combined filter");
|
||||
assert!(hits.iter().all(|h| h.doc_id == target));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn lexical_filter_unknown_media_returns_empty() {
|
||||
let env = TestEnv::new();
|
||||
env.insert_doc("a.md", "rust");
|
||||
let filters = SearchFilters {
|
||||
media: vec!["nonexistent_kind".to_string()],
|
||||
..Default::default()
|
||||
};
|
||||
let hits = env.run_search("rust", &filters);
|
||||
assert!(hits.is_empty(), "unknown media → no hits, no error");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn lexical_empty_filters_match_default_behavior() {
|
||||
let env = TestEnv::new();
|
||||
env.insert_doc("a.md", "rust");
|
||||
let with_default = env.run_search("rust", &SearchFilters::default());
|
||||
assert!(!with_default.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn lexical_snapshot_run_1() {
|
||||
// Pinned snapshot. A small, deterministic corpus; the JSON shape of
|
||||
|
||||
@@ -129,6 +129,51 @@ impl SqliteStore {
|
||||
}
|
||||
}
|
||||
|
||||
// p9-fb-36: media_type filter (IN-list).
|
||||
// `assets.media_type` JSON has two shapes:
|
||||
// - unit variant (Markdown / Pdf / …): JSON text, e.g. `"markdown"`
|
||||
// - tuple variant (Image(Png) / Audio(Mp3) / Other(s)): JSON object,
|
||||
// e.g. `{"image": "png"}`
|
||||
// Extract a unified "kind" string for both shapes; mirrors lexical.
|
||||
if !filters.media.is_empty() {
|
||||
let media_ph = std::iter::repeat_n("?", filters.media.len())
|
||||
.collect::<Vec<_>>()
|
||||
.join(",");
|
||||
sql.push_str(&format!(
|
||||
" AND d.doc_id IN (\
|
||||
SELECT d2.doc_id FROM documents d2 \
|
||||
JOIN assets a ON a.asset_id = d2.asset_id \
|
||||
WHERE CASE \
|
||||
WHEN json_type(a.media_type) = 'text' THEN json_extract(a.media_type, '$') \
|
||||
ELSE (SELECT key FROM json_each(a.media_type) LIMIT 1) \
|
||||
END IN ({media_ph}))"
|
||||
));
|
||||
for kind in &filters.media {
|
||||
bind.push(Box::new(kind.clone()));
|
||||
}
|
||||
}
|
||||
|
||||
// p9-fb-36: ingested_after filter.
|
||||
// `documents.updated_at` is RFC3339 TEXT (UTC `Z` per fb-32);
|
||||
// lexicographic >= compare is correct — but only when the filter
|
||||
// instant is also formatted as UTC `Z`. A non-UTC offset (e.g.
|
||||
// `+09:00`) would compare as ASCII after `Z` (0x2B < 0x5A) and
|
||||
// produce wrong results. Convert to UTC before formatting.
|
||||
if let Some(after) = &filters.ingested_after {
|
||||
let formatted = after
|
||||
.to_offset(time::UtcOffset::UTC)
|
||||
.format(&time::format_description::well_known::Rfc3339)
|
||||
.expect("OffsetDateTime (UTC) formats to RFC3339");
|
||||
sql.push_str(" AND d.updated_at >= ?");
|
||||
bind.push(Box::new(formatted));
|
||||
}
|
||||
|
||||
// p9-fb-36: doc_id filter — single-doc scoping.
|
||||
if let Some(id) = &filters.doc_id {
|
||||
sql.push_str(" AND d.doc_id = ?");
|
||||
bind.push(Box::new(id.0.clone()));
|
||||
}
|
||||
|
||||
// Optional path_glob: applied in Rust on the rows we get back,
|
||||
// not in SQL — matching `kb-search::lexical`'s post-filter so
|
||||
// the glob semantics are byte-identical between retrievers.
|
||||
@@ -280,6 +325,89 @@ mod tests {
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
/// Variant of `seed_committed` that accepts an explicit `media_type`
|
||||
/// JSON string (e.g. `r#""markdown""#` or `r#""pdf""#`) and an
|
||||
/// explicit `updated_at` RFC3339 string so the fb-36 filter tests can
|
||||
/// exercise `media` and `ingested_after` without going through the full
|
||||
/// ingest pipeline.
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
fn seed_committed_full(
|
||||
store: &SqliteStore,
|
||||
chunk_id: &str,
|
||||
doc_id: &str,
|
||||
workspace_path: &str,
|
||||
lang: &str,
|
||||
tags: &[&str],
|
||||
trust: &str,
|
||||
media_type_json: &str,
|
||||
updated_at: &str,
|
||||
) {
|
||||
let asset_id = format!("a{}", &doc_id[..31]);
|
||||
{
|
||||
let conn = store.lock_conn();
|
||||
conn.execute(
|
||||
"INSERT INTO assets (
|
||||
asset_id, source_uri, workspace_path, media_type, byte_len,
|
||||
checksum, storage_kind, storage_path, discovered_at
|
||||
) VALUES (?, ?, ?, ?, 0, 'deadbeefdeadbeefdeadbeefdeadbeef',
|
||||
'reference', ?, '1970-01-01T00:00:00Z')",
|
||||
params![
|
||||
asset_id,
|
||||
format!("file://{workspace_path}"),
|
||||
workspace_path,
|
||||
media_type_json,
|
||||
workspace_path,
|
||||
],
|
||||
)
|
||||
.unwrap();
|
||||
conn.execute(
|
||||
"INSERT INTO documents (
|
||||
doc_id, asset_id, workspace_path, title, lang, source_type,
|
||||
trust_level, parser_version, doc_version, schema_version,
|
||||
metadata_json, provenance_json, created_at, updated_at
|
||||
) VALUES (?, ?, ?, NULL, ?, 'markdown', ?, 'v1', 1, 1,
|
||||
'{}', '{}', '1970-01-01T00:00:00Z', ?)",
|
||||
params![doc_id, asset_id, workspace_path, lang, trust, updated_at],
|
||||
)
|
||||
.unwrap();
|
||||
for t in tags {
|
||||
conn.execute(
|
||||
"INSERT INTO document_tags (doc_id, tag) VALUES (?, ?)",
|
||||
params![doc_id, t],
|
||||
)
|
||||
.unwrap();
|
||||
}
|
||||
conn.execute(
|
||||
"INSERT INTO chunks (
|
||||
chunk_id, doc_id, text, heading_path_json, section_label,
|
||||
source_spans_json, token_estimate, chunker_version,
|
||||
policy_hash, block_ids_json, created_at
|
||||
) VALUES (?, ?, 'hi', '[]', NULL, '[]', 1, 'v1', 'h', '[]',
|
||||
'1970-01-01T00:00:00Z')",
|
||||
params![chunk_id, doc_id],
|
||||
)
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
let embed_row = EmbeddingRecordRow {
|
||||
embedding_id: format!("e{}", &chunk_id[..31]),
|
||||
chunk_id: chunk_id.to_string(),
|
||||
model_id: "m".to_string(),
|
||||
model_version: "v1".to_string(),
|
||||
dimensions: 4,
|
||||
lance_table: "t".to_string(),
|
||||
created_at: OffsetDateTime::UNIX_EPOCH,
|
||||
};
|
||||
store
|
||||
.put_embedding_records_pending(std::slice::from_ref(&embed_row))
|
||||
.unwrap();
|
||||
store
|
||||
.mark_embedding_records_committed(std::slice::from_ref(
|
||||
&embed_row.embedding_id,
|
||||
))
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
fn cid(s: &str) -> ChunkId {
|
||||
ChunkId(s.to_string())
|
||||
}
|
||||
@@ -449,4 +577,147 @@ mod tests {
|
||||
let out = store.filter_chunks(&[], &SearchFilters::default()).unwrap();
|
||||
assert!(out.is_empty());
|
||||
}
|
||||
|
||||
// ── p9-fb-36 new filter arms ─────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn filter_chunks_media_type_keeps_matching_kind() {
|
||||
// c1 = markdown, c2 = pdf. Filter for pdf → only c2 survives.
|
||||
let tmp = TempDir::new().unwrap();
|
||||
let store = open_store(&tmp);
|
||||
let c1 = "11111111111111111111111111111111";
|
||||
let c2 = "22222222222222222222222222222222";
|
||||
seed_committed_full(
|
||||
&store, c1, "d1d1d1d1d1d1d1d1d1d1d1d1d1d1d1d1",
|
||||
"notes/a.md", "en", &[], "primary",
|
||||
r#""markdown""#,
|
||||
"1970-01-01T00:00:00Z",
|
||||
);
|
||||
seed_committed_full(
|
||||
&store, c2, "d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2",
|
||||
"notes/b.pdf", "en", &[], "primary",
|
||||
r#""pdf""#,
|
||||
"1970-01-01T00:00:00Z",
|
||||
);
|
||||
|
||||
let f = SearchFilters {
|
||||
media: vec!["pdf".to_string()],
|
||||
..Default::default()
|
||||
};
|
||||
let out = store
|
||||
.filter_chunks(&[cid(c1), cid(c2)], &f)
|
||||
.unwrap();
|
||||
assert_eq!(out, vec![cid(c2)], "only pdf chunk should survive media filter");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn filter_chunks_ingested_after_excludes_old_docs() {
|
||||
// c1 ingested 2020, c2 ingested 2026. filter ingested_after=2025 → only c2.
|
||||
let tmp = TempDir::new().unwrap();
|
||||
let store = open_store(&tmp);
|
||||
let c1 = "11111111111111111111111111111111";
|
||||
let c2 = "22222222222222222222222222222222";
|
||||
seed_committed_full(
|
||||
&store, c1, "d1d1d1d1d1d1d1d1d1d1d1d1d1d1d1d1",
|
||||
"old.md", "en", &[], "primary",
|
||||
r#""markdown""#,
|
||||
"2020-01-01T00:00:00Z",
|
||||
);
|
||||
seed_committed_full(
|
||||
&store, c2, "d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2",
|
||||
"new.md", "en", &[], "primary",
|
||||
r#""markdown""#,
|
||||
"2026-01-01T00:00:00Z",
|
||||
);
|
||||
|
||||
let f = SearchFilters {
|
||||
ingested_after: Some(time::macros::datetime!(2025-01-01 00:00:00 UTC)),
|
||||
..Default::default()
|
||||
};
|
||||
let out = store
|
||||
.filter_chunks(&[cid(c1), cid(c2)], &f)
|
||||
.unwrap();
|
||||
assert_eq!(out, vec![cid(c2)], "only post-2025 chunk should survive ingested_after filter");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn filter_chunks_doc_id_scopes_to_single_doc() {
|
||||
// c1 belongs to d1, c2 belongs to d2. filter doc_id=d1 → only c1.
|
||||
let tmp = TempDir::new().unwrap();
|
||||
let store = open_store(&tmp);
|
||||
let c1 = "11111111111111111111111111111111";
|
||||
let c2 = "22222222222222222222222222222222";
|
||||
let d1 = "d1d1d1d1d1d1d1d1d1d1d1d1d1d1d1d1";
|
||||
seed_committed_full(
|
||||
&store, c1, d1,
|
||||
"a.md", "en", &[], "primary",
|
||||
r#""markdown""#,
|
||||
"1970-01-01T00:00:00Z",
|
||||
);
|
||||
seed_committed_full(
|
||||
&store, c2, "d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2",
|
||||
"b.md", "en", &[], "primary",
|
||||
r#""markdown""#,
|
||||
"1970-01-01T00:00:00Z",
|
||||
);
|
||||
|
||||
let f = SearchFilters {
|
||||
doc_id: Some(kebab_core::DocumentId(d1.to_string())),
|
||||
..Default::default()
|
||||
};
|
||||
let out = store
|
||||
.filter_chunks(&[cid(c1), cid(c2)], &f)
|
||||
.unwrap();
|
||||
assert_eq!(out, vec![cid(c1)], "doc_id filter must scope to the target doc only");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn filter_chunks_ingested_after_non_utc_offset_compares_as_instant() {
|
||||
// Regression test for the non-UTC offset lex-compare bug.
|
||||
//
|
||||
// Scenario (from PR #127 review):
|
||||
// - doc stored at `2026-04-01T01:00:00Z`
|
||||
// - filter: `2026-04-01T05:00:00+09:00` == `2026-03-31T20:00:00Z` instant
|
||||
//
|
||||
// The doc instant (01:00 UTC on Apr 1) is AFTER the filter instant
|
||||
// (20:00 UTC on Mar 31), so the doc SHOULD match.
|
||||
//
|
||||
// Buggy code: formats `+09:00` as-is → lex compare
|
||||
// `2026-04-01T01:00:00Z` vs `2026-04-01T05:00:00+09:00`
|
||||
// `01` < `05` → doc dropped incorrectly.
|
||||
//
|
||||
// Fixed code: converts to UTC first → compares
|
||||
// `2026-04-01T01:00:00Z` vs `2026-03-31T20:00:00Z`
|
||||
// Apr 1 > Mar 31 → doc correctly included.
|
||||
let tmp = TempDir::new().unwrap();
|
||||
let store = open_store(&tmp);
|
||||
let c1 = "11111111111111111111111111111111";
|
||||
seed_committed_full(
|
||||
&store, c1, "d1d1d1d1d1d1d1d1d1d1d1d1d1d1d1d1",
|
||||
"doc.md", "en", &[], "primary",
|
||||
r#""markdown""#,
|
||||
"2026-04-01T01:00:00Z",
|
||||
);
|
||||
|
||||
// Filter instant: 2026-04-01T05:00:00+09:00 == 2026-03-31T20:00:00 UTC.
|
||||
// Doc (2026-04-01T01:00:00Z) is after the filter instant → should match.
|
||||
let filter_instant = time::OffsetDateTime::parse(
|
||||
"2026-04-01T05:00:00+09:00",
|
||||
&time::format_description::well_known::Rfc3339,
|
||||
)
|
||||
.expect("valid RFC3339 with +09:00 offset");
|
||||
|
||||
let f = SearchFilters {
|
||||
ingested_after: Some(filter_instant),
|
||||
..Default::default()
|
||||
};
|
||||
let out = store
|
||||
.filter_chunks(&[cid(c1)], &f)
|
||||
.unwrap();
|
||||
assert_eq!(
|
||||
out,
|
||||
vec![cid(c1)],
|
||||
"doc ingested at 01:00Z should match filter 05:00+09:00 (== 20:00Z previous day)"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -190,6 +190,22 @@ kebab fetch span "$DOC_ID" 1 5 --json | jq '{line_start, line_end, effective_end
|
||||
|
||||
PDF / audio docs reject `fetch span` with `error.v1.code = span_not_supported` — use `fetch chunk` (PDF chunks are page-aligned) or `fetch doc` instead.
|
||||
|
||||
### Filter args (fb-36)
|
||||
|
||||
````bash
|
||||
# Filter by media kind (md alias normalizes to markdown).
|
||||
kebab search "rust" --media md --json | jq '.hits | length'
|
||||
|
||||
# Filter by ingest timestamp (RFC3339).
|
||||
kebab search "rust" --ingested-after 2026-04-01T00:00:00Z --json
|
||||
|
||||
# Combine: doc-id scope + tag (AND across flags).
|
||||
kebab search "rust" --doc-id "<doc-id>" --tag rust --json
|
||||
````
|
||||
|
||||
Bad `--ingested-after` → `error.v1.code = config_invalid`, exit 2.
|
||||
Unknown `--media` value → silently empty (no error).
|
||||
|
||||
## P6-4 이미지 ingestion 옵션
|
||||
|
||||
`config.toml` 에 다음 절을 추가하면 `kebab ingest` 가 `**/*.png` / `**/*.jpg` 등 이미지 자산도 함께 색인합니다 (텍스트만 색인하려면 생략):
|
||||
|
||||
1304
docs/superpowers/plans/2026-05-10-p9-fb-36-search-filters.md
Normal file
1304
docs/superpowers/plans/2026-05-10-p9-fb-36-search-filters.md
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,213 @@
|
||||
---
|
||||
title: "p9-fb-36 — Search filter args design"
|
||||
phase: P9
|
||||
component: kebab-core + kebab-search + kebab-cli + kebab-mcp
|
||||
task_id: p9-fb-36
|
||||
status: design
|
||||
target_version: 0.5.0
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§4 search]
|
||||
date: 2026-05-10
|
||||
---
|
||||
|
||||
# p9-fb-36 — Search filter args
|
||||
|
||||
## Goal
|
||||
|
||||
agent / 사용자가 검색 범위를 좁힐 수 있도록 CLI / MCP 에 filter flag 추가. 기존 `SearchFilters` 도메인 type 의 4 필드 (tags_any / lang / path_glob / trust_min) 를 CLI 표면에 노출하고, 신규 3 필드 (media / ingested_after / doc_id) 추가. wire schema 변경 없음 (input-only). filter 적용 layer = SQLite WHERE (lexical) + over-fetch + post-filter (vector). AND 조합 의미 고정.
|
||||
|
||||
## Behavior contract
|
||||
|
||||
### CLI flags on `kebab search`
|
||||
|
||||
7 flags 추가, 모두 optional. 비어있으면 미적용 (기존 동작 보존):
|
||||
|
||||
| flag | 의미 | repeat? |
|
||||
|------|------|---------|
|
||||
| `--tag <name>` | doc 의 `metadata.tags` 안에 매칭 (OR-within) | yes (`--tag rust --tag async` = `tag IN (rust,async)`) |
|
||||
| `--lang <iso>` | `documents.lang` 정확 매칭 | no |
|
||||
| `--path-glob <pattern>` | `documents.workspace_path` glob 매칭 | no |
|
||||
| `--trust-min <level>` | `documents.trust_level >= level` (enum 순서) | no |
|
||||
| `--media <csv>` | `assets.media_type.kind` IN 리스트 (예: `--media md,pdf`) | csv |
|
||||
| `--ingested-after <RFC3339>` | `documents.updated_at >= timestamp` | no |
|
||||
| `--doc-id <id>` | `documents.doc_id = id` | no |
|
||||
|
||||
다중 flag 조합 = AND 결합. 각 flag 안 다중 값 (--tag, --media) = OR.
|
||||
|
||||
### Filter validation
|
||||
|
||||
- `--ingested-after` RFC3339 파싱 실패 → CLI 진입 시 `error.v1.code = config_invalid`, exit 2.
|
||||
- `--media` 의 unknown value (예: `--media foo`) → 매칭 0건 (filter unmatch). 명시적 거절 안 함 (lenient).
|
||||
- `--trust-min` clap value_enum 검증 (enum 외 거절).
|
||||
- `--doc-id` 형식 검증 안 함 (DocumentId 는 단순 string wrapper). 존재하지 않으면 매칭 0건.
|
||||
|
||||
### Filter layer
|
||||
|
||||
**Lexical (lexical.rs)**:
|
||||
- 기존 SQL builder 의 WHERE 절 확장. `media` / `ingested_after` / `doc_id` 모두 SQL 구문 가능.
|
||||
- `media`: `JOIN assets a ON a.asset_id = d.asset_id` + `json_extract(a.media_type, '$.kind') IN (?, ?)` (다중 값).
|
||||
- `ingested_after`: `d.updated_at >= ?` (RFC3339 lexicographic compare; UTC `Z` 가정).
|
||||
- `doc_id`: `d.doc_id = ?`.
|
||||
- path_glob 은 기존 post-filter 그대로.
|
||||
|
||||
**Vector (vector.rs)**:
|
||||
- 기존 over-fetch (k * 2) + `filter_chunks` 헬퍼에서 SQLite chunks JOIN documents JOIN assets.
|
||||
- 같은 WHERE 조건 적용. k 부족 시 truncated.
|
||||
|
||||
### Wire shape
|
||||
|
||||
기존 wire schema 변경 없음.
|
||||
|
||||
- `search_response.v1` (output) — 그대로.
|
||||
- `search_hit.v1` (개별 hit) — 그대로.
|
||||
- 입력 측 (CLI args / MCP `SearchInput`) 만 확장.
|
||||
|
||||
MCP `SearchInput` schema 는 `schemars` derive 로 자동 갱신. 수동 schema 파일 X.
|
||||
|
||||
### MCP `SearchInput` 확장
|
||||
|
||||
```rust
|
||||
pub struct SearchInput {
|
||||
pub query: String,
|
||||
pub mode: Option<String>,
|
||||
pub k: Option<usize>,
|
||||
pub max_tokens: Option<usize>, // fb-34
|
||||
pub snippet_chars: Option<usize>, // fb-34
|
||||
pub cursor: Option<String>, // fb-34
|
||||
// p9-fb-36 신규 (모두 optional)
|
||||
pub tags: Option<Vec<String>>,
|
||||
pub lang: Option<String>,
|
||||
pub path_glob: Option<String>,
|
||||
pub trust_min: Option<String>, // "low" | "medium" | "high"
|
||||
pub media: Option<Vec<String>>,
|
||||
pub ingested_after: Option<String>, // RFC3339
|
||||
pub doc_id: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
input → `SearchFilters` 변환 시 위와 동일 검증 (RFC3339 파싱, trust_level enum). 실패 시 `invalid_input` ErrorV1.
|
||||
|
||||
## Allowed / forbidden dependencies
|
||||
|
||||
- `kebab-core`: 신규 dep 없음. 기존 type 확장만.
|
||||
- `kebab-search`: 변경 없음 (SQL builder 안 WHERE 추가만).
|
||||
- `kebab-cli`: clap flag 추가, dispatch 변환.
|
||||
- `kebab-mcp`: SearchInput 확장.
|
||||
- `kebab-tui`: 변경 없음.
|
||||
|
||||
`kebab-core` 의 다른 `kebab-*` crate 의존 금지 룰 그대로.
|
||||
|
||||
## Public surface delta
|
||||
|
||||
### kebab-core
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Debug, Default, PartialEq, Serialize, Deserialize)]
|
||||
pub struct SearchFilters {
|
||||
pub tags_any: Vec<String>,
|
||||
pub lang: Option<Lang>,
|
||||
pub path_glob: Option<String>,
|
||||
pub trust_min: Option<TrustLevel>,
|
||||
/// p9-fb-36: media_type filter — IN-list of `MediaType.kind` strings
|
||||
/// (e.g. `["markdown", "pdf"]`). Empty Vec = no filter.
|
||||
#[serde(default)]
|
||||
pub media: Vec<String>,
|
||||
/// p9-fb-36: hits whose source doc's `documents.updated_at` is at
|
||||
/// or after this timestamp. None = no filter. RFC3339 / UTC.
|
||||
#[serde(default, with = "time::serde::rfc3339::option")]
|
||||
pub ingested_after: Option<OffsetDateTime>,
|
||||
/// p9-fb-36: restrict hits to a single document. None = no filter.
|
||||
#[serde(default)]
|
||||
pub doc_id: Option<DocumentId>,
|
||||
}
|
||||
```
|
||||
|
||||
`#[serde(default)]` on each new field = backwards-compat (older JSON without these keys deserializes as defaults).
|
||||
|
||||
### kebab-search (lexical + vector)
|
||||
|
||||
내부 SQL builder 확장만. public API 변경 없음.
|
||||
|
||||
### kebab-cli (`Cmd::Search`)
|
||||
|
||||
```rust
|
||||
Cmd::Search {
|
||||
// 기존
|
||||
query, k, mode, explain, no_cache,
|
||||
max_tokens, snippet_chars, cursor, // fb-34
|
||||
// p9-fb-36 신규
|
||||
#[arg(long)] tag: Vec<String>,
|
||||
#[arg(long)] lang: Option<String>,
|
||||
#[arg(long)] path_glob: Option<String>,
|
||||
#[arg(long, value_enum)] trust_min: Option<TrustLevelFlag>,
|
||||
#[arg(long, value_delimiter = ',')] media: Vec<String>,
|
||||
#[arg(long)] ingested_after: Option<String>,
|
||||
#[arg(long)] doc_id: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
`TrustLevelFlag` 신규 clap value_enum (CLI-internal, kebab-core 의 `TrustLevel` 로 변환).
|
||||
|
||||
### kebab-mcp::tools::search
|
||||
|
||||
`SearchInput` 7 optional 필드 추가 (위 §MCP `SearchInput` 확장). dispatch 에서 `SearchFilters` 빌드 + 검증.
|
||||
|
||||
## Test plan
|
||||
|
||||
| kind | description |
|
||||
|------|-------------|
|
||||
| unit (kebab-core) | `SearchFilters::default()` — 7 필드 모두 비어있음 |
|
||||
| unit (kebab-search/lexical) | `media: ["pdf"]` — markdown doc 안 잡힘 |
|
||||
| unit (kebab-search/lexical) | `media: ["markdown", "pdf"]` — IN-list 동작 |
|
||||
| unit (kebab-search/lexical) | `ingested_after: <어제>` — 어제 이전 doc 안 잡힘 |
|
||||
| unit (kebab-search/lexical) | `doc_id: <X>` — 다른 doc 의 chunk 안 잡힘 |
|
||||
| unit (kebab-search/lexical) | 다중 filter AND — 모두 만족하는 hit 만 |
|
||||
| unit (kebab-search/lexical) | 빈 filter (default) — 기존 동작과 동일 |
|
||||
| unit (kebab-search/vector) | 동일 패턴 — `filter_chunks` post-filter |
|
||||
| unit (kebab-search) | 알 수 없는 media 값 (`["foo"]`) — empty result, no error |
|
||||
| 통합 (kebab-cli) | `kebab search Q --media md --json` wire shape (search_response.v1 그대로) |
|
||||
| 통합 (kebab-cli) | `kebab search Q --ingested-after 2020-01-01 --json` 모든 hit 통과 |
|
||||
| 통합 (kebab-cli) | `kebab search Q --ingested-after garbage --json` → `error.v1.code = config_invalid` exit 2 |
|
||||
| 통합 (kebab-cli) | `kebab search Q --doc-id <id> --json` 단일 doc 만 |
|
||||
| 통합 (kebab-cli) | `kebab search Q --tag rust --tag async --json` IN-list 동작 |
|
||||
| 통합 (kebab-mcp) | `mcp__kebab__search` 7 optional 필드 모두 정상 응답 |
|
||||
| 통합 (kebab-mcp) | `mcp__kebab__search` invalid `ingested_after` → invalid_input |
|
||||
|
||||
## Implementation steps (high-level)
|
||||
|
||||
1. `kebab-core::SearchFilters` 3 필드 추가 + 단위 테스트.
|
||||
2. `kebab-search/lexical.rs` SQL builder 확장 + 단위 테스트.
|
||||
3. `kebab-search/vector.rs` `filter_chunks` 헬퍼 동일 확장 + 단위 테스트.
|
||||
4. `kebab-cli::Cmd::Search` 7 flag 추가 + dispatch + RFC3339 파싱.
|
||||
5. `kebab-cli` 통합 테스트 (lexical-only, no Ollama).
|
||||
6. `kebab-mcp::tools::search::SearchInput` 7 필드 + dispatch + invalid_input 검증.
|
||||
7. `kebab-mcp` 통합 테스트.
|
||||
8. README + SMOKE — filter 예시.
|
||||
9. tasks/INDEX.md / spec status flip.
|
||||
10. SKILL.md — `mcp__kebab__search` input shape 갱신.
|
||||
|
||||
## Risks / notes
|
||||
|
||||
- **`assets.media_type` JSON shape**: `MediaType` enum 의 serde 직렬화 형태가 `{"kind": "markdown"}` 인지, 다른 형태인지 SQLite 저장 형식 확인 필요. `Markdown` 같은 unit variant 는 `"markdown"` 문자열, `Image(...)` / `Audio(...)` 같은 tuple variant 는 `{"image": {...}}` 형태일 가능성. `json_extract` 경로를 그에 맞춰 조정 (e.g. `case when typeof(...) = 'text' then ... else json_extract($.kind) end`).
|
||||
- **RFC3339 lexicographic compare**: ingest 시 항상 UTC `Z` 로 저장 (fb-32 ingest path 확인됨). 외부 도구가 다른 offset 으로 강제 update 시 비교 부정확. spec 에 "UTC `Z` 가정" 명시.
|
||||
- **path_glob 과 다른 filter 의 ordering**: path_glob 은 post-filter (lexical), 신규 3 개는 SQL — fetch_limit 도달 후 path_glob 으로 추가 cut → final hit 수가 줄 수 있음. 기존 동작과 동일 (path_glob 패턴 유지).
|
||||
- **clap `Vec<String>` 의 default**: clap 0.4 에서 미지정 = `Vec::new()`. 자동.
|
||||
- **trust_min enum 매핑**: clap value_enum 으로 안전. `TrustLevelFlag` → `TrustLevel` 변환 헬퍼.
|
||||
- **SearchFilters serde backwards-compat**: `#[serde(default)]` 로 옛 JSON 무영향. SQLite 안 SearchFilters 직렬 저장 안 함 (request-time only).
|
||||
|
||||
## Out of scope
|
||||
|
||||
- `--exclude-doc-id` / `--exclude-tag` (exclusion filter).
|
||||
- 다중 doc_id (`--doc-id a --doc-id b`) — 단일만.
|
||||
- TUI Search 패널 filter UI.
|
||||
- Lance metadata pre-filter.
|
||||
- tag 시스템 신규 도입 (이미 존재).
|
||||
- `--search.default-filter` config (default 값 지정) — agent 가 매번 명시.
|
||||
|
||||
## Documentation updates (implementation PR 동시)
|
||||
|
||||
- `README.md` — `kebab search` row 의 flag 표기에 7 flag 추가.
|
||||
- `docs/SMOKE.md` — filter walkthrough (`--media md --ingested-after 2026-04-01` 예시).
|
||||
- `tasks/p9/p9-fb-36-search-filters.md` — `status: open → completed`, design/plan 링크.
|
||||
- `tasks/INDEX.md` — fb-36 행 ✅.
|
||||
- `integrations/claude-code/kebab/SKILL.md` — `mcp__kebab__search` input shape 갱신 (7 필드 명시 + AND 의미 + lenient unknown media).
|
||||
@@ -48,11 +48,12 @@ Use when the user wants to **find** a doc, or when you (the model) need raw chun
|
||||
|
||||
Input:
|
||||
```json
|
||||
{ "query": "<query>", "mode": "hybrid", "k": 10, "max_tokens": null, "snippet_chars": null, "cursor": null }
|
||||
{ "query": "<query>", "mode": "hybrid", "k": 10, "max_tokens": null, "snippet_chars": null, "cursor": null, "tags": null, "lang": null, "path_glob": null, "trust_min": null, "media": null, "ingested_after": null, "doc_id": null }
|
||||
```
|
||||
|
||||
- `mode = "hybrid"` is the default-correct choice. Use `"vector"` for semantic-only ("docs about X concept"), `"lexical"` for exact strings ("the literal flag `--foo-bar`").
|
||||
- **`max_tokens` / `snippet_chars` / `cursor` (p9-fb-34)** — agent budget controls. Set `max_tokens` to cap result wire size (chars/4 estimate); set `cursor` to the previous response's `next_cursor` to fetch the next page.
|
||||
- **p9-fb-36 filter inputs:** `tags` (string array — OR-within, AND across keys), `lang` (BCP-47 language code), `path_glob` (glob pattern matched against doc path), `trust_min` (`"primary"` | `"secondary"` | `"generated"` — includes that level and above), `media` (string array — IN-list of `"markdown"` | `"pdf"` | `"image"` | `"audio"` | `"other"`; alias `"md"` → `"markdown"`), `ingested_after` (RFC3339 UTC string), `doc_id` (exact doc UUID). AND combinator across keys. Invalid `ingested_after` or unknown `trust_min` → `error.v1.code = invalid_input`. Unknown `media` value → empty hits, no error.
|
||||
- Output is `search_response.v1`: `{ hits: search_hit.v1[], next_cursor: string|null, truncated: bool }`. Iterate `response.hits[]` for individual hits. Key hit fields: `rank`, `score`, `doc_path`, `heading_path[]`, `section_label`, `snippet`, `citation` (line range / page), `chunk_id`.
|
||||
- Cite back to the user as `doc_path § heading_path[-1]` so they can open the source.
|
||||
- When `truncated: true`, the budget loop modified the page (snippet shortening or k reduction). `next_cursor` is **independent** — non-null whenever more hits may be reachable. Caller may widen `max_tokens` (re-issue same query for fuller snippets / more hits per page) or follow `next_cursor` (advance through more hits) or both. Mismatched cursor (corpus_revision changed) returns `error.v1.code = stale_cursor` — re-issue the search to obtain a fresh one.
|
||||
|
||||
@@ -124,7 +124,7 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능.
|
||||
- [p9-fb-33 streaming ask (ndjson delta)](p9/p9-fb-33-streaming-ask.md) — ✅ 머지 + v0.5.0 cut 후보 (2026-05-09)
|
||||
- [p9-fb-34 output budget controls](p9/p9-fb-34-output-budget-controls.md) — ✅ 머지 + v0.5.0 cut 후보 (2026-05-09)
|
||||
- [p9-fb-35 verbatim fetch](p9/p9-fb-35-verbatim-fetch.md) — ✅ 머지 + v0.5.0 cut 후보 (2026-05-09)
|
||||
- [p9-fb-36 search filter args](p9/p9-fb-36-search-filters.md) — ⏳ 미구현, brainstorm 필요
|
||||
- [p9-fb-36 search filter args](p9/p9-fb-36-search-filters.md) — ✅ 머지 (2026-05-10)
|
||||
- [p9-fb-37 trace + stats](p9/p9-fb-37-trace-and-stats.md) — ⏳ 미구현, brainstorm 필요 (depends_on 27)
|
||||
|
||||
### 🎯 0.5.0 — RAG quality (cascade 동반: V00X + reindex)
|
||||
|
||||
@@ -3,7 +3,7 @@ phase: P9
|
||||
component: kebab-cli + kebab-search + wire-schema
|
||||
task_id: p9-fb-36
|
||||
title: "Search filter args (--media / --ingested-after / --doc-id / --tag)"
|
||||
status: open
|
||||
status: completed
|
||||
target_version: 0.4.0
|
||||
depends_on: []
|
||||
unblocks: []
|
||||
@@ -14,7 +14,10 @@ source_feedback: 사용자 도그푸딩 2026-05-06 — agent 가 검색 범위
|
||||
|
||||
# p9-fb-36 — Search filter args
|
||||
|
||||
> ⏳ **백로그 only — 미구현.** 본 spec 은 도그푸딩 피드백 skeleton. 구현 착수 전 [superpowers:brainstorming](../../docs/superpowers/) 으로 설계 단계 선행 필요. filter 종류 / SQLite 쿼리 통합 / Lance vector 필터 적용 layer brainstorm 후 확정.
|
||||
> ✅ **구현 완료.** 본 spec 은 구현 시점의 frozen 상태. post-merge deviation 은 [HOTFIXES.md](../HOTFIXES.md) 참조.
|
||||
|
||||
상세 설계: `docs/superpowers/specs/2026-05-10-p9-fb-36-search-filters-design.md`.
|
||||
구현 계획: `docs/superpowers/plans/2026-05-10-p9-fb-36-search-filters.md`.
|
||||
|
||||
## 증상 / 동기
|
||||
|
||||
|
||||
Reference in New Issue
Block a user