Files
kebab/crates/kebab-search/src/citation_helper.rs
altair823 911fb49550 refactor(rename): kb crates → kebab — Cargo packages, folders, Rust modules
프로젝트 이름 `kb` → `kebab` rename 의 첫 단계.

- workspace `Cargo.toml`: members `crates/kb-*` → `crates/kebab-*`,
  repository URL `altair823/kb` → `altair823/kebab`.
- 18 crate 폴더 rename via `git mv` (history 보존).
- 각 crate `Cargo.toml`: `name = "kb-*"` → `"kebab-*"`, path deps
  `../kb-*` → `../kebab-*`.
- 모든 `.rs`: `kb_<id>` snake-case 모듈 path 18 개 (`kb_core`,
  `kb_config`, `kb_app`, `kb_cli`, `kb_eval`, `kb_search`, `kb_chunk`,
  `kb_normalize`, `kb_source_fs`, `kb_parse_md`, `kb_parse_types`,
  `kb_store_sqlite`, `kb_store_vector`, `kb_embed`, `kb_embed_local`,
  `kb_llm`, `kb_llm_local`, `kb_rag`) → `kebab_<id>` 일괄 sed (단어
  경계 \\b 사용해 영어 문장 안의 "kb" 약어 미오염).

CLI binary 이름 (`[[bin]] name = "kb"`), 환경변수 `KB_*`, XDG paths,
tracing target, 그리고 docs sweep 은 다음 commit 에서.

## 검증

- `cargo check --workspace` clean — 모든 crate 빌드 통과 후 commit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 03:28:08 +00:00

75 lines
2.6 KiB
Rust

//! Shared helpers for building `kebab_core::Citation` values from a
//! chunk's first `SourceSpan`.
//!
//! Both the lexical and vector retrievers join against the same
//! `chunks.source_spans_json` column and need identical mapping logic
//! so cross-mode citation strings round-trip byte-identically (a
//! requirement for the hybrid retriever's tie-break on chunk_id and
//! for the `search --explain` output documented in design §0 Q3 and
//! §1.6). Living here means a future PDF / image / audio extractor can
//! enrich the mapping in one place rather than two.
use kebab_core::{Citation, SourceSpan, WorkspacePath};
/// Build a `Citation` from the chunk's first `SourceSpan`. P1 markdown
/// only emits `Line`, so the other variants are mostly defensive — we
/// forward them as faithfully as possible so a future PDF / image
/// extractor can flow through without churn.
///
/// `chunk_id` is taken only for diagnostic logging when the span shape
/// has no Citation mapping (`Byte`-spans, empty arrays).
pub(crate) fn citation_from_first_span(
chunk_id: &str,
path: WorkspacePath,
section: Option<String>,
first_span: Option<&SourceSpan>,
) -> Citation {
match first_span {
Some(SourceSpan::Line { start, end }) => Citation::Line {
path,
start: *start,
end: *end,
section,
},
Some(SourceSpan::Page { page, .. }) => Citation::Page {
path,
page: *page,
section,
},
Some(SourceSpan::Region { x, y, w, h }) => Citation::Region {
path,
x: *x,
y: *y,
w: *w,
h: *h,
},
Some(SourceSpan::Time { start_ms, end_ms }) => Citation::Time {
path,
start_ms: *start_ms,
end_ms: *end_ms,
speaker: None,
},
// Byte-spans don't have a Citation variant. Fall back to a
// Line citation pointing at the document head — better than
// fabricating a position. Spans-empty falls into the same
// branch.
other @ (Some(SourceSpan::Byte { .. }) | None) => {
let span_shape = match other {
Some(_) => "Byte",
None => "empty array",
};
tracing::warn!(
chunk_id,
span_shape,
"kb-search: SourceSpan has no Citation mapping; falling back to Line {{1, 1}}"
);
Citation::Line {
path,
start: 1,
end: 1,
section,
}
}
}
}