프로젝트 이름 `kb` → `kebab` rename 의 첫 단계. - workspace `Cargo.toml`: members `crates/kb-*` → `crates/kebab-*`, repository URL `altair823/kb` → `altair823/kebab`. - 18 crate 폴더 rename via `git mv` (history 보존). - 각 crate `Cargo.toml`: `name = "kb-*"` → `"kebab-*"`, path deps `../kb-*` → `../kebab-*`. - 모든 `.rs`: `kb_<id>` snake-case 모듈 path 18 개 (`kb_core`, `kb_config`, `kb_app`, `kb_cli`, `kb_eval`, `kb_search`, `kb_chunk`, `kb_normalize`, `kb_source_fs`, `kb_parse_md`, `kb_parse_types`, `kb_store_sqlite`, `kb_store_vector`, `kb_embed`, `kb_embed_local`, `kb_llm`, `kb_llm_local`, `kb_rag`) → `kebab_<id>` 일괄 sed (단어 경계 \\b 사용해 영어 문장 안의 "kb" 약어 미오염). CLI binary 이름 (`[[bin]] name = "kb"`), 환경변수 `KB_*`, XDG paths, tracing target, 그리고 docs sweep 은 다음 commit 에서. ## 검증 - `cargo check --workspace` clean — 모든 crate 빌드 통과 후 commit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
75 lines
2.6 KiB
Rust
75 lines
2.6 KiB
Rust
//! Shared helpers for building `kebab_core::Citation` values from a
|
|
//! chunk's first `SourceSpan`.
|
|
//!
|
|
//! Both the lexical and vector retrievers join against the same
|
|
//! `chunks.source_spans_json` column and need identical mapping logic
|
|
//! so cross-mode citation strings round-trip byte-identically (a
|
|
//! requirement for the hybrid retriever's tie-break on chunk_id and
|
|
//! for the `search --explain` output documented in design §0 Q3 and
|
|
//! §1.6). Living here means a future PDF / image / audio extractor can
|
|
//! enrich the mapping in one place rather than two.
|
|
|
|
use kebab_core::{Citation, SourceSpan, WorkspacePath};
|
|
|
|
/// Build a `Citation` from the chunk's first `SourceSpan`. P1 markdown
|
|
/// only emits `Line`, so the other variants are mostly defensive — we
|
|
/// forward them as faithfully as possible so a future PDF / image
|
|
/// extractor can flow through without churn.
|
|
///
|
|
/// `chunk_id` is taken only for diagnostic logging when the span shape
|
|
/// has no Citation mapping (`Byte`-spans, empty arrays).
|
|
pub(crate) fn citation_from_first_span(
|
|
chunk_id: &str,
|
|
path: WorkspacePath,
|
|
section: Option<String>,
|
|
first_span: Option<&SourceSpan>,
|
|
) -> Citation {
|
|
match first_span {
|
|
Some(SourceSpan::Line { start, end }) => Citation::Line {
|
|
path,
|
|
start: *start,
|
|
end: *end,
|
|
section,
|
|
},
|
|
Some(SourceSpan::Page { page, .. }) => Citation::Page {
|
|
path,
|
|
page: *page,
|
|
section,
|
|
},
|
|
Some(SourceSpan::Region { x, y, w, h }) => Citation::Region {
|
|
path,
|
|
x: *x,
|
|
y: *y,
|
|
w: *w,
|
|
h: *h,
|
|
},
|
|
Some(SourceSpan::Time { start_ms, end_ms }) => Citation::Time {
|
|
path,
|
|
start_ms: *start_ms,
|
|
end_ms: *end_ms,
|
|
speaker: None,
|
|
},
|
|
// Byte-spans don't have a Citation variant. Fall back to a
|
|
// Line citation pointing at the document head — better than
|
|
// fabricating a position. Spans-empty falls into the same
|
|
// branch.
|
|
other @ (Some(SourceSpan::Byte { .. }) | None) => {
|
|
let span_shape = match other {
|
|
Some(_) => "Byte",
|
|
None => "empty array",
|
|
};
|
|
tracing::warn!(
|
|
chunk_id,
|
|
span_shape,
|
|
"kb-search: SourceSpan has no Citation mapping; falling back to Line {{1, 1}}"
|
|
);
|
|
Citation::Line {
|
|
path,
|
|
start: 1,
|
|
end: 1,
|
|
section,
|
|
}
|
|
}
|
|
}
|
|
}
|