Files
kebab/docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
altair823 f9714aa5cb docs(rename): kb → kebab — README, tasks/, docs/, design doc, report
마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:01:55 +00:00

1353 lines
44 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "KB v1 최종 결과물 형태 — Frozen Design"
date: 2026-04-27
status: frozen
purpose: 작은 단위 분해 작업 시 spec 변경을 막기 위한 단단한 contract 동결
source_report: ../../../kebab_local_rust_report.md
related_tasks: ../../../tasks/INDEX.md
---
# KB v1 최종 결과물 형태 — Frozen Design
이 문서는 사용자가 만족할 **최종 결과물의 매우 구체적 형태**를 동결한다. 각 phase 의 task 분해는 이 contract 위에서 수행되며, 이 문서가 바뀌지 않는 한 task 들의 인터페이스는 변하지 않는다.
전제 보고서는 [`kebab_local_rust_report.md`](../../../kebab_local_rust_report.md). 그 보고서가 *방향*과 *근거*를 제공하며, 이 문서가 *형태*를 못박는다.
---
## 0. 동결된 결정 요약
| # | 결정 | 값 | 근거 |
|---|------|-----|------|
| Q1 | scope 우선순위 | UX → Data 역도출 | 사용자 만족이 spec 안정성 lever |
| Q2 | headline UX | `kebab ask` 답변 화면 | 검색/citation/RAG/refusal/모델메타 모두 노출 |
| | ask 기본 형식 | inline numeric refs `[1]…[n]` + footer | 일상 가독성 |
| | ask `--explain` | per-claim 분해 + verbose footer + retrieval trace | 디버그 단일 플래그 |
| Q3 | citation 문자열 | URI fragment (`path#k=v…`, W3C Media Fragments) | 표준 정합 + Windows path 안전 + 브라우저 자동 스크롤 |
| Q4 | refusal 정책 | 양층: score gate + LLM self-judge + citation 후처리 검증 | 환각/false-negative 양쪽 차단 |
| Q5 | streaming | always (tty 토큰, pipe buffered) | 체감 속도 + LLM trait 단일화 |
| Q6 | JSON 모드 | 별도 stable wire schema (`*.v1`), schema_version 명시 | internal 자유 진화 + 외부 contract 동결 |
| Q7 | footer | toggle (default minimal / `--explain` verbose) | 일상-디버그 분리 |
| | search 출력 | dense 4줄 (rank+score / path#frag / heading / snippet) | line-oriented 파싱 + fzf 친화 |
| Q8 | ID 인코딩 | hybrid: `blake3(canonical_json(tuple))[..32]` PK + path/heading human ref | 짧은 PK + path-based citation |
| Q9 | frontmatter | 모두 optional + auto-derive + 미지 키 `metadata.user` 보존 | 진입 장벽 0 |
| Q10 | workspace | single root + XDG layout | personal v1 적정 |
| | asset 보존 | content-addressable copy, `copy_threshold_mb=100` 초과 시 reference + checksum | reproducibility + 디스크 절감 |
| | wire 버전 | additive within `vN`, breaking → `vN+1` | 외부 깨짐 방지 |
| | ignore | gitignore 문법 + `.kebabignore` | 익숙함 |
| | 에러 | thiserror per crate, anyhow at boundary | 추적성 + UX |
| | sync | watch=false default | v1 명시 ingest |
---
## 1. Headline UX scenes
### 1.1 `kebab ask` (default)
```text
$ kebab ask "Markdown chunking 규칙은?"
heading boundary 우선 [1]. code block 중간 분할 금지 [2]. table 가능한 한
단일 chunk 유지 [2]. 긴 section 은 paragraph 단위로 분할 [1]. chunk 마다
heading_path 와 source_span 보존 [1].
─────────────────────────────────────────────────────────
[1] notes/rust/kebab-architecture.md#L661-L672
§14 Chunking 정책
[2] notes/rust/kebab-architecture.md#L665-L668
§14 Chunking 정책
grounded ✓ qwen2.5:14b-instruct rag-v1 3 chunks
```
### 1.2 `kebab ask --explain`
```text
$ kebab ask --explain "Markdown chunking 규칙은?"
▎ heading boundary 우선
└ notes/rust/kebab-architecture.md#L662
「heading boundary를 우선한다」
▎ code block 중간 분할 금지
└ notes/rust/kebab-architecture.md#L663
「code block은 중간에서 자르지 않는다」
▎ table 단일 chunk 유지
└ notes/rust/kebab-architecture.md#L664
「table은 가능한 한 하나의 chunk로」
▎ heading_path / source_span 보존
└ notes/rust/kebab-architecture.md#L668-L670
retrieval trace
query "Markdown chunking 규칙은?"
mode hybrid
k 8
threshold (gate) 0.30 → top-1 0.82 pass
fusion rrf (k=60)
chunks (used) 3 / 8 returned
#1 0.82 notes/rust/kebab-architecture.md#L661-L672 bm25=12.4 vec=0.78
#2 0.78 notes/rust/kebab-architecture.md#L692-L713 bm25=10.1 vec=0.74
#3 0.55 guides/markdown-style.md#L4-L18 bm25=8.2 vec=0.61
grounded ✓ qwen2.5:14b-instruct rag-v1 3 chunks
prompt 1184 tokens completion 312 tokens latency 1842 ms
embedding multilingual-e5-small index v1.0
```
### 1.3 `kebab ask` (refusal — score gate)
```text
$ kebab ask "당신의 회사 매출은?"
근거 부족. KB 에 해당 내용 없음.
가까운 후보 (모두 임계 0.30 미만):
· ~/notes/finance/personal-budget.md#L1-L8 (score 0.21)
grounded ✗ qwen2.5:14b-instruct rag-v1 0 chunks used
```
### 1.4 `kebab ask` (refusal — LLM self-judge)
```text
$ kebab ask "이 책의 23쪽 결론은?"
근거 부족. 제공된 chunk 중 결론 내용 없음.
검색은 됨, LLM 이 결론 부재 판단:
· papers/book.pdf#p=23 (score 0.61)
· papers/book.pdf#p=24 (score 0.58)
grounded ✗ qwen2.5:14b-instruct rag-v1 3 chunks searched, 0 grounded
```
### 1.5 `kebab search` (dense)
```text
$ kebab search "Markdown chunking 규칙"
1. 0.82 notes/rust/kebab-architecture.md#L661-L672
§14 Chunking 정책
heading boundary 우선. code block 중간 분할 금지.
table 가능한 한 단일 chunk…
2. 0.71 notes/rust/kebab-architecture.md#L692-L713
§15 검색과 RAG 정책
검색은 처음부터 hybrid 로 설계하되 구현은 단계적…
3. 0.55 guides/markdown-style.md#L4-L18
§1 Heading 규약
문서는 항상 H1 으로 시작한다. H2 부터는…
3 hits hybrid index v1.0 bm25+e5-small/RRF
```
### 1.6 `kebab search --explain`
각 hit 아래 추가:
```text
├ lexical (bm25) rank 1 score 12.4
├ vector (e5-s) rank 2 score 0.78
└ rrf fusion rank 1 score 0.82
chunker md-heading-v1 chunk_id 9b4a8c…
```
### 1.7 exit codes
| code | 의미 |
|------|------|
| 0 | hit / grounded answer / success |
| 1 | no-hit / refusal (정상 거절) |
| 2 | error (parser fail, IO, network, model 미기동) |
| 3 | doctor unhealthy |
---
## 2. Wire schema v1
`docs/wire-schema/v1/*.schema.json` 으로 동결. internal Rust struct ↔ wire 변환은 `From`/`TryFrom`. 모든 wire 객체는 `schema_version` 필드 필수.
### 2.1 Citation (5 variants — discriminated by `kind`)
```json
{
"schema_version": "citation.v1",
"kind": "line|page|region|caption|time",
"path": "notes/rust/kebab.md",
"uri": "notes/rust/kebab.md#L12-L34",
"line": { "start": 12, "end": 34, "section": "§14 Chunking 정책" },
"page": { "page": 13, "section": "Experiment Setup" },
"region": { "x": 120, "y": 40, "w": 520, "h": 180 },
"caption": { "model": "qwen2.5-vl:7b" },
"time": { "start_ms": 822000, "end_ms": 850000, "speaker": "S1" }
}
```
variant 별 해당 키만 채움. `path``uri` 는 항상 채움 (`uri` 는 path + W3C Media Fragments 합본).
### 2.2 SearchHit
```json
{
"schema_version": "search_hit.v1",
"rank": 1,
"score": 0.82,
"chunk_id": "9b4a8c1e7d3f2a05",
"doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kebab-architecture.md",
"heading_path": ["아키텍처", "Chunking 정책"],
"section_label": "§14 Chunking 정책",
"snippet": "heading boundary 우선. code block 중간 분할 금지…",
"snippet_full_text": false,
"citation": { "...": "citation.v1" },
"retrieval": {
"method": "hybrid",
"lexical_score": 12.4,
"vector_score": 0.78,
"fusion_score": 0.82,
"lexical_rank": 1,
"vector_rank": 2
},
"index_version": "v1.0",
"embedding_model": "multilingual-e5-small",
"chunker_version": "md-heading-v1"
}
```
`retrieval.method ∈ {lexical, vector, hybrid}`. 단독 모드 시 다른 score/rank 는 null.
### 2.3 Answer
```json
{
"schema_version": "answer.v1",
"answer": "heading boundary 우선 [1]. code block 중간 분할 금지 [2]…",
"citations": [
{ "marker": "[1]", "citation": { "...": "citation.v1" } },
{ "marker": "[2]", "citation": { "...": "citation.v1" } }
],
"grounded": true,
"refusal_reason": null,
"model": { "id": "qwen2.5:14b-instruct", "provider": "ollama" },
"embedding": { "id": "multilingual-e5-small", "provider": "fastembed", "dimensions": 384 },
"prompt_template_version": "rag-v1",
"retrieval": {
"trace_id": "ret_4a8b2c1e",
"mode": "hybrid",
"k": 8,
"score_gate": 0.30,
"top_score": 0.82,
"chunks_returned": 8,
"chunks_used": 3
},
"usage": { "prompt_tokens": 1184, "completion_tokens": 312, "latency_ms": 1842 },
"created_at": "2026-04-27T15:42:11+09:00"
}
```
거절 시 `grounded=false`, `answer` 는 사람 친화 거절 문장, `refusal_reason ∈ {"score_gate","llm_self_judge","no_index","no_chunks"}`. `citations` 는 빈 배열 또는 가까운 후보 (marker null).
### 2.4 IngestReport
```json
{
"schema_version": "ingest_report.v1",
"scope": { "root": "/home/altair/KnowledgeBase", "include": ["**/*.md"], "exclude": [".git/**"] },
"scanned": 142, "new": 12, "updated": 3, "skipped": 127, "errors": 0,
"duration_ms": 4231,
"items": [
{
"kind": "new|updated|skipped|error",
"doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kebab-architecture.md",
"asset_id": "8c1e7d3f2a05",
"byte_len": 41822,
"block_count": 184,
"chunk_count": 38,
"parser_version": "pulldown-cmark-0.x",
"chunker_version": "md-heading-v1",
"warnings": [],
"error": null
}
]
}
```
`--summary-only``items: null`.
### 2.5 DocSummary (`kebab list docs`)
```json
{
"schema_version": "doc_summary.v1",
"doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kebab-architecture.md",
"title": "Rust 로컬 Knowledge Base 설계",
"lang": "ko",
"tags": ["knowledge-base", "rust", "rag"],
"trust_level": "primary",
"source_type": "markdown",
"byte_len": 41822,
"chunk_count": 38,
"created_at": "2026-04-27T00:00:00+09:00",
"updated_at": "2026-04-27T15:42:11+09:00",
"parser_version": "pulldown-cmark-0.x",
"chunker_version": "md-heading-v1"
}
```
### 2.6 ChunkInspection
```json
{
"schema_version": "chunk_inspection.v1",
"chunk_id": "9b4a8c1e7d3f2a05",
"doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kebab-architecture.md",
"heading_path": ["아키텍처", "Chunking 정책"],
"text": "heading boundary 우선…",
"source_spans": [{ "kind": "line", "start": 661, "end": 672 }],
"block_ids": ["b_0a", "b_0b"],
"token_estimate": 480,
"chunker_version": "md-heading-v1",
"embeddings": [
{ "model": "multilingual-e5-small", "dimensions": 384, "embedding_id": "e_2f1a" }
]
}
```
### 2.7 DoctorReport
```json
{
"schema_version": "doctor.v1",
"ok": true,
"checks": [
{ "name": "config_loaded", "ok": true, "detail": "~/.config/kebab/config.toml" },
{ "name": "data_dir_writable", "ok": true, "detail": "~/.local/share/kebab" },
{ "name": "sqlite_open", "ok": true, "detail": "kebab.sqlite (schema v1)" },
{ "name": "lancedb_open", "ok": true, "detail": "lancedb/" },
{ "name": "embedding_model", "ok": true, "detail": "multilingual-e5-small (384d)" },
{ "name": "ollama_reachable", "ok": true, "detail": "http://127.0.0.1:11434" },
{ "name": "ollama_model_pulled", "ok": false, "detail": "qwen2.5:14b-instruct missing", "hint": "ollama pull qwen2.5:14b-instruct" }
]
}
```
`ok=false` 가 1개 이상이면 root `ok=false`, exit 3.
### 2.8 Versioning 규칙
- 한 schema 안: 새 optional 필드 추가만 OK. 기존 필드 제거/타입변경/enum 값 제거 금지.
- 그 이상의 변경 → `*.v2.schema.json` 신설. CLI `--schema-version v1|v2`. default 최신.
- enum 값 추가 시 클라이언트는 unknown 무시 (forward compat).
---
## 3. 도메인 모델 (kebab-core)
### 3.1 Newtype IDs
```rust
#[derive(Clone, Debug, Eq, Hash, PartialEq, Serialize, Deserialize)] pub struct AssetId(pub String);
#[derive(Clone, Debug, Eq, Hash, PartialEq, Serialize, Deserialize)] pub struct DocumentId(pub String);
#[derive(Clone, Debug, Eq, Hash, PartialEq, Serialize, Deserialize)] pub struct BlockId(pub String);
#[derive(Clone, Debug, Eq, Hash, PartialEq, Serialize, Deserialize)] pub struct ChunkId(pub String);
#[derive(Clone, Debug, Eq, Hash, PartialEq, Serialize, Deserialize)] pub struct EmbeddingId(pub String);
#[derive(Clone, Debug, Eq, Hash, PartialEq, Serialize, Deserialize)] pub struct IndexId(pub String);
```
`Display`, `FromStr` 구현. 32-char hex.
### 3.2 Versions / labels
```rust
pub struct ParserVersion(pub String);
pub struct ChunkerVersion(pub String);
pub struct EmbeddingModelId(pub String);
pub struct EmbeddingVersion(pub String);
pub struct IndexVersion(pub String);
pub struct PromptTemplateVersion(pub String);
pub struct SchemaVersion(pub &'static str);
```
### 3.3 RawAsset
```rust
pub struct RawAsset {
pub asset_id: AssetId,
pub source_uri: SourceUri,
pub workspace_path: WorkspacePath,
pub media_type: MediaType,
pub byte_len: u64,
pub checksum: Checksum,
pub discovered_at: OffsetDateTime,
pub stored: AssetStorage,
}
pub enum SourceUri { File(PathBuf), Kb(String) }
pub struct WorkspacePath(pub String);
pub enum MediaType {
Markdown,
Pdf,
Image(ImageType),
Audio(AudioType),
Other(String),
}
pub enum AssetStorage {
Copied { path: PathBuf },
Reference{ path: PathBuf, sha: Checksum },
}
```
### 3.4 CanonicalDocument / Block / SourceSpan
```rust
pub struct CanonicalDocument {
pub doc_id: DocumentId,
pub source_asset_id: AssetId,
pub workspace_path: WorkspacePath,
pub title: String,
pub lang: Lang,
pub blocks: Vec<Block>,
pub metadata: Metadata,
pub provenance: Provenance,
pub parser_version: ParserVersion,
pub schema_version: u32,
pub doc_version: u32,
}
pub enum Block {
Heading(HeadingBlock),
Paragraph(TextBlock),
List(ListBlock),
Code(CodeBlock),
Table(TableBlock),
Quote(TextBlock),
ImageRef(ImageRefBlock),
AudioRef(AudioRefBlock),
}
pub struct CommonBlock {
pub block_id: BlockId,
pub heading_path: Vec<String>,
pub source_span: SourceSpan,
}
pub struct HeadingBlock { pub common: CommonBlock, pub level: u8, pub text: String }
pub struct TextBlock { pub common: CommonBlock, pub text: String, pub inlines: Vec<Inline> }
pub struct ListBlock { pub common: CommonBlock, pub ordered: bool, pub items: Vec<TextBlock> }
pub struct CodeBlock { pub common: CommonBlock, pub lang: Option<String>, pub code: String }
pub struct TableBlock { pub common: CommonBlock, pub headers: Vec<String>, pub rows: Vec<Vec<String>> }
pub struct ImageRefBlock{
pub common: CommonBlock,
pub asset_id: Option<AssetId>,
pub src: String,
pub alt: String,
pub ocr: Option<OcrText>,
pub caption: Option<ModelCaption>,
}
pub struct AudioRefBlock{
pub common: CommonBlock,
pub asset_id: AssetId,
pub duration_ms: u64,
pub transcript: Option<Transcript>,
}
pub enum Inline {
Text(String),
Code(String),
Link { text: String, href: String },
Strong(Vec<Inline>),
Emph(Vec<Inline>),
}
pub enum SourceSpan {
Line { start: u32, end: u32 },
Byte { start: u64, end: u64 },
Page { page: u32, char_start: Option<u32>, char_end: Option<u32> },
Region { x: u32, y: u32, w: u32, h: u32 },
Time { start_ms: u64, end_ms: u64 },
}
```
### 3.5 Chunk / Citation
```rust
pub struct Chunk {
pub chunk_id: ChunkId,
pub doc_id: DocumentId,
pub block_ids: Vec<BlockId>,
pub text: String,
pub heading_path: Vec<String>,
pub source_spans: Vec<SourceSpan>,
pub token_estimate: usize,
pub chunker_version: ChunkerVersion,
}
pub enum Citation {
Line { path: WorkspacePath, start: u32, end: u32, section: Option<String> },
Page { path: WorkspacePath, page: u32, section: Option<String> },
Region { path: WorkspacePath, x: u32, y: u32, w: u32, h: u32 },
Caption{ path: WorkspacePath, model: String },
Time { path: WorkspacePath, start_ms: u64, end_ms: u64, speaker: Option<String> },
}
impl Citation {
pub fn path(&self) -> &WorkspacePath;
pub fn to_uri(&self) -> String;
pub fn parse(s: &str) -> Result<Self>;
}
```
### 3.6 Metadata / Provenance
```rust
pub struct Metadata {
pub aliases: Vec<String>,
pub tags: Vec<String>,
pub created_at: OffsetDateTime,
pub updated_at: OffsetDateTime,
pub source_type: SourceType,
pub trust_level: TrustLevel,
pub user_id_alias: Option<String>,
pub user: serde_json::Map<String, serde_json::Value>,
}
pub enum SourceType { Markdown, Note, Paper, Reference, Inbox }
pub enum TrustLevel { Primary, Secondary, Generated }
pub struct Provenance { pub events: Vec<ProvenanceEvent> }
pub struct ProvenanceEvent {
pub at: OffsetDateTime,
pub agent: String,
pub kind: ProvenanceKind,
pub note: Option<String>,
}
pub enum ProvenanceKind {
Discovered, Parsed, Normalized, Chunked,
OcrApplied, CaptionApplied, Transcribed,
Embedded, Indexed, Warning, Error,
}
```
### 3.7 SearchQuery / SearchHit
```rust
pub enum SearchMode { Lexical, Vector, Hybrid }
pub struct SearchQuery {
pub text: String,
pub mode: SearchMode,
pub k: usize,
pub filters: SearchFilters,
}
pub struct SearchFilters {
pub tags_any: Vec<String>,
pub lang: Option<Lang>,
pub path_glob: Option<String>,
pub trust_min: Option<TrustLevel>,
}
pub struct SearchHit {
pub rank: u32,
pub chunk_id: ChunkId,
pub doc_id: DocumentId,
pub doc_path: WorkspacePath,
pub heading_path: Vec<String>,
pub section_label: Option<String>,
pub snippet: String,
pub citation: Citation,
pub retrieval: RetrievalDetail,
pub index_version: IndexVersion,
pub embedding_model: Option<EmbeddingModelId>,
pub chunker_version: ChunkerVersion,
}
pub struct RetrievalDetail {
pub method: SearchMode,
pub fusion_score: f32,
pub lexical_score: Option<f32>,
pub vector_score: Option<f32>,
pub lexical_rank: Option<u32>,
pub vector_rank: Option<u32>,
}
```
### 3.7a Forward-declared types
`Block::ImageRef` / `AudioRef` variant 은 v1 부터 존재하나, 그 안의 `ocr` / `caption` / `transcript` 필드는 P1 에선 항상 `None`. 다음 타입은 `kebab-core` 에 stub 으로 둠 (최종 도메인 모델 슬롯):
```rust
pub struct OcrText { pub joined: String, pub regions: Vec<OcrRegion>, pub engine: String, pub engine_version: String }
pub struct OcrRegion { pub bbox: (u32, u32, u32, u32), pub text: String, pub confidence: f32 }
pub struct ModelCaption { pub text: String, pub model: String, pub model_version: String }
pub struct Transcript { pub segments: Vec<TranscriptSegment>, pub engine: String, pub engine_version: String, pub language: Lang }
pub struct TranscriptSegment { pub start_ms: u64, pub end_ms: u64, pub text: String, pub speaker: Option<String>, pub confidence: Option<f32> }
pub struct Checksum(pub String); // full blake3 hex (64 chars)
pub struct Lang(pub String);
pub enum ImageType { Png, Jpeg, Webp, Gif, Tiff, Other(String) }
pub enum AudioType { M4a, Mp3, Wav, Flac, Ogg, Other(String) }
```
`ExtractConfig`, `DocFilter`, `JobKind`, `JobStatus`, `JobFilter`, `JobRow`, `JobId`, `VectorRecord`, `VectorHit`, `RefusalSignal`, `NoHitSignal`, `DoctorUnhealthy``kebab-core` 에 정의 (자세한 필드는 사용 시 결정, 이 spec 에서 forward-ref 만 보장).
`OffsetDateTime``time::OffsetDateTime`, `Result` 는 crate-local alias.
### 3.7b Parser intermediate types — `kebab-parse-types`
Parser 의 *중간* 표현 (`ParsedBlock` 류) 은 `kebab-core` 가 아니라 별도의 thin crate **`kebab-parse-types`** 에 둔다. 이유: `kebab-normalize` 는 medium-agnostic 한 ID/Provenance lift 를 책임지고 어떤 parser 도 직접 import 하면 안 된다. 그러나 normalize 에 들어오는 입력 타입이 어딘가에 정의되어야 하는데, 그것을 `kebab-core` 에 박으면 (a) parser-별 ParsedBlock 변종 (`ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`) 이 향후 합류할 때 core 의 namespace 가 폭발하고, (b) parser 의 의미 변경이 core 변경이 되어 모든 의존자가 영향을 받는다.
`kebab-parse-types` 는 이 둘 사이의 **유일한 layer** 다. 의존 그래프:
```text
kebab-core (도메인 모델 — Block, Chunk, SourceSpan, IDs, …)
kebab-parse-types (parser 중간 표현 — ParsedBlock, ParsedImageRegion[P+], ParsedPdfPage[P+], ParsedAudioSegment[P+], Inline)
▲ ▲
│ │
kebab-parse-md, kebab-parse-pdf, kebab-normalize
kebab-parse-image, kebab-parse-audio
```
`kebab-parse-types` 는:
- `kebab-core` 에만 의존 (`Block`, `SourceSpan`, `Lang` 등 도메인 타입 사용).
- 다른 어떤 `kebab-*` 에도 의존하지 않는다.
- 어떤 parser 의 구체 라이브러리 (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`) 에도 의존하지 않는다.
- serde + thiserror 정도의 외부 의존만 가진다.
P1 에서 정의되는 타입:
```rust
// kebab-parse-types — depends on kebab-core only.
pub struct ParsedBlock {
pub kind: ParsedBlockKind,
pub heading_path: Vec<String>,
pub source_span: kebab_core::SourceSpan,
pub payload: ParsedPayload,
}
pub enum ParsedBlockKind { Heading, Paragraph, List, Code, Table, Quote, ImageRef, AudioRef }
pub enum ParsedPayload {
Heading { level: u8, text: String },
Paragraph { text: String, inlines: Vec<kebab_core::Inline> },
List { ordered: bool, items: Vec<Vec<kebab_core::Inline>> },
Code { lang: Option<String>, code: String },
Table { headers: Vec<String>, rows: Vec<Vec<String>> },
Quote { text: String, inlines: Vec<kebab_core::Inline> },
ImageRef { src: String, alt: String },
AudioRef { src: String }, // duration_ms filled by extractor before chunking
}
pub struct Warning { pub kind: WarningKind, pub note: String }
pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed }
```
`Inline``kebab-core` (§3.4) 에 있는 도메인 타입. `kebab-parse-types` 는 그것을 *참조* 만 한다 — 같은 의미를 두 crate 에 중복 정의하지 않는다 (그러면 normalize 가 identity-conversion 을 해야 해서 무의미).
P6/P7/P8 에서 추가될 타입 (forward-ref):
```rust
pub struct ParsedImageRegion { /* OCR/EXIF 추출 전 단계 */ }
pub struct ParsedPdfPage { pub page: u32, pub text: String }
pub struct ParsedAudioSegment { pub start_ms: u64, pub end_ms: u64, pub text: String }
```
→ 새 medium 추가 시 `kebab-core::Block` 변종은 변하지 않고, `kebab-parse-types` 만 확장된다.
### 3.8 Answer / RAG types
```rust
pub struct Answer {
pub answer: String,
pub citations: Vec<AnswerCitation>,
pub grounded: bool,
pub refusal_reason: Option<RefusalReason>,
pub model: ModelRef,
pub embedding: Option<ModelRef>,
pub prompt_template_version: PromptTemplateVersion,
pub retrieval: AnswerRetrievalSummary,
pub usage: TokenUsage,
pub created_at: OffsetDateTime,
}
pub struct AnswerCitation { pub marker: Option<String>, pub citation: Citation }
pub enum RefusalReason { ScoreGate, LlmSelfJudge, NoIndex, NoChunks }
pub struct ModelRef {
pub id: String,
pub provider: String,
pub dimensions: Option<usize>,
}
pub struct AnswerRetrievalSummary {
pub trace_id: TraceId,
pub mode: SearchMode,
pub k: usize,
pub score_gate: f32,
pub top_score: f32,
pub chunks_returned: u32,
pub chunks_used: u32,
}
pub struct TokenUsage {
pub prompt_tokens: u32,
pub completion_tokens: u32,
pub latency_ms: u32,
}
pub struct TraceId(pub String);
```
---
## 4. ID 생성 recipe
규칙: 모든 ID = `blake3(canonical_json(tuple))` 의 hex prefix 32 chars.
### 4.1 canonical_json
- key 정렬 (BTreeMap / serde-json-canonicalizer)
- ASCII whitespace 없음
- UTF-8 NFC 정규화
- 숫자: integer/float 표준 표현
- 배열 순서 보존
### 4.2 Recipe
```rust
fn id_from<T: Serialize>(tuple: T) -> String {
let bytes = canonical_json::to_vec(&tuple).unwrap();
let hex = blake3::hash(&bytes).to_hex().to_string();
hex[..32].to_string()
}
```
```text
asset_id = id_from({ kind: "asset", asset_blake3: <full hex of raw bytes> })
doc_id = id_from({ kind: "doc", workspace_path, asset_id, parser_version })
block_id = id_from({ kind: "block", doc_id, block_kind, heading_path, ordinal, source_span })
chunk_id = id_from({ kind: "chunk", doc_id, chunker_version, block_ids, policy_hash })
embedding_id = id_from({ kind: "embedding", chunk_id, model_id, model_version, dimensions })
index_id = id_from({ kind: "index", collection, embedding_model, dimensions, index_version, index_kind, index_params_hash })
```
`workspace_path` 정규화: workspace root 기준 POSIX 슬래시, NFC, leading `./` 제거, 중복 슬래시 제거.
### 4.3 변경 영향 행렬
| 변경 | 영향 받는 ID |
|------|------------|
| 파일 내용 변경 | asset_id → doc_id → block_id → chunk_id → embedding_id |
| 파일 이동 (workspace 안) | doc_id → … |
| `parser_version` bump | doc_id → block_id → chunk_id → embedding_id |
| `chunker_version` 또는 policy 변경 | chunk_id → embedding_id |
| embedding model/version/dim 변경 | embedding_id |
| index 형상 변경 | index_id |
### 4.4 Tests
- 동일 입력 → 동일 ID (회귀 1000회).
- 입력 순서 미세 차이 → ID 변화 없음 (key 정렬).
- POSIX path 케이스 (`./a/b.md` vs `a/b.md`) → 동일.
- NFC 차이 한국어 글자 → 동일.
---
## 5. SQLite 스키마
`PRAGMA foreign_keys = ON; journal_mode = WAL; synchronous = NORMAL;`. UTF-8. timestamps RFC3339 TEXT.
### 5.1 Migrations meta
```sql
CREATE TABLE schema_meta (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
);
CREATE TABLE migrations (
id INTEGER PRIMARY KEY,
applied_at TEXT NOT NULL,
description TEXT NOT NULL
);
```
### 5.2 Assets
```sql
CREATE TABLE assets (
asset_id TEXT PRIMARY KEY,
source_uri TEXT NOT NULL,
workspace_path TEXT NOT NULL,
media_type TEXT NOT NULL,
byte_len INTEGER NOT NULL,
checksum TEXT NOT NULL,
storage_kind TEXT NOT NULL CHECK (storage_kind IN ('copied','reference')),
storage_path TEXT NOT NULL,
discovered_at TEXT NOT NULL
);
CREATE UNIQUE INDEX idx_assets_workspace_path ON assets(workspace_path);
CREATE INDEX idx_assets_media_type ON assets(media_type);
```
### 5.3 Documents
```sql
CREATE TABLE documents (
doc_id TEXT PRIMARY KEY,
asset_id TEXT NOT NULL REFERENCES assets(asset_id) ON DELETE RESTRICT,
workspace_path TEXT NOT NULL,
title TEXT,
lang TEXT,
source_type TEXT NOT NULL,
trust_level TEXT NOT NULL,
parser_version TEXT NOT NULL,
doc_version INTEGER NOT NULL,
schema_version INTEGER NOT NULL,
metadata_json TEXT NOT NULL,
provenance_json TEXT NOT NULL,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
CREATE UNIQUE INDEX idx_docs_workspace_path ON documents(workspace_path);
CREATE INDEX idx_docs_lang ON documents(lang);
CREATE INDEX idx_docs_source_type ON documents(source_type);
CREATE TABLE document_tags (
doc_id TEXT NOT NULL REFERENCES documents(doc_id) ON DELETE CASCADE,
tag TEXT NOT NULL,
PRIMARY KEY (doc_id, tag)
);
CREATE INDEX idx_document_tags_tag ON document_tags(tag);
```
### 5.4 Blocks
```sql
CREATE TABLE blocks (
block_id TEXT PRIMARY KEY,
doc_id TEXT NOT NULL REFERENCES documents(doc_id) ON DELETE CASCADE,
kind TEXT NOT NULL,
heading_path_json TEXT NOT NULL,
ordinal INTEGER NOT NULL,
source_span_json TEXT NOT NULL,
payload_json TEXT NOT NULL
);
CREATE INDEX idx_blocks_doc_id ON blocks(doc_id);
```
### 5.5 Chunks + FTS5
```sql
CREATE TABLE chunks (
chunk_id TEXT PRIMARY KEY,
doc_id TEXT NOT NULL REFERENCES documents(doc_id) ON DELETE CASCADE,
text TEXT NOT NULL,
heading_path_json TEXT NOT NULL,
section_label TEXT,
source_spans_json TEXT NOT NULL,
token_estimate INTEGER NOT NULL,
chunker_version TEXT NOT NULL,
policy_hash TEXT NOT NULL,
block_ids_json TEXT NOT NULL,
created_at TEXT NOT NULL
);
CREATE INDEX idx_chunks_doc_id ON chunks(doc_id);
CREATE INDEX idx_chunks_chunker_version ON chunks(chunker_version);
CREATE VIRTUAL TABLE chunks_fts USING fts5(
chunk_id UNINDEXED,
doc_id UNINDEXED,
heading_path,
text,
tokenize = 'unicode61 remove_diacritics 2'
);
CREATE TRIGGER chunks_ai AFTER INSERT ON chunks BEGIN
INSERT INTO chunks_fts(chunk_id, doc_id, heading_path, text)
VALUES (new.chunk_id, new.doc_id, new.heading_path_json, new.text);
END;
CREATE TRIGGER chunks_ad AFTER DELETE ON chunks BEGIN
DELETE FROM chunks_fts WHERE chunk_id = old.chunk_id;
END;
CREATE TRIGGER chunks_au AFTER UPDATE ON chunks BEGIN
DELETE FROM chunks_fts WHERE chunk_id = old.chunk_id;
INSERT INTO chunks_fts(chunk_id, doc_id, heading_path, text)
VALUES (new.chunk_id, new.doc_id, new.heading_path_json, new.text);
END;
```
### 5.6 Embedding records (P3)
```sql
CREATE TABLE embedding_records (
embedding_id TEXT PRIMARY KEY,
chunk_id TEXT NOT NULL REFERENCES chunks(chunk_id) ON DELETE CASCADE,
model_id TEXT NOT NULL,
model_version TEXT NOT NULL,
dimensions INTEGER NOT NULL,
lance_table TEXT NOT NULL,
created_at TEXT NOT NULL,
UNIQUE(chunk_id, model_id, model_version, dimensions)
);
CREATE INDEX idx_embed_chunk ON embedding_records(chunk_id);
CREATE INDEX idx_embed_model ON embedding_records(model_id, model_version, dimensions);
```
### 5.7 Jobs / IngestRuns / Answers / EvalRuns
```sql
CREATE TABLE jobs (
job_id TEXT PRIMARY KEY,
kind TEXT NOT NULL,
status TEXT NOT NULL CHECK (status IN ('pending','running','succeeded','failed','canceled')),
payload_json TEXT NOT NULL,
progress_json TEXT,
error_json TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL,
finished_at TEXT
);
CREATE INDEX idx_jobs_status ON jobs(status);
CREATE INDEX idx_jobs_kind ON jobs(kind);
CREATE TABLE ingest_runs (
run_id TEXT PRIMARY KEY,
scope_json TEXT NOT NULL,
scanned INTEGER NOT NULL,
new_count INTEGER NOT NULL,
updated_count INTEGER NOT NULL,
skipped_count INTEGER NOT NULL,
error_count INTEGER NOT NULL,
duration_ms INTEGER NOT NULL,
started_at TEXT NOT NULL,
finished_at TEXT NOT NULL,
items_json TEXT
);
CREATE TABLE answers (
trace_id TEXT PRIMARY KEY,
query TEXT NOT NULL,
answer TEXT NOT NULL,
grounded INTEGER NOT NULL,
refusal_reason TEXT,
model_id TEXT NOT NULL,
model_provider TEXT NOT NULL,
embedding_model_id TEXT,
embedding_dimensions INTEGER,
prompt_template_version TEXT NOT NULL,
retrieval_mode TEXT NOT NULL,
retrieval_k INTEGER NOT NULL,
score_gate REAL NOT NULL,
top_score REAL NOT NULL,
chunks_returned INTEGER NOT NULL,
chunks_used INTEGER NOT NULL,
citations_json TEXT NOT NULL,
packed_chunks_json TEXT,
prompt_tokens INTEGER,
completion_tokens INTEGER,
latency_ms INTEGER,
created_at TEXT NOT NULL
);
CREATE INDEX idx_answers_created_at ON answers(created_at);
CREATE INDEX idx_answers_grounded ON answers(grounded);
CREATE TABLE eval_runs (
run_id TEXT PRIMARY KEY,
suite TEXT NOT NULL,
config_snapshot_json TEXT NOT NULL,
aggregate_json TEXT NOT NULL,
commit_hash TEXT,
created_at TEXT NOT NULL
);
CREATE TABLE eval_query_results (
run_id TEXT NOT NULL REFERENCES eval_runs(run_id) ON DELETE CASCADE,
query_id TEXT NOT NULL,
result_json TEXT NOT NULL,
PRIMARY KEY (run_id, query_id)
);
```
### 5.8 트랜잭션 정책
- ingest 1 doc = 1 트랜잭션.
- bulk ingest 는 doc 단위 커밋.
- chunker/embedding 재처리 = 별도 job + per-chunk 트랜잭션.
### 5.9 마이그레이션
`migrations/V001__init.sql`, `V002__*.sql` 형식. 시작 시 `schema_meta.schema_version` 확인 → 누락된 마이그레이션 적용. 다운그레이드 미지원.
---
## 6. Filesystem + config layout
### 6.1 Path resolution (XDG)
| 종류 | 기본 위치 |
|------|-----------|
| 워크스페이스 | `~/KnowledgeBase/` |
| config | `~/.config/kebab/config.toml` |
| data | `~/.local/share/kebab/` |
| cache | `~/.cache/kebab/` |
| state (logs) | `~/.local/state/kebab/` |
`~`, `$HOME`, `${KEBAB_*}` expand. 절대 path 정규화 후 사용.
### 6.2 Workspace 구조
```
~/KnowledgeBase/
├── inbox/ notes/ papers/ photos/ recordings/
└── .kebabignore
```
`.kebabignore``config.workspace.exclude` 합집합.
### 6.3 Data dir 구조
```
~/.local/share/kebab/
├── kebab.sqlite (+ -wal, -shm)
├── lancedb/
│ └── chunk_embeddings_<model>_<dim>.lance/
├── assets/<aa>/<asset_id> # shard
├── artifacts/<doc_id>/ # ocr.json / caption.json / transcript.json / pdf-text.json
├── models/ # fastembed/ ollama 캐시 위임
└── runs/<run_id>/ # eval per_query.jsonl + report.md
```
### 6.4 Config (`~/.config/kebab/config.toml`) — frozen schema
```toml
schema_version = 1
[workspace]
root = "~/KnowledgeBase"
include = ["**/*.md"]
exclude = [".git/**", "node_modules/**", ".obsidian/**"]
[storage]
data_dir = "${XDG_DATA_HOME:-~/.local/share}/kebab"
sqlite = "{data_dir}/kebab.sqlite"
vector_dir = "{data_dir}/lancedb"
asset_dir = "{data_dir}/assets"
artifact_dir = "{data_dir}/artifacts"
model_dir = "{data_dir}/models"
runs_dir = "{data_dir}/runs"
copy_threshold_mb = 100
[indexing]
max_parallel_extractors = 2
max_parallel_embeddings = 1
watch_filesystem = false
[chunking]
target_tokens = 500
overlap_tokens = 80
respect_markdown_headings = true
chunker_version = "md-heading-v1"
[models.embedding]
provider = "fastembed"
model = "multilingual-e5-small"
version = "v1"
dimensions = 384
batch_size = 64
[models.llm]
provider = "ollama"
model = "qwen2.5:14b-instruct"
context_tokens = 32768
endpoint = "http://127.0.0.1:11434"
temperature = 0.0
seed = 0
[search]
default_k = 10
hybrid_fusion = "rrf"
rrf_k = 60
snippet_chars = 220
[rag]
prompt_template_version = "rag-v1"
score_gate = 0.30
explain_default = false
max_context_tokens = 8000
```
config 우선순위: default → file → env (`KB_<SECTION>_<KEY>`) → CLI flag.
### 6.5 `kebab init` 출력
```text
$ kebab init
created ~/.config/kebab/config.toml
created ~/.local/share/kebab/
created ~/KnowledgeBase/
opened ~/.local/share/kebab/kebab.sqlite (schema v1)
hint edit ~/.config/kebab/config.toml then `kebab ingest ~/KnowledgeBase`
```
기존 파일 보존, `--force` 명시 필요.
### 6.6 Permissions / portability
- 디렉토리 0o755, 파일 0o644.
- 항상 POSIX path 정규화 후 DB 저장. `to_posix` 단일 함수.
- 심볼릭 링크: 1차 follow + 무한루프 detect (`canonicalize` 후 set 추적).
---
## 7. Trait contracts (kebab-core)
### 7.1 입출력 보조
```rust
pub struct SourceScope { pub root: PathBuf, pub include: Vec<String>, pub exclude: Vec<String> }
pub struct ExtractContext<'a> { pub asset: &'a RawAsset, pub workspace_root: &'a Path, pub config: &'a ExtractConfig }
pub struct ChunkPolicy {
pub target_tokens: usize,
pub overlap_tokens: usize,
pub respect_markdown_headings: bool,
pub chunker_version: ChunkerVersion,
}
pub enum EmbeddingKind { Document, Query }
pub struct EmbeddingInput<'a> { pub text: &'a str, pub kind: EmbeddingKind }
pub struct GenerateRequest {
pub system: String,
pub user: String,
pub stop: Vec<String>,
pub max_tokens: usize,
pub temperature: f32,
pub seed: Option<u64>,
}
pub enum TokenChunk {
Token(String),
Done { finish_reason: FinishReason, usage: TokenUsage },
}
pub enum FinishReason { Stop, Length, Aborted, Error(String) }
```
### 7.2 트레잇
```rust
pub trait SourceConnector {
fn scan(&self, scope: &SourceScope) -> Result<Vec<RawAsset>>;
}
pub trait Extractor: Send + Sync {
fn supports(&self, media_type: &MediaType) -> bool;
fn parser_version(&self) -> ParserVersion;
fn extract(&self, ctx: &ExtractContext, bytes: &[u8]) -> Result<CanonicalDocument>;
}
pub trait Chunker: Send + Sync {
fn chunker_version(&self) -> ChunkerVersion;
fn policy_hash(&self, policy: &ChunkPolicy) -> String;
fn chunk(&self, doc: &CanonicalDocument, policy: &ChunkPolicy) -> Result<Vec<Chunk>>;
}
pub trait Embedder: Send + Sync {
fn model_id(&self) -> EmbeddingModelId;
fn model_version(&self) -> EmbeddingVersion;
fn dimensions(&self) -> usize;
fn embed(&self, inputs: &[EmbeddingInput]) -> Result<Vec<Vec<f32>>>;
}
pub trait Retriever: Send + Sync {
fn search(&self, query: &SearchQuery) -> Result<Vec<SearchHit>>;
fn index_version(&self) -> IndexVersion;
}
pub trait LanguageModel: Send + Sync {
fn model_ref(&self) -> ModelRef;
fn context_tokens(&self) -> usize;
fn generate_stream(
&self,
req: GenerateRequest,
) -> Result<Box<dyn Iterator<Item = Result<TokenChunk>> + Send>>;
}
pub trait DocumentStore {
fn put_asset(&self, a: &RawAsset) -> Result<()>;
fn put_document(&self, d: &CanonicalDocument) -> Result<()>;
fn put_blocks(&self, doc: &DocumentId, blocks: &[Block]) -> Result<()>;
fn put_chunks(&self, doc: &DocumentId, chunks: &[Chunk]) -> Result<()>;
fn get_document(&self, id: &DocumentId) -> Result<Option<CanonicalDocument>>;
fn get_chunk(&self, id: &ChunkId) -> Result<Option<Chunk>>;
fn list_documents(&self, filter: &DocFilter) -> Result<Vec<DocSummary>>;
}
pub trait VectorStore {
fn ensure_table(&self, model: &EmbeddingModelId, dim: usize) -> Result<IndexId>;
fn upsert(&self, recs: &[VectorRecord]) -> Result<()>;
fn search(&self, query_vec: &[f32], k: usize, filters: &SearchFilters) -> Result<Vec<VectorHit>>;
}
pub trait JobRepo {
fn create(&self, kind: JobKind, payload: serde_json::Value) -> Result<JobId>;
fn update_progress(&self, id: &JobId, progress: serde_json::Value) -> Result<()>;
fn finish(&self, id: &JobId, status: JobStatus, error: Option<&str>) -> Result<()>;
fn list(&self, filter: &JobFilter) -> Result<Vec<JobRow>>;
}
```
---
## 8. 모듈 경계 (Allowed / Forbidden)
```text
kebab-cli, kebab-tui, kebab-desktop
└─> kebab-app
├─> kebab-source-fs
├─> kebab-parse-md / kebab-parse-pdf / kebab-parse-image / kebab-parse-audio
│ └─> kebab-parse-types (parser intermediate)
├─> kebab-normalize
│ └─> kebab-parse-types
├─> kebab-chunk
├─> kebab-store-sqlite (DocumentStore, JobRepo, Retriever[lexical])
├─> kebab-store-vector (VectorStore)
├─> kebab-embed-local
├─> kebab-search (Retriever[hybrid])
├─> kebab-llm-local
├─> kebab-rag
├─> kebab-eval
└─> kebab-config
└─> kebab-core (모두 의존)
```
`kebab-parse-types``kebab-core` 와 parsers/normalize 사이의 thin layer (§3.7b 참조). parser-별 중간 표현 (`ParsedBlock`, `ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`, `Inline`) 을 한 곳에 모아 (a) `kebab-core` 의 namespace 폭발을 막고 (b) `kebab-normalize` 가 parser 를 직접 import 하지 않게 한다.
핵심 금지:
- UI → store/llm/parse 직접 의존 ✗
- parse-* → store/llm/embed ✗
- parse-* → kebab-normalize ✗ (단방향: parsers → kebab-parse-types ← normalize)
- chunk → llm/embed ✗
- normalize → store / parse-* ✗
- kebab-parse-types → 어떤 parser/normalize/store/llm/embed/search/rag/ui ✗ (`kebab-core` 만 의존)
- 다른 store 와 cross-write ✗
`cargo deny` + workspace deny.toml + CI 체크로 강제.
---
## 9. Versioning rules
| 식별자 | 변경 시 | bump 규칙 |
|--------|---------|-----------|
| `parser_version` | 파서 의미 변화 | semver-suffix string 상수 |
| `chunker_version` | chunk boundary/policy 변화 | 라벨 (`md-heading-v2`) |
| `policy_hash` | policy 값만 변경 | 자동 (config 해시) |
| `embedding_model.id` | 모델 교체 | 새 lance 테이블 |
| `embedding_model.version` | 같은 모델 가중치/토크나이저 변경 | bump |
| `embedding.dimensions` | 차원 변경 | 새 lance 테이블 강제 |
| `index_version` | retrieval 형상 변화 | bump |
| `prompt_template_version` | template 변경 | 코드 상수 (`rag-v2`) |
| DB `schema_version` | DDL 변경 | 마이그레이션 정수 증가 |
| wire schema (`*.v1`) | 깨는 변경 시 | `*.v2` 신설, v1 additive only |
| internal Rust struct | 자유 진화 | wire 분리되어 외부 영향 0 |
CI:
- 코드 변경 PR 에서 `parser_version` / `chunker_version` 동일하게 유지됐는데 동작 테스트 결과 다르면 fail.
- DDL 변경 있는데 마이그레이션 정수 미증가 fail.
- `v1` JSON schema 파일 변경 시 additive 검증.
---
## 10. 에러 모델 + exit codes
```rust
// kebab-core
pub enum CoreError { InvalidId, InvalidCitation, InvalidSpan, Malformed }
// crate-local examples
pub enum ParseMdError { Yaml(String), Encoding, Pulldown(String), Span }
pub enum StoreError { Sqlx(rusqlite::Error), Migration(String), Conflict(String) }
pub enum LlmError { Unreachable, ModelNotPulled(String), Timeout, Stream(String) }
```
Boundary (`kebab-app`, `kebab-cli`) 에서 `anyhow::Error` 합침. exit code 매핑:
```rust
fn exit_code(err: &anyhow::Error) -> i32 {
if err.downcast_ref::<RefusalSignal>().is_some() { return 1; }
if err.downcast_ref::<NoHitSignal>().is_some() { return 1; }
if err.downcast_ref::<DoctorUnhealthy>().is_some() { return 3; }
2
}
```
| 레벨 | 메시지 |
|------|--------|
| default | `error: <한 줄>\n hint: <조치>` |
| `--verbose` | + anyhow chain |
| `--debug` 또는 `RUST_LOG=debug` | + tracing target/level/span |
Refusal 은 에러 아님. `kebab ask` 거절은 정상 stdout (Answer with grounded=false) + exit 1.
Logging: `tracing` + `tracing-subscriber` + `tracing-appender` daily roll, `~/.local/state/kebab/logs/`. structured (`trace_id`, `doc_id`, `chunk_id`).
`kebab doctor` 출력 (사람):
```text
$ kebab doctor
✓ config_loaded ~/.config/kebab/config.toml
✓ data_dir_writable ~/.local/share/kebab
✓ sqlite_open kebab.sqlite (schema v1)
✓ lancedb_open lancedb/
✓ embedding_model multilingual-e5-small (384d)
✓ ollama_reachable http://127.0.0.1:11434
✗ ollama_model_pulled qwen2.5:14b-instruct missing
hint: ollama pull qwen2.5:14b-instruct
1 check failed.
```
---
## 11. 동결 범위 / 변경 정책
이 문서가 동결 ↔ 다음 컴포넌트 분해 작업이 안전:
- 모든 wire schema (`docs/wire-schema/v1/*.schema.json`)
- 모든 trait 시그니처 (kebab-core)
- 모든 ID recipe (4.2)
- SQLite DDL (5장)
- Filesystem + config schema (6장)
- 모듈 경계 (8장)
- exit codes / refusal 정책
**변경하려면**: 이 문서에 다이어그램이나 이슈 포인트를 명기 → 영향 범위 (파급 task 목록) 적시 → 그 후에만 task 분해 수정.
**의도적으로 빠진 것 (out of scope, P+)**:
- multi-workspace
- watch mode
- desktop app `kebab://` protocol handler
- LLM-as-judge eval
- visual embedding (CLIP)
- real-time collab
- enterprise auth
---
## 12. 다음 단계
1. 이 문서 검토.
2. 검토 통과 시 `tasks/_template.md` (작업 단위 spec 템플릿) 작성.
3. P1 (Markdown ingestion) 6 component task 로 분해 — 템플릿 적합성 검증.
4. 나머지 phase 일괄 분해 (~30 component task).
각 task 는 이 문서의 trait 시그니처 + wire schema + DDL 만 인용. 새 도메인 타입 / 새 trait 도입 금지 (이 문서 수정 절차 거쳐야 함).