docs(rename): kb → kebab — README, tasks/, docs/, design doc, report

마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-02 04:01:55 +00:00
parent f1a448d6dc
commit f9714aa5cb
56 changed files with 1324 additions and 1324 deletions

140
README.md
View File

@@ -1,8 +1,8 @@
# kb — Local-first Knowledge Base
# kebab — Local-first Knowledge Base
> **상태:** P0P4 구현 완료 (31 component task 중 17 완료) + 3건 post-merge hotfix 적용. `kb index` / `kb search --mode {lexical,vector,hybrid}` / `kb ask` 모두 실 동작. 다음 단계 = P5 (eval suite). 자세한 진행 상황은 [tasks/INDEX.md](tasks/INDEX.md), 머지 후 발견된 버그와 fix는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md).
> **상태:** P0P4 구현 완료 (31 component task 중 17 완료) + 3건 post-merge hotfix 적용. `kebab index` / `kebab search --mode {lexical,vector,hybrid}` / `kebab ask` 모두 실 동작. 다음 단계 = P5 (eval suite). 자세한 진행 상황은 [tasks/INDEX.md](tasks/INDEX.md), 머지 후 발견된 버그와 fix는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md).
`kb` 는 개인용 로컬 knowledge base + RAG 도구다. Markdown / PDF / 이미지 / 음성을 한 곳에 색인하고, 의미 검색 + citation 포함 LLM 답변을 단일 binary 로 제공한다. 모든 추론은 로컬 (Ollama / fastembed / whisper.cpp) 에서 돌아간다.
`kebab` 는 개인용 로컬 knowledge base + RAG 도구다. Markdown / PDF / 이미지 / 음성을 한 곳에 색인하고, 의미 검색 + citation 포함 LLM 답변을 단일 binary 로 제공한다. 모든 추론은 로컬 (Ollama / fastembed / whisper.cpp) 에서 돌아간다.
대상 하드웨어: M4 48GB MacBook 1대, 사용자 1명.
@@ -12,14 +12,14 @@
| 명령 | 동작 | 상태 |
|------|------|------|
| `kb init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 | ✅ P0 |
| `kb ingest [<path>]` | Markdown 색인 (idempotent). PDF/이미지/음성은 P6+. | ✅ P3-5 |
| `kb search --mode {lexical,vector,hybrid} "<query>"` | 검색 — citation 포함, hybrid는 RRF fusion | ✅ P3-5 |
| `kb list docs` | 색인된 문서 목록 | ✅ P3-5 |
| `kb inspect doc <id>` / `kb inspect chunk <id>` | raw record 보기 | ✅ P3-5 |
| `kb ask "<query>"` | RAG 답변 + 근거 인용. 근거 부족 시 거절. Ollama 필요. | ✅ P4-3 |
| `kb doctor` | 설정/모델/DB 헬스 체크 | ✅ P0 |
| `kb eval run / compare` | golden query 회귀 측정 | ⏳ P5 |
| `kebab init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 | ✅ P0 |
| `kebab ingest [<path>]` | Markdown 색인 (idempotent). PDF/이미지/음성은 P6+. | ✅ P3-5 |
| `kebab search --mode {lexical,vector,hybrid} "<query>"` | 검색 — citation 포함, hybrid는 RRF fusion | ✅ P3-5 |
| `kebab list docs` | 색인된 문서 목록 | ✅ P3-5 |
| `kebab inspect doc <id>` / `kebab inspect chunk <id>` | raw record 보기 | ✅ P3-5 |
| `kebab ask "<query>"` | RAG 답변 + 근거 인용. 근거 부족 시 거절. Ollama 필요. | ✅ P4-3 |
| `kebab doctor` | 설정/모델/DB 헬스 체크 | ✅ P0 |
| `kebab eval run / compare` | golden query 회귀 측정 | ⏳ P5 |
기계 친화 모드: 모든 명령에 `--json` 플래그. 출력은 frozen wire schema v1 (`schema_version` 필드 항상 포함, 예: `ingest_report.v1`, `search_hit.v1`, `answer.v1`, `doctor.v1`).
@@ -44,35 +44,35 @@
| citation 형식 | URI fragment (`path#L12-L34`, W3C Media Fragments) |
| ID 생성 | `blake3(canonical_json(tuple))[..32]` hex |
| RRF fusion_score | `[0, 1]` 정규화 — `2 / (k_rrf + 1)` 로 나눠 mode 간 비교 가능 (post-merge hotfix) |
| layout | XDG (`~/.local/share/kb/`, `~/.config/kb/`, …) |
| layout | XDG (`~/.local/share/kebab/`, `~/.config/kebab/`, …) |
전체는 [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](docs/superpowers/specs/2026-04-27-kb-final-form-design.md) 참조.
전체는 [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](docs/superpowers/specs/2026-04-27-kebab-final-form-design.md) 참조.
---
## 의존성 그래프
```text
kb-cli, kb-tui, kb-desktop
└─> kb-app
├─> kb-source-fs
├─> kb-parse-md / kb-parse-pdf / kb-parse-image / kb-parse-audio
│ └─> kb-parse-types
├─> kb-normalize
│ └─> kb-parse-types
├─> kb-chunk
├─> kb-store-sqlite
├─> kb-store-vector
├─> kb-embed-local (kb-embed trait crate)
├─> kb-search
├─> kb-llm-local (kb-llm trait crate)
├─> kb-rag
├─> kb-eval
└─> kb-config
└─> kb-core (모두 의존)
kebab-cli, kebab-tui, kebab-desktop
└─> kebab-app
├─> kebab-source-fs
├─> kebab-parse-md / kebab-parse-pdf / kebab-parse-image / kebab-parse-audio
│ └─> kebab-parse-types
├─> kebab-normalize
│ └─> kebab-parse-types
├─> kebab-chunk
├─> kebab-store-sqlite
├─> kebab-store-vector
├─> kebab-embed-local (kebab-embed trait crate)
├─> kebab-search
├─> kebab-llm-local (kebab-llm trait crate)
├─> kebab-rag
├─> kebab-eval
└─> kebab-config
└─> kebab-core (모두 의존)
```
UI → store/llm/parse 직접 의존 금지. 모든 user-facing 진입은 `kb-app` facade 만 통한다 (design §8). `kb-cli``--config <path>` flag 를 honor 하려면 `kb_app::*_with_config(cfg, …)` companion 을 통해 Config 을 명시적으로 thread 하는 패턴 — 자세한 이유는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md) 의 `--config` 항목.
UI → store/llm/parse 직접 의존 금지. 모든 user-facing 진입은 `kebab-app` facade 만 통한다 (design §8). `kebab-cli``--config <path>` flag 를 honor 하려면 `kebab_app::*_with_config(cfg, …)` companion 을 통해 Config 을 명시적으로 thread 하는 패턴 — 자세한 이유는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md) 의 `--config` 항목.
---
@@ -80,16 +80,16 @@ UI → store/llm/parse 직접 의존 금지. 모든 user-facing 진입은 `kb-ap
| Phase | 내용 | 핵심 산출 crate | 선행 | 상태 |
|-------|------|----------------|------|------|
| **P0** | Workspace 뼈대 + 도메인 계약 + ID recipe | `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` | | ✅ 완료 |
| **P1** | Markdown ingestion (walk → parse → chunk → SQLite) | `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-sqlite` | P0 | ✅ 완료 |
| **P2** | SQLite FTS5 lexical 검색 + citation | `kb-search` (lexical) | P1 | ✅ 완료 |
| **P3** | Local embedding + LanceDB + hybrid (RRF) + kb-app wiring | `kb-embed`, `kb-embed-local`, `kb-store-vector`, `kb-search` | P2 | ✅ 완료 |
| **P4** | Local LLM + RAG + grounded answer | `kb-llm`, `kb-llm-local`, `kb-rag` | P3 | ✅ 완료 |
| **P5** | Golden query / regression eval | `kb-eval` | P4 | ⏳ 다음 |
| **P6** | 이미지 ingestion (OCR + caption) | `kb-parse-image` | P5 | ⏳ |
| **P7** | PDF text + page citation | `kb-parse-pdf` | P5 | ⏳ |
| **P8** | 음성 transcription + timestamp citation | `kb-parse-audio` | P5 | ⏳ |
| **P9** | TUI + desktop app | `kb-tui`, `kb-desktop` | P5 | ⏳ |
| **P0** | Workspace 뼈대 + 도메인 계약 + ID recipe | `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` | | ✅ 완료 |
| **P1** | Markdown ingestion (walk → parse → chunk → SQLite) | `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-sqlite` | P0 | ✅ 완료 |
| **P2** | SQLite FTS5 lexical 검색 + citation | `kebab-search` (lexical) | P1 | ✅ 완료 |
| **P3** | Local embedding + LanceDB + hybrid (RRF) + kebab-app wiring | `kebab-embed`, `kebab-embed-local`, `kebab-store-vector`, `kebab-search` | P2 | ✅ 완료 |
| **P4** | Local LLM + RAG + grounded answer | `kebab-llm`, `kebab-llm-local`, `kebab-rag` | P3 | ✅ 완료 |
| **P5** | Golden query / regression eval | `kebab-eval` | P4 | ⏳ 다음 |
| **P6** | 이미지 ingestion (OCR + caption) | `kebab-parse-image` | P5 | ⏳ |
| **P7** | PDF text + page citation | `kebab-parse-pdf` | P5 | ⏳ |
| **P8** | 음성 transcription + timestamp citation | `kebab-parse-audio` | P5 | ⏳ |
| **P9** | TUI + desktop app | `kebab-tui`, `kebab-desktop` | P5 | ⏳ |
P0~P5 직렬. P6~P9 P5 이후 병렬 가능.
@@ -100,13 +100,13 @@ P0~P5 직렬. P6~P9 P5 이후 병렬 가능.
## 디렉토리 구조
```text
kb/
kebab/
├── README.md # 이 파일
├── kb_local_rust_report.md # 최초 설계 보고서 (방향성 + 근거)
├── kebab_local_rust_report.md # 최초 설계 보고서 (방향성 + 근거)
├── docs/
│ ├── superpowers/
│ │ ├── specs/
│ │ │ └── 2026-04-27-kb-final-form-design.md # frozen design (12 sections)
│ │ │ └── 2026-04-27-kebab-final-form-design.md # frozen design (12 sections)
│ │ └── plans/
│ │ └── 2026-04-27-task-decomposition.md # task 분해 implementation plan
│ ├── SMOKE.md # 로컬 워크스페이스에 직접 돌려보는 절차
@@ -127,19 +127,19 @@ kb/
│ ├── p8/p8-1, p8-2 # (2)
│ └── p9/p9-1 … p9-5 # (5)
├── crates/
│ ├── kb-core/ kb-parse-types/ kb-config/ # 도메인 + 설정 (P0)
│ ├── kb-source-fs/ # 워크스페이스 walk + checksum (P1-1)
│ ├── kb-parse-md/ # Markdown frontmatter + blocks (P1-2/3)
│ ├── kb-normalize/ # ParsedBlock → CanonicalDocument (P1-4)
│ ├── kb-chunk/ # heading-aware chunker (P1-5)
│ ├── kb-store-sqlite/ # SQLite + FTS5 (V001/V002/V003) (P1-6, P2-1, P3-3)
│ ├── kb-search/ # Lexical + Vector + Hybrid retriever (P2-2, P3-4)
│ ├── kb-embed/ kb-embed-local/ # Embedder trait + fastembed adapter (P3-1, P3-2)
│ ├── kb-store-vector/ # LanceDB VectorStore (P3-3)
│ ├── kb-llm/ kb-llm-local/ # LanguageModel trait + Ollama adapter (P4-1, P4-2)
│ ├── kb-rag/ # RAG pipeline (P4-3)
│ ├── kb-app/ # facade (P0 시그니처 + P3-5 본체)
│ └── kb-cli/ # binary (P0 → 핫픽스로 --config flag wiring 강화)
│ ├── kebab-core/ kebab-parse-types/ kebab-config/ # 도메인 + 설정 (P0)
│ ├── kebab-source-fs/ # 워크스페이스 walk + checksum (P1-1)
│ ├── kebab-parse-md/ # Markdown frontmatter + blocks (P1-2/3)
│ ├── kebab-normalize/ # ParsedBlock → CanonicalDocument (P1-4)
│ ├── kebab-chunk/ # heading-aware chunker (P1-5)
│ ├── kebab-store-sqlite/ # SQLite + FTS5 (V001/V002/V003) (P1-6, P2-1, P3-3)
│ ├── kebab-search/ # Lexical + Vector + Hybrid retriever (P2-2, P3-4)
│ ├── kebab-embed/ kebab-embed-local/ # Embedder trait + fastembed adapter (P3-1, P3-2)
│ ├── kebab-store-vector/ # LanceDB VectorStore (P3-3)
│ ├── kebab-llm/ kebab-llm-local/ # LanguageModel trait + Ollama adapter (P4-1, P4-2)
│ ├── kebab-rag/ # RAG pipeline (P4-3)
│ ├── kebab-app/ # facade (P0 시그니처 + P3-5 본체)
│ └── kebab-cli/ # binary (P0 → 핫픽스로 --config flag wiring 강화)
├── migrations/ # SQLite refinery V001/V002/V003
└── fixtures/ # 테스트 fixture 트리
```
@@ -153,19 +153,19 @@ kb/
cargo build --release
# 첫 실행 — XDG 경로에 config.toml 생성
./target/release/kb init
./target/release/kebab init
# config 손보고
${EDITOR:-vi} ~/.config/kb/config.toml
${EDITOR:-vi} ~/.config/kebab/config.toml
# 색인
./target/release/kb ingest
./target/release/kebab ingest
# 검색
./target/release/kb search "Markdown chunking 규칙" --mode hybrid
./target/release/kebab search "Markdown chunking 규칙" --mode hybrid
# 질문 (Ollama 필요)
./target/release/kb ask "내 KB 설계에서 저장소 전략은?"
./target/release/kebab ask "내 KB 설계에서 저장소 전략은?"
```
워크스페이스를 격리해서 직접 돌려보는 패턴은 [docs/SMOKE.md](docs/SMOKE.md) 참조 — `--config <path>` 로 임시 디렉토리에 격리된 KB 를 만들 수 있다.
@@ -181,17 +181,17 @@ ${EDITOR:-vi} ~/.config/kb/config.toml
- multi-workspace (P+ 후순위)
- LLM-as-judge eval (rule-based `must_contain` 만)
- visual embedding (CLIP) — P+
- desktop app `kb://` protocol handler — P+
- desktop app `kebab://` protocol handler — P+
---
## 외부 AI 통합
`kb``--json` 모드 + frozen wire schema v1 은 외부 자동화의 stable contract. 가능한 통합:
`kebab``--json` 모드 + frozen wire schema v1 은 외부 자동화의 stable contract. 가능한 통합:
1. **Claude Code / Codex skill** — 얇은 wrapper (`kb search --json` / `kb ask --json` 호출). ~50 lines.
2. **MCP server**`kb-mcp` binary (stdio JSON-RPC) 가 `kb-app` facade 를 1:1 노출. Claude Desktop / Cursor / Zed 등 공유.
3. **HTTP wrapper**`kb serve --bind 127.0.0.1:7711` (P+, local-only 가치 깨므로 신중).
1. **Claude Code / Codex skill** — 얇은 wrapper (`kebab search --json` / `kebab ask --json` 호출). ~50 lines.
2. **MCP server**`kebab-mcp` binary (stdio JSON-RPC) 가 `kebab-app` facade 를 1:1 노출. Claude Desktop / Cursor / Zed 등 공유.
3. **HTTP wrapper**`kebab serve --bind 127.0.0.1:7711` (P+, local-only 가치 깨므로 신중).
---
@@ -199,7 +199,7 @@ ${EDITOR:-vi} ~/.config/kb/config.toml
이 repo 는 단일 사용자 프로젝트지만 spec 변경 절차는 명문화되어 있다.
1. **frozen design 변경**`docs/superpowers/specs/2026-04-27-kb-final-form-design.md` 가 단일 contract. 변경 시 영향 받는 component task 모두 동시 갱신 필요. PR 1개로 묶기.
1. **frozen design 변경**`docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` 가 단일 contract. 변경 시 영향 받는 component task 모두 동시 갱신 필요. PR 1개로 묶기.
2. **새 component task 추가**`tasks/_template.md` 복사 후 `tasks/p<phase>/p<phase>-<n>-<name>.md` 생성. `contract_sections` 에 design doc 섹션 명시. `Allowed/Forbidden dependencies` 는 design §8 module-boundary 표 따름.
3. **구현** — component task 1개당 sub-agent 1세션 권장. `cargo test -p <crate>` + DoD 체크리스트 통과. PR 으로 머지.
4. **버전 변경**`parser_version` / `chunker_version` / `embedding_version` 등 변경은 design §9 의 cascade rule 따름. 영향 받는 record 는 재처리 필요.
@@ -215,8 +215,8 @@ ${EDITOR:-vi} ~/.config/kb/config.toml
## 참고
- 최초 설계 보고서: [kb_local_rust_report.md](kb_local_rust_report.md)
- Frozen design: [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](docs/superpowers/specs/2026-04-27-kb-final-form-design.md)
- 최초 설계 보고서: [kebab_local_rust_report.md](kebab_local_rust_report.md)
- Frozen design: [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](docs/superpowers/specs/2026-04-27-kebab-final-form-design.md)
- Task 분해 plan: [docs/superpowers/plans/2026-04-27-task-decomposition.md](docs/superpowers/plans/2026-04-27-task-decomposition.md)
- Task 인덱스: [tasks/INDEX.md](tasks/INDEX.md)
- Post-merge 핫픽스 로그: [tasks/HOTFIXES.md](tasks/HOTFIXES.md)

View File

@@ -1,21 +1,21 @@
---
title: "kb 스모크 실행 가이드"
title: "kebab 스모크 실행 가이드"
date: 2026-05-01
---
# kb 스모크 실행 가이드
# kebab 스모크 실행 가이드
P3-5 머지 후 (`kb-app::ingest` / `search` / `list` / `inspect` 와이어링) 부터, 그리고 P4-3 머지 후 (`kb ask` 와이어링) 부터 사용자가 자기 설치본을 직접 검증할 수 있다. 이 문서는 사용자 환경 (`~/.config/kb/`, `~/.local/share/kb/`) 을 건드리지 않고 임시 디렉토리에 격리된 KB 를 띄워 전체 파이프라인을 1세션 안에 한 번 돌리는 절차다.
P3-5 머지 후 (`kebab-app::ingest` / `search` / `list` / `inspect` 와이어링) 부터, 그리고 P4-3 머지 후 (`kebab ask` 와이어링) 부터 사용자가 자기 설치본을 직접 검증할 수 있다. 이 문서는 사용자 환경 (`~/.config/kebab/`, `~/.local/share/kebab/`) 을 건드리지 않고 임시 디렉토리에 격리된 KB 를 띄워 전체 파이프라인을 1세션 안에 한 번 돌리는 절차다.
## 준비
빌드:
```bash
cargo build --release -p kb-cli # debug 도 무방. 디버그가 더 빠르게 빌드됨.
cargo build --release -p kebab-cli # debug 도 무방. 디버그가 더 빠르게 빌드됨.
```
원격 Ollama (선택, `kb ask` 만 필요):
원격 Ollama (선택, `kebab ask` 만 필요):
```bash
# Mac 등 별도 호스트에서
@@ -34,8 +34,8 @@ curl http://<host>:11434/api/tags
## 격리된 워크스페이스 생성
```bash
mkdir -p /tmp/kb-smoke/{workspace,data}
cat > /tmp/kb-smoke/workspace/intro.md <<'EOF'
mkdir -p /tmp/kebab-smoke/{workspace,data}
cat > /tmp/kebab-smoke/workspace/intro.md <<'EOF'
---
title: 인사말
tags: [demo]
@@ -51,19 +51,19 @@ EOF
## 격리된 config
`/tmp/kb-smoke/config.toml`:
`/tmp/kebab-smoke/config.toml`:
```toml
schema_version = 1
[workspace]
root = "/tmp/kb-smoke/workspace"
root = "/tmp/kebab-smoke/workspace"
include = ["**/*.md"]
exclude = [".git/**", "node_modules/**", ".obsidian/**"]
[storage]
data_dir = "/tmp/kb-smoke/data"
sqlite = "{data_dir}/kb.sqlite"
data_dir = "/tmp/kebab-smoke/data"
sqlite = "{data_dir}/kebab.sqlite"
vector_dir = "{data_dir}/lancedb"
asset_dir = "{data_dir}/assets"
artifact_dir = "{data_dir}/artifacts"
@@ -110,12 +110,12 @@ explain_default = false
max_context_tokens = 6000
```
`KB_*` 환경변수로 override 가능 (`KB_MODELS_LLM_MODEL=qwen2.5:32b kb …` 등). 자세한 키 목록은 `crates/kb-config/src/lib.rs``apply_env` 매치 암.
`KEBAB_*` 환경변수로 override 가능 (`KEBAB_MODELS_LLM_MODEL=qwen2.5:32b kebab …` 등). 자세한 키 목록은 `crates/kebab-config/src/lib.rs``apply_env` 매치 암.
## 명령 시퀀스
```bash
KB() { ./target/debug/kb --config /tmp/kb-smoke/config.toml "$@"; }
KEBAB() { ./target/debug/kebab --config /tmp/kebab-smoke/config.toml "$@"; }
KB doctor # 1. health check
KB ingest # 2. 워크스페이스 색인
@@ -128,31 +128,31 @@ KB ask "이 KB 안에서 ..." --mode hybrid --k 5 # 8. RAG 답변 (Ollama
KB --json ask "..." --mode hybrid # 9. 기계 친화 출력 검증
```
각 명령은 0 종료 코드면 정상. `kb ask` 는 거절 시 종료 코드 1 (`RefusalSignal`) — 의도된 동작.
각 명령은 0 종료 코드면 정상. `kebab ask` 는 거절 시 종료 코드 1 (`RefusalSignal`) — 의도된 동작.
## 검증 체크리스트
- `kb doctor``--config` path 를 honor 하고 그 안의 `storage.data_dir` 를 출력 (XDG default 가 아님).
- `kb ingest` idempotent — 두 번째 실행이 `new=0 updated=N`.
- `kb list docs` 출력에 frontmatter 의 `title` 이 아닌 deterministic `doc_id` (32-hex) + `workspace_path` 가 보임.
- `kb search --mode hybrid``fusion_score``[0, 1]` 범위 (top-1 종종 1.0 — 두 retriever 모두 rank 1 일 때).
- `kb ask` JSON 응답에 `model.id` 가 config 의 모델 (`gemma4:26b` 등) 과 일치, `embedding.id = multilingual-e5-small`, `citations[].marker``[1]` / `[2]` 형식 (square-bracketed bare index).
- 코퍼스에 없는 주제로 `kb ask``refusal_reason: "llm_self_judge"` (또는 `no_chunks` / `score_gate`) + `grounded: false`.
- `kebab doctor``--config` path 를 honor 하고 그 안의 `storage.data_dir` 를 출력 (XDG default 가 아님).
- `kebab ingest` idempotent — 두 번째 실행이 `new=0 updated=N`.
- `kebab list docs` 출력에 frontmatter 의 `title` 이 아닌 deterministic `doc_id` (32-hex) + `workspace_path` 가 보임.
- `kebab search --mode hybrid``fusion_score``[0, 1]` 범위 (top-1 종종 1.0 — 두 retriever 모두 rank 1 일 때).
- `kebab ask` JSON 응답에 `model.id` 가 config 의 모델 (`gemma4:26b` 등) 과 일치, `embedding.id = multilingual-e5-small`, `citations[].marker``[1]` / `[2]` 형식 (square-bracketed bare index).
- 코퍼스에 없는 주제로 `kebab ask``refusal_reason: "llm_self_judge"` (또는 `no_chunks` / `score_gate`) + `grounded: false`.
## 정리
```bash
rm -rf /tmp/kb-smoke/data # 데이터만 날리고 다시 ingest 가능
rm -rf /tmp/kb-smoke # 통째로 정리
rm -rf /tmp/kebab-smoke/data # 데이터만 날리고 다시 ingest 가능
rm -rf /tmp/kebab-smoke # 통째로 정리
```
`~/.config/kb/``~/.local/share/kb/` 는 한 번도 터치되지 않는다 (`--config` flag 가 정확히 honor 되는 경우 — P3-5 hotfix 이후 보장).
`~/.config/kebab/``~/.local/share/kebab/` 는 한 번도 터치되지 않는다 (`--config` flag 가 정확히 honor 되는 경우 — P3-5 hotfix 이후 보장).
## 알려진 동작
-`kb ingest` 시 fastembed 모델 다운로드 (~470MB) — `data_dir/models/fastembed/` 에 캐시.
- `kb ask` 응답 시간 = LLM 토큰 throughput 에 종속. M4 Pro 48GB + gemma4:26b 기준 답변 50100 토큰에 2055초.
- `--config` path 가 존재하지 않거나 malformed 면 `kb doctor` 가 hard fail (defaults 가 silently mask 하지 않게 하는 hotfix 동작).
-`kebab ingest` 시 fastembed 모델 다운로드 (~470MB) — `data_dir/models/fastembed/` 에 캐시.
- `kebab ask` 응답 시간 = LLM 토큰 throughput 에 종속. M4 Pro 48GB + gemma4:26b 기준 답변 50100 토큰에 2055초.
- `--config` path 가 존재하지 않거나 malformed 면 `kebab doctor` 가 hard fail (defaults 가 silently mask 하지 않게 하는 hotfix 동작).
- 매 CLI invocation 마다 fastembed 모델 init 비용 (~4초) — process-level 캐시 부재 때문. P9 TUI 진입 시 `App``OnceLock` 으로 세션 동안 한 번만 init.
자세한 history 와 발견된 버그는 [tasks/HOTFIXES.md](../tasks/HOTFIXES.md) 참조.

View File

@@ -5,8 +5,8 @@ When implementing tasks against this codebase:
- Treat the frozen design doc as the single source of truth. Do not invent
new fields, traits, or enum variants.
- Prefer editing existing files to creating new ones; reuse types from
`kb-core` instead of duplicating shapes.
`kebab-core` instead of duplicating shapes.
- For each task, follow the task spec under `tasks/p<N>/p<N>-<i>.md`.
Canonical source:
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §11 + §12.
[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §11 + §12.

View File

@@ -4,4 +4,4 @@ Medium-agnostic representation of a document with `Block`s, `SourceSpan`s,
and provenance.
Canonical source:
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3.4 + §3.7a.
[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3.4 + §3.7a.

View File

@@ -5,4 +5,4 @@
`policy_hash` so chunk IDs include the policy.
Canonical source:
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3.5 + §7.1 + §7.2.
[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3.5 + §7.1 + §7.2.

View File

@@ -4,4 +4,4 @@ Citations use W3C Media Fragments URIs to locate evidence inside a
document. Five variants: `Line`, `Page`, `Region`, `Caption`, `Time`.
Canonical source:
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3.5 + §0 Q3.
[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3.5 + §0 Q3.

View File

@@ -1,6 +1,6 @@
# Domain model
The domain types live in `kb-core` and mirror the frozen design exactly.
The domain types live in `kebab-core` and mirror the frozen design exactly.
Canonical source:
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3.
[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3.

View File

@@ -1,6 +1,6 @@
# ID recipe
All `kb-*` IDs are 32 hex chars: the first 32 of `blake3(canonical_json(tuple))`.
All `kebab-*` IDs are 32 hex chars: the first 32 of `blake3(canonical_json(tuple))`.
Canonical source:
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §4.
[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §4.

View File

@@ -1,8 +1,8 @@
# Module boundaries
`kb-core` is leaf — every other crate depends on it. Parsers depend on
`kb-parse-types` (not on `kb-normalize`); `kb-normalize` depends on
`kb-parse-types` (not on parsers). UI crates depend only on `kb-app`.
`kebab-core` is leaf — every other crate depends on it. Parsers depend on
`kebab-parse-types` (not on `kebab-normalize`); `kebab-normalize` depends on
`kebab-parse-types` (not on parsers). UI crates depend only on `kebab-app`.
Canonical source:
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §8.
[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §8.

View File

@@ -8,11 +8,11 @@
- Phase A — Author the canonical task spec template (`tasks/_template.md`).
- Phase B — Decompose P1 (Markdown ingestion) into 6 component task specs to validate the template.
- Phase C — After Phase B passes review, decompose P0 + P2..P9 into the remaining ~24 task specs in one pass.
- Each task spec cites only the frozen design doc (`docs/superpowers/specs/2026-04-27-kb-final-form-design.md`) for types, traits, schema, layout. No new domain types or traits are introduced inside task specs.
- Each task spec cites only the frozen design doc (`docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`) for types, traits, schema, layout. No new domain types or traits are introduced inside task specs.
**Tech Stack:** Plain Markdown documents under `tasks/`. No code changes in this plan — produces specs that AI sub-agents will later implement against.
**Frozen contract source:** [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../specs/2026-04-27-kb-final-form-design.md). All task specs reference this. Modifications to the contract require updating that file first, then re-checking dependent task specs.
**Frozen contract source:** [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../specs/2026-04-27-kebab-final-form-design.md). All task specs reference this. Modifications to the contract require updating that file first, then re-checking dependent task specs.
**Phase task index (target file layout):**
@@ -83,7 +83,7 @@ Existing per-phase epic files (`tasks/phase-0-skeleton.md` … `phase-9-ui.md`)
- [ ] **Step 1: Verify the design doc path resolves**
```bash
test -f docs/superpowers/specs/2026-04-27-kb-final-form-design.md && echo OK
test -f docs/superpowers/specs/2026-04-27-kebab-final-form-design.md && echo OK
```
Expected: `OK`
@@ -101,7 +101,7 @@ title: "<Component title>"
status: planned
depends_on: [] # other task_ids
unblocks: [] # other task_ids
contract_source: ../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
---
@@ -117,7 +117,7 @@ contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
## Allowed dependencies
- `kb-core`
- `kebab-core`
- <other crates per design §8>
- <external crates with versions>
@@ -166,7 +166,7 @@ If any item here is needed during implementation, STOP and update the frozen des
| unit | ... | ... |
| snapshot | ... (JSON freeze) | `fixtures/...` |
| contract | trait round-trip | mock impls |
| integration | end-to-end via `kb-app` facade | tmp workspace |
| integration | end-to-end via `kebab-app` facade | tmp workspace |
All tests must run under `cargo test -p <crate>` and not require external network or Ollama unless explicitly stated.
@@ -247,7 +247,7 @@ git add tasks/p1 tasks/INDEX.md
git commit -m "tasks: prepare P1 component decomposition skeleton"
```
### Task B1: `p1-1-source-fs.md` (kb-source-fs)
### Task B1: `p1-1-source-fs.md` (kebab-source-fs)
**Files:**
- Create: `tasks/p1/p1-1-source-fs.md`
@@ -259,13 +259,13 @@ Write the following content to `tasks/p1/p1-1-source-fs.md`:
````markdown
---
phase: P1
component: kb-source-fs
component: kebab-source-fs
task_id: p1-1
title: "Local filesystem source connector"
status: planned
depends_on: [p0-1]
unblocks: [p1-2, p1-3, p1-4, p1-5, p1-6]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.3, §6.2, §6.6, §7.1, §7.2 SourceConnector, §8]
---
@@ -281,8 +281,8 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `ignore` (gitignore semantics)
- `blake3`
- `walkdir`
@@ -293,21 +293,21 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
## Forbidden dependencies
- `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `SourceScope` | `kb_core::SourceScope` | `kb-app` from config |
| `SourceScope` | `kebab_core::SourceScope` | `kebab-app` from config |
| filesystem | `&Path` | OS |
| `.kbignore` | text file | workspace root, optional |
| `.kebabignore` | text file | workspace root, optional |
## Outputs
| output | type | downstream consumer |
|--------|------|---------------------|
| `Vec<RawAsset>` | `kb_core::RawAsset` | `kb-parse-md`, asset writer in `kb-store-sqlite` (via `kb-app`) |
| `Vec<RawAsset>` | `kebab_core::RawAsset` | `kebab-parse-md`, asset writer in `kebab-store-sqlite` (via `kebab-app`) |
## Public surface (signatures only — no new types)
@@ -315,11 +315,11 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
pub struct FsSourceConnector { /* internal */ }
impl FsSourceConnector {
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
}
impl kb_core::SourceConnector for FsSourceConnector {
fn scan(&self, scope: &kb_core::SourceScope) -> anyhow::Result<Vec<kb_core::RawAsset>>;
impl kebab_core::SourceConnector for FsSourceConnector {
fn scan(&self, scope: &kebab_core::SourceScope) -> anyhow::Result<Vec<kebab_core::RawAsset>>;
}
```
@@ -329,7 +329,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
- `asset_id` derived per design §4.2 from `blake3(raw bytes)` full hex.
- `media_type` selected from extension + libmagic-like sniff fallback (`.md` → Markdown, others fall through to `MediaType::Other`).
- `discovered_at` = current `OffsetDateTime::now_utc()` at scan time.
- Combine `config.workspace.exclude` `.kbignore` for filter (union; ordering does not matter).
- Combine `config.workspace.exclude` `.kebabignore` for filter (union; ordering does not matter).
- Symbolic links: follow once, detect cycles via `canonicalize` + visited set.
- Files larger than `storage.copy_threshold_mb` MB → emit `AssetStorage::Reference { path, sha }` (do not copy bytes here; copying is done by the asset writer task).
- Idempotent: same input → same `Vec<RawAsset>` (sort by `workspace_path`).
@@ -337,7 +337,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
## Storage / wire effects
- Reads: filesystem under `config.workspace.root`.
- Writes: nothing. (Asset copy is handled by the asset writer in `kb-store-sqlite`.)
- Writes: nothing. (Asset copy is handled by the asset writer in `kebab-store-sqlite`.)
## Test plan
@@ -346,25 +346,25 @@ impl kb_core::SourceConnector for FsSourceConnector {
| unit | POSIX path normalization | inline cases incl. `./a/b.md`, `a//b.md`, `a/b.md` → identical |
| unit | blake3 of known bytes matches expected hex | inline |
| unit | gitignore filter (`*.tmp`, `node_modules/**`) excludes correctly | tmp tree built in test |
| unit | `.kbignore` config exclude works | tmp tree |
| unit | `.kebabignore` config exclude works | tmp tree |
| unit | symlink cycle does not loop | tmp tree with `a -> b -> a` |
| snapshot | `Vec<RawAsset>` serialized JSON for fixture tree is stable | `fixtures/source-fs/tree-1` |
| determinism | re-running scan twice produces byte-identical JSON | `fixtures/source-fs/tree-1` |
All tests run under `cargo test -p kb-source-fs` with no network and no model.
All tests run under `cargo test -p kebab-source-fs` with no network and no model.
## Definition of Done
- [ ] `cargo check -p kb-source-fs` passes
- [ ] `cargo test -p kb-source-fs` passes
- [ ] `cargo check -p kebab-source-fs` passes
- [ ] `cargo test -p kebab-source-fs` passes
- [ ] Snapshot test `fixtures/source-fs/tree-1` round-trips deterministically
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kb-source-fs`)
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kebab-source-fs`)
- [ ] PR description links to design §3.3, §6.2, §7.2
## Out of scope
- File watching (P+).
- Asset copy/reference storage on disk (`kb-store-sqlite` task p1-6).
- Asset copy/reference storage on disk (`kebab-store-sqlite` task p1-6).
- Non-fs source connectors (HTTP, S3 — P+).
## Risks / notes
@@ -400,13 +400,13 @@ Write to `tasks/p1/p1-2-parse-md-frontmatter.md`:
````markdown
---
phase: P1
component: kb-parse-md (frontmatter submodule)
component: kebab-parse-md (frontmatter submodule)
task_id: p1-2
title: "Markdown frontmatter parsing → Metadata"
status: planned
depends_on: [p0-1]
unblocks: [p1-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.6 Metadata, §0 Q9 frontmatter, §10 errors]
---
@@ -414,15 +414,15 @@ contract_sections: [§3.6 Metadata, §0 Q9 frontmatter, §10 errors]
## Goal
Parse YAML/TOML frontmatter from Markdown bytes into `kb_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
Parse YAML/TOML frontmatter from Markdown bytes into `kebab_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
## Why now / why this size
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kb-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kebab-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
## Allowed dependencies
- `kb-core`
- `kebab-core`
- `serde`
- `serde_yaml` (or `yaml-rust2`) for YAML
- `toml` for TOML
@@ -432,7 +432,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
## Forbidden dependencies
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-tui`, `kb-desktop`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `pulldown-cmark` (block parser is a sibling task)
- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-tui`, `kebab-desktop`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `pulldown-cmark` (block parser is a sibling task)
## Inputs
@@ -445,7 +445,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
| output | type | downstream |
|--------|------|------------|
| `(Metadata, Option<FrontmatterSpan>, Vec<Warning>)` | tuple | `kb-normalize` → CanonicalDocument |
| `(Metadata, Option<FrontmatterSpan>, Vec<Warning>)` | tuple | `kebab-normalize` → CanonicalDocument |
## Public surface (signatures only — no new types)
@@ -453,7 +453,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
pub fn parse_frontmatter(
bytes: &[u8],
hints: &BodyHints,
) -> anyhow::Result<(kb_core::Metadata, Option<FrontmatterSpan>, Vec<Warning>)>;
) -> anyhow::Result<(kebab_core::Metadata, Option<FrontmatterSpan>, Vec<Warning>)>;
```
`FrontmatterSpan` and `Warning` are crate-internal helpers; if any new public type is needed, STOP and update the frozen design doc first.
@@ -490,12 +490,12 @@ pub fn parse_frontmatter(
| snapshot | `fixtures/markdown/frontmatter-only.md` produces stable JSON | fixture |
| snapshot | mixed-language body with no `lang:` detects `ko` or `en` | `fixtures/markdown/mixed-lang.md` |
All tests under `cargo test -p kb-parse-md --lib frontmatter`.
All tests under `cargo test -p kebab-parse-md --lib frontmatter`.
## Definition of Done
- [ ] `cargo check -p kb-parse-md` passes
- [ ] `cargo test -p kb-parse-md frontmatter` passes
- [ ] `cargo check -p kebab-parse-md` passes
- [ ] `cargo test -p kebab-parse-md frontmatter` passes
- [ ] No `pulldown-cmark` import in this submodule
- [ ] Snapshot tests stable across two consecutive runs
- [ ] PR links design §0 Q9, §3.6
@@ -534,13 +534,13 @@ Write to `tasks/p1/p1-3-parse-md-blocks.md`:
````markdown
---
phase: P1
component: kb-parse-md (blocks submodule)
component: kebab-parse-md (blocks submodule)
task_id: p1-3
title: "Markdown body → Block tree with line spans"
status: planned
depends_on: [p0-1]
unblocks: [p1-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 Block, §3.4 SourceSpan, §0 Q3 citation]
---
@@ -548,7 +548,7 @@ contract_sections: [§3.4 Block, §3.4 SourceSpan, §0 Q3 citation]
## Goal
Parse Markdown body bytes into a flat `Vec<ParsedBlock>` (intermediate, crate-private) with heading paths and line ranges preserved, ready for `kb-normalize` to lift into `CanonicalDocument`.
Parse Markdown body bytes into a flat `Vec<ParsedBlock>` (intermediate, crate-private) with heading paths and line ranges preserved, ready for `kebab-normalize` to lift into `CanonicalDocument`.
## Why now / why this size
@@ -556,14 +556,14 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
## Allowed dependencies
- `kb-core`
- `kebab-core`
- `pulldown-cmark` (CommonMark with source-map; GFM tables enabled via feature)
- `serde`
- `thiserror`
## Forbidden dependencies
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `kb-tui`, `kb-desktop`, `comrak` (alternative parser; pick one)
- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `kebab-tui`, `kebab-desktop`, `comrak` (alternative parser; pick one)
## Inputs
@@ -576,7 +576,7 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
| output | type | downstream |
|--------|------|------------|
| `Vec<ParsedBlock>` (intermediate type, crate-private) | | `kb-normalize` |
| `Vec<ParsedBlock>` (intermediate type, crate-private) | | `kebab-normalize` |
| `Vec<Warning>` | | propagated into Provenance |
## Public surface (signatures only — no new types)
@@ -585,7 +585,7 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec<ParsedBlock>, Vec<Warning>)>;
```
`ParsedBlock` is a crate-internal mirror that maps 1:1 to `kb_core::Block` variants once `kb-normalize` assigns `BlockId`s.
`ParsedBlock` is a crate-internal mirror that maps 1:1 to `kebab_core::Block` variants once `kebab-normalize` assigns `BlockId`s.
## Behavior contract
@@ -593,7 +593,7 @@ pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec<
- Heading tree: every block records its ancestor heading texts in order (e.g., `["아키텍처", "Chunking 정책"]`).
- Code blocks: language tag preserved (` ```rust ``Some("rust")`), fenced content not split.
- Tables: GFM tables produce `TableBlock` with header row + body rows; if a table cell is malformed, fall back to a `Paragraph` block + warning.
- Image references: `![alt](src)` produces `ImageRefBlock` with `asset_id = None`, `src = "..."`, `alt = "..."`. Resolution to `AssetId` happens later in `kb-normalize`.
- Image references: `![alt](src)` produces `ImageRefBlock` with `asset_id = None`, `src = "..."`, `alt = "..."`. Resolution to `AssetId` happens later in `kebab-normalize`.
- Lists: ordered/unordered preserved; nested list items flattened into one `ListBlock` with each top-level item's text.
- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per design §3.4). Drop other inlines silently.
- Malformed input never panics. Worst case: empty `Vec<ParsedBlock>` + warning.
@@ -616,12 +616,12 @@ pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec<
| snapshot | `fixtures/markdown/nested-headings.md` → ParsedBlock JSON stable | fixture |
| snapshot | `fixtures/markdown/code-and-table.md` → JSON stable | fixture |
All tests under `cargo test -p kb-parse-md --lib blocks`.
All tests under `cargo test -p kebab-parse-md --lib blocks`.
## Definition of Done
- [ ] `cargo check -p kb-parse-md` passes
- [ ] `cargo test -p kb-parse-md blocks` passes
- [ ] `cargo check -p kebab-parse-md` passes
- [ ] `cargo test -p kebab-parse-md blocks` passes
- [ ] Snapshot tests stable across two runs
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.4
@@ -629,7 +629,7 @@ All tests under `cargo test -p kb-parse-md --lib blocks`.
## Out of scope
- Frontmatter (p1-2).
- Lifting `ParsedBlock``kb_core::Block` with `BlockId` (p1-4).
- Lifting `ParsedBlock``kebab_core::Block` with `BlockId` (p1-4).
- Chunking (p1-5).
## Risks / notes
@@ -658,13 +658,13 @@ Write to `tasks/p1/p1-4-normalize.md`:
````markdown
---
phase: P1
component: kb-normalize
component: kebab-normalize
task_id: p1-4
title: "Lift parser output → CanonicalDocument with deterministic IDs"
status: planned
depends_on: [p1-2, p1-3]
unblocks: [p1-5, p1-6]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4, §4 ID recipe, §3.6 Provenance]
---
@@ -676,12 +676,12 @@ Combine `Metadata` (p1-2) + `Vec<ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a
## Why now / why this size
Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
Single responsibility: ID generation + struct assembly. Keeps `kebab-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `serde`
- `serde-json-canonicalizer` (canonical JSON for ID hashing)
- `blake3`
@@ -691,38 +691,38 @@ Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` pure
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md` (consumed via plain types only — must not couple back), `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md` (consumed via plain types only — must not couple back), `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
Note: this crate accepts `ParsedBlock` from `kb-parse-md` either by (a) exposing `ParsedBlock` as a `kb-core` type, or (b) `kb-parse-md` re-exporting via a public DTO. Pick (a): move `ParsedBlock` into `kb-core` so this task does not import `kb-parse-md`.
Note: this crate accepts `ParsedBlock` from `kebab-parse-md` either by (a) exposing `ParsedBlock` as a `kebab-core` type, or (b) `kebab-parse-md` re-exporting via a public DTO. Pick (a): move `ParsedBlock` into `kebab-core` so this task does not import `kebab-parse-md`.
## Inputs
| input | type | source |
|-------|------|--------|
| `RawAsset` | `kb_core::RawAsset` | p1-1 |
| `RawAsset` | `kebab_core::RawAsset` | p1-1 |
| `Metadata` + frontmatter span + warnings | from p1-2 | parser caller |
| `Vec<ParsedBlock>` + warnings | from p1-3 | parser caller |
| `parser_version` | `kb_core::ParserVersion` | constant in `kb-parse-md` |
| `parser_version` | `kebab_core::ParserVersion` | constant in `kebab-parse-md` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk`, `kb-store-sqlite` |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk`, `kebab-store-sqlite` |
## Public surface (signatures only — no new types)
```rust
pub fn build_canonical_document(
asset: &kb_core::RawAsset,
metadata: kb_core::Metadata,
blocks: Vec<kb_core::ParsedBlock>,
parser_version: &kb_core::ParserVersion,
asset: &kebab_core::RawAsset,
metadata: kebab_core::Metadata,
blocks: Vec<kebab_core::ParsedBlock>,
parser_version: &kebab_core::ParserVersion,
warnings: Vec<Warning>,
) -> anyhow::Result<kb_core::CanonicalDocument>;
) -> anyhow::Result<kebab_core::CanonicalDocument>;
pub fn id_for_doc(workspace_path: &kb_core::WorkspacePath, asset: &kb_core::AssetId, parser_version: &kb_core::ParserVersion) -> kb_core::DocumentId;
pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kb_core::SourceSpan) -> kb_core::BlockId;
pub fn id_for_doc(workspace_path: &kebab_core::WorkspacePath, asset: &kebab_core::AssetId, parser_version: &kebab_core::ParserVersion) -> kebab_core::DocumentId;
pub fn id_for_block(doc: &kebab_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kebab_core::SourceSpan) -> kebab_core::BlockId;
```
## Behavior contract
@@ -750,14 +750,14 @@ pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[Strin
| unit | provenance contains Discovered/Parsed/Normalized in order | inline |
| snapshot | `fixtures/markdown/code-and-table.md` → CanonicalDocument JSON stable (incl. all IDs) | fixture |
All tests under `cargo test -p kb-normalize`.
All tests under `cargo test -p kebab-normalize`.
## Definition of Done
- [ ] `cargo check -p kb-normalize` passes
- [ ] `cargo test -p kb-normalize` passes
- [ ] `cargo check -p kebab-normalize` passes
- [ ] `cargo test -p kebab-normalize` passes
- [ ] Determinism test runs ≥ 1000 iterations under 1 second
- [ ] No `kb-parse-md` import (consumed via `kb-core::ParsedBlock`)
- [ ] No `kebab-parse-md` import (consumed via `kebab-core::ParsedBlock`)
- [ ] PR links design §4.2, §4.3
## Out of scope
@@ -791,13 +791,13 @@ Write to `tasks/p1/p1-5-chunk.md`:
````markdown
---
phase: P1
component: kb-chunk
component: kebab-chunk
task_id: p1-5
title: "Markdown heading-aware chunker (md-heading-v1)"
status: planned
depends_on: [p1-4]
unblocks: [p1-6, p2-2, p3-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §7.2 Chunker, §0 Q3 citation]
---
@@ -813,8 +813,8 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `serde`
- `blake3` (policy_hash)
- `serde-json-canonicalizer`
@@ -822,30 +822,30 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize` (consumes `CanonicalDocument` only via `kb-core`), `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize` (consumes `CanonicalDocument` only via `kebab-core`), `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 |
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` from config |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` from config |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite` (p1-6), `kb-embed*` (P3) |
| `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite` (p1-6), `kebab-embed*` (P3) |
## Public surface (signatures only — no new types)
```rust
pub struct MdHeadingV1Chunker;
impl kb_core::Chunker for MdHeadingV1Chunker {
fn chunker_version(&self) -> kb_core::ChunkerVersion;
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>;
impl kebab_core::Chunker for MdHeadingV1Chunker {
fn chunker_version(&self) -> kebab_core::ChunkerVersion;
fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
}
```
@@ -882,12 +882,12 @@ impl kb_core::Chunker for MdHeadingV1Chunker {
| determinism | identical input + identical policy → identical chunk_ids | inline |
| snapshot | `fixtures/markdown/long-section.md` → Vec<Chunk> JSON stable | fixture |
All tests under `cargo test -p kb-chunk`.
All tests under `cargo test -p kebab-chunk`.
## Definition of Done
- [ ] `cargo check -p kb-chunk` passes
- [ ] `cargo test -p kb-chunk` passes
- [ ] `cargo check -p kebab-chunk` passes
- [ ] `cargo test -p kebab-chunk` passes
- [ ] Snapshot stable across two runs
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.5, §4.2
@@ -924,13 +924,13 @@ Write to `tasks/p1/p1-6-store-sqlite.md`:
````markdown
---
phase: P1
component: kb-store-sqlite (P1 subset)
component: kebab-store-sqlite (P1 subset)
task_id: p1-6
title: "SQLite store: assets/documents/blocks/chunks + asset writer + migrations"
status: planned
depends_on: [p1-1, p1-4, p1-5]
unblocks: [p2-1, p3-3, p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5 DDL (5.1, 5.2, 5.3, 5.4, 5.5 chunks only — FTS handled in p2-1), §5.7 jobs/ingest_runs, §5.8 transactions, §6.3 data_dir layout]
---
@@ -946,8 +946,8 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `rusqlite` (with `bundled-sqlcipher` disabled; use `bundled` feature)
- `refinery` for migrations
- `serde_json`
@@ -958,7 +958,7 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
## Forbidden dependencies
- `kb-source-fs` (only types via `kb-core`), `kb-parse-md`, `kb-normalize`, `kb-chunk` (only types via `kb-core`), `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs` (only types via `kebab-core`), `kebab-parse-md`, `kebab-normalize`, `kebab-chunk` (only types via `kebab-core`), `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
@@ -966,17 +966,17 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|-------|------|--------|
| migrations | `migrations/V001__init.sql` | repo |
| `RawAsset` + bytes | `(RawAsset, Vec<u8>)` | p1-1 + reader |
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 |
| `Vec<Chunk>` | `kb_core::Chunk` | p1-5 |
| `IngestRun` aggregates | `(scope, counts, duration)` | `kb-app` |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
| `Vec<Chunk>` | `kebab_core::Chunk` | p1-5 |
| `IngestRun` aggregates | `(scope, counts, duration)` | `kebab-app` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `data_dir/kb.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | | every later phase |
| `data_dir/kebab.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | | every later phase |
| `data_dir/assets/<aa>/<asset_id>` bytes (when copied) | | future re-extraction, integrity verification |
| `IngestReport` (wire schema v1) | `kb_core::IngestReport` | `kb-cli`, eval |
| `IngestReport` (wire schema v1) | `kebab_core::IngestReport` | `kebab-cli`, eval |
## Public surface (signatures only — no new types)
@@ -984,23 +984,23 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
pub struct SqliteStore { /* internal */ }
impl SqliteStore {
pub fn open(config: &kb_config::Config) -> anyhow::Result<Self>;
pub fn open(config: &kebab_config::Config) -> anyhow::Result<Self>;
pub fn run_migrations(&self) -> anyhow::Result<()>;
pub fn put_asset_with_bytes(&self, asset: &kb_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
pub fn put_asset_with_bytes(&self, asset: &kebab_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
}
impl kb_core::DocumentStore for SqliteStore {
fn put_asset(&self, a: &kb_core::RawAsset) -> anyhow::Result<()>;
fn put_document(&self, d: &kb_core::CanonicalDocument) -> anyhow::Result<()>;
fn put_blocks(&self, doc: &kb_core::DocumentId, blocks: &[kb_core::Block]) -> anyhow::Result<()>;
fn put_chunks(&self, doc: &kb_core::DocumentId, chunks: &[kb_core::Chunk]) -> anyhow::Result<()>;
fn get_document(&self, id: &kb_core::DocumentId) -> anyhow::Result<Option<kb_core::CanonicalDocument>>;
fn get_chunk(&self, id: &kb_core::ChunkId) -> anyhow::Result<Option<kb_core::Chunk>>;
fn list_documents(&self, filter: &kb_core::DocFilter) -> anyhow::Result<Vec<kb_core::DocSummary>>;
impl kebab_core::DocumentStore for SqliteStore {
fn put_asset(&self, a: &kebab_core::RawAsset) -> anyhow::Result<()>;
fn put_document(&self, d: &kebab_core::CanonicalDocument) -> anyhow::Result<()>;
fn put_blocks(&self, doc: &kebab_core::DocumentId, blocks: &[kebab_core::Block]) -> anyhow::Result<()>;
fn put_chunks(&self, doc: &kebab_core::DocumentId, chunks: &[kebab_core::Chunk]) -> anyhow::Result<()>;
fn get_document(&self, id: &kebab_core::DocumentId) -> anyhow::Result<Option<kebab_core::CanonicalDocument>>;
fn get_chunk(&self, id: &kebab_core::ChunkId) -> anyhow::Result<Option<kebab_core::Chunk>>;
fn list_documents(&self, filter: &kebab_core::DocFilter) -> anyhow::Result<Vec<kebab_core::DocSummary>>;
}
impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
impl kebab_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
```
## Behavior contract
@@ -1020,7 +1020,7 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
## Storage / wire effects
- Writes: `kb.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
- Writes: `kebab.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
- Reads on subsequent calls: same DB.
## Test plan
@@ -1036,14 +1036,14 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
| contract | DocumentStore trait round-trip for fixture document | `fixtures/markdown/code-and-table.md` |
| snapshot | IngestReport JSON for fixture run | fixture |
All tests under `cargo test -p kb-store-sqlite` with no network.
All tests under `cargo test -p kebab-store-sqlite` with no network.
## Definition of Done
- [ ] `cargo check -p kb-store-sqlite` passes
- [ ] `cargo test -p kb-store-sqlite` passes
- [ ] `cargo check -p kebab-store-sqlite` passes
- [ ] `cargo test -p kebab-store-sqlite` passes
- [ ] migration `V001__init.sql` matches design §5 verbatim (diff-checked in CI)
- [ ] Writes to `~/.local/share/kb/` are gated by `kb-config`'s `data_dir` and never escape it
- [ ] Writes to `~/.local/share/kebab/` are gated by `kebab-config`'s `data_dir` and never escape it
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §5
@@ -1056,7 +1056,7 @@ All tests under `cargo test -p kb-store-sqlite` with no network.
## Risks / notes
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kb.sqlite-wal` / `-shm`.
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kebab.sqlite-wal` / `-shm`.
- Asset directory shard prefix uses `asset_id[..2]`; using `asset_id[..1]` would create at most 16 dirs (insufficient).
````
@@ -1074,7 +1074,7 @@ git commit -m "tasks: add p1-6 store-sqlite component spec"
```bash
ls tasks/p1/ | sort
for f in tasks/p1/p1-*.md; do grep -q '2026-04-27-kb-final-form-design.md' "$f" || echo "MISSING REF in $f"; done
for f in tasks/p1/p1-*.md; do grep -q '2026-04-27-kebab-final-form-design.md' "$f" || echo "MISSING REF in $f"; done
echo done
```
@@ -1106,9 +1106,9 @@ For each component task below, the steps are: (1) write file, (2) verify, (3) co
**Files:** Create: `tasks/p0/p0-1-skeleton.md`. Also `mkdir -p tasks/p0`.
`contract_sections`: §3 (all subsections), §4, §5 (migrations meta only), §6, §7, §8, §10. `Allowed`: workspace + `kb-core` + `kb-config` + `kb-app` + `kb-cli` only.
`contract_sections`: §3 (all subsections), §4, §5 (migrations meta only), §6, §7, §8, §10. `Allowed`: workspace + `kebab-core` + `kebab-config` + `kebab-app` + `kebab-cli` only.
Body covers: workspace `Cargo.toml` resolver=3, edition 2024, member list (`kb-core`, `kb-config`, `kb-app`, `kb-cli`), workspace dependencies, `kb-core` types and traits per design §3 / §7, deterministic ID functions per §4 (with full unit tests), `kb-config` loader (TOML + env + CLI override per §6.4), `kb-app` facade signatures (`ingest`, `search`, `ask`, `inspect_doc`, `inspect_chunk`, `doctor`, `init`), `kb-cli` skeleton with clap + `--help`. DoD: `cargo check --workspace`, `cargo test --workspace` (Newtype+ID+canonical-json tests only), `kb --help` works, `docs/spec/*` stubs created (link to frozen design doc), `docs/wire-schema/v1/*.schema.json` stubs (one file per object in §2).
Body covers: workspace `Cargo.toml` resolver=3, edition 2024, member list (`kebab-core`, `kebab-config`, `kebab-app`, `kebab-cli`), workspace dependencies, `kebab-core` types and traits per design §3 / §7, deterministic ID functions per §4 (with full unit tests), `kebab-config` loader (TOML + env + CLI override per §6.4), `kebab-app` facade signatures (`ingest`, `search`, `ask`, `inspect_doc`, `inspect_chunk`, `doctor`, `init`), `kebab-cli` skeleton with clap + `--help`. DoD: `cargo check --workspace`, `cargo test --workspace` (Newtype+ID+canonical-json tests only), `kebab --help` works, `docs/spec/*` stubs created (link to frozen design doc), `docs/wire-schema/v1/*.schema.json` stubs (one file per object in §2).
- [ ] **Step 1: Create directory and file**
@@ -1134,11 +1134,11 @@ git -c user.name=kb -c user.email=kb@local commit -m "tasks: add p0-1 skeleton c
#### `p2-1-fts-schema.md`
`contract_sections`: §5.5 FTS5 + triggers, §9 versioning. `Allowed`: `kb-core`, `kb-config`, `kb-store-sqlite` (extends migrations). `depends_on: [p1-6]`. Migration `V002__fts.sql` adds `chunks_fts` virtual table and three triggers verbatim from §5.5. Tests: backfill from existing chunks via `INSERT INTO chunks_fts SELECT ... FROM chunks`, then assert FTS row count == chunks row count; insert/update/delete in `chunks` reflects in `chunks_fts`.
`contract_sections`: §5.5 FTS5 + triggers, §9 versioning. `Allowed`: `kebab-core`, `kebab-config`, `kebab-store-sqlite` (extends migrations). `depends_on: [p1-6]`. Migration `V002__fts.sql` adds `chunks_fts` virtual table and three triggers verbatim from §5.5. Tests: backfill from existing chunks via `INSERT INTO chunks_fts SELECT ... FROM chunks`, then assert FTS row count == chunks row count; insert/update/delete in `chunks` reflects in `chunks_fts`.
#### `p2-2-lexical-retriever.md`
`contract_sections`: §3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5 search output (for snippet length defaults), §2.2 wire schema. `Allowed`: `kb-core`, `kb-config`, `kb-store-sqlite`. `depends_on: [p2-1]`. Implements `Retriever` trait with `bm25(chunks_fts)` ranking, snippet via SQLite `snippet()` (≤ `snippet_chars` chars), citation built per §0 Q3 from `source_spans`. Tests: top-k correctness on fixture corpus, citation line range round-trip against original Markdown, deterministic across two runs.
`contract_sections`: §3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5 search output (for snippet length defaults), §2.2 wire schema. `Allowed`: `kebab-core`, `kebab-config`, `kebab-store-sqlite`. `depends_on: [p2-1]`. Implements `Retriever` trait with `bm25(chunks_fts)` ranking, snippet via SQLite `snippet()` (≤ `snippet_chars` chars), citation built per §0 Q3 from `source_spans`. Tests: top-k correctness on fixture corpus, citation line range round-trip against original Markdown, deterministic across two runs.
- [ ] **Step 1: Create directory and both spec files** (template per Phase B; bodies as described above).
- [ ] **Step 2: Verify with `for f in tasks/p2/p2-*.md; do test -s "$f" || echo MISSING $f; done; echo done` (expect only `done`).**
@@ -1152,10 +1152,10 @@ git add tasks/p2 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
**Files:** `mkdir -p tasks/p3` and create `p3-1-embedder-trait.md`, `p3-2-fastembed-adapter.md`, `p3-3-lancedb-store.md`, `p3-4-hybrid-fusion.md`.
- `p3-1-embedder-trait.md`: §3.7, §7.2 Embedder, §11. Allowed: `kb-core`, `kb-config`. No external embedding dep. Public surface: `Embedder` trait + `EmbeddingInput`/`EmbeddingKind` already in core (validate they exist; if not, this task is also a `kb-core` patch). `depends_on: [p0-1]`. Tests: trait dyn dispatch, mock embedder.
- `p3-2-fastembed-adapter.md`: §11.3, §6.4 `[models.embedding]`. Allowed: `kb-core`, `kb-config`, `fastembed`, `tokenizers`, `ort`. `depends_on: [p3-1]`. Provides `FastembedEmbedder` implementing `Embedder` for `multilingual-e5-small` (default), with required Document/Query prefix per §11.3. Tests: dimension check, deterministic vector for fixed input (hash compare on first 8 floats with epsilon), batch size respected.
- `p3-3-lancedb-store.md`: §3.5, §5.6 embedding_records, §6.3 lancedb table naming. Allowed: `kb-core`, `kb-config`, `lancedb`, `arrow`, `kb-store-sqlite` (write `embedding_records` row only — no other table). `depends_on: [p3-2, p1-6]`. Implements `VectorStore` trait. Table naming `chunk_embeddings_<model>_<dim>.lance`. `ensure_table` creates if missing. `upsert` inserts vectors and writes a matching `embedding_records` row in same logical operation (best-effort 2PC: lance commit, then SQLite insert; on SQLite failure, log warning + leave lance row — re-upsert is idempotent because of the `UNIQUE(chunk_id, model_id, model_version, dimensions)` constraint and lance upsert semantics). `search` filters via SearchFilters and returns top-k. Tests: smoke (insert+search), dimension mismatch error, model isolation (two models stay in two tables).
- `p3-4-hybrid-fusion.md`: §3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 `[search]` rrf settings. Allowed: `kb-core`, `kb-config`, `kb-store-sqlite` (lexical Retriever from p2-2), `kb-store-vector` (vector Retriever wrapper around `VectorStore::search`). `depends_on: [p2-2, p3-3]`. Implements `HybridRetriever` that dispatches by `SearchMode`, fuses with RRF (k from config, default 60), populates `lexical_score`, `vector_score`, `lexical_rank`, `vector_rank`, `fusion_score`. Tests: pure lexical mode == p2-2 output; pure vector mode == p3-3 output; hybrid produces strictly larger or equal coverage of expected hits than either single mode on a small fixture; deterministic.
- `p3-1-embedder-trait.md`: §3.7, §7.2 Embedder, §11. Allowed: `kebab-core`, `kebab-config`. No external embedding dep. Public surface: `Embedder` trait + `EmbeddingInput`/`EmbeddingKind` already in core (validate they exist; if not, this task is also a `kebab-core` patch). `depends_on: [p0-1]`. Tests: trait dyn dispatch, mock embedder.
- `p3-2-fastembed-adapter.md`: §11.3, §6.4 `[models.embedding]`. Allowed: `kebab-core`, `kebab-config`, `fastembed`, `tokenizers`, `ort`. `depends_on: [p3-1]`. Provides `FastembedEmbedder` implementing `Embedder` for `multilingual-e5-small` (default), with required Document/Query prefix per §11.3. Tests: dimension check, deterministic vector for fixed input (hash compare on first 8 floats with epsilon), batch size respected.
- `p3-3-lancedb-store.md`: §3.5, §5.6 embedding_records, §6.3 lancedb table naming. Allowed: `kebab-core`, `kebab-config`, `lancedb`, `arrow`, `kebab-store-sqlite` (write `embedding_records` row only — no other table). `depends_on: [p3-2, p1-6]`. Implements `VectorStore` trait. Table naming `chunk_embeddings_<model>_<dim>.lance`. `ensure_table` creates if missing. `upsert` inserts vectors and writes a matching `embedding_records` row in same logical operation (best-effort 2PC: lance commit, then SQLite insert; on SQLite failure, log warning + leave lance row — re-upsert is idempotent because of the `UNIQUE(chunk_id, model_id, model_version, dimensions)` constraint and lance upsert semantics). `search` filters via SearchFilters and returns top-k. Tests: smoke (insert+search), dimension mismatch error, model isolation (two models stay in two tables).
- `p3-4-hybrid-fusion.md`: §3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 `[search]` rrf settings. Allowed: `kebab-core`, `kebab-config`, `kebab-store-sqlite` (lexical Retriever from p2-2), `kebab-store-vector` (vector Retriever wrapper around `VectorStore::search`). `depends_on: [p2-2, p3-3]`. Implements `HybridRetriever` that dispatches by `SearchMode`, fuses with RRF (k from config, default 60), populates `lexical_score`, `vector_score`, `lexical_rank`, `vector_rank`, `fusion_score`. Tests: pure lexical mode == p2-2 output; pure vector mode == p3-3 output; hybrid produces strictly larger or equal coverage of expected hits than either single mode on a small fixture; deterministic.
- [ ] **Step 1: Create directory and 4 files** (template per Phase B).
- [ ] **Step 2: Verify**
@@ -1176,9 +1176,9 @@ git add tasks/p3 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
**Files:** `mkdir -p tasks/p4` and create `p4-1-llm-trait.md`, `p4-2-ollama-adapter.md`, `p4-3-rag-pipeline.md`.
- `p4-1-llm-trait.md`: §7.2 LanguageModel + TokenChunk, §0 Q5 streaming, §3.8 Answer types referenced. Allowed: `kb-core`, `kb-config`. `depends_on: [p0-1]`. Defines (or validates) `LanguageModel` trait, `GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage` per design. Tests: trait dyn dispatch, mock LM streams 3 tokens.
- `p4-2-ollama-adapter.md`: §11.2 Ollama, §6.4 `[models.llm]`, §0 Q5 streaming. Allowed: `kb-core`, `kb-config`, `reqwest` (blocking + json + stream feature) or `ureq` + manual SSE; `serde_json`, `tokio`/runtime if needed. `depends_on: [p4-1]`. Implements `OllamaLanguageModel` with streaming `/api/generate`. `temperature=0.0` default, `seed` honored for determinism. Reachability/missing-model errors map to `LlmError` per design §10. Tests: against a mock HTTP server (`wiremock` or hand-rolled `tiny_http`); deterministic stream collect equals buffered concatenation; missing model returns `LlmError::ModelNotPulled` with proper hint.
- `p4-3-rag-pipeline.md`: §0 Q4 refusal (two-layer), §0 Q7 footer, §1.11.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 `[rag]`. Allowed: `kb-core`, `kb-config`, `kb-search` (Retriever), `kb-llm` (LanguageModel). `depends_on: [p3-4, p4-2]`. Pipeline: retrieve top-k → score gate (`refusal_reason: ScoreGate` if top1 < gate) → context packer (token budget + heading_path header `[#n doc=… heading=… span=…]`) → render `rag-v1` prompt → stream → collect → citation extraction (regex `\[(\d+)\]`) → citation validation (each `[n]` must map to a packed chunk; otherwise `grounded=false`, `refusal_reason: LlmSelfJudge`) → write `answers` row. Tests: happy path produces grounded Answer with citations; query with all chunks below gate produces ScoreGate refusal; query whose LLM emits a citation pointing to non-existent `[7]` becomes LlmSelfJudge refusal; identical query under temperature=0 produces byte-identical Answer (snapshot).
- `p4-1-llm-trait.md`: §7.2 LanguageModel + TokenChunk, §0 Q5 streaming, §3.8 Answer types referenced. Allowed: `kebab-core`, `kebab-config`. `depends_on: [p0-1]`. Defines (or validates) `LanguageModel` trait, `GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage` per design. Tests: trait dyn dispatch, mock LM streams 3 tokens.
- `p4-2-ollama-adapter.md`: §11.2 Ollama, §6.4 `[models.llm]`, §0 Q5 streaming. Allowed: `kebab-core`, `kebab-config`, `reqwest` (blocking + json + stream feature) or `ureq` + manual SSE; `serde_json`, `tokio`/runtime if needed. `depends_on: [p4-1]`. Implements `OllamaLanguageModel` with streaming `/api/generate`. `temperature=0.0` default, `seed` honored for determinism. Reachability/missing-model errors map to `LlmError` per design §10. Tests: against a mock HTTP server (`wiremock` or hand-rolled `tiny_http`); deterministic stream collect equals buffered concatenation; missing model returns `LlmError::ModelNotPulled` with proper hint.
- `p4-3-rag-pipeline.md`: §0 Q4 refusal (two-layer), §0 Q7 footer, §1.11.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 `[rag]`. Allowed: `kebab-core`, `kebab-config`, `kebab-search` (Retriever), `kebab-llm` (LanguageModel). `depends_on: [p3-4, p4-2]`. Pipeline: retrieve top-k → score gate (`refusal_reason: ScoreGate` if top1 < gate) → context packer (token budget + heading_path header `[#n doc=… heading=… span=…]`) → render `rag-v1` prompt → stream → collect → citation extraction (regex `\[(\d+)\]`) → citation validation (each `[n]` must map to a packed chunk; otherwise `grounded=false`, `refusal_reason: LlmSelfJudge`) → write `answers` row. Tests: happy path produces grounded Answer with citations; query with all chunks below gate produces ScoreGate refusal; query whose LLM emits a citation pointing to non-existent `[7]` becomes LlmSelfJudge refusal; identical query under temperature=0 produces byte-identical Answer (snapshot).
- [ ] **Step 1, 2, 3** as in C3.
@@ -1193,8 +1193,8 @@ git add tasks/p4 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
**Files:** `mkdir -p tasks/p5` and create `p5-1-golden-fixture-runner.md`, `p5-2-metrics-compare.md`.
- `p5-1-golden-fixture-runner.md`: phase epic + §5.7 eval_runs/eval_query_results, §6.3 runs_dir. Allowed: `kb-core`, `kb-config`, `kb-app` (calls facade for search/ask), `serde_yaml`. `depends_on: [p4-3]`. Loads `fixtures/golden_queries.yaml`, runs each query in selected mode (lexical/vector/hybrid/rag), captures per-query results to `eval_query_results` and to `runs_dir/<run_id>/per_query.jsonl`. Tests: fixture with 3 queries runs end-to-end on a tiny corpus, all rows recorded.
- `p5-2-metrics-compare.md`: phase epic, §0 Q6 wire schema. Allowed: `kb-core`, `kb-config`, `kb-store-sqlite` (read eval rows). `depends_on: [p5-1]`. Computes hit@k, MRR, recall@k_doc, citation_coverage, groundedness (rule-based via `must_contain`), empty_result_rate, refusal_correctness. `kb eval compare a b` produces wins/losses/draws + delta. Tests: fixed input rows produce expected metric values; compare produces stable sorted output.
- `p5-1-golden-fixture-runner.md`: phase epic + §5.7 eval_runs/eval_query_results, §6.3 runs_dir. Allowed: `kebab-core`, `kebab-config`, `kebab-app` (calls facade for search/ask), `serde_yaml`. `depends_on: [p4-3]`. Loads `fixtures/golden_queries.yaml`, runs each query in selected mode (lexical/vector/hybrid/rag), captures per-query results to `eval_query_results` and to `runs_dir/<run_id>/per_query.jsonl`. Tests: fixture with 3 queries runs end-to-end on a tiny corpus, all rows recorded.
- `p5-2-metrics-compare.md`: phase epic, §0 Q6 wire schema. Allowed: `kebab-core`, `kebab-config`, `kebab-store-sqlite` (read eval rows). `depends_on: [p5-1]`. Computes hit@k, MRR, recall@k_doc, citation_coverage, groundedness (rule-based via `must_contain`), empty_result_rate, refusal_correctness. `kebab eval compare a b` produces wins/losses/draws + delta. Tests: fixed input rows produce expected metric values; compare produces stable sorted output.
- [ ] **Step 1, 2, 3** as in C3.
@@ -1209,9 +1209,9 @@ git add tasks/p5 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
`mkdir -p tasks/p6` and create:
- `p6-1-image-extractor-exif.md`: phase epic §9.1, §3.4 ImageRefBlock, §3.7a ImageType. Allowed: `kb-core`, `kb-config`, `image`, `kamadak-exif`. Implements `Extractor` for `MediaType::Image(_)` producing a `CanonicalDocument` whose body is exactly one `ImageRefBlock`. EXIF goes to `metadata.user`. `depends_on: [p0-1, p1-6]`. Tests: PNG/JPEG decode metadata; EXIF extraction; deterministic doc_id.
- `p6-2-ocr-adapter.md`: phase epic §9.1. Allowed: `kb-core`, `kb-config`, `image`, OS-specific OCR (feature `apple-vision` for macOS via sidecar binary; feature `tesseract` for cross-platform; default tesseract). Defines `OcrEngine` trait + adapter. Populates `ImageRefBlock.ocr` `OcrText` (`joined`, regions, engine, engine_version). `depends_on: [p6-1]`. Tests: deterministic text on a fixed fixture image with high-confidence text.
- `p6-3-caption-adapter.md`: phase epic §9.1 caption section, §3.7a ModelCaption. Allowed: `kb-core`, `kb-config`, `kb-llm` (reuse LanguageModel for VLM). Optional/feature-gated. `depends_on: [p6-1, p4-2]`. Populates `ImageRefBlock.caption`. Tests: with mock LM, caption recorded with model id; absence of feature flag leaves caption=None.
- `p6-1-image-extractor-exif.md`: phase epic §9.1, §3.4 ImageRefBlock, §3.7a ImageType. Allowed: `kebab-core`, `kebab-config`, `image`, `kamadak-exif`. Implements `Extractor` for `MediaType::Image(_)` producing a `CanonicalDocument` whose body is exactly one `ImageRefBlock`. EXIF goes to `metadata.user`. `depends_on: [p0-1, p1-6]`. Tests: PNG/JPEG decode metadata; EXIF extraction; deterministic doc_id.
- `p6-2-ocr-adapter.md`: phase epic §9.1. Allowed: `kebab-core`, `kebab-config`, `image`, OS-specific OCR (feature `apple-vision` for macOS via sidecar binary; feature `tesseract` for cross-platform; default tesseract). Defines `OcrEngine` trait + adapter. Populates `ImageRefBlock.ocr` `OcrText` (`joined`, regions, engine, engine_version). `depends_on: [p6-1]`. Tests: deterministic text on a fixed fixture image with high-confidence text.
- `p6-3-caption-adapter.md`: phase epic §9.1 caption section, §3.7a ModelCaption. Allowed: `kebab-core`, `kebab-config`, `kebab-llm` (reuse LanguageModel for VLM). Optional/feature-gated. `depends_on: [p6-1, p4-2]`. Populates `ImageRefBlock.caption`. Tests: with mock LM, caption recorded with model id; absence of feature flag leaves caption=None.
- [ ] **Step 1, 2, 3** as in C3.
@@ -1226,8 +1226,8 @@ git add tasks/p6 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
`mkdir -p tasks/p7` and create:
- `p7-1-pdf-text-extractor.md`: phase epic §9.2, §3.4 SourceSpan::Page. Allowed: `kb-core`, `kb-config`, `pdf-extract`, `lopdf` (page metadata). `depends_on: [p0-1, p1-6]`. Extractor for `MediaType::Pdf` produces a `CanonicalDocument` with one `Paragraph` per page, `SourceSpan::Page`. Failed-text pages are emitted as paragraphs with empty text and a `Provenance` warning marking them as scanned candidates. Tests: page count, span correctness, failure handling.
- `p7-2-pdf-page-chunker.md`: phase epic §9.2, §3.5, §0 Q3 citation. Allowed: `kb-core`, `kb-config`. New chunker version `pdf-page-v1` that respects page boundaries. `depends_on: [p7-1]`. Tests: chunk does not cross page boundary; very long page subdivides per `target_tokens`.
- `p7-1-pdf-text-extractor.md`: phase epic §9.2, §3.4 SourceSpan::Page. Allowed: `kebab-core`, `kebab-config`, `pdf-extract`, `lopdf` (page metadata). `depends_on: [p0-1, p1-6]`. Extractor for `MediaType::Pdf` produces a `CanonicalDocument` with one `Paragraph` per page, `SourceSpan::Page`. Failed-text pages are emitted as paragraphs with empty text and a `Provenance` warning marking them as scanned candidates. Tests: page count, span correctness, failure handling.
- `p7-2-pdf-page-chunker.md`: phase epic §9.2, §3.5, §0 Q3 citation. Allowed: `kebab-core`, `kebab-config`. New chunker version `pdf-page-v1` that respects page boundaries. `depends_on: [p7-1]`. Tests: chunk does not cross page boundary; very long page subdivides per `target_tokens`.
- [ ] **Step 1, 2, 3** as in C3.
@@ -1242,7 +1242,7 @@ git add tasks/p7 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
`mkdir -p tasks/p8` and create:
- `p8-1-whisper-adapter.md`: phase epic §9.3, §3.4 AudioRefBlock + `Transcript`. Allowed: `kb-core`, `kb-config`, whisper.cpp Rust binding (`whisper-rs`) or sidecar binary. `depends_on: [p0-1, p1-6]`. Implements `Transcriber` trait. Default model `large-v3` via config; tests use a tiny model (e.g., `base.en`) for speed. Tests: monotone segment timestamps, language detection populated, deterministic transcript on fixed audio.
- `p8-1-whisper-adapter.md`: phase epic §9.3, §3.4 AudioRefBlock + `Transcript`. Allowed: `kebab-core`, `kebab-config`, whisper.cpp Rust binding (`whisper-rs`) or sidecar binary. `depends_on: [p0-1, p1-6]`. Implements `Transcriber` trait. Default model `large-v3` via config; tests use a tiny model (e.g., `base.en`) for speed. Tests: monotone segment timestamps, language detection populated, deterministic transcript on fixed audio.
- `p8-2-segment-chunker.md`: phase epic §9.3, §3.5. New `audio-segment-v1` chunker that groups segments up to `target_tokens` with priority on speaker turn boundaries (when present). `depends_on: [p8-1]`. Tests: chunk timestamp == first/last segment timestamp; speaker change forces split.
- [ ] **Step 1, 2, 3** as in C3.
@@ -1258,11 +1258,11 @@ git add tasks/p8 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
`mkdir -p tasks/p9` and create:
- `p9-1-tui-library.md`: phase epic §16.2, §3.7. Allowed: `kb-core`, `kb-app` only (UI law). `ratatui`, `crossterm`. `depends_on: [p1-6]`. Library list view + tag filter. Tests: snapshot of rendered frame against fixture corpus list.
- `p9-1-tui-library.md`: phase epic §16.2, §3.7. Allowed: `kebab-core`, `kebab-app` only (UI law). `ratatui`, `crossterm`. `depends_on: [p1-6]`. Library list view + tag filter. Tests: snapshot of rendered frame against fixture corpus list.
- `p9-2-tui-search.md`: phase epic §16.2, §1.5. Allowed: same as p9-1. `depends_on: [p2-2, p3-4]`. Search input + result list + preview pane; `Enter` triggers external editor jump (`$EDITOR +<line> <path>`). Tests: search results render; `g` keybinding constructs the correct editor command.
- `p9-3-tui-ask.md`: phase epic §16.2, §1.1, §1.2. Allowed: same. `depends_on: [p4-3]`. Ask pane shows streaming tokens; `--explain` toggle. Tests: streaming render, refusal render.
- `p9-4-tui-inspect.md`: §1.6 inspect, §3.5. Allowed: same. `depends_on: [p1-6, p3-3]`. Renders Document and Chunk inspection per wire schemas 2.5/2.6.
- `p9-5-desktop-tauri.md`: phase epic §16.3, §1 all scenes. Allowed: `kb-core`, `kb-app`, Tauri backend; frontend stack TBD by user (vanilla TS by default). `depends_on: [p9-1, p9-2, p9-3, p9-4]`. Backend exposes Tauri commands that wrap `kb-app` 1:1. Source viewer per medium (Markdown render, PDF page, image with region overlay, audio with seek). Tests: backend command unit tests (no frontend e2e in this task).
- `p9-5-desktop-tauri.md`: phase epic §16.3, §1 all scenes. Allowed: `kebab-core`, `kebab-app`, Tauri backend; frontend stack TBD by user (vanilla TS by default). `depends_on: [p9-1, p9-2, p9-3, p9-4]`. Backend exposes Tauri commands that wrap `kebab-app` 1:1. Source viewer per medium (Markdown render, PDF page, image with region overlay, audio with seek). Tests: backend command unit tests (no frontend e2e in this task).
- [ ] **Step 1, 2, 3** as in C3.
@@ -1351,5 +1351,5 @@ git -c user.name=kb -c user.email=kb@local commit -m "tasks: update INDEX with f
- [ ] `tasks/p0/`, `tasks/p1/` … `tasks/p9/` exist with the component spec files listed above.
- [ ] Every component spec contains: frontmatter (with `contract_sections`), Allowed/Forbidden, Inputs, Outputs, Public surface, Behavior contract, Storage/wire effects, Test plan, Definition of Done, Out of scope, Risks/notes.
- [ ] `tasks/INDEX.md` lists every component task.
- [ ] No new domain types introduced inside any component spec — every type referenced is defined in [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../specs/2026-04-27-kb-final-form-design.md).
- [ ] No new domain types introduced inside any component spec — every type referenced is defined in [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../specs/2026-04-27-kebab-final-form-design.md).
- [ ] All commits authored sequentially per task; rollback is per-task.

View File

@@ -3,7 +3,7 @@ title: "KB v1 최종 결과물 형태 — Frozen Design"
date: 2026-04-27
status: frozen
purpose: 작은 단위 분해 작업 시 spec 변경을 막기 위한 단단한 contract 동결
source_report: ../../../kb_local_rust_report.md
source_report: ../../../kebab_local_rust_report.md
related_tasks: ../../../tasks/INDEX.md
---
@@ -11,7 +11,7 @@ related_tasks: ../../../tasks/INDEX.md
이 문서는 사용자가 만족할 **최종 결과물의 매우 구체적 형태**를 동결한다. 각 phase 의 task 분해는 이 contract 위에서 수행되며, 이 문서가 바뀌지 않는 한 task 들의 인터페이스는 변하지 않는다.
전제 보고서는 [`kb_local_rust_report.md`](../../../kb_local_rust_report.md). 그 보고서가 *방향*과 *근거*를 제공하며, 이 문서가 *형태*를 못박는다.
전제 보고서는 [`kebab_local_rust_report.md`](../../../kebab_local_rust_report.md). 그 보고서가 *방향*과 *근거*를 제공하며, 이 문서가 *형태*를 못박는다.
---
@@ -20,7 +20,7 @@ related_tasks: ../../../tasks/INDEX.md
| # | 결정 | 값 | 근거 |
|---|------|-----|------|
| Q1 | scope 우선순위 | UX → Data 역도출 | 사용자 만족이 spec 안정성 lever |
| Q2 | headline UX | `kb ask` 답변 화면 | 검색/citation/RAG/refusal/모델메타 모두 노출 |
| Q2 | headline UX | `kebab ask` 답변 화면 | 검색/citation/RAG/refusal/모델메타 모두 노출 |
| | ask 기본 형식 | inline numeric refs `[1]…[n]` + footer | 일상 가독성 |
| | ask `--explain` | per-claim 분해 + verbose footer + retrieval trace | 디버그 단일 플래그 |
| Q3 | citation 문자열 | URI fragment (`path#k=v…`, W3C Media Fragments) | 표준 정합 + Windows path 안전 + 브라우저 자동 스크롤 |
@@ -34,7 +34,7 @@ related_tasks: ../../../tasks/INDEX.md
| Q10 | workspace | single root + XDG layout | personal v1 적정 |
| | asset 보존 | content-addressable copy, `copy_threshold_mb=100` 초과 시 reference + checksum | reproducibility + 디스크 절감 |
| | wire 버전 | additive within `vN`, breaking → `vN+1` | 외부 깨짐 방지 |
| | ignore | gitignore 문법 + `.kbignore` | 익숙함 |
| | ignore | gitignore 문법 + `.kebabignore` | 익숙함 |
| | 에러 | thiserror per crate, anyhow at boundary | 추적성 + UX |
| | sync | watch=false default | v1 명시 ingest |
@@ -42,43 +42,43 @@ related_tasks: ../../../tasks/INDEX.md
## 1. Headline UX scenes
### 1.1 `kb ask` (default)
### 1.1 `kebab ask` (default)
```text
$ kb ask "Markdown chunking 규칙은?"
$ kebab ask "Markdown chunking 규칙은?"
heading boundary 우선 [1]. code block 중간 분할 금지 [2]. table 가능한 한
단일 chunk 유지 [2]. 긴 section 은 paragraph 단위로 분할 [1]. chunk 마다
heading_path 와 source_span 보존 [1].
─────────────────────────────────────────────────────────
[1] notes/rust/kb-architecture.md#L661-L672
[1] notes/rust/kebab-architecture.md#L661-L672
§14 Chunking 정책
[2] notes/rust/kb-architecture.md#L665-L668
[2] notes/rust/kebab-architecture.md#L665-L668
§14 Chunking 정책
grounded ✓ qwen2.5:14b-instruct rag-v1 3 chunks
```
### 1.2 `kb ask --explain`
### 1.2 `kebab ask --explain`
```text
$ kb ask --explain "Markdown chunking 규칙은?"
$ kebab ask --explain "Markdown chunking 규칙은?"
▎ heading boundary 우선
└ notes/rust/kb-architecture.md#L662
└ notes/rust/kebab-architecture.md#L662
「heading boundary를 우선한다」
▎ code block 중간 분할 금지
└ notes/rust/kb-architecture.md#L663
└ notes/rust/kebab-architecture.md#L663
「code block은 중간에서 자르지 않는다」
▎ table 단일 chunk 유지
└ notes/rust/kb-architecture.md#L664
└ notes/rust/kebab-architecture.md#L664
「table은 가능한 한 하나의 chunk로」
▎ heading_path / source_span 보존
└ notes/rust/kb-architecture.md#L668-L670
└ notes/rust/kebab-architecture.md#L668-L670
retrieval trace
query "Markdown chunking 규칙은?"
@@ -87,8 +87,8 @@ retrieval trace
threshold (gate) 0.30 → top-1 0.82 pass
fusion rrf (k=60)
chunks (used) 3 / 8 returned
#1 0.82 notes/rust/kb-architecture.md#L661-L672 bm25=12.4 vec=0.78
#2 0.78 notes/rust/kb-architecture.md#L692-L713 bm25=10.1 vec=0.74
#1 0.82 notes/rust/kebab-architecture.md#L661-L672 bm25=12.4 vec=0.78
#2 0.78 notes/rust/kebab-architecture.md#L692-L713 bm25=10.1 vec=0.74
#3 0.55 guides/markdown-style.md#L4-L18 bm25=8.2 vec=0.61
grounded ✓ qwen2.5:14b-instruct rag-v1 3 chunks
@@ -96,10 +96,10 @@ prompt 1184 tokens completion 312 tokens latency 1842 ms
embedding multilingual-e5-small index v1.0
```
### 1.3 `kb ask` (refusal — score gate)
### 1.3 `kebab ask` (refusal — score gate)
```text
$ kb ask "당신의 회사 매출은?"
$ kebab ask "당신의 회사 매출은?"
근거 부족. KB 에 해당 내용 없음.
가까운 후보 (모두 임계 0.30 미만):
@@ -108,10 +108,10 @@ $ kb ask "당신의 회사 매출은?"
grounded ✗ qwen2.5:14b-instruct rag-v1 0 chunks used
```
### 1.4 `kb ask` (refusal — LLM self-judge)
### 1.4 `kebab ask` (refusal — LLM self-judge)
```text
$ kb ask "이 책의 23쪽 결론은?"
$ kebab ask "이 책의 23쪽 결론은?"
근거 부족. 제공된 chunk 중 결론 내용 없음.
검색은 됨, LLM 이 결론 부재 판단:
@@ -121,17 +121,17 @@ $ kb ask "이 책의 23쪽 결론은?"
grounded ✗ qwen2.5:14b-instruct rag-v1 3 chunks searched, 0 grounded
```
### 1.5 `kb search` (dense)
### 1.5 `kebab search` (dense)
```text
$ kb search "Markdown chunking 규칙"
$ kebab search "Markdown chunking 규칙"
1. 0.82 notes/rust/kb-architecture.md#L661-L672
1. 0.82 notes/rust/kebab-architecture.md#L661-L672
§14 Chunking 정책
heading boundary 우선. code block 중간 분할 금지.
table 가능한 한 단일 chunk…
2. 0.71 notes/rust/kb-architecture.md#L692-L713
2. 0.71 notes/rust/kebab-architecture.md#L692-L713
§15 검색과 RAG 정책
검색은 처음부터 hybrid 로 설계하되 구현은 단계적…
@@ -142,7 +142,7 @@ $ kb search "Markdown chunking 규칙"
3 hits hybrid index v1.0 bm25+e5-small/RRF
```
### 1.6 `kb search --explain`
### 1.6 `kebab search --explain`
각 hit 아래 추가:
@@ -174,8 +174,8 @@ $ kb search "Markdown chunking 규칙"
{
"schema_version": "citation.v1",
"kind": "line|page|region|caption|time",
"path": "notes/rust/kb.md",
"uri": "notes/rust/kb.md#L12-L34",
"path": "notes/rust/kebab.md",
"uri": "notes/rust/kebab.md#L12-L34",
"line": { "start": 12, "end": 34, "section": "§14 Chunking 정책" },
"page": { "page": 13, "section": "Experiment Setup" },
@@ -196,7 +196,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
"score": 0.82,
"chunk_id": "9b4a8c1e7d3f2a05",
"doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kb-architecture.md",
"doc_path": "notes/rust/kebab-architecture.md",
"heading_path": ["아키텍처", "Chunking 정책"],
"section_label": "§14 Chunking 정책",
"snippet": "heading boundary 우선. code block 중간 분할 금지…",
@@ -261,7 +261,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
{
"kind": "new|updated|skipped|error",
"doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kb-architecture.md",
"doc_path": "notes/rust/kebab-architecture.md",
"asset_id": "8c1e7d3f2a05",
"byte_len": 41822,
"block_count": 184,
@@ -277,13 +277,13 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
`--summary-only``items: null`.
### 2.5 DocSummary (`kb list docs`)
### 2.5 DocSummary (`kebab list docs`)
```json
{
"schema_version": "doc_summary.v1",
"doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kb-architecture.md",
"doc_path": "notes/rust/kebab-architecture.md",
"title": "Rust 로컬 Knowledge Base 설계",
"lang": "ko",
"tags": ["knowledge-base", "rust", "rag"],
@@ -305,7 +305,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
"schema_version": "chunk_inspection.v1",
"chunk_id": "9b4a8c1e7d3f2a05",
"doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kb-architecture.md",
"doc_path": "notes/rust/kebab-architecture.md",
"heading_path": ["아키텍처", "Chunking 정책"],
"text": "heading boundary 우선…",
"source_spans": [{ "kind": "line", "start": 661, "end": 672 }],
@@ -325,9 +325,9 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
"schema_version": "doctor.v1",
"ok": true,
"checks": [
{ "name": "config_loaded", "ok": true, "detail": "~/.config/kb/config.toml" },
{ "name": "data_dir_writable", "ok": true, "detail": "~/.local/share/kb" },
{ "name": "sqlite_open", "ok": true, "detail": "kb.sqlite (schema v1)" },
{ "name": "config_loaded", "ok": true, "detail": "~/.config/kebab/config.toml" },
{ "name": "data_dir_writable", "ok": true, "detail": "~/.local/share/kebab" },
{ "name": "sqlite_open", "ok": true, "detail": "kebab.sqlite (schema v1)" },
{ "name": "lancedb_open", "ok": true, "detail": "lancedb/" },
{ "name": "embedding_model", "ok": true, "detail": "multilingual-e5-small (384d)" },
{ "name": "ollama_reachable", "ok": true, "detail": "http://127.0.0.1:11434" },
@@ -346,7 +346,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
---
## 3. 도메인 모델 (kb-core)
## 3. 도메인 모델 (kebab-core)
### 3.1 Newtype IDs
@@ -581,7 +581,7 @@ pub struct RetrievalDetail {
### 3.7a Forward-declared types
`Block::ImageRef` / `AudioRef` variant 은 v1 부터 존재하나, 그 안의 `ocr` / `caption` / `transcript` 필드는 P1 에선 항상 `None`. 다음 타입은 `kb-core` 에 stub 으로 둠 (최종 도메인 모델 슬롯):
`Block::ImageRef` / `AudioRef` variant 은 v1 부터 존재하나, 그 안의 `ocr` / `caption` / `transcript` 필드는 P1 에선 항상 `None`. 다음 타입은 `kebab-core` 에 stub 으로 둠 (최종 도메인 모델 슬롯):
```rust
pub struct OcrText { pub joined: String, pub regions: Vec<OcrRegion>, pub engine: String, pub engine_version: String }
@@ -596,41 +596,41 @@ pub enum ImageType { Png, Jpeg, Webp, Gif, Tiff, Other(String) }
pub enum AudioType { M4a, Mp3, Wav, Flac, Ogg, Other(String) }
```
`ExtractConfig`, `DocFilter`, `JobKind`, `JobStatus`, `JobFilter`, `JobRow`, `JobId`, `VectorRecord`, `VectorHit`, `RefusalSignal`, `NoHitSignal`, `DoctorUnhealthy``kb-core` 에 정의 (자세한 필드는 사용 시 결정, 이 spec 에서 forward-ref 만 보장).
`ExtractConfig`, `DocFilter`, `JobKind`, `JobStatus`, `JobFilter`, `JobRow`, `JobId`, `VectorRecord`, `VectorHit`, `RefusalSignal`, `NoHitSignal`, `DoctorUnhealthy``kebab-core` 에 정의 (자세한 필드는 사용 시 결정, 이 spec 에서 forward-ref 만 보장).
`OffsetDateTime``time::OffsetDateTime`, `Result` 는 crate-local alias.
### 3.7b Parser intermediate types — `kb-parse-types`
### 3.7b Parser intermediate types — `kebab-parse-types`
Parser 의 *중간* 표현 (`ParsedBlock` 류) 은 `kb-core` 가 아니라 별도의 thin crate **`kb-parse-types`** 에 둔다. 이유: `kb-normalize` 는 medium-agnostic 한 ID/Provenance lift 를 책임지고 어떤 parser 도 직접 import 하면 안 된다. 그러나 normalize 에 들어오는 입력 타입이 어딘가에 정의되어야 하는데, 그것을 `kb-core` 에 박으면 (a) parser-별 ParsedBlock 변종 (`ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`) 이 향후 합류할 때 core 의 namespace 가 폭발하고, (b) parser 의 의미 변경이 core 변경이 되어 모든 의존자가 영향을 받는다.
Parser 의 *중간* 표현 (`ParsedBlock` 류) 은 `kebab-core` 가 아니라 별도의 thin crate **`kebab-parse-types`** 에 둔다. 이유: `kebab-normalize` 는 medium-agnostic 한 ID/Provenance lift 를 책임지고 어떤 parser 도 직접 import 하면 안 된다. 그러나 normalize 에 들어오는 입력 타입이 어딘가에 정의되어야 하는데, 그것을 `kebab-core` 에 박으면 (a) parser-별 ParsedBlock 변종 (`ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`) 이 향후 합류할 때 core 의 namespace 가 폭발하고, (b) parser 의 의미 변경이 core 변경이 되어 모든 의존자가 영향을 받는다.
`kb-parse-types` 는 이 둘 사이의 **유일한 layer** 다. 의존 그래프:
`kebab-parse-types` 는 이 둘 사이의 **유일한 layer** 다. 의존 그래프:
```text
kb-core (도메인 모델 — Block, Chunk, SourceSpan, IDs, …)
kebab-core (도메인 모델 — Block, Chunk, SourceSpan, IDs, …)
kb-parse-types (parser 중간 표현 — ParsedBlock, ParsedImageRegion[P+], ParsedPdfPage[P+], ParsedAudioSegment[P+], Inline)
kebab-parse-types (parser 중간 표현 — ParsedBlock, ParsedImageRegion[P+], ParsedPdfPage[P+], ParsedAudioSegment[P+], Inline)
▲ ▲
│ │
kb-parse-md, kb-parse-pdf, kb-normalize
kb-parse-image, kb-parse-audio
kebab-parse-md, kebab-parse-pdf, kebab-normalize
kebab-parse-image, kebab-parse-audio
```
`kb-parse-types` 는:
- `kb-core` 에만 의존 (`Block`, `SourceSpan`, `Lang` 등 도메인 타입 사용).
- 다른 어떤 `kb-*` 에도 의존하지 않는다.
`kebab-parse-types` 는:
- `kebab-core` 에만 의존 (`Block`, `SourceSpan`, `Lang` 등 도메인 타입 사용).
- 다른 어떤 `kebab-*` 에도 의존하지 않는다.
- 어떤 parser 의 구체 라이브러리 (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`) 에도 의존하지 않는다.
- serde + thiserror 정도의 외부 의존만 가진다.
P1 에서 정의되는 타입:
```rust
// kb-parse-types — depends on kb-core only.
// kebab-parse-types — depends on kebab-core only.
pub struct ParsedBlock {
pub kind: ParsedBlockKind,
pub heading_path: Vec<String>,
pub source_span: kb_core::SourceSpan,
pub source_span: kebab_core::SourceSpan,
pub payload: ParsedPayload,
}
@@ -638,11 +638,11 @@ pub enum ParsedBlockKind { Heading, Paragraph, List, Code, Table, Quote, ImageRe
pub enum ParsedPayload {
Heading { level: u8, text: String },
Paragraph { text: String, inlines: Vec<kb_core::Inline> },
List { ordered: bool, items: Vec<Vec<kb_core::Inline>> },
Paragraph { text: String, inlines: Vec<kebab_core::Inline> },
List { ordered: bool, items: Vec<Vec<kebab_core::Inline>> },
Code { lang: Option<String>, code: String },
Table { headers: Vec<String>, rows: Vec<Vec<String>> },
Quote { text: String, inlines: Vec<kb_core::Inline> },
Quote { text: String, inlines: Vec<kebab_core::Inline> },
ImageRef { src: String, alt: String },
AudioRef { src: String }, // duration_ms filled by extractor before chunking
}
@@ -651,7 +651,7 @@ pub struct Warning { pub kind: WarningKind, pub note: String }
pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed }
```
`Inline``kb-core` (§3.4) 에 있는 도메인 타입. `kb-parse-types` 는 그것을 *참조* 만 한다 — 같은 의미를 두 crate 에 중복 정의하지 않는다 (그러면 normalize 가 identity-conversion 을 해야 해서 무의미).
`Inline``kebab-core` (§3.4) 에 있는 도메인 타입. `kebab-parse-types` 는 그것을 *참조* 만 한다 — 같은 의미를 두 crate 에 중복 정의하지 않는다 (그러면 normalize 가 identity-conversion 을 해야 해서 무의미).
P6/P7/P8 에서 추가될 타입 (forward-ref):
@@ -661,7 +661,7 @@ pub struct ParsedPdfPage { pub page: u32, pub text: String }
pub struct ParsedAudioSegment { pub start_ms: u64, pub end_ms: u64, pub text: String }
```
→ 새 medium 추가 시 `kb-core::Block` 변종은 변하지 않고, `kb-parse-types` 만 확장된다.
→ 새 medium 추가 시 `kebab-core::Block` 변종은 변하지 않고, `kebab-parse-types` 만 확장된다.
### 3.8 Answer / RAG types
@@ -995,28 +995,28 @@ CREATE TABLE eval_query_results (
| 종류 | 기본 위치 |
|------|-----------|
| 워크스페이스 | `~/KnowledgeBase/` |
| config | `~/.config/kb/config.toml` |
| data | `~/.local/share/kb/` |
| cache | `~/.cache/kb/` |
| state (logs) | `~/.local/state/kb/` |
| config | `~/.config/kebab/config.toml` |
| data | `~/.local/share/kebab/` |
| cache | `~/.cache/kebab/` |
| state (logs) | `~/.local/state/kebab/` |
`~`, `$HOME`, `${KB_*}` expand. 절대 path 정규화 후 사용.
`~`, `$HOME`, `${KEBAB_*}` expand. 절대 path 정규화 후 사용.
### 6.2 Workspace 구조
```
~/KnowledgeBase/
├── inbox/ notes/ papers/ photos/ recordings/
└── .kbignore
└── .kebabignore
```
`.kbignore``config.workspace.exclude` 합집합.
`.kebabignore``config.workspace.exclude` 합집합.
### 6.3 Data dir 구조
```
~/.local/share/kb/
├── kb.sqlite (+ -wal, -shm)
~/.local/share/kebab/
├── kebab.sqlite (+ -wal, -shm)
├── lancedb/
│ └── chunk_embeddings_<model>_<dim>.lance/
├── assets/<aa>/<asset_id> # shard
@@ -1025,7 +1025,7 @@ CREATE TABLE eval_query_results (
└── runs/<run_id>/ # eval per_query.jsonl + report.md
```
### 6.4 Config (`~/.config/kb/config.toml`) — frozen schema
### 6.4 Config (`~/.config/kebab/config.toml`) — frozen schema
```toml
schema_version = 1
@@ -1036,8 +1036,8 @@ include = ["**/*.md"]
exclude = [".git/**", "node_modules/**", ".obsidian/**"]
[storage]
data_dir = "${XDG_DATA_HOME:-~/.local/share}/kb"
sqlite = "{data_dir}/kb.sqlite"
data_dir = "${XDG_DATA_HOME:-~/.local/share}/kebab"
sqlite = "{data_dir}/kebab.sqlite"
vector_dir = "{data_dir}/lancedb"
asset_dir = "{data_dir}/assets"
artifact_dir = "{data_dir}/artifacts"
@@ -1086,15 +1086,15 @@ max_context_tokens = 8000
config 우선순위: default → file → env (`KB_<SECTION>_<KEY>`) → CLI flag.
### 6.5 `kb init` 출력
### 6.5 `kebab init` 출력
```text
$ kb init
created ~/.config/kb/config.toml
created ~/.local/share/kb/
$ kebab init
created ~/.config/kebab/config.toml
created ~/.local/share/kebab/
created ~/KnowledgeBase/
opened ~/.local/share/kb/kb.sqlite (schema v1)
hint edit ~/.config/kb/config.toml then `kb ingest ~/KnowledgeBase`
opened ~/.local/share/kebab/kebab.sqlite (schema v1)
hint edit ~/.config/kebab/config.toml then `kebab ingest ~/KnowledgeBase`
```
기존 파일 보존, `--force` 명시 필요.
@@ -1107,7 +1107,7 @@ hint edit ~/.config/kb/config.toml then `kb ingest ~/KnowledgeBase`
---
## 7. Trait contracts (kb-core)
## 7. Trait contracts (kebab-core)
### 7.1 입출력 보조
@@ -1210,34 +1210,34 @@ pub trait JobRepo {
## 8. 모듈 경계 (Allowed / Forbidden)
```text
kb-cli, kb-tui, kb-desktop
└─> kb-app
├─> kb-source-fs
├─> kb-parse-md / kb-parse-pdf / kb-parse-image / kb-parse-audio
│ └─> kb-parse-types (parser intermediate)
├─> kb-normalize
│ └─> kb-parse-types
├─> kb-chunk
├─> kb-store-sqlite (DocumentStore, JobRepo, Retriever[lexical])
├─> kb-store-vector (VectorStore)
├─> kb-embed-local
├─> kb-search (Retriever[hybrid])
├─> kb-llm-local
├─> kb-rag
├─> kb-eval
└─> kb-config
└─> kb-core (모두 의존)
kebab-cli, kebab-tui, kebab-desktop
└─> kebab-app
├─> kebab-source-fs
├─> kebab-parse-md / kebab-parse-pdf / kebab-parse-image / kebab-parse-audio
│ └─> kebab-parse-types (parser intermediate)
├─> kebab-normalize
│ └─> kebab-parse-types
├─> kebab-chunk
├─> kebab-store-sqlite (DocumentStore, JobRepo, Retriever[lexical])
├─> kebab-store-vector (VectorStore)
├─> kebab-embed-local
├─> kebab-search (Retriever[hybrid])
├─> kebab-llm-local
├─> kebab-rag
├─> kebab-eval
└─> kebab-config
└─> kebab-core (모두 의존)
```
`kb-parse-types``kb-core` 와 parsers/normalize 사이의 thin layer (§3.7b 참조). parser-별 중간 표현 (`ParsedBlock`, `ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`, `Inline`) 을 한 곳에 모아 (a) `kb-core` 의 namespace 폭발을 막고 (b) `kb-normalize` 가 parser 를 직접 import 하지 않게 한다.
`kebab-parse-types``kebab-core` 와 parsers/normalize 사이의 thin layer (§3.7b 참조). parser-별 중간 표현 (`ParsedBlock`, `ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`, `Inline`) 을 한 곳에 모아 (a) `kebab-core` 의 namespace 폭발을 막고 (b) `kebab-normalize` 가 parser 를 직접 import 하지 않게 한다.
핵심 금지:
- UI → store/llm/parse 직접 의존 ✗
- parse-* → store/llm/embed ✗
- parse-* → kb-normalize ✗ (단방향: parsers → kb-parse-types ← normalize)
- parse-* → kebab-normalize ✗ (단방향: parsers → kebab-parse-types ← normalize)
- chunk → llm/embed ✗
- normalize → store / parse-* ✗
- kb-parse-types → 어떤 parser/normalize/store/llm/embed/search/rag/ui ✗ (`kb-core` 만 의존)
- kebab-parse-types → 어떤 parser/normalize/store/llm/embed/search/rag/ui ✗ (`kebab-core` 만 의존)
- 다른 store 와 cross-write ✗
`cargo deny` + workspace deny.toml + CI 체크로 강제.
@@ -1270,7 +1270,7 @@ CI:
## 10. 에러 모델 + exit codes
```rust
// kb-core
// kebab-core
pub enum CoreError { InvalidId, InvalidCitation, InvalidSpan, Malformed }
// crate-local examples
pub enum ParseMdError { Yaml(String), Encoding, Pulldown(String), Span }
@@ -1278,7 +1278,7 @@ pub enum StoreError { Sqlx(rusqlite::Error), Migration(String), Conflict(String)
pub enum LlmError { Unreachable, ModelNotPulled(String), Timeout, Stream(String) }
```
Boundary (`kb-app`, `kb-cli`) 에서 `anyhow::Error` 합침. exit code 매핑:
Boundary (`kebab-app`, `kebab-cli`) 에서 `anyhow::Error` 합침. exit code 매핑:
```rust
fn exit_code(err: &anyhow::Error) -> i32 {
@@ -1295,17 +1295,17 @@ fn exit_code(err: &anyhow::Error) -> i32 {
| `--verbose` | + anyhow chain |
| `--debug` 또는 `RUST_LOG=debug` | + tracing target/level/span |
Refusal 은 에러 아님. `kb ask` 거절은 정상 stdout (Answer with grounded=false) + exit 1.
Refusal 은 에러 아님. `kebab ask` 거절은 정상 stdout (Answer with grounded=false) + exit 1.
Logging: `tracing` + `tracing-subscriber` + `tracing-appender` daily roll, `~/.local/state/kb/logs/`. structured (`trace_id`, `doc_id`, `chunk_id`).
Logging: `tracing` + `tracing-subscriber` + `tracing-appender` daily roll, `~/.local/state/kebab/logs/`. structured (`trace_id`, `doc_id`, `chunk_id`).
`kb doctor` 출력 (사람):
`kebab doctor` 출력 (사람):
```text
$ kb doctor
✓ config_loaded ~/.config/kb/config.toml
✓ data_dir_writable ~/.local/share/kb
✓ sqlite_open kb.sqlite (schema v1)
$ kebab doctor
✓ config_loaded ~/.config/kebab/config.toml
✓ data_dir_writable ~/.local/share/kebab
✓ sqlite_open kebab.sqlite (schema v1)
✓ lancedb_open lancedb/
✓ embedding_model multilingual-e5-small (384d)
✓ ollama_reachable http://127.0.0.1:11434
@@ -1322,7 +1322,7 @@ $ kb doctor
이 문서가 동결 ↔ 다음 컴포넌트 분해 작업이 안전:
- 모든 wire schema (`docs/wire-schema/v1/*.schema.json`)
- 모든 trait 시그니처 (kb-core)
- 모든 trait 시그니처 (kebab-core)
- 모든 ID recipe (4.2)
- SQLite DDL (5장)
- Filesystem + config schema (6장)
@@ -1334,7 +1334,7 @@ $ kb doctor
**의도적으로 빠진 것 (out of scope, P+)**:
- multi-workspace
- watch mode
- desktop app `kb://` protocol handler
- desktop app `kebab://` protocol handler
- LLM-as-judge eval
- visual embedding (CLIP)
- real-time collab

View File

@@ -17,11 +17,11 @@ urlcolor: blue
최종 방향은 다음 한 문장으로 요약할 수 있다.
> Markdown을 1등급 지식 소스로 삼고, 이미지, PDF, 음성은 각각 extractor adapter를 통해 동일한 `CanonicalDocument -> Chunk -> Embed -> Index -> Search -> RAG` 파이프라인으로 흘려보낸다. CLI, TUI, desktop app은 모두 같은 `kb-app` facade를 함수 호출로 사용한다.
> Markdown을 1등급 지식 소스로 삼고, 이미지, PDF, 음성은 각각 extractor adapter를 통해 동일한 `CanonicalDocument -> Chunk -> Embed -> Index -> Search -> RAG` 파이프라인으로 흘려보낸다. CLI, TUI, desktop app은 모두 같은 `kebab-app` facade를 함수 호출로 사용한다.
가장 먼저 만들 것은 채팅 UI가 아니다. 먼저 만들어야 할 것은 다음 7가지다.
1. `kb-core`의 도메인 모델과 trait 계약
1. `kebab-core`의 도메인 모델과 trait 계약
2. deterministic ID 규칙
3. Markdown canonicalization
4. chunking policy
@@ -66,16 +66,16 @@ urlcolor: blue
Markdown files
|
v
kb-source-fs
kebab-source-fs
|
v
kb-parse-md
kebab-parse-md
|
v
CanonicalDocument
|
v
kb-chunk
kebab-chunk
|
v
Chunks
@@ -88,15 +88,15 @@ SQLite metadata/FTS LanceDB vectors Raw asset store
+--------------------+--------------------+
|
v
kb-search
kebab-search
|
v
kb-rag
kebab-rag
|
+---------------+---------------+
| | |
v v v
kb-cli kb-tui kb-desktop
kebab-cli kebab-tui kebab-desktop
```
추후 확장 후 구조는 다음과 같다.
@@ -133,23 +133,23 @@ Cargo workspace는 여러 package를 함께 관리하는 구조이며, 공통 `C
[workspace]
resolver = "3"
members = [
"crates/kb-core",
"crates/kb-config",
"crates/kb-source-fs",
"crates/kb-parse-md",
"crates/kb-normalize",
"crates/kb-chunk",
"crates/kb-store-sqlite",
"crates/kb-store-vector",
"crates/kb-embed",
"crates/kb-embed-local",
"crates/kb-search",
"crates/kb-llm",
"crates/kb-llm-local",
"crates/kb-rag",
"crates/kb-eval",
"crates/kb-app",
"crates/kb-cli"
"crates/kebab-core",
"crates/kebab-config",
"crates/kebab-source-fs",
"crates/kebab-parse-md",
"crates/kebab-normalize",
"crates/kebab-chunk",
"crates/kebab-store-sqlite",
"crates/kebab-store-vector",
"crates/kebab-embed",
"crates/kebab-embed-local",
"crates/kebab-search",
"crates/kebab-llm",
"crates/kebab-llm-local",
"crates/kebab-rag",
"crates/kebab-eval",
"crates/kebab-app",
"crates/kebab-cli"
]
[workspace.package]
@@ -173,7 +173,7 @@ tracing = "0.1"
초기 repo는 이렇게 잡는다.
```text
kb/
kebab/
Cargo.toml
README.md
docs/
@@ -191,70 +191,70 @@ kb/
nested-headings.md
code-and-table.md
crates/
kb-core/
kb-config/
kb-source-fs/
kb-parse-md/
kb-normalize/
kb-chunk/
kb-store-sqlite/
kb-store-vector/
kb-embed/
kb-embed-local/
kb-search/
kb-llm/
kb-llm-local/
kb-rag/
kb-eval/
kb-app/
kb-cli/
kebab-core/
kebab-config/
kebab-source-fs/
kebab-parse-md/
kebab-normalize/
kebab-chunk/
kebab-store-sqlite/
kebab-store-vector/
kebab-embed/
kebab-embed-local/
kebab-search/
kebab-llm/
kebab-llm-local/
kebab-rag/
kebab-eval/
kebab-app/
kebab-cli/
```
나중에 추가할 crate는 다음과 같다.
```text
crates/kb-parse-image/
crates/kb-parse-pdf/
crates/kb-parse-audio/
crates/kb-rerank/
crates/kb-tui/
crates/kb-desktop/
crates/kebab-parse-image/
crates/kebab-parse-pdf/
crates/kebab-parse-audio/
crates/kebab-rerank/
crates/kebab-tui/
crates/kebab-desktop/
```
중요한 의존성 규칙은 다음과 같다.
```text
kb-cli, kb-tui, kb-desktop
-> kb-app
-> kb-index / kb-search / kb-rag
-> kb-core traits
kebab-cli, kebab-tui, kebab-desktop
-> kebab-app
-> kebab-index / kebab-search / kebab-rag
-> kebab-core traits
-> concrete adapters
```
UI crate는 절대로 parser, DB, LLM adapter를 직접 호출하지 않는다. 모든 user-facing command는 `kb-app` facade를 통해 호출한다.
UI crate는 절대로 parser, DB, LLM adapter를 직접 호출하지 않는다. 모든 user-facing command는 `kebab-app` facade를 통해 호출한다.
# 5. 컴포넌트 목록과 책임
| 컴포넌트 | 책임 | 초기 구현 |
|---|---|---|
| `kb-core` | domain type, trait, error, ID 규칙 | 필수 |
| `kb-config` | config 파일 로딩, 기본값, 경로 확장 | 필수 |
| `kb-source-fs` | 로컬 폴더 scan, checksum, 변경 감지 | 필수 |
| `kb-parse-md` | Markdown -> structured document | 필수 |
| `kb-normalize` | parser output -> `CanonicalDocument` | 필수 |
| `kb-chunk` | block-aware chunking | 필수 |
| `kb-store-sqlite` | metadata, document, chunk, job, FTS | 필수 |
| `kb-store-vector` | vector upsert/search | P1 |
| `kb-embed` | embedding trait | P1 |
| `kb-embed-local` | local embedding adapter | P1 |
| `kb-search` | lexical, vector, hybrid retrieval | P1 |
| `kb-llm` | language model trait | P1 |
| `kb-llm-local` | Ollama 또는 llama.cpp adapter | P1 |
| `kb-rag` | context packing, answer, citation | P1 |
| `kb-eval` | golden query, regression test | P1 |
| `kb-cli` | command line interface | 필수 |
| `kb-tui` | terminal UI | P2 |
| `kb-desktop` | desktop app | P3 |
| `kebab-core` | domain type, trait, error, ID 규칙 | 필수 |
| `kebab-config` | config 파일 로딩, 기본값, 경로 확장 | 필수 |
| `kebab-source-fs` | 로컬 폴더 scan, checksum, 변경 감지 | 필수 |
| `kebab-parse-md` | Markdown -> structured document | 필수 |
| `kebab-normalize` | parser output -> `CanonicalDocument` | 필수 |
| `kebab-chunk` | block-aware chunking | 필수 |
| `kebab-store-sqlite` | metadata, document, chunk, job, FTS | 필수 |
| `kebab-store-vector` | vector upsert/search | P1 |
| `kebab-embed` | embedding trait | P1 |
| `kebab-embed-local` | local embedding adapter | P1 |
| `kebab-search` | lexical, vector, hybrid retrieval | P1 |
| `kebab-llm` | language model trait | P1 |
| `kebab-llm-local` | Ollama 또는 llama.cpp adapter | P1 |
| `kebab-rag` | context packing, answer, citation | P1 |
| `kebab-eval` | golden query, regression test | P1 |
| `kebab-cli` | command line interface | 필수 |
| `kebab-tui` | terminal UI | P2 |
| `kebab-desktop` | desktop app | P3 |
# 6. 핵심 도메인 모델
@@ -398,10 +398,10 @@ Markdown frontmatter 기본 규약은 다음 정도로 시작한다.
```yaml
---
id: rust-kb-architecture
id: rust-kebab-architecture
title: Rust 로컬 Knowledge Base 설계
aliases:
- local kb
- local kebab
- rust rag
tags:
- knowledge-base
@@ -418,7 +418,7 @@ lang: ko
Markdown citation은 line range를 기본으로 한다.
```text
notes/rust/kb.md:L12-L34
notes/rust/kebab.md:L12-L34
```
# 9. 이미지, PDF, 음성 확장 전략
@@ -534,13 +534,13 @@ Ollama 문서는 macOS Sonoma 이상에서 Apple M series CPU/GPU support를 언
초기 adapter는 다음처럼 둔다.
```text
kb-llm-local
kebab-llm-local
- OllamaLanguageModel
- later: LlamaCppLanguageModel
- later: CandleLanguageModel
```
Ollama가 내부적으로 local server를 쓰더라도, 프로젝트 아키텍처 관점에서는 HTTP MSA가 아니다. `kb-llm-local` 안에 캡슐화된 model adapter일 뿐이다.
Ollama가 내부적으로 local server를 쓰더라도, 프로젝트 아키텍처 관점에서는 HTTP MSA가 아니다. `kebab-llm-local` 안에 캡슐화된 model adapter일 뿐이다.
## 11.3 Local embedding
@@ -583,10 +583,10 @@ M4 48GB MacBook은 개인용 local KB에 충분한 타겟이지만, indexing과
root = "~/KnowledgeBase"
[storage]
sqlite_path = "~/.local/share/kb/kb.sqlite"
vector_path = "~/.local/share/kb/lancedb"
raw_asset_path = "~/.local/share/kb/assets"
artifact_path = "~/.local/share/kb/artifacts"
sqlite_path = "~/.local/share/kebab/kebab.sqlite"
vector_path = "~/.local/share/kebab/lancedb"
raw_asset_path = "~/.local/share/kebab/assets"
artifact_path = "~/.local/share/kebab/artifacts"
[indexing]
max_parallel_extractors = 2
@@ -609,7 +609,7 @@ context_tokens = 32768
# 13. Trait 계약
컴포넌트는 trait으로 연결한다. 아래 계약을 `kb-core`에 둔다.
컴포넌트는 trait으로 연결한다. 아래 계약을 `kebab-core`에 둔다.
```rust
pub trait SourceConnector {
@@ -741,14 +741,14 @@ pub struct Answer {
CLI는 가장 먼저 만든다.
```text
kb init
kb ingest <path>
kb index
kb search <query>
kb ask <query>
kb inspect doc <doc_id>
kb inspect chunk <chunk_id>
kb doctor
kebab init
kebab ingest <path>
kebab index
kebab search <query>
kebab ask <query>
kebab inspect doc <doc_id>
kebab inspect chunk <chunk_id>
kebab doctor
```
CLI는 개발과 테스트의 기준점이다. TUI와 desktop app은 CLI 기능이 안정된 뒤 붙인다.
@@ -791,10 +791,10 @@ Desktop app은 가장 나중에 만든다. 이유는 UI보다 먼저 domain mode
산출물:
```text
kb-core
kb-config
kb-app
kb-cli
kebab-core
kebab-config
kebab-app
kebab-cli
docs/spec/*
fixtures/markdown/*
```
@@ -804,7 +804,7 @@ fixtures/markdown/*
```text
cargo check --workspace
cargo test --workspace
kb --help
kebab --help
```
## Phase 1 - Markdown ingestion
@@ -814,19 +814,19 @@ kb --help
구현 crate:
```text
kb-source-fs
kb-parse-md
kb-normalize
kb-chunk
kb-store-sqlite
kebab-source-fs
kebab-parse-md
kebab-normalize
kebab-chunk
kebab-store-sqlite
```
완료 조건:
```text
kb ingest ~/KnowledgeBase
kb list docs
kb inspect doc <doc_id>
kebab ingest ~/KnowledgeBase
kebab list docs
kebab inspect doc <doc_id>
```
## Phase 2 - Lexical search
@@ -836,14 +836,14 @@ kb inspect doc <doc_id>
완료 조건:
```text
kb search "Rust workspace 설계"
kebab search "Rust workspace 설계"
```
결과는 citation을 포함해야 한다.
```text
1. Rust workspace는 여러 package를 하나로 관리한다...
source: notes/rust/kb.md:L12-L34
source: notes/rust/kebab.md:L12-L34
```
## Phase 3 - Vector search와 embedding
@@ -853,18 +853,18 @@ kb search "Rust workspace 설계"
구현 crate:
```text
kb-embed
kb-embed-local
kb-store-vector
kb-search
kebab-embed
kebab-embed-local
kebab-store-vector
kebab-search
```
완료 조건:
```text
kb index --embeddings
kb search --mode vector "비슷한 설계 원칙"
kb search --mode hybrid "Markdown chunking 규칙"
kebab index --embeddings
kebab search --mode vector "비슷한 설계 원칙"
kebab search --mode hybrid "Markdown chunking 규칙"
```
## Phase 4 - Local LLM RAG
@@ -874,15 +874,15 @@ kb search --mode hybrid "Markdown chunking 규칙"
구현 crate:
```text
kb-llm
kb-llm-local
kb-rag
kebab-llm
kebab-llm-local
kebab-rag
```
완료 조건:
```text
kb ask "내 KB 설계에서 저장소 전략은?"
kebab ask "내 KB 설계에서 저장소 전략은?"
```
답변은 citation을 포함해야 하며, 근거가 없으면 거절해야 한다.
@@ -895,7 +895,7 @@ kb ask "내 KB 설계에서 저장소 전략은?"
```text
fixtures/golden_queries.yaml
kb-eval
kebab-eval
```
측정값:
@@ -915,8 +915,8 @@ answer groundedness
완료 조건:
```text
kb ingest ./assets/diagram.png
kb search "이미지 안의 OCR 텍스트"
kebab ingest ./assets/diagram.png
kebab search "이미지 안의 OCR 텍스트"
```
## Phase 7 - PDF support
@@ -926,8 +926,8 @@ kb search "이미지 안의 OCR 텍스트"
완료 조건:
```text
kb ingest ./paper.pdf
kb search "PDF 안의 특정 개념"
kebab ingest ./paper.pdf
kebab search "PDF 안의 특정 개념"
```
## Phase 8 - 음성 support
@@ -937,8 +937,8 @@ kb search "PDF 안의 특정 개념"
완료 조건:
```text
kb ingest ./meeting.m4a
kb search "회의에서 언급한 결정사항"
kebab ingest ./meeting.m4a
kebab search "회의에서 언급한 결정사항"
```
## Phase 9 - TUI와 desktop app
@@ -948,7 +948,7 @@ kb search "회의에서 언급한 결정사항"
순서:
```text
kb-tui -> kb-desktop
kebab-tui -> kebab-desktop
```
# 18. 테스트 전략
@@ -986,7 +986,7 @@ AI에게 “전체 repo를 만들어줘”라고 시키지 말고, component spe
템플릿은 다음과 같다.
```text
Component: kb-parse-md
Component: kebab-parse-md
Responsibility:
- Markdown bytes를 CanonicalDocument로 변환한다.
@@ -994,17 +994,17 @@ Responsibility:
- line range 또는 byte range를 최대한 보존한다.
Allowed dependencies:
- kb-core
- kebab-core
- pulldown-cmark 또는 comrak
- serde
- thiserror
Forbidden dependencies:
- kb-store
- kb-llm
- kb-rag
- kb-tui
- kb-desktop
- kebab-store
- kebab-llm
- kebab-rag
- kebab-tui
- kebab-desktop
Inputs:
- RawAsset
@@ -1052,7 +1052,7 @@ Non-goals:
## 1-2일차
- workspace 생성
- `kb-core` 도메인 타입 초안
- `kebab-core` 도메인 타입 초안
- ID 규칙 문서화
- CLI skeleton
@@ -1073,7 +1073,7 @@ Non-goals:
- SQLite FTS5 검색
- citation 출력
- `kb inspect` 구현
- `kebab inspect` 구현
## 12-14일차
@@ -1100,7 +1100,7 @@ MVP 완료 조건은 다음과 같다.
- [ ] parser/chunker version을 바꾸면 재처리 대상이 식별된다.
- [ ] local embedding을 붙일 수 있는 trait이 있다.
- [ ] local LLM을 붙일 수 있는 trait이 있다.
- [ ] `kb-app` facade를 통해 CLI가 동작한다.
- [ ] `kebab-app` facade를 통해 CLI가 동작한다.
P1 완료 조건은 다음과 같다.

View File

@@ -14,30 +14,30 @@ historical contract that was implemented; this file accumulates the
deltas so phase 5+ readers can find the live behavior without diffing
git history.
## 2026-05-01 — `--config` flag silently ignored across all kb-cli subcommands
## 2026-05-01 — `--config` flag silently ignored across all kebab-cli subcommands
**Discovered**: post-P3-5 manual smoke at `/tmp/kb-smoke/`.
**Discovered**: post-P3-5 manual smoke at `/tmp/kebab-smoke/`.
**Symptom**: `kb --config /path/to/config.toml ingest|search|list|inspect|doctor` ignored the flag and fell back to `~/.config/kb/config.toml` (XDG default). Users had to use `KB_*` env vars to point at a non-default config.
**Symptom**: `kebab --config /path/to/config.toml ingest|search|list|inspect|doctor` ignored the flag and fell back to `~/.config/kebab/config.toml` (XDG default). Users had to use `KEBAB_*` env vars to point at a non-default config.
**Root cause**: `kb-cli` read `cli.config` only inside `Cmd::Ingest` to build `SourceScope`, then called bare `kb_app::ingest(scope, summary_only)` which internally re-loaded `Config::load(None)` (XDG path). Same pattern in `Cmd::Search` / `List` / `Inspect` / `Doctor`. P3-5 introduced `*_with_config` test seams via `#[doc(hidden)] pub fn` but kb-cli never used them.
**Root cause**: `kebab-cli` read `cli.config` only inside `Cmd::Ingest` to build `SourceScope`, then called bare `kebab_app::ingest(scope, summary_only)` which internally re-loaded `Config::load(None)` (XDG path). Same pattern in `Cmd::Search` / `List` / `Inspect` / `Doctor`. P3-5 introduced `*_with_config` test seams via `#[doc(hidden)] pub fn` but kebab-cli never used them.
**Fix** (PR #20, fix/cli-config-flag-and-search-output):
- `kb-cli` now builds the Config once via `Config::load(cli.config.as_deref())` at the top of every subcommand and threads it into `kb_app::*_with_config(cfg, ...)` instead of `kb_app::*(...)`.
- `kb_app::doctor()` rewritten as `doctor_with_config_path(Option<&Path>)` that reports the actual path probed and hard-fails when `--config <path>` doesn't exist (defaults would otherwise mask user intent).
- `kb-app` module doc-comment updated: `#[doc(hidden)] pub fn *_with_config` is no longer "test-only seam" — it's the official "config-explicit" API consumed by CLI `--config`, integration tests, and TUI sessions.
- Same PR also improved `kb search` printer: `{:.4}` score formatting (RRF range collapses on `{:.2}`) and `> heading_path` suffix so chunks from the same document are visually distinct.
- `kebab-cli` now builds the Config once via `Config::load(cli.config.as_deref())` at the top of every subcommand and threads it into `kebab_app::*_with_config(cfg, ...)` instead of `kebab_app::*(...)`.
- `kebab_app::doctor()` rewritten as `doctor_with_config_path(Option<&Path>)` that reports the actual path probed and hard-fails when `--config <path>` doesn't exist (defaults would otherwise mask user intent).
- `kebab-app` module doc-comment updated: `#[doc(hidden)] pub fn *_with_config` is no longer "test-only seam" — it's the official "config-explicit" API consumed by CLI `--config`, integration tests, and TUI sessions.
- Same PR also improved `kebab search` printer: `{:.4}` score formatting (RRF range collapses on `{:.2}`) and `> heading_path` suffix so chunks from the same document are visually distinct.
**Amends**: tasks/p3/p3-5-app-wiring.md (the test seam was always meant to be the config-explicit API; only the doc-comment lied).
### 2026-05-01 — `--config` regression in `kb ask` (P4-3 follow-up)
### 2026-05-01 — `--config` regression in `kebab ask` (P4-3 follow-up)
**Discovered**: post-P4-3 manual smoke against 192.168.0.47 Ollama with `gemma4:26b`.
**Symptom**: `kb --config <path> ask` returned `model.id = qwen2.5:14b-instruct` (XDG default model) and `score_gate = 0.30` (XDG default), instead of `gemma4:26b` / `0.05` from the explicit config. P4-3 added the ask body but kb-cli's `Cmd::Ask` arm still called bare `kb_app::ask(query, opts)` — same regression class as the P3-5 fix above, just missed when ask was wired.
**Symptom**: `kebab --config <path> ask` returned `model.id = qwen2.5:14b-instruct` (XDG default model) and `score_gate = 0.30` (XDG default), instead of `gemma4:26b` / `0.05` from the explicit config. P4-3 added the ask body but kebab-cli's `Cmd::Ask` arm still called bare `kebab_app::ask(query, opts)` — same regression class as the P3-5 fix above, just missed when ask was wired.
**Fix** (PR #24, fix/cli-ask-honor-config-flag):
- `kb-cli` builds `Config::load(cli.config.as_deref())` once at the top of `Cmd::Ask` and calls `kb_app::ask_with_config(cfg, query, opts)`.
- `kebab-cli` builds `Config::load(cli.config.as_deref())` once at the top of `Cmd::Ask` and calls `kebab_app::ask_with_config(cfg, query, opts)`.
**Amends**: tasks/p4/p4-3-rag-pipeline.md.
@@ -48,15 +48,15 @@ git history.
**Root cause**: RRF formula `score(c) = Σ 1/(k_rrf + rank_m(c))` produces values bounded by `num_retrievers / (k_rrf + 1)`. With `num_retrievers = 2` and the default `k_rrf = 60`, the upper bound is `2/61 ≈ 0.0328`. The default `config.rag.score_gate = 0.05` was calibrated for vector / lexical scores already in `[0, 1]` and silently refused every hybrid query. `fusion_score` was also incomparable across modes — Lexical / Vector lived in `[0, 1]`, Hybrid lived in `(0, 0.033]`.
**Fix** (PR #25, fix/rrf-fusion-score-normalize-and-docs):
- `crates/kb-search/src/hybrid.rs` divides every raw RRF score by `2 / (k_rrf + 1)` so `fusion_score` always lives in `[0, 1]` regardless of mode. Both retrievers contributing rank 1 normalises to `1.0`; chunks present in only one retriever cap around `0.5`. RRF's rank-ordering invariants are preserved (same constant divides every score), so sort + tiebreak behaviour is identical.
- `crates/kebab-search/src/hybrid.rs` divides every raw RRF score by `2 / (k_rrf + 1)` so `fusion_score` always lives in `[0, 1]` regardless of mode. Both retrievers contributing rank 1 normalises to `1.0`; chunks present in only one retriever cap around `0.5`. RRF's rank-ordering invariants are preserved (same constant divides every score), so sort + tiebreak behaviour is identical.
- One unit test (`rrf_formula_matches_known_value`) updated to expect the normalised value `(1/61 + 1/62) / (2/61) ≈ 0.9919`.
- The integration snapshot `crates/kb-search/tests/fixtures/search/hybrid/run-1.json` already used presence checks (`fusion_score_positive: true`) rather than absolute values, so it didn't need regeneration.
- The integration snapshot `crates/kebab-search/tests/fixtures/search/hybrid/run-1.json` already used presence checks (`fusion_score_positive: true`) rather than absolute values, so it didn't need regeneration.
**Why not a per-mode `score_gate` config**: separate `lexical_score_gate / vector_score_gate / hybrid_score_gate` would force every downstream consumer (CLI, eval, TUI) to know which mode picks which threshold. Normalising the score itself is a one-line change at the source and makes `Answer.retrieval.score_gate` semantically meaningful without per-mode bookkeeping.
**Amends**: tasks/p3/p3-4-hybrid-fusion.md (RRF formula now divides by `2/(k_rrf+1)` after summation), tasks/phase-3-vector-hybrid.md (RRF section).
**Verification**: post-fix smoke at `/tmp/kb-smoke/` with default `score_gate = 0.05` succeeded across four scenarios — Korean→Korean, English→English, cross-language, and out-of-corpus refusal.
**Verification**: post-fix smoke at `/tmp/kebab-smoke/` with default `score_gate = 0.05` succeeded across four scenarios — Korean→Korean, English→English, cross-language, and out-of-corpus refusal.
## How to add an entry

View File

@@ -1,12 +1,12 @@
---
title: "KB 작업 단위 인덱스"
source: kb_local_rust_report.md
source: kebab_local_rust_report.md
date: 2026-04-27
---
# KB 작업 단위 인덱스
[`kb_local_rust_report.md`](../kb_local_rust_report.md) 의 Phase 로드맵을 아키텍처 수준 작업 단위로 분해. 각 task 문서는 독립적으로 착수/검수 가능한 단위.
[`kebab_local_rust_report.md`](../kebab_local_rust_report.md) 의 Phase 로드맵을 아키텍처 수준 작업 단위로 분해. 각 task 문서는 독립적으로 착수/검수 가능한 단위.
## 의존 그래프
@@ -25,16 +25,16 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능.
| # | 코드 | 제목 | 핵심 산출 crate | 선행 |
|---|------|------|----------------|------|
| P0 | [phase-0-skeleton.md](phase-0-skeleton.md) | Workspace 뼈대 + 도메인 계약 | kb-core, kb-parse-types, kb-config, kb-app, kb-cli | |
| P1 | [phase-1-markdown-ingestion.md](phase-1-markdown-ingestion.md) | Markdown ingestion 파이프라인 | kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-sqlite | P0 |
| P2 | [phase-2-lexical-search.md](phase-2-lexical-search.md) | SQLite FTS5 lexical 검색 + citation | kb-search (lexical) | P1 |
| P3 | [phase-3-vector-hybrid.md](phase-3-vector-hybrid.md) | Local embedding + LanceDB + hybrid | kb-embed, kb-embed-local, kb-store-vector, kb-search | P2 |
| P4 | [phase-4-local-llm-rag.md](phase-4-local-llm-rag.md) | Local LLM + RAG + grounded answer | kb-llm, kb-llm-local, kb-rag | P3 |
| P5 | [phase-5-evaluation.md](phase-5-evaluation.md) | Golden query / regression eval | kb-eval | P4 |
| P6 | [phase-6-image.md](phase-6-image.md) | 이미지 ingestion (OCR + caption) | kb-parse-image | P5 |
| P7 | [phase-7-pdf.md](phase-7-pdf.md) | PDF text + page citation | kb-parse-pdf | P5 |
| P8 | [phase-8-audio.md](phase-8-audio.md) | 음성 transcription + timestamp citation | kb-parse-audio | P5 |
| P9 | [phase-9-ui.md](phase-9-ui.md) | TUI + desktop app | kb-tui, kb-desktop | P5 |
| P0 | [phase-0-skeleton.md](phase-0-skeleton.md) | Workspace 뼈대 + 도메인 계약 | kebab-core, kebab-parse-types, kebab-config, kebab-app, kebab-cli | |
| P1 | [phase-1-markdown-ingestion.md](phase-1-markdown-ingestion.md) | Markdown ingestion 파이프라인 | kebab-source-fs, kebab-parse-md, kebab-normalize, kebab-chunk, kebab-store-sqlite | P0 |
| P2 | [phase-2-lexical-search.md](phase-2-lexical-search.md) | SQLite FTS5 lexical 검색 + citation | kebab-search (lexical) | P1 |
| P3 | [phase-3-vector-hybrid.md](phase-3-vector-hybrid.md) | Local embedding + LanceDB + hybrid | kebab-embed, kebab-embed-local, kebab-store-vector, kebab-search | P2 |
| P4 | [phase-4-local-llm-rag.md](phase-4-local-llm-rag.md) | Local LLM + RAG + grounded answer | kebab-llm, kebab-llm-local, kebab-rag | P3 |
| P5 | [phase-5-evaluation.md](phase-5-evaluation.md) | Golden query / regression eval | kebab-eval | P4 |
| P6 | [phase-6-image.md](phase-6-image.md) | 이미지 ingestion (OCR + caption) | kebab-parse-image | P5 |
| P7 | [phase-7-pdf.md](phase-7-pdf.md) | PDF text + page citation | kebab-parse-pdf | P5 |
| P8 | [phase-8-audio.md](phase-8-audio.md) | 음성 transcription + timestamp citation | kebab-parse-audio | P5 |
| P9 | [phase-9-ui.md](phase-9-ui.md) | TUI + desktop app | kebab-tui, kebab-desktop | P5 |
## Component task decomposition (per phase)

View File

@@ -6,7 +6,7 @@ title: "<Component title>"
status: planned
depends_on: [] # other task_ids
unblocks: [] # other task_ids
contract_source: ../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
---
@@ -22,7 +22,7 @@ contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
## Allowed dependencies
- `kb-core`
- `kebab-core`
- <other crates per design §8>
- <external crates with versions>
@@ -71,7 +71,7 @@ If any item here is needed during implementation, STOP and update the frozen des
| unit | ... | ... |
| snapshot | ... (JSON freeze) | `fixtures/...` |
| contract | trait round-trip | mock impls |
| integration | end-to-end via `kb-app` facade | tmp workspace |
| integration | end-to-end via `kebab-app` facade | tmp workspace |
All tests must run under `cargo test -p <crate>` and not require external network or Ollama unless explicitly stated.

View File

@@ -1,12 +1,12 @@
---
phase: P0
component: workspace + kb-core + kb-config + kb-app + kb-cli
component: workspace + kebab-core + kebab-config + kebab-app + kebab-cli
task_id: p0-1
title: "Workspace skeleton + frozen domain types/traits + ID recipe + facade"
status: completed
depends_on: []
unblocks: [p1-1, p1-2, p1-3, p1-4, p1-5, p1-6, p2-1, p2-2, p3-1, p3-2, p3-3, p3-4, p4-1, p4-2, p4-3, p5-1, p5-2, p6-1, p6-2, p6-3, p7-1, p7-2, p8-1, p8-2, p9-1, p9-2, p9-3, p9-4, p9-5]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3 (all), §4, §5.1 schema_meta+migrations, §6 (config + XDG), §7 (all traits), §8 module boundaries, §9 versioning, §10 errors+exit codes, §2.8 wire schema_version]
---
@@ -14,56 +14,56 @@ contract_sections: [§3 (all), §4, §5.1 schema_meta+migrations, §6 (config +
## Goal
Stand up the Cargo workspace (Rust 2024, resolver=3) with `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts.
Stand up the Cargo workspace (Rust 2024, resolver=3) with `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts.
## Why now / why this size
Every other task imports `kb-core`. If types or trait signatures wobble after this point, every downstream task spec drifts. This task is large but indivisible: types + traits + ID recipe + facade + CLI skeleton + wire schema stubs must land together so the rest of the workspace can compile against them.
Every other task imports `kebab-core`. If types or trait signatures wobble after this point, every downstream task spec drifts. This task is large but indivisible: types + traits + ID recipe + facade + CLI skeleton + wire schema stubs must land together so the rest of the workspace can compile against them.
## Allowed dependencies
- workspace `[workspace.dependencies]`: `anyhow = "1"`, `thiserror = "2"`, `serde = { version = "1", features = ["derive"] }`, `serde_json = "1"`, `time = { version = "0.3", features = ["serde", "macros"] }`, `uuid = { version = "1", features = ["v7", "serde"] }`, `blake3 = "1"`, `tracing = "0.1"`
- per crate:
- `kb-core`: workspace deps + `serde_json::Map`, `serde-json-canonicalizer`, `unicode-normalization`
- `kb-parse-types`: workspace deps + `kb-core` ONLY (no parsers, no stores, no normalize). Defines parser intermediate representations per design §3.7b.
- `kb-config`: workspace deps + `toml = "0.8"`, `dirs = "5"` (XDG paths)
- `kb-app`: workspace deps + `kb-core`, `kb-config`, `tracing-subscriber`, `tracing-appender`
- `kb-cli`: workspace deps + `kb-core`, `kb-config`, `kb-app`, `clap = { version = "4", features = ["derive"] }`
- `kebab-core`: workspace deps + `serde_json::Map`, `serde-json-canonicalizer`, `unicode-normalization`
- `kebab-parse-types`: workspace deps + `kebab-core` ONLY (no parsers, no stores, no normalize). Defines parser intermediate representations per design §3.7b.
- `kebab-config`: workspace deps + `toml = "0.8"`, `dirs = "5"` (XDG paths)
- `kebab-app`: workspace deps + `kebab-core`, `kebab-config`, `tracing-subscriber`, `tracing-appender`
- `kebab-cli`: workspace deps + `kebab-core`, `kebab-config`, `kebab-app`, `clap = { version = "4", features = ["derive"] }`
## Forbidden dependencies
- `kb-core` MUST NOT depend on any other `kb-*` crate.
- `kb-parse-types` MUST depend ONLY on `kb-core`. No parser libraries (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`, …), no other `kb-*` crate.
- `kb-config` MUST NOT depend on `kb-app`, `kb-cli`, parsers, stores, embedders, search, llm, rag, tui, desktop.
- `kb-app` MUST NOT yet depend on parsers/stores/embedders/search/llm/rag (those crates do not exist yet — facade methods stub out and return `unimplemented!()` or `anyhow::bail!("not yet wired (Pn-i)")`).
- `kb-cli` MUST NOT call any non-`kb-app` crate directly.
- `kebab-core` MUST NOT depend on any other `kebab-*` crate.
- `kebab-parse-types` MUST depend ONLY on `kebab-core`. No parser libraries (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`, …), no other `kebab-*` crate.
- `kebab-config` MUST NOT depend on `kebab-app`, `kebab-cli`, parsers, stores, embedders, search, llm, rag, tui, desktop.
- `kebab-app` MUST NOT yet depend on parsers/stores/embedders/search/llm/rag (those crates do not exist yet — facade methods stub out and return `unimplemented!()` or `anyhow::bail!("not yet wired (Pn-i)")`).
- `kebab-cli` MUST NOT call any non-`kebab-app` crate directly.
## Inputs
| input | type | source |
|-------|------|--------|
| frozen design doc | Markdown | `docs/superpowers/specs/2026-04-27-kb-final-form-design.md` |
| user `kb` invocation | command-line args | end user |
| frozen design doc | Markdown | `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` |
| user `kebab` invocation | command-line args | end user |
## Outputs
| output | type | downstream consumer |
|--------|------|---------------------|
| compiling workspace | Rust crates | every later task |
| `kb-core` types/traits | Rust API | every other crate |
| `kb-core` ID functions | Rust API | parsers, normalize, chunkers, embedders, search, rag |
| `kb-config::Config` | Rust struct | every other crate |
| `kb-app` facade methods (stubs) | Rust API | `kb-cli`, future TUI/desktop |
| `kb` binary | executable | end user |
| `kebab-core` types/traits | Rust API | every other crate |
| `kebab-core` ID functions | Rust API | parsers, normalize, chunkers, embedders, search, rag |
| `kebab-config::Config` | Rust struct | every other crate |
| `kebab-app` facade methods (stubs) | Rust API | `kebab-cli`, future TUI/desktop |
| `kebab` binary | executable | end user |
| `docs/wire-schema/v1/*.schema.json` stubs | JSON Schema files | future wire emitters and consumers |
| `docs/spec/*.md` stubs (link to frozen design) | Markdown | future contributors |
## Public surface (signatures only — no new types)
All types/traits below are defined in `kb-core` exactly per design §3 and §7 (no additions, no renames). Subagent must copy field-for-field.
All types/traits below are defined in `kebab-core` exactly per design §3 and §7 (no additions, no renames). Subagent must copy field-for-field.
```rust
// ── kb-core ─────────────────────────────────────────────────────────────────
// ── kebab-core ─────────────────────────────────────────────────────────────────
// Newtype IDs (design §3.1) — Display + FromStr implemented.
pub struct AssetId(pub String);
@@ -114,7 +114,7 @@ pub struct AudioRefBlock { /* per §3.4 */ }
pub enum Inline { /* per §3.4 */ }
pub enum SourceSpan { /* per §3.4 */ }
// (ParsedBlock + parser intermediates live in kb-parse-types per design §3.7b — NOT in kb-core.)
// (ParsedBlock + parser intermediates live in kebab-parse-types per design §3.7b — NOT in kebab-core.)
// Chunk + Citation (§3.5)
pub struct Chunk { /* per §3.5 */ }
@@ -232,14 +232,14 @@ pub fn nfc(input: &str) -> String; // §4.1
```
```rust
// ── kb-parse-types ──────────────────────────────────────────────────────────
// ── kebab-parse-types ──────────────────────────────────────────────────────────
// Per design §3.7b. Defines parser intermediate representations consumed by
// kb-normalize. Depends on kb-core only — never on parser libraries.
// kebab-normalize. Depends on kebab-core only — never on parser libraries.
pub struct ParsedBlock {
pub kind: ParsedBlockKind,
pub heading_path: Vec<String>,
pub source_span: kb_core::SourceSpan,
pub source_span: kebab_core::SourceSpan,
pub payload: ParsedPayload,
}
@@ -247,16 +247,16 @@ pub enum ParsedBlockKind { Heading, Paragraph, List, Code, Table, Quote, ImageRe
pub enum ParsedPayload {
Heading { level: u8, text: String },
Paragraph { text: String, inlines: Vec<kb_core::Inline> },
List { ordered: bool, items: Vec<Vec<kb_core::Inline>> },
Paragraph { text: String, inlines: Vec<kebab_core::Inline> },
List { ordered: bool, items: Vec<Vec<kebab_core::Inline>> },
Code { lang: Option<String>, code: String },
Table { headers: Vec<String>, rows: Vec<Vec<String>> },
Quote { text: String, inlines: Vec<kb_core::Inline> },
Quote { text: String, inlines: Vec<kebab_core::Inline> },
ImageRef { src: String, alt: String },
AudioRef { src: String },
}
// `Inline` itself lives in kb-core (§3.4) — parse-types references it, never duplicates it.
// `Inline` itself lives in kebab-core (§3.4) — parse-types references it, never duplicates it.
pub struct Warning { pub kind: WarningKind, pub note: String }
pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed }
@@ -268,31 +268,31 @@ pub struct ParsedAudioSegment;
```
```rust
// ── kb-config ───────────────────────────────────────────────────────────────
// ── kebab-config ───────────────────────────────────────────────────────────────
pub struct Config { /* full schema per §6.4 */ }
impl Config {
pub fn load(path: Option<&std::path::Path>) -> anyhow::Result<Self>;
pub fn from_file(path: &std::path::Path) -> anyhow::Result<Self>;
pub fn defaults() -> Self;
pub fn apply_env(self, env: &std::collections::HashMap<String, String>) -> Self;
pub fn xdg_config_path() -> std::path::PathBuf; // ~/.config/kb/config.toml
pub fn xdg_data_dir() -> std::path::PathBuf; // ~/.local/share/kb
pub fn xdg_config_path() -> std::path::PathBuf; // ~/.config/kebab/config.toml
pub fn xdg_data_dir() -> std::path::PathBuf; // ~/.local/share/kebab
pub fn xdg_cache_dir() -> std::path::PathBuf;
pub fn xdg_state_dir() -> std::path::PathBuf;
}
```
```rust
// ── kb-app ──────────────────────────────────────────────────────────────────
// ── kebab-app ──────────────────────────────────────────────────────────────────
pub fn init_workspace(force: bool) -> anyhow::Result<()>;
pub fn ingest(scope: kb_core::SourceScope, summary_only: bool) -> anyhow::Result<kb_core::IngestReport>;
pub fn list_docs(filter: kb_core::DocFilter) -> anyhow::Result<Vec<kb_core::DocSummary>>;
pub fn inspect_doc(id: &kb_core::DocumentId) -> anyhow::Result<kb_core::CanonicalDocument>;
pub fn inspect_chunk(id: &kb_core::ChunkId) -> anyhow::Result<kb_core::Chunk>;
pub fn search(query: kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>;
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<kb_core::Answer>;
pub fn ingest(scope: kebab_core::SourceScope, summary_only: bool) -> anyhow::Result<kebab_core::IngestReport>;
pub fn list_docs(filter: kebab_core::DocFilter) -> anyhow::Result<Vec<kebab_core::DocSummary>>;
pub fn inspect_doc(id: &kebab_core::DocumentId) -> anyhow::Result<kebab_core::CanonicalDocument>;
pub fn inspect_chunk(id: &kebab_core::ChunkId) -> anyhow::Result<kebab_core::Chunk>;
pub fn search(query: kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<kebab_core::Answer>;
pub fn doctor() -> anyhow::Result<DoctorReport>;
pub struct AskOpts { pub k: usize, pub explain: bool, pub mode: kb_core::SearchMode, pub temperature: Option<f32>, pub seed: Option<u64> }
pub struct AskOpts { pub k: usize, pub explain: bool, pub mode: kebab_core::SearchMode, pub temperature: Option<f32>, pub seed: Option<u64> }
pub struct DoctorReport { pub ok: bool, pub checks: Vec<DoctorCheck> }
pub struct DoctorCheck { pub name: String, pub ok: bool, pub detail: String, pub hint: Option<String> }
```
@@ -300,9 +300,9 @@ pub struct DoctorCheck { pub name: String, pub ok: bool, pub detail: String, pub
P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; later phases replace bodies but never change signatures.
```rust
// ── kb-cli ──────────────────────────────────────────────────────────────────
// ── kebab-cli ──────────────────────────────────────────────────────────────────
// clap subcommands: init | ingest | list (docs) | inspect (doc|chunk) | search | ask | doctor | eval (subcommand placeholder)
// Each maps 1:1 to a kb_app function. Exit code mapping per §10.
// Each maps 1:1 to a kebab_app function. Exit code mapping per §10.
```
## Behavior contract
@@ -312,18 +312,18 @@ P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; late
- `id_from` uses `serde-json-canonicalizer` exactly as design §4.2 specifies and truncates blake3 to 32 hex chars.
- `Citation::to_uri` emits W3C Media Fragments URIs per §0 Q3 (`#L<a>-L<b>`, `#p=<n>`, `#xywh=…`, `#caption`, `#t=hh:mm:ss,hh:mm:ss[&speaker=…]`).
- `Citation::parse` is the strict inverse (round-trip property).
- `kb-config` resolves XDG paths via `dirs` crate; respects `XDG_CONFIG_HOME`, `XDG_DATA_HOME`, `XDG_CACHE_HOME`, `XDG_STATE_HOME` if set.
- Config layer order: defaults → file → env (`KB_<SECTION>_<KEY>`) → CLI flag (CLI override is applied by `kb-cli` after `Config::load`).
- `kb-cli` global flags: `--config <path>`, `--verbose`, `--debug`, `--json`, `--explain` (where applicable). On `--json`, output conforms to wire schema v1.
- `kb-cli` exit codes: 0 success, 1 no-hit/refusal, 2 error, 3 doctor unhealthy (per §10).
- `kebab-config` resolves XDG paths via `dirs` crate; respects `XDG_CONFIG_HOME`, `XDG_DATA_HOME`, `XDG_CACHE_HOME`, `XDG_STATE_HOME` if set.
- Config layer order: defaults → file → env (`KB_<SECTION>_<KEY>`) → CLI flag (CLI override is applied by `kebab-cli` after `Config::load`).
- `kebab-cli` global flags: `--config <path>`, `--verbose`, `--debug`, `--json`, `--explain` (where applicable). On `--json`, output conforms to wire schema v1.
- `kebab-cli` exit codes: 0 success, 1 no-hit/refusal, 2 error, 3 doctor unhealthy (per §10).
- All facade-returned wire objects emit `schema_version` per §2 (e.g., `"answer.v1"`, `"search_hit.v1"`).
## Storage / wire effects
- Filesystem: creates `~/.config/kb/`, `~/.local/share/kb/`, `~/KnowledgeBase/` only when `kb init` runs; never on `Config::load`.
- Filesystem: creates `~/.config/kebab/`, `~/.local/share/kebab/`, `~/KnowledgeBase/` only when `kebab init` runs; never on `Config::load`.
- Wire schemas: ships `docs/wire-schema/v1/{citation,search_hit,answer,ingest_report,doc_summary,chunk_inspection,doctor}.schema.json` as **stubs** declaring the top-level `schema_version` and required fields per §2. Full property validation can land later.
- DB: workspace ships `migrations/V001__init.sql` containing **only** §5.1 `schema_meta` + `migrations` tables (the full schema lands in p1-6's migration file or p0-1 may pre-stage the empty migrations directory; choose the former to keep this task within `kb-core`/`kb-config`/`kb-app`/`kb-cli` scope).
- Logging: `tracing` initialized in `kb-cli`; daily-rolling file in `~/.local/state/kb/logs/`.
- DB: workspace ships `migrations/V001__init.sql` containing **only** §5.1 `schema_meta` + `migrations` tables (the full schema lands in p1-6's migration file or p0-1 may pre-stage the empty migrations directory; choose the former to keep this task within `kebab-core`/`kebab-config`/`kebab-app`/`kebab-cli` scope).
- Logging: `tracing` initialized in `kebab-cli`; daily-rolling file in `~/.local/state/kebab/logs/`.
## Test plan
@@ -336,20 +336,20 @@ P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; late
| unit | newtype `Display`/`FromStr` rejects invalid lengths/chars | inline |
| unit | `Config::defaults` + env override + CLI override produces expected merged config | inline |
| snapshot | `Config::defaults` JSON serde stable | inline (round-trip) |
| smoke | `kb --help`, `kb init`, `kb doctor` run; doctor reports config_loaded ✓ data_dir_writable ✓ even with no DB present (downstream checks may fail with hint) | tmp `XDG_*` env |
| smoke | `kebab --help`, `kebab init`, `kebab doctor` run; doctor reports config_loaded ✓ data_dir_writable ✓ even with no DB present (downstream checks may fail with hint) | tmp `XDG_*` env |
| build | `cargo check --workspace` and `cargo test --workspace` pass | repo |
All tests must run with no network, no Ollama, no models.
## Definition of Done
- [ ] `Cargo.toml` workspace lists `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` and resolver=3, edition 2024
- [ ] `Cargo.toml` workspace lists `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` and resolver=3, edition 2024
- [ ] `cargo check --workspace` passes
- [ ] `cargo test --workspace` passes
- [ ] `kb-parse-types` `cargo tree` shows ONLY `kb-core` + `serde`/`thiserror` style deps (no parser libs, no other `kb-*`)
- [ ] `kb --help` prints subcommands
- [ ] `kb init` creates XDG dirs idempotently and writes `config.toml`
- [ ] `kb doctor` returns wire JSON conforming to `doctor.v1` (in `--json` mode)
- [ ] `kebab-parse-types` `cargo tree` shows ONLY `kebab-core` + `serde`/`thiserror` style deps (no parser libs, no other `kebab-*`)
- [ ] `kebab --help` prints subcommands
- [ ] `kebab init` creates XDG dirs idempotently and writes `config.toml`
- [ ] `kebab doctor` returns wire JSON conforming to `doctor.v1` (in `--json` mode)
- [ ] `docs/wire-schema/v1/*.schema.json` stubs exist (7 files: citation, search_hit, answer, ingest_report, doc_summary, chunk_inspection, doctor)
- [ ] `docs/spec/` stubs exist linking to the frozen design (one file per: domain-model, ids, canonical-document, chunk-policy, citation-policy, module-boundaries, ai-generation-guidelines)
- [ ] `fixtures/` root directory created with all subdirectories that downstream tasks reference: `fixtures/markdown/`, `fixtures/source-fs/`, `fixtures/search/lexical/`, `fixtures/search/hybrid/`, `fixtures/embed/`, `fixtures/vector/`, `fixtures/rag/`, `fixtures/eval/`, `fixtures/image/`, `fixtures/pdf/`, `fixtures/audio/`. Each subdir gets a `.gitkeep` so it tracks. P1 ships at minimum `fixtures/markdown/{simple-note,nested-headings,code-and-table}.md` (per epic phase-0); other dirs stay empty until their phase lands.
@@ -361,12 +361,12 @@ All tests must run with no network, no Ollama, no models.
- Any parser / store / embedder / search / llm / rag / tui / desktop logic (downstream phases).
- Full schema migrations (most DDL lands in p1-6 / p2-1 / p3-3).
- Wire schema deep validation (only required fields + `schema_version` checked here).
- Real `kb-app` business logic (functions stub with `unimplemented!()` or explicit `bail!`).
- Real `kebab-app` business logic (functions stub with `unimplemented!()` or explicit `bail!`).
## Risks / notes
- ID recipe is the contract that every later record depends on. Any change after this task lands forces a `parser_version` / `chunker_version` / `embedding_version` cascade per §9. Treat changes as schema migrations and update the design doc first.
- Newtype IDs use `String` (not `[u8; 16]`) to keep serde simple; tests must still enforce 32-char hex constraint on `FromStr`.
- `kb-app` stubs must use `bail!` not `panic!` so the CLI exits with code 2 cleanly per §10.
- `kebab-app` stubs must use `bail!` not `panic!` so the CLI exits with code 2 cleanly per §10.
- `clap` v4 derive: subcommand `inspect` has nested `doc` / `chunk` variants; ensure exit code 0/1/2 mapping wraps the facade call uniformly.
- XDG path discovery on macOS: spec uses XDG (not `Application Support`) per §6.1 — `dirs` crate honors XDG env vars; tests must set them explicitly.

View File

@@ -1,12 +1,12 @@
---
phase: P1
component: kb-source-fs
component: kebab-source-fs
task_id: p1-1
title: "Local filesystem source connector"
status: completed
depends_on: [p0-1]
unblocks: [p1-2, p1-3, p1-4, p1-5, p1-6]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.3, §6.2, §6.6, §7.1, §7.2 SourceConnector, §8]
---
@@ -22,8 +22,8 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `ignore` (gitignore semantics)
- `blake3`
- `walkdir`
@@ -34,21 +34,21 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
## Forbidden dependencies
- `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `SourceScope` | `kb_core::SourceScope` | `kb-app` from config |
| `SourceScope` | `kebab_core::SourceScope` | `kebab-app` from config |
| filesystem | `&Path` | OS |
| `.kbignore` | text file | workspace root, optional |
| `.kebabignore` | text file | workspace root, optional |
## Outputs
| output | type | downstream consumer |
|--------|------|---------------------|
| `Vec<RawAsset>` | `kb_core::RawAsset` | `kb-parse-md`, asset writer in `kb-store-sqlite` (via `kb-app`) |
| `Vec<RawAsset>` | `kebab_core::RawAsset` | `kebab-parse-md`, asset writer in `kebab-store-sqlite` (via `kebab-app`) |
## Public surface (signatures only — no new types)
@@ -56,11 +56,11 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
pub struct FsSourceConnector { /* internal */ }
impl FsSourceConnector {
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
}
impl kb_core::SourceConnector for FsSourceConnector {
fn scan(&self, scope: &kb_core::SourceScope) -> anyhow::Result<Vec<kb_core::RawAsset>>;
impl kebab_core::SourceConnector for FsSourceConnector {
fn scan(&self, scope: &kebab_core::SourceScope) -> anyhow::Result<Vec<kebab_core::RawAsset>>;
}
```
@@ -70,7 +70,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
- `asset_id` derived per design §4.2 from `blake3(raw bytes)` full hex.
- `media_type` selected from extension + libmagic-like sniff fallback (`.md` → Markdown, others fall through to `MediaType::Other`).
- `discovered_at` = current `OffsetDateTime::now_utc()` at scan time.
- Combine `config.workspace.exclude` `.kbignore` for filter (union; ordering does not matter).
- Combine `config.workspace.exclude` `.kebabignore` for filter (union; ordering does not matter).
- Symbolic links: follow once, detect cycles via `canonicalize` + visited set.
- Files larger than `storage.copy_threshold_mb` MB → emit `AssetStorage::Reference { path, sha }` (do not copy bytes here; copying is done by the asset writer task).
- Idempotent: same input → same `Vec<RawAsset>` (sort by `workspace_path`).
@@ -78,7 +78,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
## Storage / wire effects
- Reads: filesystem under `config.workspace.root`.
- Writes: nothing. (Asset copy is handled by the asset writer in `kb-store-sqlite`.)
- Writes: nothing. (Asset copy is handled by the asset writer in `kebab-store-sqlite`.)
## Test plan
@@ -87,25 +87,25 @@ impl kb_core::SourceConnector for FsSourceConnector {
| unit | POSIX path normalization | inline cases incl. `./a/b.md`, `a//b.md`, `a/b.md` → identical |
| unit | blake3 of known bytes matches expected hex | inline |
| unit | gitignore filter (`*.tmp`, `node_modules/**`) excludes correctly | tmp tree built in test |
| unit | `.kbignore` config exclude works | tmp tree |
| unit | `.kebabignore` config exclude works | tmp tree |
| unit | symlink cycle does not loop | tmp tree with `a -> b -> a` |
| snapshot | `Vec<RawAsset>` serialized JSON for fixture tree is stable | `fixtures/source-fs/tree-1` |
| determinism | re-running scan twice produces byte-identical JSON | `fixtures/source-fs/tree-1` |
All tests run under `cargo test -p kb-source-fs` with no network and no model.
All tests run under `cargo test -p kebab-source-fs` with no network and no model.
## Definition of Done
- [ ] `cargo check -p kb-source-fs` passes
- [ ] `cargo test -p kb-source-fs` passes
- [ ] `cargo check -p kebab-source-fs` passes
- [ ] `cargo test -p kebab-source-fs` passes
- [ ] Snapshot test `fixtures/source-fs/tree-1` round-trips deterministically
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kb-source-fs`)
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kebab-source-fs`)
- [ ] PR description links to design §3.3, §6.2, §7.2
## Out of scope
- File watching (P+).
- Asset copy/reference storage on disk (`kb-store-sqlite` task p1-6).
- Asset copy/reference storage on disk (`kebab-store-sqlite` task p1-6).
- Non-fs source connectors (HTTP, S3 — P+).
## Risks / notes

View File

@@ -1,29 +1,29 @@
---
phase: P1
component: kb-parse-md (frontmatter submodule)
component: kebab-parse-md (frontmatter submodule)
task_id: p1-2
title: "Markdown frontmatter parsing → Metadata"
status: completed
depends_on: [p0-1]
unblocks: [p1-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_sections: [design §3.6 Metadata, design §3.7b kb-parse-types (Warning), design §0 Q9 frontmatter, design §10 errors]
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [design §3.6 Metadata, design §3.7b kebab-parse-types (Warning), design §0 Q9 frontmatter, design §10 errors]
---
# p1-2 — Markdown frontmatter parsing
## Goal
Parse YAML/TOML frontmatter from Markdown bytes into `kb_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
Parse YAML/TOML frontmatter from Markdown bytes into `kebab_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
## Why now / why this size
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kb-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kebab-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
## Allowed dependencies
- `kb-core`
- `kb-parse-types` (provides shared `Warning` + `WarningKind` per design §3.7b)
- `kebab-core`
- `kebab-parse-types` (provides shared `Warning` + `WarningKind` per design §3.7b)
- `serde`
- `serde_yaml` (or `yaml-rust2`) for YAML
- `toml` for TOML
@@ -33,7 +33,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
## Forbidden dependencies
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-tui`, `kb-desktop`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `pulldown-cmark` (block parser is a sibling task)
- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-tui`, `kebab-desktop`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `pulldown-cmark` (block parser is a sibling task)
## Inputs
@@ -46,7 +46,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
| output | type | downstream |
|--------|------|------------|
| `(Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)` | tuple | `kb-normalize` → CanonicalDocument |
| `(Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)` | tuple | `kebab-normalize` → CanonicalDocument |
## Public surface (signatures only — no new types)
@@ -54,10 +54,10 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
pub fn parse_frontmatter(
bytes: &[u8],
hints: &BodyHints,
) -> anyhow::Result<(kb_core::Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)>;
) -> anyhow::Result<(kebab_core::Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)>;
```
`Warning` / `WarningKind` come from `kb-parse-types` (shared with `p1-3` blocks parser and downstream `kb-normalize`). `FrontmatterSpan` is crate-internal; if any new public type is needed, STOP and update the frozen design doc first.
`Warning` / `WarningKind` come from `kebab-parse-types` (shared with `p1-3` blocks parser and downstream `kebab-normalize`). `FrontmatterSpan` is crate-internal; if any new public type is needed, STOP and update the frozen design doc first.
## Behavior contract
@@ -68,7 +68,7 @@ pub fn parse_frontmatter(
- `source_type` default `markdown`; `trust_level` default `primary`.
- `aliases`, `tags` default empty.
- Unknown keys → `metadata.user` (`serde_json::Map`), preserved verbatim, no warning.
- Unknown enum value (e.g. `trust_level: weird`) → emit `kb_parse_types::Warning { kind: WarningKind::MalformedFrontmatter, note: "unknown trust_level=weird, defaulted to primary" }` + ingest continues with default.
- Unknown enum value (e.g. `trust_level: weird`) → emit `kebab_parse_types::Warning { kind: WarningKind::MalformedFrontmatter, note: "unknown trust_level=weird, defaulted to primary" }` + ingest continues with default.
- Malformed YAML → frontmatter discarded, body still parsed, `Warning { kind: WarningKind::MalformedFrontmatter, note: "<error msg>" }` emitted.
- No frontmatter at all → defaults applied silently.
- `id:` field captured into `metadata.user_id_alias` (alias only — does NOT influence `doc_id` per design §4.2).
@@ -91,12 +91,12 @@ pub fn parse_frontmatter(
| snapshot | `fixtures/markdown/frontmatter-only.md` produces stable JSON | fixture |
| snapshot | mixed-language body with no `lang:` detects `ko` or `en` | `fixtures/markdown/mixed-lang.md` |
All tests under `cargo test -p kb-parse-md --lib frontmatter`.
All tests under `cargo test -p kebab-parse-md --lib frontmatter`.
## Definition of Done
- [ ] `cargo check -p kb-parse-md` passes
- [ ] `cargo test -p kb-parse-md frontmatter` passes
- [ ] `cargo check -p kebab-parse-md` passes
- [ ] `cargo test -p kebab-parse-md frontmatter` passes
- [ ] No `pulldown-cmark` import in this submodule
- [ ] Snapshot tests stable across two consecutive runs
- [ ] PR links design §0 Q9, §3.6

View File

@@ -1,20 +1,20 @@
---
phase: P1
component: kb-parse-md (blocks submodule)
component: kebab-parse-md (blocks submodule)
task_id: p1-3
title: "Markdown body → Block tree with line spans"
status: completed
depends_on: [p0-1]
unblocks: [p1-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_sections: [§3.4 Block, §3.4 SourceSpan, §3.7b kb-parse-types, §0 Q3 citation]
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 Block, §3.4 SourceSpan, §3.7b kebab-parse-types, §0 Q3 citation]
---
# p1-3 — Markdown body → Block tree
## Goal
Parse Markdown body bytes into a flat `Vec<kb_parse_types::ParsedBlock>` with heading paths and line ranges preserved, ready for `kb-normalize` to lift into `CanonicalDocument`.
Parse Markdown body bytes into a flat `Vec<kebab_parse_types::ParsedBlock>` with heading paths and line ranges preserved, ready for `kebab-normalize` to lift into `CanonicalDocument`.
## Why now / why this size
@@ -22,15 +22,15 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
## Allowed dependencies
- `kb-core`
- `kb-parse-types` (defines `ParsedBlock`, `ParsedPayload`, `Warning`)
- `kebab-core`
- `kebab-parse-types` (defines `ParsedBlock`, `ParsedPayload`, `Warning`)
- `pulldown-cmark` (CommonMark with source-map; GFM tables enabled via feature)
- `serde`
- `thiserror`
## Forbidden dependencies
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `kb-tui`, `kb-desktop`, `comrak` (alternative parser; pick one)
- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `kebab-tui`, `kebab-desktop`, `comrak` (alternative parser; pick one)
## Inputs
@@ -43,8 +43,8 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
| output | type | downstream |
|--------|------|------------|
| `Vec<kb_parse_types::ParsedBlock>` | shared type from `kb-parse-types` | `kb-normalize` |
| `Vec<kb_parse_types::Warning>` | shared type | propagated into Provenance |
| `Vec<kebab_parse_types::ParsedBlock>` | shared type from `kebab-parse-types` | `kebab-normalize` |
| `Vec<kebab_parse_types::Warning>` | shared type | propagated into Provenance |
## Public surface (signatures only — no new types)
@@ -52,10 +52,10 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
pub fn parse_blocks(
body: &[u8],
body_offset_lines: u32,
) -> anyhow::Result<(Vec<kb_parse_types::ParsedBlock>, Vec<kb_parse_types::Warning>)>;
) -> anyhow::Result<(Vec<kebab_parse_types::ParsedBlock>, Vec<kebab_parse_types::Warning>)>;
```
`ParsedBlock` is defined in `kb-parse-types` (design §3.7b). `kb-parse-md` does NOT define its own; it consumes the shared type. Lift to `kb_core::Block` (with `BlockId` assignment) is `kb-normalize`'s job (p1-4).
`ParsedBlock` is defined in `kebab-parse-types` (design §3.7b). `kebab-parse-md` does NOT define its own; it consumes the shared type. Lift to `kebab_core::Block` (with `BlockId` assignment) is `kebab-normalize`'s job (p1-4).
## Behavior contract
@@ -63,9 +63,9 @@ pub fn parse_blocks(
- Heading tree: every block records its ancestor heading texts in order (e.g., `["아키텍처", "Chunking 정책"]`).
- Code blocks: `ParsedPayload::Code { lang: Some("rust"), code }` — fenced content not split.
- Tables: GFM tables produce `ParsedPayload::Table { headers, rows }`; if a table cell is malformed, fall back to `ParsedPayload::Paragraph` + `Warning::MalformedTable`.
- Image references: `![alt](src)` produces `ParsedPayload::ImageRef { src, alt }`. `AssetId` resolution happens later in `kb-normalize` (when image src can be matched to a workspace asset).
- Lists: ordered/unordered preserved via `ParsedPayload::List { ordered, items }`; nested list items flattened so each `items[i]` is a `Vec<kb_core::Inline>` for one top-level item.
- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per `kb_core::Inline` per design §3.4). Drop other inlines silently.
- Image references: `![alt](src)` produces `ParsedPayload::ImageRef { src, alt }`. `AssetId` resolution happens later in `kebab-normalize` (when image src can be matched to a workspace asset).
- Lists: ordered/unordered preserved via `ParsedPayload::List { ordered, items }`; nested list items flattened so each `items[i]` is a `Vec<kebab_core::Inline>` for one top-level item.
- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per `kebab_core::Inline` per design §3.4). Drop other inlines silently.
- Malformed input never panics. Worst case: empty `Vec<ParsedBlock>` + `Warning::ExtractFailed`.
## Storage / wire effects
@@ -86,12 +86,12 @@ pub fn parse_blocks(
| snapshot | `fixtures/markdown/nested-headings.md` → ParsedBlock JSON stable | fixture |
| snapshot | `fixtures/markdown/code-and-table.md` → JSON stable | fixture |
All tests under `cargo test -p kb-parse-md --lib blocks`.
All tests under `cargo test -p kebab-parse-md --lib blocks`.
## Definition of Done
- [ ] `cargo check -p kb-parse-md` passes
- [ ] `cargo test -p kb-parse-md blocks` passes
- [ ] `cargo check -p kebab-parse-md` passes
- [ ] `cargo test -p kebab-parse-md blocks` passes
- [ ] Snapshot tests stable across two runs
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.4
@@ -99,7 +99,7 @@ All tests under `cargo test -p kb-parse-md --lib blocks`.
## Out of scope
- Frontmatter (p1-2).
- Lifting `kb_parse_types::ParsedBlock``kb_core::Block` with `BlockId` (p1-4 normalize).
- Lifting `kebab_parse_types::ParsedBlock``kebab_core::Block` with `BlockId` (p1-4 normalize).
- Chunking (p1-5).
## Risks / notes

View File

@@ -1,30 +1,30 @@
---
phase: P1
component: kb-normalize
component: kebab-normalize
task_id: p1-4
title: "Lift parser output → CanonicalDocument with deterministic IDs"
status: completed
depends_on: [p1-2, p1-3]
unblocks: [p1-5, p1-6]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_sections: [§3.4, §3.7b kb-parse-types, §4 ID recipe, §3.6 Provenance, §8 module boundaries]
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4, §3.7b kebab-parse-types, §4 ID recipe, §3.6 Provenance, §8 module boundaries]
---
# p1-4 — Lift to CanonicalDocument
## Goal
Combine `Metadata` (p1-2) + `Vec<kb_parse_types::ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a `kb_core::CanonicalDocument` with deterministic `doc_id` and `block_id`s per design §4 recipe.
Combine `Metadata` (p1-2) + `Vec<kebab_parse_types::ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a `kebab_core::CanonicalDocument` with deterministic `doc_id` and `block_id`s per design §4 recipe.
## Why now / why this size
Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
Single responsibility: ID generation + struct assembly. Keeps `kebab-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
## Allowed dependencies
- `kb-core`
- `kb-parse-types` (input shapes — `ParsedBlock`, `ParsedPayload`, `Warning`)
- `kb-config`
- `kebab-core`
- `kebab-parse-types` (input shapes — `ParsedBlock`, `ParsedPayload`, `Warning`)
- `kebab-config`
- `serde`
- `serde-json-canonicalizer` (canonical JSON for ID hashing)
- `blake3`
@@ -34,36 +34,36 @@ Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` pure
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md` (consumed via shared `kb-parse-types` only — `kb-parse-md` must NOT appear in this crate's `cargo tree`), `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md` (consumed via shared `kebab-parse-types` only — `kebab-parse-md` must NOT appear in this crate's `cargo tree`), `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `RawAsset` | `kb_core::RawAsset` | p1-1 |
| `Metadata` + frontmatter span + warnings | `(kb_core::Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)` | p1-2 |
| `Vec<ParsedBlock>` + warnings | `(Vec<kb_parse_types::ParsedBlock>, Vec<kb_parse_types::Warning>)` | p1-3 |
| `parser_version` | `kb_core::ParserVersion` | constant in `kb-parse-md` |
| `RawAsset` | `kebab_core::RawAsset` | p1-1 |
| `Metadata` + frontmatter span + warnings | `(kebab_core::Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)` | p1-2 |
| `Vec<ParsedBlock>` + warnings | `(Vec<kebab_parse_types::ParsedBlock>, Vec<kebab_parse_types::Warning>)` | p1-3 |
| `parser_version` | `kebab_core::ParserVersion` | constant in `kebab-parse-md` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk`, `kb-store-sqlite` |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk`, `kebab-store-sqlite` |
## Public surface (signatures only — no new types)
```rust
pub fn build_canonical_document(
asset: &kb_core::RawAsset,
metadata: kb_core::Metadata,
blocks: Vec<kb_parse_types::ParsedBlock>,
parser_version: &kb_core::ParserVersion,
warnings: Vec<kb_parse_types::Warning>,
) -> anyhow::Result<kb_core::CanonicalDocument>;
asset: &kebab_core::RawAsset,
metadata: kebab_core::Metadata,
blocks: Vec<kebab_parse_types::ParsedBlock>,
parser_version: &kebab_core::ParserVersion,
warnings: Vec<kebab_parse_types::Warning>,
) -> anyhow::Result<kebab_core::CanonicalDocument>;
pub fn id_for_doc(workspace_path: &kb_core::WorkspacePath, asset: &kb_core::AssetId, parser_version: &kb_core::ParserVersion) -> kb_core::DocumentId;
pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kb_core::SourceSpan) -> kb_core::BlockId;
pub fn id_for_doc(workspace_path: &kebab_core::WorkspacePath, asset: &kebab_core::AssetId, parser_version: &kebab_core::ParserVersion) -> kebab_core::DocumentId;
pub fn id_for_block(doc: &kebab_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kebab_core::SourceSpan) -> kebab_core::BlockId;
```
## Behavior contract
@@ -91,14 +91,14 @@ pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[Strin
| unit | provenance contains Discovered/Parsed/Normalized in order | inline |
| snapshot | `fixtures/markdown/code-and-table.md` → CanonicalDocument JSON stable (incl. all IDs) | fixture |
All tests under `cargo test -p kb-normalize`.
All tests under `cargo test -p kebab-normalize`.
## Definition of Done
- [ ] `cargo check -p kb-normalize` passes
- [ ] `cargo test -p kb-normalize` passes
- [ ] `cargo check -p kebab-normalize` passes
- [ ] `cargo test -p kebab-normalize` passes
- [ ] Determinism test runs ≥ 1000 iterations under 1 second
- [ ] No `kb-parse-md` (or any other parser crate) appears in `cargo tree -p kb-normalize` — input types come from `kb-parse-types` only
- [ ] No `kebab-parse-md` (or any other parser crate) appears in `cargo tree -p kebab-normalize` — input types come from `kebab-parse-types` only
- [ ] PR links design §3.7b, §4.2, §4.3, §8
## Out of scope

View File

@@ -1,12 +1,12 @@
---
phase: P1
component: kb-chunk
component: kebab-chunk
task_id: p1-5
title: "Markdown heading-aware chunker (md-heading-v1)"
status: completed
depends_on: [p1-4]
unblocks: [p1-6, p2-2, p3-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §7.2 Chunker, §0 Q3 citation]
---
@@ -22,8 +22,8 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `serde`
- `blake3` (policy_hash)
- `serde-json-canonicalizer`
@@ -31,30 +31,30 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize` (consumes `CanonicalDocument` only via `kb-core`), `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize` (consumes `CanonicalDocument` only via `kebab-core`), `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 |
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` from config |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` from config |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite` (p1-6), `kb-embed*` (P3) |
| `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite` (p1-6), `kebab-embed*` (P3) |
## Public surface (signatures only — no new types)
```rust
pub struct MdHeadingV1Chunker;
impl kb_core::Chunker for MdHeadingV1Chunker {
fn chunker_version(&self) -> kb_core::ChunkerVersion;
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>;
impl kebab_core::Chunker for MdHeadingV1Chunker {
fn chunker_version(&self) -> kebab_core::ChunkerVersion;
fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
}
```
@@ -91,12 +91,12 @@ impl kb_core::Chunker for MdHeadingV1Chunker {
| determinism | identical input + identical policy → identical chunk_ids | inline |
| snapshot | `fixtures/markdown/long-section.md` → Vec<Chunk> JSON stable | fixture |
All tests under `cargo test -p kb-chunk`.
All tests under `cargo test -p kebab-chunk`.
## Definition of Done
- [ ] `cargo check -p kb-chunk` passes
- [ ] `cargo test -p kb-chunk` passes
- [ ] `cargo check -p kebab-chunk` passes
- [ ] `cargo test -p kebab-chunk` passes
- [ ] Snapshot stable across two runs
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.5, §4.2

View File

@@ -1,12 +1,12 @@
---
phase: P1
component: kb-store-sqlite (P1 subset)
component: kebab-store-sqlite (P1 subset)
task_id: p1-6
title: "SQLite store: assets/documents/blocks/chunks + asset writer + migrations"
status: completed
depends_on: [p1-1, p1-4, p1-5]
unblocks: [p2-1, p3-3, p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5 DDL (5.1, 5.2, 5.3, 5.4, 5.5 chunks only — FTS handled in p2-1), §5.7 jobs/ingest_runs, §5.8 transactions, §6.3 data_dir layout]
---
@@ -22,8 +22,8 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `rusqlite` (with `bundled-sqlcipher` disabled; use `bundled` feature)
- `refinery` for migrations
- `serde_json`
@@ -34,7 +34,7 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
## Forbidden dependencies
- `kb-source-fs` (only types via `kb-core`), `kb-parse-md`, `kb-normalize`, `kb-chunk` (only types via `kb-core`), `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs` (only types via `kebab-core`), `kebab-parse-md`, `kebab-normalize`, `kebab-chunk` (only types via `kebab-core`), `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
@@ -42,17 +42,17 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|-------|------|--------|
| migrations | `migrations/V001__init.sql` | repo |
| `RawAsset` + bytes | `(RawAsset, Vec<u8>)` | p1-1 + reader |
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 |
| `Vec<Chunk>` | `kb_core::Chunk` | p1-5 |
| `IngestRun` aggregates | `(scope, counts, duration)` | `kb-app` |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
| `Vec<Chunk>` | `kebab_core::Chunk` | p1-5 |
| `IngestRun` aggregates | `(scope, counts, duration)` | `kebab-app` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `data_dir/kb.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | | every later phase |
| `data_dir/kebab.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | | every later phase |
| `data_dir/assets/<aa>/<asset_id>` bytes (when copied) | | future re-extraction, integrity verification |
| `IngestReport` (wire schema v1) | `kb_core::IngestReport` | `kb-cli`, eval |
| `IngestReport` (wire schema v1) | `kebab_core::IngestReport` | `kebab-cli`, eval |
## Public surface (signatures only — no new types)
@@ -60,23 +60,23 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
pub struct SqliteStore { /* internal */ }
impl SqliteStore {
pub fn open(config: &kb_config::Config) -> anyhow::Result<Self>;
pub fn open(config: &kebab_config::Config) -> anyhow::Result<Self>;
pub fn run_migrations(&self) -> anyhow::Result<()>;
pub fn put_asset_with_bytes(&self, asset: &kb_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
pub fn put_asset_with_bytes(&self, asset: &kebab_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
}
impl kb_core::DocumentStore for SqliteStore {
fn put_asset(&self, a: &kb_core::RawAsset) -> anyhow::Result<()>;
fn put_document(&self, d: &kb_core::CanonicalDocument) -> anyhow::Result<()>;
fn put_blocks(&self, doc: &kb_core::DocumentId, blocks: &[kb_core::Block]) -> anyhow::Result<()>;
fn put_chunks(&self, doc: &kb_core::DocumentId, chunks: &[kb_core::Chunk]) -> anyhow::Result<()>;
fn get_document(&self, id: &kb_core::DocumentId) -> anyhow::Result<Option<kb_core::CanonicalDocument>>;
fn get_chunk(&self, id: &kb_core::ChunkId) -> anyhow::Result<Option<kb_core::Chunk>>;
fn list_documents(&self, filter: &kb_core::DocFilter) -> anyhow::Result<Vec<kb_core::DocSummary>>;
impl kebab_core::DocumentStore for SqliteStore {
fn put_asset(&self, a: &kebab_core::RawAsset) -> anyhow::Result<()>;
fn put_document(&self, d: &kebab_core::CanonicalDocument) -> anyhow::Result<()>;
fn put_blocks(&self, doc: &kebab_core::DocumentId, blocks: &[kebab_core::Block]) -> anyhow::Result<()>;
fn put_chunks(&self, doc: &kebab_core::DocumentId, chunks: &[kebab_core::Chunk]) -> anyhow::Result<()>;
fn get_document(&self, id: &kebab_core::DocumentId) -> anyhow::Result<Option<kebab_core::CanonicalDocument>>;
fn get_chunk(&self, id: &kebab_core::ChunkId) -> anyhow::Result<Option<kebab_core::Chunk>>;
fn list_documents(&self, filter: &kebab_core::DocFilter) -> anyhow::Result<Vec<kebab_core::DocSummary>>;
}
impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
impl kebab_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
```
## Behavior contract
@@ -96,7 +96,7 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
## Storage / wire effects
- Writes: `kb.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
- Writes: `kebab.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
- Reads on subsequent calls: same DB.
## Test plan
@@ -112,14 +112,14 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
| contract | DocumentStore trait round-trip for fixture document | `fixtures/markdown/code-and-table.md` |
| snapshot | IngestReport JSON for fixture run | fixture |
All tests under `cargo test -p kb-store-sqlite` with no network.
All tests under `cargo test -p kebab-store-sqlite` with no network.
## Definition of Done
- [ ] `cargo check -p kb-store-sqlite` passes
- [ ] `cargo test -p kb-store-sqlite` passes
- [ ] `cargo check -p kebab-store-sqlite` passes
- [ ] `cargo test -p kebab-store-sqlite` passes
- [ ] migration `V001__init.sql` matches design §5 verbatim (diff-checked in CI)
- [ ] Writes to `~/.local/share/kb/` are gated by `kb-config`'s `data_dir` and never escape it
- [ ] Writes to `~/.local/share/kebab/` are gated by `kebab-config`'s `data_dir` and never escape it
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §5
@@ -132,5 +132,5 @@ All tests under `cargo test -p kb-store-sqlite` with no network.
## Risks / notes
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kb.sqlite-wal` / `-shm`.
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kebab.sqlite-wal` / `-shm`.
- Asset directory shard prefix uses `asset_id[..2]`; using `asset_id[..1]` would create at most 16 dirs (insufficient).

View File

@@ -1,12 +1,12 @@
---
phase: P2
component: kb-store-sqlite (FTS5 migration)
component: kebab-store-sqlite (FTS5 migration)
task_id: p2-1
title: "FTS5 virtual table + triggers (V002 migration)"
status: completed
depends_on: [p1-6]
unblocks: [p2-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5.5 chunks_fts + triggers, §9 versioning]
---
@@ -18,19 +18,19 @@ Add `chunks_fts` virtual table and three sync triggers via migration `V002__fts.
## Why now / why this size
`chunks_fts` is the lexical index for `kb-search`. Splitting it from p1-6 keeps P1 focused on relational data; bringing it as `V002` lets users upgrade an existing P1 DB without re-ingesting.
`chunks_fts` is the lexical index for `kebab-search`. Splitting it from p1-6 keeps P1 focused on relational data; bringing it as `V002` lets users upgrade an existing P1 DB without re-ingesting.
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-store-sqlite` (extends migrations)
- `kebab-core`
- `kebab-config`
- `kebab-store-sqlite` (extends migrations)
- `rusqlite`
- `refinery`
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search` (consumer is p2-2), `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search` (consumer is p2-2), `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
@@ -52,7 +52,7 @@ Add `chunks_fts` virtual table and three sync triggers via migration `V002__fts.
pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>;
```
(Used by `kb index --rebuild-fts`. Re-runs `INSERT INTO chunks_fts SELECT ... FROM chunks` after `DELETE FROM chunks_fts;`.)
(Used by `kebab index --rebuild-fts`. Re-runs `INSERT INTO chunks_fts SELECT ... FROM chunks` after `DELETE FROM chunks_fts;`.)
## Behavior contract
@@ -64,7 +64,7 @@ pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>;
## Storage / wire effects
- Writes: `chunks_fts` virtual table inside `kb.sqlite`.
- Writes: `chunks_fts` virtual table inside `kebab.sqlite`.
- Reads: existing `chunks` rows for backfill.
## Test plan
@@ -78,12 +78,12 @@ pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>;
| function | `rebuild_chunks_fts` produces deterministic content equal to fresh backfill | tmp DB |
| migration | running `V002` twice is a no-op (refinery handles idempotency) | tmp DB |
All tests under `cargo test -p kb-store-sqlite fts`.
All tests under `cargo test -p kebab-store-sqlite fts`.
## Definition of Done
- [ ] `cargo check -p kb-store-sqlite` passes
- [ ] `cargo test -p kb-store-sqlite fts` passes
- [ ] `cargo check -p kebab-store-sqlite` passes
- [ ] `cargo test -p kebab-store-sqlite fts` passes
- [ ] `migrations/V002__fts.sql` matches design §5.5 verbatim (CI diff check)
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §5.5

View File

@@ -1,12 +1,12 @@
---
phase: P2
component: kb-search (lexical mode)
component: kebab-search (lexical mode)
task_id: p2-2
title: "Lexical Retriever via SQLite FTS5 + bm25 + citation"
status: completed
depends_on: [p2-1]
unblocks: [p3-4, p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5/1.6 search output, §2.2 wire schema, §6.4 search settings]
---
@@ -14,38 +14,38 @@ contract_sections: [§3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5
## Goal
Implement `kb_core::Retriever` for `SearchMode::Lexical` using SQLite FTS5. Returns `SearchHit` with `bm25` ranking, `snippet()`-derived preview, and proper W3C-fragment citation.
Implement `kebab_core::Retriever` for `SearchMode::Lexical` using SQLite FTS5. Returns `SearchHit` with `bm25` ranking, `snippet()`-derived preview, and proper W3C-fragment citation.
## Why now / why this size
First concrete `Retriever`. Lets `kb search --mode lexical` work without any embedding/LLM infrastructure. Establishes the SearchHit construction contract that hybrid (p3-4) reuses.
First concrete `Retriever`. Lets `kebab search --mode lexical` work without any embedding/LLM infrastructure. Establishes the SearchHit construction contract that hybrid (p3-4) reuses.
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-store-sqlite` (read access to `chunks_fts` + `chunks` + `documents`)
- `kebab-core`
- `kebab-config`
- `kebab-store-sqlite` (read access to `chunks_fts` + `chunks` + `documents`)
- `rusqlite`
- `tracing`
- `thiserror`
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `SearchQuery` (mode=Lexical) | `kb_core::SearchQuery` | `kb-app::search` |
| `kb-config::search` settings (`default_k`, `snippet_chars`) | `kb_config::Config` | runtime |
| SQLite connection (read) | `rusqlite::Connection` | `kb-store-sqlite` |
| `SearchQuery` (mode=Lexical) | `kebab_core::SearchQuery` | `kebab-app::search` |
| `kebab-config::search` settings (`default_k`, `snippet_chars`) | `kebab_config::Config` | runtime |
| SQLite connection (read) | `rusqlite::Connection` | `kebab-store-sqlite` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<SearchHit>` | `kb_core::SearchHit` | `kb-cli` printer, `kb-rag` packer (P4), hybrid (p3-4) |
| `Vec<SearchHit>` | `kebab_core::SearchHit` | `kebab-cli` printer, `kebab-rag` packer (P4), hybrid (p3-4) |
## Public surface (signatures only — no new types)
@@ -53,12 +53,12 @@ First concrete `Retriever`. Lets `kb search --mode lexical` work without any emb
pub struct LexicalRetriever { /* internal: holds an Arc<rusqlite::Connection> + IndexVersion */ }
impl LexicalRetriever {
pub fn new(store: std::sync::Arc<kb_store_sqlite::SqliteStore>, index_version: kb_core::IndexVersion) -> Self;
pub fn new(store: std::sync::Arc<kebab_store_sqlite::SqliteStore>, index_version: kebab_core::IndexVersion) -> Self;
}
impl kb_core::Retriever for LexicalRetriever {
fn search(&self, query: &kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>;
fn index_version(&self) -> kb_core::IndexVersion;
impl kebab_core::Retriever for LexicalRetriever {
fn search(&self, query: &kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
fn index_version(&self) -> kebab_core::IndexVersion;
}
```
@@ -98,8 +98,8 @@ impl kb_core::Retriever for LexicalRetriever {
## Storage / wire effects
- Reads only. Never mutates `kb.sqlite`.
- Wire: `Vec<SearchHit>` serialized via wire schema `search_hit.v1` when `kb-cli --json` is used.
- Reads only. Never mutates `kebab.sqlite`.
- Wire: `Vec<SearchHit>` serialized via wire schema `search_hit.v1` when `kebab-cli --json` is used.
## Test plan
@@ -114,13 +114,13 @@ impl kb_core::Retriever for LexicalRetriever {
| determinism | identical query twice produces identical hit order and scores | tmp DB |
| snapshot | `Vec<SearchHit>` JSON for fixed corpus stable | `fixtures/search/lexical/run-1.json` |
All tests under `cargo test -p kb-search lexical`.
All tests under `cargo test -p kebab-search lexical`.
## Definition of Done
- [ ] `cargo check -p kb-search` passes
- [ ] `cargo test -p kb-search lexical` passes
- [ ] No imports outside Allowed dependencies (`cargo tree -p kb-search` audit)
- [ ] `cargo check -p kebab-search` passes
- [ ] `cargo test -p kebab-search lexical` passes
- [ ] No imports outside Allowed dependencies (`cargo tree -p kebab-search` audit)
- [ ] Output JSON conforms to `docs/wire-schema/v1/search_hit.schema.json`
- [ ] PR links design §3.7, §0 Q3, §2.2

View File

@@ -1,12 +1,12 @@
---
phase: P3
component: kb-embed (trait crate)
component: kebab-embed (trait crate)
task_id: p3-1
title: "Embedder trait + EmbeddingInput/Kind validation"
status: completed
depends_on: [p0-1]
unblocks: [p3-2, p3-3, p3-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [design §3.7 SearchHit.embedding_model, design §7.1 EmbeddingInput/Kind, design §7.2 Embedder, report §11 LLM/embedding split]
---
@@ -14,16 +14,16 @@ contract_sections: [design §3.7 SearchHit.embedding_model, design §7.1 Embeddi
## Goal
Provide the `kb-embed` crate that re-exports `Embedder` trait, `EmbeddingInput`/`EmbeddingKind`, and offers a mock implementation for downstream tests. This task is **trait-only**; concrete adapters live in p3-2.
Provide the `kebab-embed` crate that re-exports `Embedder` trait, `EmbeddingInput`/`EmbeddingKind`, and offers a mock implementation for downstream tests. This task is **trait-only**; concrete adapters live in p3-2.
## Why now / why this size
Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface. Owning the trait + a mock implementation in a tiny crate keeps `kb-store-vector` and `kb-search` testable without touching real models.
Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface. Owning the trait + a mock implementation in a tiny crate keeps `kebab-store-vector` and `kebab-search` testable without touching real models.
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `serde`
- `thiserror`
- `tracing`
@@ -31,35 +31,35 @@ Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface.
## Forbidden dependencies
- `fastembed`, `ort`, `tokenizers`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `fastembed`, `ort`, `tokenizers`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `EmbeddingInput` | `kb_core::EmbeddingInput<'_>` | callers (parser-side or query-side) |
| `EmbeddingInput` | `kebab_core::EmbeddingInput<'_>` | callers (parser-side or query-side) |
| model identity | `(EmbeddingModelId, EmbeddingVersion, dimensions)` | adapter at construction |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<Vec<f32>>` | row-aligned with input | `kb-store-vector`, `kb-search` (vector mode) |
| `Vec<Vec<f32>>` | row-aligned with input | `kebab-store-vector`, `kebab-search` (vector mode) |
## Public surface (signatures only — no new types)
```rust
pub use kb_core::{EmbeddingInput, EmbeddingKind, EmbeddingModelId, EmbeddingVersion, Embedder};
pub use kebab_core::{EmbeddingInput, EmbeddingKind, EmbeddingModelId, EmbeddingVersion, Embedder};
/// Test-only mock that produces deterministic vectors. Compiled only when `mock` feature is on.
#[cfg(feature = "mock")]
pub struct MockEmbedder { /* internal: model_id, dims, seed */ }
#[cfg(feature = "mock")]
impl MockEmbedder {
pub fn new(model_id: kb_core::EmbeddingModelId, version: kb_core::EmbeddingVersion, dimensions: usize) -> Self;
pub fn new(model_id: kebab_core::EmbeddingModelId, version: kebab_core::EmbeddingVersion, dimensions: usize) -> Self;
}
#[cfg(feature = "mock")]
impl kb_core::Embedder for MockEmbedder { /* per §7.2 */ }
impl kebab_core::Embedder for MockEmbedder { /* per §7.2 */ }
```
## Behavior contract
@@ -84,21 +84,21 @@ impl kb_core::Embedder for MockEmbedder { /* per §7.2 */ }
| unit | dimensions match construction-time value | inline |
| contract | property test: 100 random inputs, each vector has length == dimensions, all finite floats | inline (proptest) |
All tests under `cargo test -p kb-embed`.
All tests under `cargo test -p kebab-embed`.
## Definition of Done
- [ ] `cargo check -p kb-embed` passes
- [ ] `cargo test -p kb-embed` passes
- [ ] `cargo check -p kebab-embed` passes
- [ ] `cargo test -p kebab-embed` passes
- [ ] No external embedding dep present
- [ ] PR links design §7.2 Embedder, §11
## Out of scope
- Real adapter (`kb-embed-local` is p3-2).
- Real adapter (`kebab-embed-local` is p3-2).
- Reranker traits (P+).
## Risks / notes
- `MockEmbedder` is gated by `mock` feature (default OFF). Downstream tests opt in via `[dev-dependencies] kb-embed = { path = "...", features = ["mock"] }`. CI build of release binary (`cargo build --release` without `--features mock`) MUST NOT include `MockEmbedder` symbol — verifiable via `cargo bloat` or `nm` symbol scan.
- Trait re-exports keep the call site stable even if `kb-core` reorganizes; downstream crates should `use kb_embed::Embedder` rather than `use kb_core::Embedder`.
- `MockEmbedder` is gated by `mock` feature (default OFF). Downstream tests opt in via `[dev-dependencies] kebab-embed = { path = "...", features = ["mock"] }`. CI build of release binary (`cargo build --release` without `--features mock`) MUST NOT include `MockEmbedder` symbol — verifiable via `cargo bloat` or `nm` symbol scan.
- Trait re-exports keep the call site stable even if `kebab-core` reorganizes; downstream crates should `use kebab_embed::Embedder` rather than `use kebab_core::Embedder`.

View File

@@ -1,12 +1,12 @@
---
phase: P3
component: kb-embed-local (fastembed adapter)
component: kebab-embed-local (fastembed adapter)
task_id: p3-2
title: "fastembed-rs Embedder for multilingual-e5-small"
status: completed
depends_on: [p3-1]
unblocks: [p3-3, p3-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [design §7.2 Embedder, report §11.3 local embedding, design §6.4 [models.embedding], design §9 versioning]
---
@@ -22,9 +22,9 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-embed`
- `kebab-core`
- `kebab-config`
- `kebab-embed`
- `fastembed = "4"` (or current stable)
- `tokenizers`
- `ort` (transitive via fastembed)
@@ -33,21 +33,21 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, network HTTP libs (model download is fastembed's responsibility)
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, network HTTP libs (model download is fastembed's responsibility)
## Inputs
| input | type | source |
|-------|------|--------|
| `kb-config::Config.models.embedding` | settings | runtime |
| `EmbeddingInput[..]` | `kb_core::EmbeddingInput<'_>[]` | callers |
| `kebab-config::Config.models.embedding` | settings | runtime |
| `EmbeddingInput[..]` | `kebab_core::EmbeddingInput<'_>[]` | callers |
| model cache | `data_dir/models/fastembed/` | filesystem |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<Vec<f32>>` | row-aligned, `dimensions = 384` | `kb-store-vector`, query vectors for hybrid search |
| `Vec<Vec<f32>>` | row-aligned, `dimensions = 384` | `kebab-store-vector`, query vectors for hybrid search |
| model identity | `(EmbeddingModelId, EmbeddingVersion, usize)` | record fields, `embedding_id` recipe |
## Public surface (signatures only — no new types)
@@ -56,14 +56,14 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d
pub struct FastembedEmbedder { /* internal: TextEmbedding instance + model meta */ }
impl FastembedEmbedder {
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
}
impl kb_core::Embedder for FastembedEmbedder {
fn model_id(&self) -> kb_core::EmbeddingModelId;
fn model_version(&self) -> kb_core::EmbeddingVersion;
impl kebab_core::Embedder for FastembedEmbedder {
fn model_id(&self) -> kebab_core::EmbeddingModelId;
fn model_version(&self) -> kebab_core::EmbeddingVersion;
fn dimensions(&self) -> usize;
fn embed(&self, inputs: &[kb_core::EmbeddingInput<'_>]) -> anyhow::Result<Vec<Vec<f32>>>;
fn embed(&self, inputs: &[kebab_core::EmbeddingInput<'_>]) -> anyhow::Result<Vec<Vec<f32>>>;
}
```
@@ -96,12 +96,12 @@ impl kb_core::Embedder for FastembedEmbedder {
| performance | batch of 64 short inputs completes in < 5s on CI host | tmp config (skip on slow CI via `#[ignore]`) |
| snapshot | aggregate hash of vectors for 5 known sentences stable across runs | `fixtures/embed/known-sentences.json` |
All tests under `cargo test -p kb-embed-local`. Mark slow tests `#[ignore]` and run via `cargo test -- --ignored` in dedicated CI lane.
All tests under `cargo test -p kebab-embed-local`. Mark slow tests `#[ignore]` and run via `cargo test -- --ignored` in dedicated CI lane.
## Definition of Done
- [ ] `cargo check -p kb-embed-local` passes
- [ ] `cargo test -p kb-embed-local` passes (excluding `#[ignore]`)
- [ ] `cargo check -p kebab-embed-local` passes
- [ ] `cargo test -p kebab-embed-local` passes (excluding `#[ignore]`)
- [ ] First-run model download works under `data_dir/models/fastembed/`
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §11.3, §6.4, §9

View File

@@ -1,12 +1,12 @@
---
phase: P3
component: kb-store-vector (LanceDB)
component: kebab-store-vector (LanceDB)
task_id: p3-3
title: "LanceDB VectorStore + embedding_records writer"
status: completed
depends_on: [p3-2, p1-6]
unblocks: [p3-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5.6 embedding_records, §6.3 lancedb table naming, §7.2 VectorStore, §9 versioning]
---
@@ -18,13 +18,13 @@ Implement `VectorStore` over LanceDB (embedded). Stores per-model tables (`chunk
## Why now / why this size
Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-4) can compose lexical + vector retrievers without leaking storage details.
Closes the loop chunk → vector. Splits cleanly from `kebab-search` so hybrid (p3-4) can compose lexical + vector retrievers without leaking storage details.
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-store-sqlite` (only for writing/reading rows in `embedding_records`)
- `kebab-core`
- `kebab-config`
- `kebab-store-sqlite` (only for writing/reading rows in `embedding_records`)
- `lancedb`
- `arrow` (and `arrow-array`, `arrow-schema`)
- `serde`, `serde_json`
@@ -33,16 +33,16 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-embed*` (consumes `Vec<f32>` via input only — no embedding logic here), `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-embed*` (consumes `Vec<f32>` via input only — no embedding logic here), `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `VectorRecord[..]` | `kb_core::VectorRecord` | `kb-app::embed_index` (P3 facade) |
| query vector | `&[f32]` | `kb-embed-local` (`Embedder::embed` for query) |
| filters | `kb_core::SearchFilters` | `SearchQuery` |
| `kb-config::Config.storage.vector_dir` | path | runtime |
| `VectorRecord[..]` | `kebab_core::VectorRecord` | `kebab-app::embed_index` (P3 facade) |
| query vector | `&[f32]` | `kebab-embed-local` (`Embedder::embed` for query) |
| filters | `kebab_core::SearchFilters` | `SearchQuery` |
| `kebab-config::Config.storage.vector_dir` | path | runtime |
## Outputs
@@ -50,7 +50,7 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-
|--------|------|------------|
| Lance tables under `vector_dir/chunk_embeddings_<model>_<dim>.lance/` | filesystem | future searches |
| `embedding_records` rows | SQLite | reverse lookup, reindex bookkeeping |
| `Vec<VectorHit>` | `kb_core::VectorHit` | hybrid retriever (p3-4) |
| `Vec<VectorHit>` | `kebab_core::VectorHit` | hybrid retriever (p3-4) |
## Public surface (signatures only — no new types)
@@ -58,13 +58,13 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-
pub struct LanceVectorStore { /* internal: connection + sqlite handle */ }
impl LanceVectorStore {
pub fn new(config: &kb_config::Config, sqlite: std::sync::Arc<kb_store_sqlite::SqliteStore>) -> anyhow::Result<Self>;
pub fn new(config: &kebab_config::Config, sqlite: std::sync::Arc<kebab_store_sqlite::SqliteStore>) -> anyhow::Result<Self>;
}
impl kb_core::VectorStore for LanceVectorStore {
fn ensure_table(&self, model: &kb_core::EmbeddingModelId, dim: usize) -> anyhow::Result<kb_core::IndexId>;
fn upsert(&self, recs: &[kb_core::VectorRecord]) -> anyhow::Result<()>;
fn search(&self, query_vec: &[f32], k: usize, filters: &kb_core::SearchFilters) -> anyhow::Result<Vec<kb_core::VectorHit>>;
impl kebab_core::VectorStore for LanceVectorStore {
fn ensure_table(&self, model: &kebab_core::EmbeddingModelId, dim: usize) -> anyhow::Result<kebab_core::IndexId>;
fn upsert(&self, recs: &[kebab_core::VectorRecord]) -> anyhow::Result<()>;
fn search(&self, query_vec: &[f32], k: usize, filters: &kebab_core::SearchFilters) -> anyhow::Result<Vec<kebab_core::VectorHit>>;
}
```
@@ -116,12 +116,12 @@ impl kb_core::VectorStore for LanceVectorStore {
| determinism | same query vector + same data → same top-k order | tmp data_dir |
| snapshot | `Vec<VectorHit>` JSON for fixed corpus stable | `fixtures/vector/run-1.json` |
All tests under `cargo test -p kb-store-vector`.
All tests under `cargo test -p kebab-store-vector`.
## Definition of Done
- [ ] `cargo check -p kb-store-vector` passes
- [ ] `cargo test -p kb-store-vector` passes
- [ ] `cargo check -p kebab-store-vector` passes
- [ ] `cargo test -p kebab-store-vector` passes
- [ ] No imports outside Allowed dependencies
- [ ] `embedding_records` rows align 1:1 with Lance rows after a successful upsert batch
- [ ] PR links design §5.6, §6.3, §7.2
@@ -130,7 +130,7 @@ All tests under `cargo test -p kb-store-vector`.
- IVF / PQ index tuning (P+).
- Image / multimodal vector tables (P6).
- `kb-app` orchestration of indexing jobs (`embed_index` facade method body).
- `kebab-app` orchestration of indexing jobs (`embed_index` facade method body).
## Risks / notes

View File

@@ -1,12 +1,12 @@
---
phase: P3
component: kb-search (hybrid)
component: kebab-search (hybrid)
task_id: p3-4
title: "Hybrid Retriever (RRF) over lexical + vector"
status: completed
depends_on: [p2-2, p3-3]
unblocks: [p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 [search] rrf settings]
---
@@ -22,17 +22,17 @@ Single mediator. Keeps the lexical and vector retrievers focused; only this task
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-store-sqlite` (for `LexicalRetriever`)
- `kb-store-vector` (for `LanceVectorStore`)
- `kb-embed` (trait only — for query embedding via `Embedder`)
- `kebab-core`
- `kebab-config`
- `kebab-store-sqlite` (for `LexicalRetriever`)
- `kebab-store-vector` (for `LanceVectorStore`)
- `kebab-embed` (trait only — for query embedding via `Embedder`)
- `tracing`
- `thiserror`
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`. (`kb-embed-local` is a runtime-injected `dyn Embedder`; this crate must not depend on the concrete adapter directly.)
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`. (`kebab-embed-local` is a runtime-injected `dyn Embedder`; this crate must not depend on the concrete adapter directly.)
## Inputs
@@ -41,21 +41,21 @@ Single mediator. Keeps the lexical and vector retrievers focused; only this task
| `LexicalRetriever` | trait object | constructed elsewhere |
| `LanceVectorStore` | trait object | constructed elsewhere |
| `Box<dyn Embedder>` | for query embedding | runtime-injected |
| `kb-config::Config.search` | `default_k`, `hybrid_fusion`, `rrf_k` | runtime |
| `SearchQuery` | `kb_core::SearchQuery` | `kb-app::search` |
| `kebab-config::Config.search` | `default_k`, `hybrid_fusion`, `rrf_k` | runtime |
| `SearchQuery` | `kebab_core::SearchQuery` | `kebab-app::search` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<SearchHit>` (with full `RetrievalDetail`) | `kb_core::SearchHit` | `kb-cli` printer, `kb-rag` packer |
| `Vec<SearchHit>` (with full `RetrievalDetail`) | `kebab_core::SearchHit` | `kebab-cli` printer, `kebab-rag` packer |
## Public surface (signatures only — no new types)
```rust
pub struct HybridRetriever {
lexical: std::sync::Arc<dyn kb_core::Retriever>,
vector: std::sync::Arc<dyn kb_core::Retriever>, // wrapper over LanceVectorStore + Embedder
lexical: std::sync::Arc<dyn kebab_core::Retriever>,
vector: std::sync::Arc<dyn kebab_core::Retriever>, // wrapper over LanceVectorStore + Embedder
fusion: FusionPolicy,
k: usize,
}
@@ -64,27 +64,27 @@ pub enum FusionPolicy { Rrf { k_rrf: u32 } }
impl HybridRetriever {
pub fn new(
config: &kb_config::Config,
lexical: std::sync::Arc<dyn kb_core::Retriever>,
vector: std::sync::Arc<dyn kb_core::Retriever>,
config: &kebab_config::Config,
lexical: std::sync::Arc<dyn kebab_core::Retriever>,
vector: std::sync::Arc<dyn kebab_core::Retriever>,
) -> Self;
}
impl kb_core::Retriever for HybridRetriever {
fn search(&self, query: &kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>;
fn index_version(&self) -> kb_core::IndexVersion;
impl kebab_core::Retriever for HybridRetriever {
fn search(&self, query: &kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
fn index_version(&self) -> kebab_core::IndexVersion;
}
/// Wrapper that turns a VectorStore + Embedder into a Retriever.
pub struct VectorRetriever {
store: std::sync::Arc<dyn kb_core::VectorStore>,
embed: std::sync::Arc<dyn kb_core::Embedder>,
/* heading_path/snippet enrichment hits SQLite via kb-store-sqlite read accessor */
store: std::sync::Arc<dyn kebab_core::VectorStore>,
embed: std::sync::Arc<dyn kebab_core::Embedder>,
/* heading_path/snippet enrichment hits SQLite via kebab-store-sqlite read accessor */
}
impl VectorRetriever {
pub fn new(store: std::sync::Arc<dyn kb_core::VectorStore>, embed: std::sync::Arc<dyn kb_core::Embedder>, sqlite: std::sync::Arc<kb_store_sqlite::SqliteStore>) -> Self;
pub fn new(store: std::sync::Arc<dyn kebab_core::VectorStore>, embed: std::sync::Arc<dyn kebab_core::Embedder>, sqlite: std::sync::Arc<kebab_store_sqlite::SqliteStore>) -> Self;
}
impl kb_core::Retriever for VectorRetriever { /* per §7.2 */ }
impl kebab_core::Retriever for VectorRetriever { /* per §7.2 */ }
```
## Behavior contract
@@ -124,12 +124,12 @@ impl kb_core::Retriever for VectorRetriever { /* per §7.2 */ }
| determinism | identical query twice → byte-identical `Vec<SearchHit>` | tmp DB |
| snapshot | hybrid output JSON stable | `fixtures/search/hybrid/run-1.json` |
All tests under `cargo test -p kb-search hybrid`.
All tests under `cargo test -p kebab-search hybrid`.
## Definition of Done
- [ ] `cargo check -p kb-search` passes
- [ ] `cargo test -p kb-search hybrid` passes
- [ ] `cargo check -p kebab-search` passes
- [ ] `cargo test -p kebab-search hybrid` passes
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.7, §6.4 search, §0 Q3

View File

@@ -1,64 +1,64 @@
---
phase: P3
component: kb-app (facade wiring)
component: kebab-app (facade wiring)
task_id: p3-5
title: "Wire kb-app facade — ingest / search / list / inspect end-to-end"
title: "Wire kebab-app facade — ingest / search / list / inspect end-to-end"
status: completed
depends_on: [p1-6, p2-2, p3-2, p3-3, p3-4]
unblocks: [p4-3, p9-1, p9-2, p9-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§7 components, §7.2 traits, §1 UX scenes (ingest, search, list, inspect), §6.4 config, §10 errors]
---
# p3-5 — Wire kb-app facade
# p3-5 — Wire kebab-app facade
## Goal
Replace the `bail!("not yet wired")` stubs in `kb-app` (currently `ingest`, `search`, `list_docs`, `inspect_doc`, `inspect_chunk`) with real bodies that compose the libraries shipped in P1P3. After this task, `kb index` actually walks a workspace and persists chunks, and `kb search --mode {lexical,vector,hybrid}` returns real `SearchHit`s. `kb-app::ask` stays stubbed (P4-3 owns it).
Replace the `bail!("not yet wired")` stubs in `kebab-app` (currently `ingest`, `search`, `list_docs`, `inspect_doc`, `inspect_chunk`) with real bodies that compose the libraries shipped in P1P3. After this task, `kebab index` actually walks a workspace and persists chunks, and `kebab search --mode {lexical,vector,hybrid}` returns real `SearchHit`s. `kebab-app::ask` stays stubbed (P4-3 owns it).
## Why now / why this size
P1P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where `kb-app::ask` will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; `ask` wiring lives in P4-3.
P1P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where `kebab-app::ask` will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; `ask` wiring lives in P4-3.
## Allowed dependencies
`kb-app` may depend on:
`kebab-app` may depend on:
- `kb-core`, `kb-config` (already)
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-sqlite` (P1)
- `kb-search` (P2-2, P3-4)
- `kb-store-vector`, `kb-embed`, `kb-embed-local` (P3-2, P3-3, P3-4)
- `kebab-core`, `kebab-config` (already)
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-sqlite` (P1)
- `kebab-search` (P2-2, P3-4)
- `kebab-store-vector`, `kebab-embed`, `kebab-embed-local` (P3-2, P3-3, P3-4)
- `tracing`, `anyhow`, `serde`, `serde_json`, `time`, `dirs`, `toml` (existing)
## Forbidden dependencies
- `kb-llm*`, `kb-rag` (P4 ownership — `ask` stays stubbed)
- `kb-tui`, `kb-desktop` (P9 — those *consume* `kb-app`, not the other way)
- `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio` (P6/P7/P8)
- network HTTP libs (model download is `kb-embed-local`'s responsibility via fastembed)
- `kebab-llm*`, `kebab-rag` (P4 ownership — `ask` stays stubbed)
- `kebab-tui`, `kebab-desktop` (P9 — those *consume* `kebab-app`, not the other way)
- `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio` (P6/P7/P8)
- network HTTP libs (model download is `kebab-embed-local`'s responsibility via fastembed)
## Inputs
| input | type | source |
|-------|------|--------|
| `kb-config::Config` | `kb_config::Config` | loaded once at process start; threaded through every facade call |
| `SourceScope` | `kb_core::SourceScope` | CLI |
| `SearchQuery` | `kb_core::SearchQuery` | CLI |
| `DocFilter` | `kb_core::DocFilter` | CLI |
| `DocumentId` / `ChunkId` | `kb_core::ids::*` | CLI |
| `kebab-config::Config` | `kebab_config::Config` | loaded once at process start; threaded through every facade call |
| `SourceScope` | `kebab_core::SourceScope` | CLI |
| `SearchQuery` | `kebab_core::SearchQuery` | CLI |
| `DocFilter` | `kebab_core::DocFilter` | CLI |
| `DocumentId` / `ChunkId` | `kebab_core::ids::*` | CLI |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `IngestReport` | `kb_core::IngestReport` | `kb-cli` (`ingest_report.v1`), `kb-eval` (P5) |
| `Vec<SearchHit>` | `kb_core::SearchHit` | `kb-cli` (`search_hit.v1`), `kb-rag` (P4-3) |
| `Vec<DocSummary>` | `kb_core::DocSummary` | `kb-cli` (`doc_summary.v1`) |
| `CanonicalDocument` / `Chunk` | `kb_core::*` | `kb-cli` (`chunk_inspection.v1`) |
| `IngestReport` | `kebab_core::IngestReport` | `kebab-cli` (`ingest_report.v1`), `kebab-eval` (P5) |
| `Vec<SearchHit>` | `kebab_core::SearchHit` | `kebab-cli` (`search_hit.v1`), `kebab-rag` (P4-3) |
| `Vec<DocSummary>` | `kebab_core::DocSummary` | `kebab-cli` (`doc_summary.v1`) |
| `CanonicalDocument` / `Chunk` | `kebab_core::*` | `kebab-cli` (`chunk_inspection.v1`) |
## Public surface (signatures only — no new types)
The signatures already live on `kb-app`. This task only swaps the bodies. Frozen surface:
The signatures already live on `kebab-app`. This task only swaps the bodies. Frozen surface:
```rust
pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result<IngestReport>;
@@ -69,7 +69,7 @@ pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHi
// `ask` and `init_workspace` / `doctor` stay as-is.
```
If a constructor helper is needed (e.g., a single `App::open(&Config)` that opens `SqliteStore`, `LanceVectorStore`, embedder, retrievers once), it MAY be added pub-or-pub(crate) to `kb-app`. Keep all newly-added types crate-private unless explicitly required by `kb-cli`.
If a constructor helper is needed (e.g., a single `App::open(&Config)` that opens `SqliteStore`, `LanceVectorStore`, embedder, retrievers once), it MAY be added pub-or-pub(crate) to `kebab-app`. Keep all newly-added types crate-private unless explicitly required by `kebab-cli`.
## Behavior contract
@@ -77,8 +77,8 @@ If a constructor helper is needed (e.g., a single `App::open(&Config)` that open
Pipeline per design §1.2:
1. Resolve `scope` → set of files under `config.workspace.root` filtered by `config.workspace.include`/`exclude`. Use `kb-source-fs::FsSourceConnector` and pass each `RawAsset` + bytes through.
2. For each markdown asset: `kb_parse_md::frontmatter::parse_frontmatter` + `kb_parse_md::blocks::parse_blocks``kb_normalize::build_canonical_document``kb_chunk::MdHeadingV1Chunker::chunk`.
1. Resolve `scope` → set of files under `config.workspace.root` filtered by `config.workspace.include`/`exclude`. Use `kebab-source-fs::FsSourceConnector` and pass each `RawAsset` + bytes through.
2. For each markdown asset: `kebab_parse_md::frontmatter::parse_frontmatter` + `kebab_parse_md::blocks::parse_blocks``kebab_normalize::build_canonical_document``kebab_chunk::MdHeadingV1Chunker::chunk`.
3. SQLite writes per design §5.8 (one ingest = one transaction): `put_asset_with_bytes``put_document``put_blocks``put_chunks`.
4. If `config.models.embedding.provider == "fastembed"`: build `FastembedEmbedder` once at the top of the run, embed every chunk's `text` as `EmbeddingKind::Document`, batch by `config.models.embedding.batch_size`, then `LanceVectorStore::ensure_table` + `upsert`. Skip vector indexing if provider is `"none"` or `embedding.dimensions == 0` (config opt-out path).
5. Record an `ingest_runs` row via `JobRepo` with the aggregate counts. `summary_only=true` writes `items_json=NULL`.
@@ -90,9 +90,9 @@ Errors: any per-file parse failure should be recorded as a `Warning` in `Provena
Per `query.mode`:
- `Lexical``kb_search::LexicalRetriever::search`. Takes a read-only borrow of the SQLite connection.
- `Vector``kb_search::VectorRetriever::search`. Requires `Embedder` (built from config) + `VectorStore` (LanceDB).
- `Hybrid``kb_search::HybridRetriever::search` composing the above two.
- `Lexical``kebab_search::LexicalRetriever::search`. Takes a read-only borrow of the SQLite connection.
- `Vector``kebab_search::VectorRetriever::search`. Requires `Embedder` (built from config) + `VectorStore` (LanceDB).
- `Hybrid``kebab_search::HybridRetriever::search` composing the above two.
Each call constructs (or reuses) the retrievers from a single `App` context that holds `Arc<SqliteStore>`, `Arc<LanceVectorStore>`, `Arc<dyn Embedder>` so cold-start cost is paid once per process. Document the lifetime model: a CLI invocation paths through `App::open(&Config)` once, runs the requested op, then drops everything on exit. The TUI (P9) holds the `App` for the session.
@@ -100,17 +100,17 @@ Each call constructs (or reuses) the retrievers from a single `App` context that
### `list_docs(filter)` / `inspect_doc(id)` / `inspect_chunk(id)`
Direct delegation to `kb_core::DocumentStore` trait methods on `SqliteStore`. No vector / embedding involvement.
Direct delegation to `kebab_core::DocumentStore` trait methods on `SqliteStore`. No vector / embedding involvement.
### Lifecycle
Define an internal `pub(crate) struct App { config: Arc<Config>, sqlite: Arc<SqliteStore>, vector: Option<Arc<LanceVectorStore>>, embedder: Option<Arc<dyn Embedder + Send + Sync>>, /* retrievers built per call */ }` with a `pub(crate) fn open(config: &Config) -> Result<Self>`. Each public `kb-app::*` function builds an `App` (or accepts one in tests) and uses it. The free functions stay the public API so `kb-cli` and `kb-tui` don't need to refactor.
Define an internal `pub(crate) struct App { config: Arc<Config>, sqlite: Arc<SqliteStore>, vector: Option<Arc<LanceVectorStore>>, embedder: Option<Arc<dyn Embedder + Send + Sync>>, /* retrievers built per call */ }` with a `pub(crate) fn open(config: &Config) -> Result<Self>`. Each public `kebab-app::*` function builds an `App` (or accepts one in tests) and uses it. The free functions stay the public API so `kebab-cli` and `kebab-tui` don't need to refactor.
`vector` and `embedder` are `Option` because the spec allows running KB without embeddings (lexical-only mode for resource-constrained boxes — `provider == "none"`). When absent, `search` with `Vector` / `Hybrid` modes returns `anyhow::Error` with a hint to enable embeddings; `ingest` skips the vector step.
### Versioning
- `parser_version` from `kb-parse-md` constant.
- `parser_version` from `kebab-parse-md` constant.
- `chunker_version` from `MdHeadingV1Chunker::chunker_version()`.
- `embedding_model` / `embedding_version` from the embedder.
- `index_version` from the retriever (or `Lexical`/`Vector` IndexId).
@@ -119,15 +119,15 @@ All wired through to the persisted records and the wire output.
## Storage / wire effects
- Writes: `data_dir/kb.sqlite` (assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs), `data_dir/assets/<aa>/<id>` (when copied), `data_dir/lancedb/chunk_embeddings_<model>_<dim>.lance/` (when embeddings on), `data_dir/models/fastembed/` (model cache, first run only).
- Writes: `data_dir/kebab.sqlite` (assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs), `data_dir/assets/<aa>/<id>` (when copied), `data_dir/lancedb/chunk_embeddings_<model>_<dim>.lance/` (when embeddings on), `data_dir/models/fastembed/` (model cache, first run only).
- Reads on subsequent calls: same DB.
- Wire JSON conforms to `ingest_report.v1`, `search_hit.v1`, `doc_summary.v1`, `chunk_inspection.v1``kb-cli` already has the wrappers.
- Wire JSON conforms to `ingest_report.v1`, `search_hit.v1`, `doc_summary.v1`, `chunk_inspection.v1``kebab-cli` already has the wrappers.
## Test plan
| kind | description | fixture / data |
|------|-------------|----------------|
| integration | `ingest` over a 3-file fixture workspace produces non-empty `IngestReport` and writes the expected SQLite rows + Lance rows | `crates/kb-app/tests/fixtures/workspace/`, tmp `data_dir` |
| integration | `ingest` over a 3-file fixture workspace produces non-empty `IngestReport` and writes the expected SQLite rows + Lance rows | `crates/kebab-app/tests/fixtures/workspace/`, tmp `data_dir` |
| integration | `ingest` with `provider="none"` skips Lance and produces an `IngestReport` with `embeddings_indexed=0` | tmp config |
| integration | `ingest` is idempotent: re-running on the same workspace updates `documents.updated_at` and bumps `doc_version` without duplicating rows | tmp `data_dir` |
| integration | `search` with `mode=Lexical` returns hits whose `embedding_model` is `None` | tmp `data_dir` |
@@ -136,36 +136,36 @@ All wired through to the persisted records and the wire output.
| integration | `search` with `mode=Vector` and `provider="none"` returns a clear error | tmp config |
| integration | `list_docs` with `tags_any=["rust"]` filters correctly | tmp `data_dir` |
| integration | `inspect_doc` round-trips a document; `inspect_chunk` round-trips a chunk | tmp `data_dir` |
| smoke | end-to-end CLI smoke: `kb index` + `kb search --mode hybrid "..."` against the fixture workspace using `cargo run` (manual / `assert_cmd`) | fixture |
| smoke | end-to-end CLI smoke: `kebab index` + `kebab search --mode hybrid "..."` against the fixture workspace using `cargo run` (manual / `assert_cmd`) | fixture |
`#[ignore]` AVX-gated tests follow the P3-3/P3-4 pattern: `require_avx_or_panic()` at the top of each LanceDB-touching body. Default `cargo test -p kb-app` runs the lexical-only and SQLite-only paths.
`#[ignore]` AVX-gated tests follow the P3-3/P3-4 pattern: `require_avx_or_panic()` at the top of each LanceDB-touching body. Default `cargo test -p kebab-app` runs the lexical-only and SQLite-only paths.
All tests under `cargo test -p kb-app`. CLI smoke optional via `assert_cmd` if it doesn't add a heavyweight dep tree.
All tests under `cargo test -p kebab-app`. CLI smoke optional via `assert_cmd` if it doesn't add a heavyweight dep tree.
## Definition of Done
- [ ] `cargo check -p kb-app` passes.
- [ ] `cargo test -p kb-app` passes (default lane).
- [ ] `cargo test -p kb-app -- --ignored` passes on AVX hardware.
- [ ] `cargo run -p kb-cli -- index` succeeds against a non-empty fixture workspace.
- [ ] `cargo run -p kb-cli -- search --mode hybrid "<term>"` returns real hits + citations.
- [ ] `cargo run -p kb-cli -- list` returns the indexed documents.
- [ ] `cargo check -p kebab-app` passes.
- [ ] `cargo test -p kebab-app` passes (default lane).
- [ ] `cargo test -p kebab-app -- --ignored` passes on AVX hardware.
- [ ] `cargo run -p kebab-cli -- index` succeeds against a non-empty fixture workspace.
- [ ] `cargo run -p kebab-cli -- search --mode hybrid "<term>"` returns real hits + citations.
- [ ] `cargo run -p kebab-cli -- list` returns the indexed documents.
- [ ] `cargo clippy --workspace --all-targets -- -D warnings` clean.
- [ ] No imports outside Allowed dependencies.
- [ ] PR links design §1.2, §1.5, §1.6, §7.
## Out of scope
- `kb-app::ask` body — P4-3 (RAG pipeline).
- `kb index --rebuild-fts` CLI option — wire `kb_store_sqlite::rebuild_chunks_fts` only when needed by a downstream task.
- `kb index --resume` checkpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer.
- `kebab-app::ask` body — P4-3 (RAG pipeline).
- `kebab index --rebuild-fts` CLI option — wire `kebab_store_sqlite::rebuild_chunks_fts` only when needed by a downstream task.
- `kebab index --resume` checkpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer.
- `--watch` mode — P+.
- TUI / desktop integration — P9 consumes the wired facade.
## Risks / notes
- Cold-start cost: first `kb index` run downloads the fastembed model (~470MB) and warms the ONNX session. Surface via `tracing::info!` (already wired in P3-2).
- **Post-merge fix (2026-05)**: original P3-5 implementation left every `kb-cli` subcommand calling the bare `kb_app::*` free functions, which silently re-loaded `Config::load(None)` (XDG default) and ignored the user's `--config <path>` flag. CLI now goes through `kb_app::*_with_config(cfg, ...)` everywhere. The `#[doc(hidden)] pub fn *_with_config` companions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See [HOTFIXES.md](../HOTFIXES.md) for details.
- Cold-start cost: first `kebab index` run downloads the fastembed model (~470MB) and warms the ONNX session. Surface via `tracing::info!` (already wired in P3-2).
- **Post-merge fix (2026-05)**: original P3-5 implementation left every `kebab-cli` subcommand calling the bare `kebab_app::*` free functions, which silently re-loaded `Config::load(None)` (XDG default) and ignored the user's `--config <path>` flag. CLI now goes through `kebab_app::*_with_config(cfg, ...)` everywhere. The `#[doc(hidden)] pub fn *_with_config` companions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See [HOTFIXES.md](../HOTFIXES.md) for details.
- The `App` lifecycle struct is internal but its construction is the natural seam for adding caching / connection pooling later. Keep it `pub(crate)` so future refactors don't break the CLI.
- Mismatched `index_version` across stored records and the live retriever should fail loud at `App::open` (not at first search). Reuse the `tracing::warn!` from `HybridRetriever::new` (P3-4).
- The fastembed adapter holds a `tokio::runtime` (P3-3); `App` must be constructed from a synchronous context. Document on `App::open`.

View File

@@ -1,12 +1,12 @@
---
phase: P4
component: kb-llm (trait crate)
component: kebab-llm (trait crate)
task_id: p4-1
title: "LanguageModel trait + GenerateRequest/TokenChunk"
status: completed
depends_on: [p0-1]
unblocks: [p4-2, p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§7.1 GenerateRequest/TokenChunk, §7.2 LanguageModel, §0 Q5 streaming, §3.8 ModelRef]
---
@@ -14,16 +14,16 @@ contract_sections: [§7.1 GenerateRequest/TokenChunk, §7.2 LanguageModel, §0 Q
## Goal
Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper types (`GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage`, `ModelRef`), plus a `MockLanguageModel` for downstream tests.
Provide the `kebab-llm` crate that re-exports the `LanguageModel` trait and helper types (`GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage`, `ModelRef`), plus a `MockLanguageModel` for downstream tests.
## Why now / why this size
`kb-rag` (p4-3) consumes a `LanguageModel` trait object. Owning the trait + a deterministic mock here lets RAG tests run with no Ollama dependency. Real adapters (Ollama, llama.cpp, candle) live in p4-2 and beyond.
`kebab-rag` (p4-3) consumes a `LanguageModel` trait object. Owning the trait + a deterministic mock here lets RAG tests run with no Ollama dependency. Real adapters (Ollama, llama.cpp, candle) live in p4-2 and beyond.
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `serde`
- `thiserror`
- `tracing`
@@ -31,13 +31,13 @@ Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper
## Forbidden dependencies
- `reqwest`, `ureq`, `tokio`, `whisper-rs`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-tui`, `kb-desktop`
- `reqwest`, `ureq`, `tokio`, `whisper-rs`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `GenerateRequest` | `kb_core::GenerateRequest` | RAG pipeline |
| `GenerateRequest` | `kebab_core::GenerateRequest` | RAG pipeline |
| concrete adapter at runtime | `dyn LanguageModel` | p4-2+ |
## Outputs
@@ -45,12 +45,12 @@ Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper
| output | type | downstream |
|--------|------|------------|
| streaming `TokenChunk` iterator | `Box<dyn Iterator<Item=anyhow::Result<TokenChunk>> + Send>` | RAG pipeline |
| `ModelRef` identity | `kb_core::ModelRef` | Answer.model |
| `ModelRef` identity | `kebab_core::ModelRef` | Answer.model |
## Public surface (signatures only — no new types)
```rust
pub use kb_core::{LanguageModel, GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef};
pub use kebab_core::{LanguageModel, GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef};
/// Test-only deterministic mock. Compiled only when `mock` feature is on.
#[cfg(feature = "mock")]
@@ -59,12 +59,12 @@ pub struct MockLanguageModel {
pub provider: String,
pub context_tokens: usize,
pub canned_response: String, // emitted token-by-token
pub canned_finish: kb_core::FinishReason,
pub canned_usage: kb_core::TokenUsage,
pub canned_finish: kebab_core::FinishReason,
pub canned_usage: kebab_core::TokenUsage,
}
#[cfg(feature = "mock")]
impl kb_core::LanguageModel for MockLanguageModel { /* per §7.2 */ }
impl kebab_core::LanguageModel for MockLanguageModel { /* per §7.2 */ }
```
## Behavior contract
@@ -89,12 +89,12 @@ impl kb_core::LanguageModel for MockLanguageModel { /* per §7.2 */ }
| unit | concatenation of streamed `TokenChunk::Token` equals canned text (truncated by stop strings) | inline |
| contract | `model_ref()` populates `provider` and leaves `dimensions = None` | inline |
All tests under `cargo test -p kb-llm`.
All tests under `cargo test -p kebab-llm`.
## Definition of Done
- [ ] `cargo check -p kb-llm` passes
- [ ] `cargo test -p kb-llm` passes
- [ ] `cargo check -p kebab-llm` passes
- [ ] `cargo test -p kebab-llm` passes
- [ ] No HTTP / async runtime deps present
- [ ] PR links design §7.2 LanguageModel, §0 Q5

View File

@@ -1,12 +1,12 @@
---
phase: P4
component: kb-llm-local (Ollama adapter)
component: kebab-llm-local (Ollama adapter)
task_id: p4-2
title: "OllamaLanguageModel — streaming /api/generate"
status: completed
depends_on: [p4-1]
unblocks: [p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [design §7.2 LanguageModel, report §11.2 Ollama, design §6.4 [models.llm], design §0 Q5 streaming, design §10 errors]
---
@@ -18,13 +18,13 @@ Implement `OllamaLanguageModel` against Ollama's local HTTP API (`POST /api/gene
## Why now / why this size
First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so swapping providers stays config-only.
First real LM. Required for `kebab ask` to function. Isolated from RAG pipeline so swapping providers stays config-only.
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-llm`
- `kebab-core`
- `kebab-config`
- `kebab-llm`
- `reqwest = { version = "0.12", default-features = false, features = ["blocking", "json", "rustls-tls"] }`
- `serde`, `serde_json`
- `tracing`
@@ -32,21 +32,21 @@ First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so
## Forbidden dependencies
- `tokio`, `async-std`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-tui`, `kb-desktop`. (Streaming uses `reqwest::blocking::Response::bytes_stream` via line-delimited JSON; no async runtime needed.)
- `tokio`, `async-std`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-tui`, `kebab-desktop`. (Streaming uses `reqwest::blocking::Response::bytes_stream` via line-delimited JSON; no async runtime needed.)
## Inputs
| input | type | source |
|-------|------|--------|
| `kb-config::Config.models.llm` | endpoint, model, context, temperature, seed | runtime |
| `GenerateRequest` | `kb_core::GenerateRequest` | RAG pipeline |
| `kebab-config::Config.models.llm` | endpoint, model, context, temperature, seed | runtime |
| `GenerateRequest` | `kebab_core::GenerateRequest` | RAG pipeline |
| Ollama HTTP server (local) | `http://127.0.0.1:11434` | external process |
## Outputs
| output | type | downstream |
|--------|------|------------|
| streaming `TokenChunk` iterator | per §7.2 | `kb-rag` |
| streaming `TokenChunk` iterator | per §7.2 | `kebab-rag` |
| `ModelRef` | `{ id, provider="ollama", dimensions=None }` | `Answer.model` |
## Public surface (signatures only — no new types)
@@ -55,14 +55,14 @@ First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so
pub struct OllamaLanguageModel { /* internal: reqwest::blocking::Client + config */ }
impl OllamaLanguageModel {
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
}
impl kb_core::LanguageModel for OllamaLanguageModel {
fn model_ref(&self) -> kb_core::ModelRef;
impl kebab_core::LanguageModel for OllamaLanguageModel {
fn model_ref(&self) -> kebab_core::ModelRef;
fn context_tokens(&self) -> usize;
fn generate_stream(&self, req: kb_core::GenerateRequest)
-> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<kb_core::TokenChunk>> + Send>>;
fn generate_stream(&self, req: kebab_core::GenerateRequest)
-> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<kebab_core::TokenChunk>> + Send>>;
}
```
@@ -93,7 +93,7 @@ impl kb_core::LanguageModel for OllamaLanguageModel {
- UTF-8 boundary: buffer incomplete byte sequences across stream lines before emitting `TokenChunk::Token`.
- Determinism: with `temperature=0` and fixed `seed`, Ollama's output is reproducible (modulo nondeterminism in the model itself); tests that verify determinism use a fixed seed and may rely on aggregate hash with tolerance, NOT byte equality.
- `model_ref().provider = "ollama"`, `dimensions = None`.
- Reachability check: `OllamaLanguageModel::new` does NOT eagerly hit the network; first failure surfaces on `generate_stream`. Use `kb doctor` (separate task) to probe.
- Reachability check: `OllamaLanguageModel::new` does NOT eagerly hit the network; first failure surfaces on `generate_stream`. Use `kebab doctor` (separate task) to probe.
## Storage / wire effects
@@ -112,12 +112,12 @@ impl kb_core::LanguageModel for OllamaLanguageModel {
| determinism | identical request + temperature=0 + seed=0 produces identical token stream against mock | mocked HTTP |
| `#[ignore]` integration | real Ollama on `localhost:11434` with `qwen2.5:14b-instruct` produces non-empty output | requires user opt-in |
All non-ignored tests under `cargo test -p kb-llm-local`. Real-LM integration runs via `cargo test -p kb-llm-local -- --ignored`.
All non-ignored tests under `cargo test -p kebab-llm-local`. Real-LM integration runs via `cargo test -p kebab-llm-local -- --ignored`.
## Definition of Done
- [ ] `cargo check -p kb-llm-local` passes
- [ ] `cargo test -p kb-llm-local` passes (mocked tests; real LM behind `#[ignore]`)
- [ ] `cargo check -p kebab-llm-local` passes
- [ ] `cargo test -p kebab-llm-local` passes (mocked tests; real LM behind `#[ignore]`)
- [ ] No async runtime present (uses `reqwest::blocking`)
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §11.2, §0 Q5, §10
@@ -125,7 +125,7 @@ All non-ignored tests under `cargo test -p kb-llm-local`. Real-LM integration ru
## Out of scope
- llama.cpp / candle adapters (P+).
- Embedding via Ollama's `/api/embed` endpoint (alternate adapter inside `kb-embed-local` if requested later).
- Embedding via Ollama's `/api/embed` endpoint (alternate adapter inside `kebab-embed-local` if requested later).
- Cancellation / abort tokens (P+).
- Connection pooling tuning (default `reqwest::blocking` is sufficient for single-user CLI).

View File

@@ -1,12 +1,12 @@
---
phase: P4
component: kb-rag
component: kebab-rag
task_id: p4-3
title: "RAG pipeline: retrieve → gate → pack → generate → cite-validate"
status: completed
depends_on: [p3-4, p4-2]
unblocks: [p5-1]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§0 Q4 refusal (two-layer), §0 Q7 footer, §1.11.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 [rag], §10 errors]
---
@@ -22,11 +22,11 @@ This is the user-facing payoff. Splitting it further would couple too many inter
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-search` (Retriever trait object)
- `kb-llm` (LanguageModel trait object)
- `kb-store-sqlite` (read chunk full text/section + write `answers` row)
- `kebab-core`
- `kebab-config`
- `kebab-search` (Retriever trait object)
- `kebab-llm` (LanguageModel trait object)
- `kebab-store-sqlite` (read chunk full text/section + write `answers` row)
- `serde`, `serde_json`
- `regex` (for citation marker extraction)
- `time`
@@ -35,51 +35,51 @@ This is the user-facing payoff. Splitting it further would couple too many inter
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector` (only via Retriever trait), `kb-embed*` (only via Retriever), `kb-llm-local` (only via LanguageModel trait), `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector` (only via Retriever trait), `kebab-embed*` (only via Retriever), `kebab-llm-local` (only via LanguageModel trait), `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `query: &str` | text | `kb-app::ask` |
| `query: &str` | text | `kebab-app::ask` |
| `AskOpts` | k, explain, mode, temperature, seed | CLI |
| `dyn Retriever` | hybrid retriever from p3-4 | runtime injection |
| `dyn LanguageModel` | from p4-2 (or mock) | runtime injection |
| `dyn DocumentStore` | for chunk full-text fetch | from p1-6 |
| `kb-config::Config.rag` | `prompt_template_version`, `score_gate`, `max_context_tokens` | runtime |
| `kebab-config::Config.rag` | `prompt_template_version`, `score_gate`, `max_context_tokens` | runtime |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Answer` | `kb_core::Answer` | `kb-cli` printer, `answers` table |
| `Answer` | `kebab_core::Answer` | `kebab-cli` printer, `answers` table |
| `answers` table row | SQLite | history, eval |
## Public surface (signatures only — no new types)
```rust
pub struct RagPipeline {
retriever: std::sync::Arc<dyn kb_core::Retriever>,
llm: std::sync::Arc<dyn kb_core::LanguageModel>,
docs: std::sync::Arc<kb_store_sqlite::SqliteStore>,
config: kb_config::Config,
retriever: std::sync::Arc<dyn kebab_core::Retriever>,
llm: std::sync::Arc<dyn kebab_core::LanguageModel>,
docs: std::sync::Arc<kebab_store_sqlite::SqliteStore>,
config: kebab_config::Config,
}
impl RagPipeline {
pub fn new(
config: kb_config::Config,
retriever: std::sync::Arc<dyn kb_core::Retriever>,
llm: std::sync::Arc<dyn kb_core::LanguageModel>,
docs: std::sync::Arc<kb_store_sqlite::SqliteStore>,
config: kebab_config::Config,
retriever: std::sync::Arc<dyn kebab_core::Retriever>,
llm: std::sync::Arc<dyn kebab_core::LanguageModel>,
docs: std::sync::Arc<kebab_store_sqlite::SqliteStore>,
) -> Self;
pub fn ask(&self, query: &str, opts: AskOpts) -> anyhow::Result<kb_core::Answer>;
pub fn ask(&self, query: &str, opts: AskOpts) -> anyhow::Result<kebab_core::Answer>;
}
pub struct AskOpts {
pub k: usize,
pub explain: bool,
pub mode: kb_core::SearchMode,
pub mode: kebab_core::SearchMode,
pub temperature: Option<f32>,
pub seed: Option<u64>,
pub stream_sink: Option<std::sync::mpsc::Sender<String>>, // tty/UI token streaming
@@ -158,12 +158,12 @@ pub struct AskOpts {
| determinism | identical inputs + temperature=0 + seed=0 → identical Answer (snapshot) | mock |
| snapshot | `Answer` JSON for fixed query stable | `fixtures/rag/run-1.json` |
All tests under `cargo test -p kb-rag` with no real Ollama (mock LM only).
All tests under `cargo test -p kebab-rag` with no real Ollama (mock LM only).
## Definition of Done
- [ ] `cargo check -p kb-rag` passes
- [ ] `cargo test -p kb-rag` passes
- [ ] `cargo check -p kebab-rag` passes
- [ ] `cargo test -p kebab-rag` passes
- [ ] No imports outside Allowed dependencies
- [ ] All paths write an `answers` row
- [ ] Output JSON conforms to `answer.v1`
@@ -178,9 +178,9 @@ All tests under `cargo test -p kb-rag` with no real Ollama (mock LM only).
## Risks / notes
- Citation regex is STRICT `\[#(\d{1,3})\]` only. Models that emit `[1]`/`[ #1 ]`/`[foo]` are treated as no-marker → refusal. This is intentional: a noisy citation grammar lets prose `[1]` or `vec![1]` slip through as false positives, which corrupts both `grounded` and `kb eval` `citation_coverage`. The prompt template (`rag-v1`) explicitly instructs `[#번호]`.
- **Post-merge fix (2026-05)**: kb-cli's `Cmd::Ask` arm originally called bare `kb_app::ask(query, opts)`, ignoring `--config <path>` and silently using XDG defaults (manifested as wrong model id / score_gate / data_dir surfacing in `Answer.retrieval`). Fixed by routing through `kb_app::ask_with_config(cfg, query, opts)`. See [HOTFIXES.md](../HOTFIXES.md).
- **Post-merge fix (2026-05)**: `config.rag.score_gate` default `0.05` was incompatible with hybrid RRF `fusion_score` (raw range `(0, 2/k_rrf]` ≈ `≤ 0.033` at default k_rrf=60), refusing every hybrid `kb ask`. Closed by normalizing RRF in p3-4 to `[0, 1]`. See p3-4 spec + HOTFIXES.md.
- Citation regex is STRICT `\[#(\d{1,3})\]` only. Models that emit `[1]`/`[ #1 ]`/`[foo]` are treated as no-marker → refusal. This is intentional: a noisy citation grammar lets prose `[1]` or `vec![1]` slip through as false positives, which corrupts both `grounded` and `kebab eval` `citation_coverage`. The prompt template (`rag-v1`) explicitly instructs `[#번호]`.
- **Post-merge fix (2026-05)**: kebab-cli's `Cmd::Ask` arm originally called bare `kebab_app::ask(query, opts)`, ignoring `--config <path>` and silently using XDG defaults (manifested as wrong model id / score_gate / data_dir surfacing in `Answer.retrieval`). Fixed by routing through `kebab_app::ask_with_config(cfg, query, opts)`. See [HOTFIXES.md](../HOTFIXES.md).
- **Post-merge fix (2026-05)**: `config.rag.score_gate` default `0.05` was incompatible with hybrid RRF `fusion_score` (raw range `(0, 2/k_rrf]` ≈ `≤ 0.033` at default k_rrf=60), refusing every hybrid `kebab ask`. Closed by normalizing RRF in p3-4 to `[0, 1]`. See p3-4 spec + HOTFIXES.md.
- `stream_sink` channel: pipeline `send`s tokens; if the receiver is dropped (caller cancelled), `SendError` is silently swallowed and generation continues to completion (so the `Answer` row still gets persisted). Pipeline does NOT panic on a dead sink.
- `temperature=0` does not fully eliminate stochasticity in some quantized Ollama models; document this and rely on `must_contain` rule-based metrics in P5 instead of exact match.
- Prompt-injection defense lives entirely in the system prompt; do NOT mutate `[근거]` text. If chunk text contains `<|system|>` or similar tokens, do not strip them — they are inert when wrapped.

View File

@@ -1,12 +1,12 @@
---
phase: P5
component: kb-eval (runner)
component: kebab-eval (runner)
task_id: p5-1
title: "Golden query fixture loader + per-query runner"
status: completed
depends_on: [p4-3]
unblocks: [p5-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5.7 eval_runs/eval_query_results, §6.3 runs_dir, phase epic tasks/phase-5-evaluation.md]
---
@@ -14,7 +14,7 @@ contract_sections: [§5.7 eval_runs/eval_query_results, §6.3 runs_dir, phase ep
## Goal
Load `fixtures/golden_queries.yaml`, run each query through `kb-app` (lexical / vector / hybrid / rag), and persist results into `eval_query_results` + `runs_dir/<run_id>/per_query.jsonl`.
Load `fixtures/golden_queries.yaml`, run each query through `kebab-app` (lexical / vector / hybrid / rag), and persist results into `eval_query_results` + `runs_dir/<run_id>/per_query.jsonl`.
## Why now / why this size
@@ -22,10 +22,10 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-app` (calls facade for search / ask)
- `kb-store-sqlite` (writes eval rows)
- `kebab-core`
- `kebab-config`
- `kebab-app` (calls facade for search / ask)
- `kebab-store-sqlite` (writes eval rows)
- `serde`, `serde_yaml`, `serde_json`
- `time`
- `tracing`
@@ -33,7 +33,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (all reached via `kb-app` facade only), `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (all reached via `kebab-app` facade only), `kebab-tui`, `kebab-desktop`
## Inputs
@@ -41,7 +41,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
|-------|------|--------|
| `fixtures/golden_queries.yaml` | YAML | repo-shipped |
| `EvalRunOpts` | suite, mode, with_rag, k, temperature, seed | CLI |
| `kb-app` facade | search/ask | runtime |
| `kebab-app` facade | search/ask | runtime |
## Outputs
@@ -50,7 +50,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
| `eval_runs` row | SQLite | p5-2, history |
| `eval_query_results` rows | SQLite | p5-2 |
| `runs_dir/<run_id>/per_query.jsonl` | filesystem | external tools, audits |
| `EvalRun` struct | `kb_eval::EvalRun` | caller |
| `EvalRun` struct | `kebab_eval::EvalRun` | caller |
## Public surface (signatures only — no new types)
@@ -58,9 +58,9 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
pub struct GoldenQuery {
pub id: String,
pub query: String,
pub lang: kb_core::Lang,
pub expected_doc_ids: Vec<kb_core::DocumentId>,
pub expected_chunk_ids: Vec<kb_core::ChunkId>,
pub lang: kebab_core::Lang,
pub expected_doc_ids: Vec<kebab_core::DocumentId>,
pub expected_chunk_ids: Vec<kebab_core::ChunkId>,
pub must_contain: Vec<String>,
pub forbidden: Vec<String>,
pub difficulty: Option<String>,
@@ -68,7 +68,7 @@ pub struct GoldenQuery {
pub struct EvalRunOpts {
pub suite: String, // "golden" default
pub mode: kb_core::SearchMode,
pub mode: kebab_core::SearchMode,
pub with_rag: bool,
pub k: usize,
pub temperature: Option<f32>,
@@ -86,9 +86,9 @@ pub struct EvalRun {
pub struct QueryResult {
pub query_id: String,
pub query: String,
pub mode: kb_core::SearchMode,
pub hits_top_k: Vec<kb_core::SearchHit>,
pub answer: Option<kb_core::Answer>,
pub mode: kebab_core::SearchMode,
pub hits_top_k: Vec<kebab_core::SearchHit>,
pub answer: Option<kebab_core::Answer>,
pub elapsed_ms: u32,
pub error: Option<String>,
}
@@ -103,10 +103,10 @@ pub fn run_eval(opts: &EvalRunOpts) -> anyhow::Result<EvalRun>;
- Parses YAML; required fields: `id`, `query`. Optional: everything else (defaults to empty / `None`).
- Validates uniqueness of `id` and that `expected_doc_ids` / `expected_chunk_ids` exist in DB; missing → return error listing the offenders.
- `run_eval`:
- Loads `fixtures/golden_queries.yaml` (path overridable via env `KB_EVAL_GOLDEN`).
- Loads `fixtures/golden_queries.yaml` (path overridable via env `KEBAB_EVAL_GOLDEN`).
- Generates `run_id = "run_" + ulid_lower()`.
- Captures `config_snapshot_json`: serialized `kb_config::Config` plus `chunker_version`, `embedding_model+version+dims`, `llm.model_id`, `prompt_template_version`, `score_gate`, `rrf_k`, `index_version`.
- For each query: call `kb_app::search(SearchQuery { mode: opts.mode, k: opts.k, .. })`. If `opts.with_rag`, also call `kb_app::ask(query, AskOpts { mode: opts.mode, k: opts.k, explain: true, temperature: opts.temperature, seed: opts.seed, .. })`.
- Captures `config_snapshot_json`: serialized `kebab_config::Config` plus `chunker_version`, `embedding_model+version+dims`, `llm.model_id`, `prompt_template_version`, `score_gate`, `rrf_k`, `index_version`.
- For each query: call `kebab_app::search(SearchQuery { mode: opts.mode, k: opts.k, .. })`. If `opts.with_rag`, also call `kebab_app::ask(query, AskOpts { mode: opts.mode, k: opts.k, explain: true, temperature: opts.temperature, seed: opts.seed, .. })`.
- Each `QueryResult` measured by elapsed wall-clock (ms).
- Errors are caught per-query (do not abort the run). Failed queries record `error: Some(msg)` and `hits_top_k = vec![]`.
- Determinism: with `temperature=0` and fixed `seed`, two consecutive runs produce byte-identical `per_query.jsonl` for non-RAG queries; RAG queries may differ in negligible token budget telemetry.
@@ -130,12 +130,12 @@ pub fn run_eval(opts: &EvalRunOpts) -> anyhow::Result<EvalRun>;
| determinism | re-running same suite + fixed seed → identical `per_query.jsonl` (lexical only) | tmp DB, fixed corpus |
| snapshot | `EvalRun` (with mock LM for `with_rag`) JSON stable | `fixtures/eval/run-1.json` |
All tests under `cargo test -p kb-eval runner`.
All tests under `cargo test -p kebab-eval runner`.
## Definition of Done
- [ ] `cargo check -p kb-eval` passes
- [ ] `cargo test -p kb-eval runner` passes
- [ ] `cargo check -p kebab-eval` passes
- [ ] `cargo test -p kebab-eval runner` passes
- [ ] `fixtures/golden_queries.yaml` template shipped (≥ 5 example entries)
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §5.7

View File

@@ -1,12 +1,12 @@
---
phase: P5
component: kb-eval (metrics + compare)
component: kebab-eval (metrics + compare)
task_id: p5-2
title: "Metrics computation + compare report"
status: completed
depends_on: [p5-1]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5.7 eval_runs.aggregate_json, phase epic tasks/phase-5-evaluation.md]
---
@@ -14,7 +14,7 @@ contract_sections: [§5.7 eval_runs.aggregate_json, phase epic tasks/phase-5-eva
## Goal
Compute hit@k, MRR, recall@k_doc, citation_coverage, groundedness, empty_result_rate, refusal_correctness from stored `eval_query_results`. Write `aggregate_json` back into `eval_runs`. Provide `kb eval compare a b` that diffs two runs.
Compute hit@k, MRR, recall@k_doc, citation_coverage, groundedness, empty_result_rate, refusal_correctness from stored `eval_query_results`. Write `aggregate_json` back into `eval_runs`. Provide `kebab eval compare a b` that diffs two runs.
## Why now / why this size
@@ -22,16 +22,16 @@ Metric formulas + comparison logic are pure computation. Splitting them from p5-
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-store-sqlite` (read eval rows, write `aggregate_json`)
- `kebab-core`
- `kebab-config`
- `kebab-store-sqlite` (read eval rows, write `aggregate_json`)
- `serde`, `serde_json`
- `tracing`
- `thiserror`
## Forbidden dependencies
- `kb-app`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-app`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
@@ -46,7 +46,7 @@ Metric formulas + comparison logic are pure computation. Splitting them from p5-
| output | type | downstream |
|--------|------|------------|
| `eval_runs.aggregate_json` updated | SQLite | history, CI checks |
| `CompareReport` | `kb_eval::CompareReport` | `kb-cli` printer |
| `CompareReport` | `kebab_eval::CompareReport` | `kebab-cli` printer |
| optional `runs_dir/<run_id>/report.md` | filesystem | human-readable summary |
## Public surface (signatures only — no new types)
@@ -127,15 +127,15 @@ pub fn render_report_md(report: &CompareReport) -> String;
| determinism | running `compute_aggregate` twice produces identical `AggregateMetrics` | inline |
| snapshot | `CompareReport` JSON for a fixed pair of runs stable | `fixtures/eval/compare-1.json` |
All tests under `cargo test -p kb-eval metrics`.
All tests under `cargo test -p kebab-eval metrics`.
## Definition of Done
- [ ] `cargo check -p kb-eval` passes
- [ ] `cargo test -p kb-eval metrics` passes
- [ ] `cargo check -p kebab-eval` passes
- [ ] `cargo test -p kebab-eval metrics` passes
- [ ] No imports outside Allowed dependencies
- [ ] `eval_runs.aggregate_json` always populated after `store_aggregate`
- [ ] `kb eval compare` CLI surface integrated via `kb-app` (call `compare_runs` + `render_report_md`)
- [ ] `kebab eval compare` CLI surface integrated via `kebab-app` (call `compare_runs` + `render_report_md`)
- [ ] PR links phase epic tasks/phase-5-evaluation.md
## Out of scope
@@ -172,11 +172,11 @@ same one this spec defines, the names / wiring just differ.
/ `compare_runs(a, b)` so integration tests can drive the pipeline
against a TempDir-backed `Config`. The no-arg forms wrap them with
`Config::load(None)`.
- **CLI surface lives on `kb-cli` directly, not via `kb-app`.** DoD
asks for `kb eval compare` to be reached "via kb-app", but `kb-app`
already depends on `kb-eval` (the P5-1 runner uses the App facade),
so routing the CLI through `kb-app` would form a cycle. `kb-cli`
`kb-eval` is wired directly; `kb-app` is unchanged.
- **CLI surface lives on `kebab-cli` directly, not via `kebab-app`.** DoD
asks for `kebab eval compare` to be reached "via kebab-app", but `kebab-app`
already depends on `kebab-eval` (the P5-1 runner uses the App facade),
so routing the CLI through `kebab-app` would form a cycle. `kebab-cli`
`kebab-eval` is wired directly; `kebab-app` is unchanged.
- **`AggregateMetrics` is `Serialize + Deserialize`.** The spec defines
only the field shape; we add `Deserialize` so the stored
`aggregate_json` can round-trip back into the type for follow-up
@@ -184,12 +184,12 @@ same one this spec defines, the names / wiring just differ.
- **`anyhow`** is used in `Result` returns since the rest of the
workspace already speaks anyhow; not in the spec's Allowed list but
matches every other crate.
- **`kb-eval` crate-level `kb-app` dep stays.** The crate already
depends on `kb-app` from P5-1 (the runner uses the `App` facade), so
- **`kebab-eval` crate-level `kebab-app` dep stays.** The crate already
depends on `kebab-app` from P5-1 (the runner uses the `App` facade), so
the Cargo.toml entry remains. The new modules (`metrics.rs`,
`compare.rs`) do not import `kb-app` themselves — they're behind the
`compare.rs`) do not import `kebab-app` themselves — they're behind the
same crate boundary as the runner, but the metric/compare *surface*
is `kb-app`-clean. Splitting the crate to avoid a transitive Cargo
is `kebab-app`-clean. Splitting the crate to avoid a transitive Cargo
edge would be churn for no behavior gain.
- **`citation_coverage` is intentionally weaker than the spec literal.**
Spec calls for "every citation resolves to a real chunk in the DB".

View File

@@ -1,12 +1,12 @@
---
phase: P6
component: kb-parse-image (image extractor + EXIF)
component: kebab-parse-image (image extractor + EXIF)
task_id: p6-1
title: "Image Extractor producing single-block CanonicalDocument + EXIF metadata"
status: planned
depends_on: [p0-1, p1-6]
unblocks: [p6-2, p6-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 Block::ImageRef + ImageRefBlock, §3.7a OcrText/ModelCaption stubs, §9.1 image extraction policy, §9 versioning]
---
@@ -22,8 +22,8 @@ Establishes the image-as-document contract and decouples extraction (asset → I
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `image = "0.25"` (decoding for size + format detect)
- `kamadak-exif` for EXIF
- `serde`, `serde_json`
@@ -33,31 +33,31 @@ Establishes the image-as-document contract and decouples extraction (asset → I
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, OCR libs, LLM libs
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, OCR libs, LLM libs
## Inputs
| input | type | source |
|-------|------|--------|
| `RawAsset` | `kb_core::RawAsset` | from `kb-source-fs` |
| `RawAsset` | `kebab_core::RawAsset` | from `kebab-source-fs` |
| image bytes | `&[u8]` | filesystem |
| `parser_version` | `kb_core::ParserVersion` | constant in this crate (`"image-meta-v1"`) |
| `parser_version` | `kebab_core::ParserVersion` | constant in this crate (`"image-meta-v1"`) |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (image-region chunker) → `kb-store-sqlite` |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (image-region chunker) → `kebab-store-sqlite` |
## Public surface (signatures only — no new types)
```rust
pub struct ImageExtractor;
impl kb_core::Extractor for ImageExtractor {
fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Image(_)) }
fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("image-meta-v1".into()) }
fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kb_core::CanonicalDocument>;
impl kebab_core::Extractor for ImageExtractor {
fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Image(_)) }
fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("image-meta-v1".into()) }
fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kebab_core::CanonicalDocument>;
}
```
@@ -69,7 +69,7 @@ impl kb_core::Extractor for ImageExtractor {
- `metadata.source_type = SourceType::Reference` (per design enum); `trust_level = TrustLevel::Primary`; `tags`/`aliases` empty.
- `metadata.user["exif"]` = JSON object with whitelisted EXIF tags (DateTimeOriginal, GPS lat/lon, Make, Model, Orientation, Software). Missing tags omitted.
- `metadata.user["dimensions"] = { "w": <u32>, "h": <u32>, "format": "<png|jpeg|...>" }`.
- `provenance` includes `Discovered`, `Parsed` events (no Normalized — ID assignment happens here directly per §3.4 stub from p1-4 logic, OR pipe through `kb-normalize` if available; this task's choice: emit a fully formed CanonicalDocument with deterministic IDs by calling `kb_core::id_for_doc` and `kb_core::id_for_block` directly).
- `provenance` includes `Discovered`, `Parsed` events (no Normalized — ID assignment happens here directly per §3.4 stub from p1-4 logic, OR pipe through `kebab-normalize` if available; this task's choice: emit a fully formed CanonicalDocument with deterministic IDs by calling `kebab_core::id_for_doc` and `kebab_core::id_for_block` directly).
- Failure modes:
- Truncated/corrupt image → still emits a CanonicalDocument with `dimensions = null`, EXIF empty, `Provenance` warning event with the decoder error message.
- Unsupported format → `anyhow::Error` (caller skips).
@@ -77,7 +77,7 @@ impl kb_core::Extractor for ImageExtractor {
## Storage / wire effects
- None directly (the caller persists via `kb-store-sqlite`).
- None directly (the caller persists via `kebab-store-sqlite`).
## Test plan
@@ -90,12 +90,12 @@ impl kb_core::Extractor for ImageExtractor {
| determinism | identical bytes → identical `doc_id`, `block_id` across two runs | inline |
| snapshot | `CanonicalDocument` JSON stable for fixture | `fixtures/image/red-100x50.png` |
All tests under `cargo test -p kb-parse-image`.
All tests under `cargo test -p kebab-parse-image`.
## Definition of Done
- [ ] `cargo check -p kb-parse-image` passes
- [ ] `cargo test -p kb-parse-image` passes
- [ ] `cargo check -p kebab-parse-image` passes
- [ ] `cargo test -p kebab-parse-image` passes
- [ ] No OCR/caption/embedding code present
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.4, §9.1

View File

@@ -1,12 +1,12 @@
---
phase: P6
component: kb-parse-image (OCR adapter)
component: kebab-parse-image (OCR adapter)
task_id: p6-2
title: "OcrEngine trait + Tesseract adapter (Apple Vision feature-gated)"
status: planned
depends_on: [p6-1]
unblocks: [p6-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 ImageRefBlock.ocr, §3.7a OcrText/OcrRegion, §9.1 OCR vs caption provenance]
---
@@ -22,9 +22,9 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-parse-image` (consumes its types)
- `kebab-core`
- `kebab-config`
- `kebab-parse-image` (consumes its types)
- `tesseract = "0.13"` (feature `tesseract`, default ON)
- For feature `apple-vision`: `std::process::Command` only (sidecar binary, not a Rust dep)
- `serde`, `serde_json`
@@ -34,21 +34,21 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| image bytes | `&[u8]` | from extractor |
| optional language hint | `kb_core::Lang` | metadata |
| `kb-config` OCR settings | engine name, languages | runtime |
| optional language hint | `kebab_core::Lang` | metadata |
| `kebab-config` OCR settings | engine name, languages | runtime |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `OcrText` | `kb_core::OcrText` | merged into `ImageRefBlock.ocr` |
| `OcrText` | `kebab_core::OcrText` | merged into `ImageRefBlock.ocr` |
## Public surface (signatures only — no new types)
@@ -56,11 +56,11 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini
pub trait OcrEngine: Send + Sync {
fn engine_name(&self) -> &'static str;
fn engine_version(&self) -> String;
fn recognize(&self, image_bytes: &[u8], lang_hint: Option<&kb_core::Lang>) -> anyhow::Result<kb_core::OcrText>;
fn recognize(&self, image_bytes: &[u8], lang_hint: Option<&kebab_core::Lang>) -> anyhow::Result<kebab_core::OcrText>;
}
pub struct TesseractOcr { /* internal: lazy api handle */ }
impl TesseractOcr { pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>; }
impl TesseractOcr { pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>; }
impl OcrEngine for TesseractOcr { /* per trait */ }
#[cfg(feature = "apple-vision")]
@@ -71,8 +71,8 @@ impl OcrEngine for AppleVisionOcr { /* per trait */ }
pub fn apply_ocr(
engine: &dyn OcrEngine,
image_bytes: &[u8],
block: &mut kb_core::ImageRefBlock,
lang_hint: Option<&kb_core::Lang>,
block: &mut kebab_core::ImageRefBlock,
lang_hint: Option<&kebab_core::Lang>,
) -> anyhow::Result<()>;
```
@@ -85,7 +85,7 @@ pub fn apply_ocr(
- `joined` = `regions.iter().map(|r| r.text).join(" ")` (no smart layout reconstruction in v1).
- `engine = "tesseract"`, `engine_version = <tesseract version string>`. The `tesseract` crate (0.13+) does NOT expose a stable Rust `version()` accessor. Use one of: (a) call libtesseract's `TessVersion()` via the bundled FFI surface, OR (b) at adapter construction, shell-out `tesseract --version` once and cache the parsed `"5.3.4"`-style string. Both are deterministic for a fixed install. Pin the chosen approach in the implementation PR.
- Apple Vision sidecar (feature `apple-vision`):
- Spawn a small Swift binary `kb-vision-ocr` (path from `config.ocr.apple_vision_binary`) feeding the image via stdin and reading JSON `{ regions: [{x,y,w,h,text,confidence}, ...] }` from stdout.
- Spawn a small Swift binary `kebab-vision-ocr` (path from `config.ocr.apple_vision_binary`) feeding the image via stdin and reading JSON `{ regions: [{x,y,w,h,text,confidence}, ...] }` from stdout.
- Same threshold and `joined` rules as Tesseract. `engine = "apple-vision"`, `engine_version = sidecar's --version`.
- This subagent task does NOT write the Swift sidecar; it only wires the Rust side. Document the expected sidecar interface in `docs/spec/sidecar-vision.md` (separate doc spec stub, optional).
- `apply_ocr` calls `engine.recognize`, sets `block.ocr = Some(text)`, and appends a `Provenance::OcrApplied` event in the caller's CanonicalDocument (caller responsibility — this task exposes a helper).
@@ -109,12 +109,12 @@ pub fn apply_ocr(
| determinism | two runs of recognize on same input → identical OcrText | fixture |
| `#[cfg(feature = "apple-vision")]` smoke | sidecar invocation captured (mock binary echoes fixed JSON) | inline mock |
All tests under `cargo test -p kb-parse-image ocr`. Tesseract install required on CI host.
All tests under `cargo test -p kebab-parse-image ocr`. Tesseract install required on CI host.
## Definition of Done
- [ ] `cargo check -p kb-parse-image --features tesseract` passes
- [ ] `cargo test -p kb-parse-image ocr` passes
- [ ] `cargo check -p kebab-parse-image --features tesseract` passes
- [ ] `cargo test -p kebab-parse-image ocr` passes
- [ ] `apple-vision` feature compiles on macOS and gracefully no-ops on Linux
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.4, §3.7a, §9.1
@@ -129,5 +129,5 @@ All tests under `cargo test -p kb-parse-image ocr`. Tesseract install required o
## Risks / notes
- Tesseract performance varies wildly with image quality; document `min_confidence` and default page-segmentation mode.
- Apple Vision sidecar requires code signing for distribution; for v1 dev builds, accept unsigned binary from `~/.local/bin/kb-vision-ocr`.
- Apple Vision sidecar requires code signing for distribution; for v1 dev builds, accept unsigned binary from `~/.local/bin/kebab-vision-ocr`.
- Large image downscale loses small-text recognition; expose `config.ocr.max_pixels` so power users can tune.

View File

@@ -1,12 +1,12 @@
---
phase: P6
component: kb-parse-image (caption adapter)
component: kebab-parse-image (caption adapter)
task_id: p6-3
title: "ModelCaption adapter (LanguageModel-driven, feature-gated)"
status: planned
depends_on: [p6-1, p4-2]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 ImageRefBlock.caption, §3.7a ModelCaption, §9.1 caption (model-generated, low trust)]
---
@@ -22,10 +22,10 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-parse-image`
- `kb-llm` (LanguageModel trait)
- `kebab-core`
- `kebab-config`
- `kebab-parse-image`
- `kebab-llm` (LanguageModel trait)
- `base64`
- `serde`, `serde_json`
- `image` (resize for prompt cost control)
@@ -34,7 +34,7 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-llm-local` (only via trait), `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-llm-local` (only via trait), `kebab-tui`, `kebab-desktop`
## Inputs
@@ -42,28 +42,28 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le
|-------|------|--------|
| image bytes | `&[u8]` | extractor |
| `dyn LanguageModel` (vision-capable) | runtime | injected |
| `kb-config.image.caption` | `{ enabled, max_pixels, prompt_template_version }` | runtime |
| `kebab-config.image.caption` | `{ enabled, max_pixels, prompt_template_version }` | runtime |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `ModelCaption` | `kb_core::ModelCaption` | merged into `ImageRefBlock.caption` |
| `ModelCaption` | `kebab_core::ModelCaption` | merged into `ImageRefBlock.caption` |
## Public surface (signatures only — no new types)
```rust
pub fn caption_image(
llm: &dyn kb_core::LanguageModel,
llm: &dyn kebab_core::LanguageModel,
image_bytes: &[u8],
cfg: &kb_config::Config,
) -> anyhow::Result<kb_core::ModelCaption>;
cfg: &kebab_config::Config,
) -> anyhow::Result<kebab_core::ModelCaption>;
pub fn apply_caption(
llm: &dyn kb_core::LanguageModel,
llm: &dyn kebab_core::LanguageModel,
image_bytes: &[u8],
block: &mut kb_core::ImageRefBlock,
cfg: &kb_config::Config,
block: &mut kebab_core::ImageRefBlock,
cfg: &kebab_config::Config,
) -> anyhow::Result<()>;
```
@@ -86,7 +86,7 @@ pub fn apply_caption(
## Storage / wire effects
- None directly. Caller persists via `kb-store-sqlite`.
- None directly. Caller persists via `kebab-store-sqlite`.
## Test plan
@@ -99,12 +99,12 @@ pub fn apply_caption(
| unit | downscale honors `max_pixels` (resulting bytes < some threshold) | fixture large image |
| determinism | identical input + temperature=0 + seed=0 → identical caption (mock) | inline |
All tests under `cargo test -p kb-parse-image caption` with mock LM only.
All tests under `cargo test -p kebab-parse-image caption` with mock LM only.
## Definition of Done
- [ ] `cargo check -p kb-parse-image --features caption` passes
- [ ] `cargo test -p kb-parse-image caption` passes
- [ ] `cargo check -p kebab-parse-image --features caption` passes
- [ ] `cargo test -p kebab-parse-image caption` passes
- [ ] No imports outside Allowed dependencies
- [ ] Feature default OFF; only on when user opts in via config
- [ ] PR links design §3.4 ImageRefBlock.caption, §9.1

View File

@@ -1,12 +1,12 @@
---
phase: P7
component: kb-parse-pdf (text extractor)
component: kebab-parse-pdf (text extractor)
task_id: p7-1
title: "Text PDF extractor → CanonicalDocument with page-level blocks"
status: planned
depends_on: [p0-1, p1-6]
unblocks: [p7-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 SourceSpan::Page, §3.4 Block::Paragraph, §9.2 PDF text extraction, §9 versioning]
---
@@ -22,8 +22,8 @@ Strict scope: page text + page numbers. Layout reconstruction (multi-column merg
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `pdf-extract = "0.7"` (or current stable)
- `lopdf = "0.32"` for page metadata (count, optional title from /Info)
- `serde`, `serde_json`
@@ -33,30 +33,30 @@ Strict scope: page text + page numbers. Layout reconstruction (multi-column merg
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, OCR libraries (OCR fallback is a separate task, not this one)
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, OCR libraries (OCR fallback is a separate task, not this one)
## Inputs
| input | type | source |
|-------|------|--------|
| `RawAsset` | `kb_core::RawAsset` | `kb-source-fs` |
| `RawAsset` | `kebab_core::RawAsset` | `kebab-source-fs` |
| PDF bytes | `&[u8]` | filesystem |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (`pdf-page-v1` chunker in p7-2) |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (`pdf-page-v1` chunker in p7-2) |
## Public surface (signatures only — no new types)
```rust
pub struct PdfTextExtractor;
impl kb_core::Extractor for PdfTextExtractor {
fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Pdf) }
fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("pdf-text-v1".into()) }
fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kb_core::CanonicalDocument>;
impl kebab_core::Extractor for PdfTextExtractor {
fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Pdf) }
fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("pdf-text-v1".into()) }
fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kebab_core::CanonicalDocument>;
}
```
@@ -100,12 +100,12 @@ impl kb_core::Extractor for PdfTextExtractor {
| determinism | identical bytes → identical CanonicalDocument JSON across two runs | inline |
| snapshot | CanonicalDocument JSON for fixture stable | `fixtures/pdf/three-page-en.pdf` |
All tests under `cargo test -p kb-parse-pdf`.
All tests under `cargo test -p kebab-parse-pdf`.
## Definition of Done
- [ ] `cargo check -p kb-parse-pdf` passes
- [ ] `cargo test -p kb-parse-pdf` passes
- [ ] `cargo check -p kebab-parse-pdf` passes
- [ ] `cargo test -p kebab-parse-pdf` passes
- [ ] No OCR / LLM code present
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.4 SourceSpan::Page, §9.2

View File

@@ -1,12 +1,12 @@
---
phase: P7
component: kb-chunk (pdf-page-v1)
component: kebab-chunk (pdf-page-v1)
task_id: p7-2
title: "PDF page-aware chunker (pdf-page-v1)"
status: planned
depends_on: [p7-1]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §0 Q3 citation, §9 versioning]
---
@@ -22,8 +22,8 @@ Per-medium chunkers must stay tiny and obvious. Page-aware logic is small but it
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `serde`, `serde_json`
- `blake3` (policy_hash)
- `serde-json-canonicalizer`
@@ -31,30 +31,30 @@ Per-medium chunkers must stay tiny and obvious. Page-aware logic is small but it
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf` (consumes `CanonicalDocument` via `kb-core` only), `kb-normalize`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf` (consumes `CanonicalDocument` via `kebab-core` only), `kebab-normalize`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `CanonicalDocument` (produced by `pdf-text-v1`) | `kb_core::CanonicalDocument` | p7-1 |
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` |
| `CanonicalDocument` (produced by `pdf-text-v1`) | `kebab_core::CanonicalDocument` | p7-1 |
| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite`, `kb-embed*` |
| `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite`, `kebab-embed*` |
## Public surface (signatures only — no new types)
```rust
pub struct PdfPageV1Chunker;
impl kb_core::Chunker for PdfPageV1Chunker {
fn chunker_version(&self) -> kb_core::ChunkerVersion { kb_core::ChunkerVersion("pdf-page-v1".into()) }
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>;
impl kebab_core::Chunker for PdfPageV1Chunker {
fn chunker_version(&self) -> kebab_core::ChunkerVersion { kebab_core::ChunkerVersion("pdf-page-v1".into()) }
fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
}
```
@@ -62,7 +62,7 @@ impl kb_core::Chunker for PdfPageV1Chunker {
## Behavior contract
- Only operates on documents whose blocks all carry `SourceSpan::Page` (i.e., from `kb-parse-pdf`). Other documents → return `anyhow::Error("PdfPageV1Chunker only handles PDF docs")`.
- Only operates on documents whose blocks all carry `SourceSpan::Page` (i.e., from `kebab-parse-pdf`). Other documents → return `anyhow::Error("PdfPageV1Chunker only handles PDF docs")`.
- For each page block (1 block per page after p7-1):
- If `text.len()` (byte estimate) ≤ `policy.target_tokens * 4` (proxy for tokens) → emit one chunk for the entire page.
- Else → split by paragraphs (split text on `\n\n` or sentence-ending punctuation followed by whitespace) and group adjacent paragraphs until the running byte total approaches `policy.target_tokens * 4`. Apply `policy.overlap_tokens * 4` bytes of trailing overlap into the next chunk's prefix.
@@ -91,12 +91,12 @@ impl kb_core::Chunker for PdfPageV1Chunker {
| determinism | same input → same chunk_ids twice | inline |
| snapshot | `Vec<Chunk>` JSON for fixture stable | `fixtures/pdf/three-page-en.pdf` (chunked) |
All tests under `cargo test -p kb-chunk pdf`.
All tests under `cargo test -p kebab-chunk pdf`.
## Definition of Done
- [ ] `cargo check -p kb-chunk` passes (existing `md-heading-v1` continues to pass)
- [ ] `cargo test -p kb-chunk pdf` passes
- [ ] `cargo check -p kebab-chunk` passes (existing `md-heading-v1` continues to pass)
- [ ] `cargo test -p kebab-chunk pdf` passes
- [ ] Snapshot stable across two runs
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.5, §0 Q3, §9

View File

@@ -1,12 +1,12 @@
---
phase: P8
component: kb-parse-audio (whisper adapter)
component: kebab-parse-audio (whisper adapter)
task_id: p8-1
title: "Audio Extractor + Transcriber trait + whisper.cpp adapter"
status: planned
depends_on: [p0-1, p1-6]
unblocks: [p8-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 Block::AudioRef + AudioRefBlock, §3.7a Transcript + TranscriptSegment, §9.3 audio policy, §9 versioning]
---
@@ -22,8 +22,8 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `whisper-rs = "0.13"` (or current stable)
- `symphonia = { version = "0.5", features = ["all"] }` — decode `.m4a/.mp3/.wav/.flac/.ogg` to interleaved f32 PCM at the source's native sample rate / channel layout. Symphonia does NOT resample; that is rubato's job.
- `rubato = "0.15"` — sample-rate conversion to 16 kHz mono f32 (the input shape whisper.cpp expects). Use `rubato::FftFixedIn::new(input_sample_rate, 16_000, frames_per_chunk, sub_chunks, 1 /* channels after downmix */)` for fixed-input streaming; pre-mix multi-channel to mono via simple averaging before the resampler.
@@ -34,21 +34,21 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf`, `kb-parse-image`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf`, `kebab-parse-image`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `RawAsset` | `kb_core::RawAsset` | `kb-source-fs` |
| `RawAsset` | `kebab_core::RawAsset` | `kebab-source-fs` |
| audio bytes | `&[u8]` | filesystem |
| `kb-config.audio` | `{ model_path, language, chunk_seconds, n_threads, gpu }` | runtime |
| `kebab-config.audio` | `{ model_path, language, chunk_seconds, n_threads, gpu }` | runtime |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (`audio-segment-v1` chunker in p8-2) |
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (`audio-segment-v1` chunker in p8-2) |
## Public surface (signatures only — no new types)
@@ -56,19 +56,19 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor
pub trait Transcriber: Send + Sync {
fn engine(&self) -> &'static str;
fn engine_version(&self) -> String;
fn transcribe(&self, pcm_f32_16khz: &[f32], language_hint: Option<&kb_core::Lang>) -> anyhow::Result<kb_core::Transcript>;
fn transcribe(&self, pcm_f32_16khz: &[f32], language_hint: Option<&kebab_core::Lang>) -> anyhow::Result<kebab_core::Transcript>;
}
pub struct WhisperCppTranscriber { /* internal: whisper_rs::WhisperContext */ }
impl WhisperCppTranscriber { pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>; }
impl WhisperCppTranscriber { pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>; }
impl Transcriber for WhisperCppTranscriber { /* per trait */ }
pub struct AudioExtractor { transcriber: std::sync::Arc<dyn Transcriber> }
impl AudioExtractor { pub fn new(transcriber: std::sync::Arc<dyn Transcriber>) -> Self; }
impl kb_core::Extractor for AudioExtractor {
fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Audio(_)) }
fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("audio-whisper-v1".into()) }
fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kb_core::CanonicalDocument>;
impl kebab_core::Extractor for AudioExtractor {
fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Audio(_)) }
fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("audio-whisper-v1".into()) }
fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kebab_core::CanonicalDocument>;
}
```
@@ -87,7 +87,7 @@ impl kb_core::Extractor for AudioExtractor {
- `provenance` events: `Discovered`, `Parsed`, `Transcribed`.
- `block_id` per design §4.2 with `block_kind = "audio_ref"`, `heading_path = []`, `ordinal = 0`, `source_span = SourceSpan::Time { start_ms: 0, end_ms: duration_ms }`.
- `WhisperCppTranscriber`:
- Loads model from `config.audio.model_path` (e.g., `~/.local/share/kb/models/whisper/ggml-large-v3.bin`).
- Loads model from `config.audio.model_path` (e.g., `~/.local/share/kebab/models/whisper/ggml-large-v3.bin`).
- Runs with `WhisperFullParams::new(SamplingStrategy::Greedy { best_of: 1 })` — deterministic.
- Streams in chunks of `config.audio.chunk_seconds` (default 30) to bound memory; aggregates segments.
- `Transcript.segments` populated with `start_ms`, `end_ms`, `text`, `confidence: Some(p)` from whisper's per-token probabilities (averaged), `speaker: None` (diarization is P+).
@@ -115,12 +115,12 @@ impl kb_core::Extractor for AudioExtractor {
| `#[ignore]` integration | 30-second Korean audio → segments_count > 1, language = "ko" | requires `large-v3` model |
| snapshot | CanonicalDocument JSON stable for short fixture | `fixtures/audio/hello.wav` |
All tests under `cargo test -p kb-parse-audio`. Mark slow/large-model tests `#[ignore]`.
All tests under `cargo test -p kebab-parse-audio`. Mark slow/large-model tests `#[ignore]`.
## Definition of Done
- [ ] `cargo check -p kb-parse-audio` passes
- [ ] `cargo test -p kb-parse-audio` passes (excluding `#[ignore]`)
- [ ] `cargo check -p kebab-parse-audio` passes
- [ ] `cargo test -p kebab-parse-audio` passes (excluding `#[ignore]`)
- [ ] No imports outside Allowed dependencies (resampler crate may be added — record in PR)
- [ ] First-run model download path documented (NOT performed by code; user responsibility)
- [ ] PR links design §3.4, §3.7a, §9.3

View File

@@ -1,12 +1,12 @@
---
phase: P8
component: kb-chunk (audio-segment-v1)
component: kebab-chunk (audio-segment-v1)
task_id: p8-2
title: "Audio segment chunker (audio-segment-v1)"
status: planned
depends_on: [p8-1]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.5 Chunk, §3.4 SourceSpan::Time, §4.2 chunk_id recipe, §0 Q3 citation, §9 versioning]
---
@@ -22,8 +22,8 @@ Per-medium chunker. Tiny but versioned — `chunk_id` depends on `chunker_versio
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kebab-core`
- `kebab-config`
- `serde`, `serde_json`
- `blake3` (policy_hash)
- `serde-json-canonicalizer`
@@ -31,30 +31,30 @@ Per-medium chunker. Tiny but versioned — `chunk_id` depends on `chunker_versio
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio` (consumes via `kb-core` only), `kb-normalize`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio` (consumes via `kebab-core` only), `kebab-normalize`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `CanonicalDocument` containing one `AudioRefBlock` with `Transcript` | `kb_core::CanonicalDocument` | p8-1 |
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` |
| `CanonicalDocument` containing one `AudioRefBlock` with `Transcript` | `kebab_core::CanonicalDocument` | p8-1 |
| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` |
## Outputs
| output | type | downstream |
|--------|------|------------|
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite`, embedders |
| `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite`, embedders |
## Public surface (signatures only — no new types)
```rust
pub struct AudioSegmentV1Chunker;
impl kb_core::Chunker for AudioSegmentV1Chunker {
fn chunker_version(&self) -> kb_core::ChunkerVersion { kb_core::ChunkerVersion("audio-segment-v1".into()) }
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>;
impl kebab_core::Chunker for AudioSegmentV1Chunker {
fn chunker_version(&self) -> kebab_core::ChunkerVersion { kebab_core::ChunkerVersion("audio-segment-v1".into()) }
fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
}
```
@@ -92,12 +92,12 @@ impl kb_core::Chunker for AudioSegmentV1Chunker {
| determinism | same input → same chunk_ids twice | inline |
| snapshot | `Vec<Chunk>` JSON for fixture transcript stable | `fixtures/audio/transcript-1.json` (constructed) |
All tests under `cargo test -p kb-chunk audio`.
All tests under `cargo test -p kebab-chunk audio`.
## Definition of Done
- [ ] `cargo check -p kb-chunk` passes (md-heading-v1 + pdf-page-v1 + audio-segment-v1 all coexist)
- [ ] `cargo test -p kb-chunk audio` passes
- [ ] `cargo check -p kebab-chunk` passes (md-heading-v1 + pdf-page-v1 + audio-segment-v1 all coexist)
- [ ] `cargo test -p kebab-chunk audio` passes
- [ ] Snapshot stable across two runs
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.5, §3.4 SourceSpan::Time, §4.2

View File

@@ -1,12 +1,12 @@
---
phase: P9
component: kb-tui (library view)
component: kebab-tui (library view)
task_id: p9-1
title: "Ratatui library list view + tag filter"
status: planned
depends_on: [p1-6]
unblocks: [p9-2, p9-3, p9-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [report §16.2 TUI (also tasks/phase-9-ui.md epic), design §3.7 SearchHit, design §1 UX scenes for shared key bindings]
---
@@ -14,7 +14,7 @@ contract_sections: [report §16.2 TUI (also tasks/phase-9-ui.md epic), design §
## Goal
Stand up a Ratatui app skeleton with a "Library" pane: list documents, filter by tag/lang, navigate. Establishes the global app loop, key dispatch, and `kb-app` integration point that the search/ask/inspect panes (p9-2..p9-4) extend.
Stand up a Ratatui app skeleton with a "Library" pane: list documents, filter by tag/lang, navigate. Establishes the global app loop, key dispatch, and `kebab-app` integration point that the search/ask/inspect panes (p9-2..p9-4) extend.
## Why now / why this size
@@ -22,9 +22,9 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-app` (facade — the only crate this binary touches besides `kb-core`/`kb-config`)
- `kebab-core`
- `kebab-config`
- `kebab-app` (facade — the only crate this binary touches besides `kebab-core`/`kebab-config`)
- `ratatui = "0.28"`
- `crossterm`
- `tracing`
@@ -32,15 +32,15 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (UI must go through `kb-app` only — this is the design §8 boundary)
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (UI must go through `kebab-app` only — this is the design §8 boundary)
## Inputs
| input | type | source |
|-------|------|--------|
| `kb-app::list_docs(filter)` | facade call | runtime |
| `kebab-app::list_docs(filter)` | facade call | runtime |
| keyboard events | `crossterm` | terminal |
| `kb-config::Config` | runtime | env / file |
| `kebab-config::Config` | runtime | env / file |
## Outputs
@@ -57,7 +57,7 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ
// state in WITHOUT modifying the App struct definition. This avoids merge
// conflicts when p9-2/3/4 land in parallel; only p9-1 ever changes `App`.
pub struct App {
pub config: kb_config::Config,
pub config: kebab_config::Config,
pub focus: Pane,
pub library: LibraryState, // owned by p9-1
pub search: Option<SearchState>, // populated by p9-2 (None until that crate links in)
@@ -73,7 +73,7 @@ pub struct AskState; // body filled by p9-3
pub struct InspectState; // body filled by p9-4
impl App {
pub fn new(config: kb_config::Config) -> anyhow::Result<Self>;
pub fn new(config: kebab_config::Config) -> anyhow::Result<Self>;
pub fn run(&mut self) -> anyhow::Result<()>; // blocking loop until quit
}
@@ -102,12 +102,12 @@ pub enum KeyOutcome { Continue, Quit, SwitchPane(Pane), Refresh }
- `Enter` → switch to Inspect pane (p9-4) on selected doc
- `q` or `Esc` → quit
- All facade calls run on the main thread (no async). For long calls, render a "loading…" state and call from a worker thread; bridge via `mpsc::channel` (this task may keep things synchronous and accept brief UI hangs for v1).
- Logging: `tracing` initialized to a file under `~/.local/state/kb/logs/`; never to stdout/stderr (so the TUI is not corrupted).
- Logging: `tracing` initialized to a file under `~/.local/state/kebab/logs/`; never to stdout/stderr (so the TUI is not corrupted).
- Error rendering: a popup overlay shows `error: {msg}\nhint: {hint}` from `anyhow::Error` chain; press any key to dismiss.
## Storage / wire effects
- Reads: `kb-app::list_docs` only.
- Reads: `kebab-app::list_docs` only.
- Writes: none.
## Test plan
@@ -118,16 +118,16 @@ pub enum KeyOutcome { Continue, Quit, SwitchPane(Pane), Refresh }
| unit | filter `f` opens edit overlay; `Enter` triggers refresh | inline |
| snapshot | rendered library with 3 docs + filter open produces stable frame buffer (use `ratatui::backend::TestBackend`) | inline |
| unit | error popup renders without panic on injected `anyhow::Error` | inline |
| integration | mocked `kb-app::list_docs` returning N docs renders all rows | inline |
| integration | mocked `kebab-app::list_docs` returning N docs renders all rows | inline |
All tests under `cargo test -p kb-tui library`.
All tests under `cargo test -p kebab-tui library`.
## Definition of Done
- [ ] `cargo check -p kb-tui` passes
- [ ] `cargo test -p kb-tui library` passes
- [ ] No imports outside `kb-core`, `kb-config`, `kb-app`
- [ ] `kb tui` (or `kb` if TUI is the default) launches and shows Library on a real terminal (manual smoke)
- [ ] `cargo check -p kebab-tui` passes
- [ ] `cargo test -p kebab-tui library` passes
- [ ] No imports outside `kebab-core`, `kebab-config`, `kebab-app`
- [ ] `kebab tui` (or `kebab` if TUI is the default) launches and shows Library on a real terminal (manual smoke)
- [ ] PR links design §8 module boundary, report §16.2 (TUI epic)
## Out of scope

View File

@@ -1,12 +1,12 @@
---
phase: P9
component: kb-tui (search pane)
component: kebab-tui (search pane)
task_id: p9-2
title: "TUI Search pane: input + result list + preview + editor jump"
status: planned
depends_on: [p2-2, p3-4, p9-1]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§1.5/1.6 search output, §3.7 SearchHit, §0 Q3 citation]
---
@@ -14,7 +14,7 @@ contract_sections: [§1.5/1.6 search output, §3.7 SearchHit, §0 Q3 citation]
## Goal
Add a Search pane to the TUI that drives `kb-app::search`, renders dense results (rank+score / path#frag / heading / snippet), and supports `g` (editor jump to citation) for the selected hit.
Add a Search pane to the TUI that drives `kebab-app::search`, renders dense results (rank+score / path#frag / heading / snippet), and supports `g` (editor jump to citation) for the selected hit.
## Why now / why this size
@@ -22,25 +22,25 @@ Search is the most-used surface. Confining it to one pane leverages the App skel
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-app`
- `kb-tui` (extends p9-1)
- `kebab-core`
- `kebab-config`
- `kebab-app`
- `kebab-tui` (extends p9-1)
- `ratatui`, `crossterm`
- `tracing`
- `thiserror`
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-desktop`
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `kb-app::search(query)` | facade | runtime |
| `kebab-app::search(query)` | facade | runtime |
| keyboard events | `crossterm` | terminal |
| selected hit's citation | `kb_core::Citation` | App state |
| selected hit's citation | `kebab_core::Citation` | App state |
## Outputs
@@ -54,16 +54,16 @@ Search is the most-used surface. Confining it to one pane leverages the App skel
```rust
pub fn render_search<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ratatui::layout::Rect, state: &App);
pub fn handle_key_search(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome;
pub fn jump_to_citation(citation: &kb_core::Citation, editor_env: &str /* $EDITOR */) -> anyhow::Result<()>;
pub fn jump_to_citation(citation: &kebab_core::Citation, editor_env: &str /* $EDITOR */) -> anyhow::Result<()>;
```
This task fills the body of `kb_tui::SearchState` (forward-declared in p9-1). The `App` struct itself is NOT edited — only `SearchState` gets fields:
This task fills the body of `kebab_tui::SearchState` (forward-declared in p9-1). The `App` struct itself is NOT edited — only `SearchState` gets fields:
```rust
pub struct SearchState {
pub input: String,
pub mode: kb_core::SearchMode,
pub hits: Vec<kb_core::SearchHit>,
pub mode: kebab_core::SearchMode,
pub hits: Vec<kebab_core::SearchHit>,
pub selected_hit: usize,
pub last_query_at: Option<time::OffsetDateTime>, // debounce timer
}
@@ -73,7 +73,7 @@ The Library pane's keypress handler (in p9-1) sets `app.search = Some(SearchStat
## Behavior contract
- Layout: top input bar (search query + mode badge `[hybrid|lexical|vector]`), middle result list (one hit per 4 lines per design §1.5 dense format), bottom preview pane (full chunk text fetched lazily via `kb-app::inspect_chunk`).
- Layout: top input bar (search query + mode badge `[hybrid|lexical|vector]`), middle result list (one hit per 4 lines per design §1.5 dense format), bottom preview pane (full chunk text fetched lazily via `kebab-app::inspect_chunk`).
- Key bindings (Search pane):
- typing → updates `search_input`; debounced (200 ms) re-search
- `Tab` → cycles `search_mode` Lexical → Vector → Hybrid → Lexical
@@ -104,14 +104,14 @@ The Library pane's keypress handler (in p9-1) sets `app.search = Some(SearchStat
| unit | `j` / `k` move selection within bounds | inline |
| unit | `jump_to_citation` for `Line` builds `+<line> <path>` command (assert via mocked Command runner) | inline |
| snapshot | rendered Search pane with 3 hits + preview stable | TestBackend |
| integration | mocked `kb-app::search` returning fixture hits drives render | inline |
| integration | mocked `kebab-app::search` returning fixture hits drives render | inline |
All tests under `cargo test -p kb-tui search`.
All tests under `cargo test -p kebab-tui search`.
## Definition of Done
- [ ] `cargo check -p kb-tui` passes
- [ ] `cargo test -p kb-tui search` passes
- [ ] `cargo check -p kebab-tui` passes
- [ ] `cargo test -p kebab-tui search` passes
- [ ] `g` keybinding launches `$EDITOR` with correct `+<line>` argument (manual smoke against vim)
- [ ] No imports outside Allowed dependencies
- [ ] PR links design §1.5/1.6, §3.7
@@ -125,5 +125,5 @@ All tests under `cargo test -p kb-tui search`.
## Risks / notes
- Suspending and restoring crossterm raw mode around the editor spawn is finicky; code defensively (RAII guard).
- Different editors take different jump syntaxes. Provide an env override `KB_EDITOR_JUMP_FORMAT="vim"` for users on exotic editors.
- Different editors take different jump syntaxes. Provide an env override `KEBAB_EDITOR_JUMP_FORMAT="vim"` for users on exotic editors.
- Long snippet text wrap: clamp to viewport width and ellipsize per design §1.5 (`…` already in dense template).

View File

@@ -1,12 +1,12 @@
---
phase: P9
component: kb-tui (ask pane)
component: kebab-tui (ask pane)
task_id: p9-3
title: "TUI Ask pane: streaming answer + citation links + --explain toggle"
status: planned
depends_on: [p4-3, p9-1]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§1.11.4 ask scenes, §2.3 Answer wire, §3.8 Answer]
---
@@ -14,7 +14,7 @@ contract_sections: [§1.11.4 ask scenes, §2.3 Answer wire, §3.8 Answer]
## Goal
Add an Ask pane that calls `kb-app::ask`, streams tokens into the answer area in real time, renders citation footnotes (default mode A), and toggles to `--explain` (mode B + retrieval trace) with a key.
Add an Ask pane that calls `kebab-app::ask`, streams tokens into the answer area in real time, renders citation footnotes (default mode A), and toggles to `--explain` (mode B + retrieval trace) with a key.
## Why now / why this size
@@ -22,23 +22,23 @@ Streaming UI is the only TUI piece that meaningfully differs from search/inspect
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-app`
- `kb-tui` (extends p9-1)
- `kebab-core`
- `kebab-config`
- `kebab-app`
- `kebab-tui` (extends p9-1)
- `ratatui`, `crossterm`
- `tracing`
- `thiserror`
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (only via `kb-app`), `kb-desktop`
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (only via `kebab-app`), `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `kb-app::ask(query, AskOpts)` | facade | runtime |
| `kebab-app::ask(query, AskOpts)` | facade | runtime |
| keyboard events | `crossterm` | terminal |
## Outputs
@@ -46,7 +46,7 @@ Streaming UI is the only TUI piece that meaningfully differs from search/inspect
| output | type | downstream |
|--------|------|------------|
| Ratatui Ask pane render | terminal | user |
| `kb-app::ask` invocation with streaming closure | facade | RAG pipeline |
| `kebab-app::ask` invocation with streaming closure | facade | RAG pipeline |
## Public surface (signatures only — no new types)
@@ -55,7 +55,7 @@ pub fn render_ask<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ra
pub fn handle_key_ask(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome;
```
This task fills the body of `kb_tui::AskState` (forward-declared in p9-1). `App` is NOT edited — only `AskState` gets fields:
This task fills the body of `kebab_tui::AskState` (forward-declared in p9-1). `App` is NOT edited — only `AskState` gets fields:
```rust
pub struct AskState {
@@ -63,8 +63,8 @@ pub struct AskState {
pub explain: bool,
pub streaming: bool,
pub partial: String,
pub answer: Option<kb_core::Answer>,
pub thread: Option<std::thread::JoinHandle<anyhow::Result<kb_core::Answer>>>,
pub answer: Option<kebab_core::Answer>,
pub thread: Option<std::thread::JoinHandle<anyhow::Result<kebab_core::Answer>>>,
pub rx: Option<std::sync::mpsc::Receiver<String>>,
}
```
@@ -74,7 +74,7 @@ pub struct AskState {
## Behavior contract
- Layout: top input bar (`?` prompt, query text), middle answer area (rendered Markdown-light: paragraphs + inline `[N]` markers), bottom-right citations panel (numbered list of citations with `path#fragment` and section label), bottom-left status (`grounded ✓/✗ model prompt_v k chunks`).
- Submission: `Enter` triggers a worker thread that calls `kb-app::ask` with `AskOpts.stream_sink: Some(tx)` (`tx: mpsc::Sender<String>`). The thread holds the `tx`, the TUI holds the matching `rx` (set on `AskState.rx`). On each render frame the TUI drains `rx.try_iter()` into `state.partial`, no blocking.
- Submission: `Enter` triggers a worker thread that calls `kebab-app::ask` with `AskOpts.stream_sink: Some(tx)` (`tx: mpsc::Sender<String>`). The thread holds the `tx`, the TUI holds the matching `rx` (set on `AskState.rx`). On each render frame the TUI drains `rx.try_iter()` into `state.partial`, no blocking.
- Streaming: while `ask_streaming = true`, the Answer area shows `ask_partial` and a small "▍" cursor. When the worker finishes, `ask_answer` is populated and the citations panel switches to the final list.
- Refusal rendering:
- `grounded = false` and `refusal_reason = ScoreGate` → render the answer (which is the human-friendly "근거 부족…" message), citations show "가까운 후보".
@@ -85,12 +85,12 @@ pub struct AskState {
- `e` → toggle `ask_explain`; resubmit on next `Enter`. While explain ON, citations panel is replaced by the per-claim breakdown (mode B in design §1.2) and a footer shows the retrieval trace summary.
- `Esc` → switch back to Library pane (cancellation of an in-flight ask is best-effort: the worker thread continues but its final answer is dropped).
- `j` / `k` → scroll the answer area when oversized.
- All facade calls stay within `kb-app::ask` — never reach into `kb-rag` directly.
- All facade calls stay within `kebab-app::ask` — never reach into `kebab-rag` directly.
- Errors render as a popup overlay; do not crash the pane.
## Storage / wire effects
- Reads/writes via `kb-app::ask` which itself writes the `answers` row in `kb.sqlite`. The pane has no direct DB access.
- Reads/writes via `kebab-app::ask` which itself writes the `answers` row in `kebab.sqlite`. The pane has no direct DB access.
## Test plan
@@ -102,14 +102,14 @@ pub struct AskState {
| unit | refusal answer renders without citations panel index errors | inline |
| snapshot | rendered Ask pane mid-stream is stable | TestBackend |
| snapshot | rendered Ask pane after finished grounded answer is stable | TestBackend |
| integration | mocked `kb-app::ask` returning a canned `Answer` populates final state correctly | inline |
| integration | mocked `kebab-app::ask` returning a canned `Answer` populates final state correctly | inline |
All tests under `cargo test -p kb-tui ask`.
All tests under `cargo test -p kebab-tui ask`.
## Definition of Done
- [ ] `cargo check -p kb-tui` passes
- [ ] `cargo test -p kb-tui ask` passes
- [ ] `cargo check -p kebab-tui` passes
- [ ] `cargo test -p kebab-tui ask` passes
- [ ] No imports outside Allowed dependencies
- [ ] Manual smoke: stream tokens visible character-by-character against a real Ollama (or `MockLanguageModel`)
- [ ] PR links design §1.11.4, §2.3

View File

@@ -1,12 +1,12 @@
---
phase: P9
component: kb-tui (inspect pane)
component: kebab-tui (inspect pane)
task_id: p9-4
title: "TUI Inspect pane: document & chunk detail render"
status: planned
depends_on: [p1-6, p9-1]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§1 inspect output, §3.5 Chunk, §2.5 DocSummary, §2.6 ChunkInspection]
---
@@ -22,24 +22,24 @@ Inspect is read-only and has no external interactions; smallest possible pane. U
## Allowed dependencies
- `kb-core`
- `kb-config`
- `kb-app`
- `kb-tui` (extends p9-1)
- `kebab-core`
- `kebab-config`
- `kebab-app`
- `kebab-tui` (extends p9-1)
- `ratatui`, `crossterm`
- `tracing`
- `thiserror`
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (only via `kb-app`), `kb-desktop`
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (only via `kebab-app`), `kebab-desktop`
## Inputs
| input | type | source |
|-------|------|--------|
| `kb-app::inspect_doc(id)` | facade | runtime |
| `kb-app::inspect_chunk(id)` | facade | runtime |
| `kebab-app::inspect_doc(id)` | facade | runtime |
| `kebab-app::inspect_chunk(id)` | facade | runtime |
| keyboard events | `crossterm` | terminal |
## Outputs
@@ -51,19 +51,19 @@ Inspect is read-only and has no external interactions; smallest possible pane. U
## Public surface (signatures only — no new types)
```rust
pub enum InspectTarget { Doc(kb_core::DocumentId), Chunk(kb_core::ChunkId) }
pub enum InspectTarget { Doc(kebab_core::DocumentId), Chunk(kebab_core::ChunkId) }
pub fn render_inspect<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ratatui::layout::Rect, state: &App);
pub fn handle_key_inspect(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome;
```
This task fills the body of `kb_tui::InspectState` (forward-declared in p9-1). `App` is NOT edited.
This task fills the body of `kebab_tui::InspectState` (forward-declared in p9-1). `App` is NOT edited.
```rust
pub struct InspectState {
pub target: Option<InspectTarget>,
pub doc: Option<kb_core::CanonicalDocument>,
pub chunk: Option<kb_core::Chunk>,
pub doc: Option<kebab_core::CanonicalDocument>,
pub chunk: Option<kebab_core::Chunk>,
pub collapsed: std::collections::HashSet<&'static str>,
pub scroll: u16,
}
@@ -89,7 +89,7 @@ pub struct InspectState {
- `c` → collapse / expand currently focused section (focus is implicit by current scroll position; v1 may simplify by toggling all sections)
- `Esc` → return to previous pane (Library or Search)
- `Enter` → no-op (Inspect is terminal — no editor jump here; users use Search pane for jump)
- Loading: while `kb-app::inspect_doc` or `inspect_chunk` runs, show "loading…". On error, popup with hint.
- Loading: while `kebab-app::inspect_doc` or `inspect_chunk` runs, show "loading…". On error, popup with hint.
- Renders must conform to wire schemas `doc_summary.v1` (subset for header) and `chunk_inspection.v1`.
## Storage / wire effects
@@ -100,18 +100,18 @@ pub struct InspectState {
| kind | description | fixture / data |
|------|-------------|----------------|
| unit | switching to InspectTarget::Doc triggers `kb-app::inspect_doc` once | inline mock |
| unit | switching to InspectTarget::Doc triggers `kebab-app::inspect_doc` once | inline mock |
| unit | scroll bounded by content height | inline |
| unit | collapse toggle via `c` flips state | inline |
| snapshot | doc-view rendered for fixture stable | TestBackend + fixture |
| snapshot | chunk-view rendered for fixture stable | TestBackend + fixture |
All tests under `cargo test -p kb-tui inspect`.
All tests under `cargo test -p kebab-tui inspect`.
## Definition of Done
- [ ] `cargo check -p kb-tui` passes
- [ ] `cargo test -p kb-tui inspect` passes
- [ ] `cargo check -p kebab-tui` passes
- [ ] `cargo test -p kebab-tui inspect` passes
- [ ] No imports outside Allowed dependencies
- [ ] Manual smoke: inspect a doc with multiple chunks, scroll, return to library
- [ ] PR links design §3.5, §2.5, §2.6

View File

@@ -1,12 +1,12 @@
---
phase: P9
component: kb-desktop (Tauri)
component: kebab-desktop (Tauri)
task_id: p9-5
title: "Tauri desktop app: backend commands wrapping kb-app + multimodal source viewer"
title: "Tauri desktop app: backend commands wrapping kebab-app + multimodal source viewer"
status: planned
depends_on: [p9-1, p9-2, p9-3, p9-4]
unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [report §16.3 desktop (also tasks/phase-9-ui.md epic), design §1 ask/search scenes, design §2 wire schemas v1, design §8 module boundaries]
---
@@ -14,23 +14,23 @@ contract_sections: [report §16.3 desktop (also tasks/phase-9-ui.md epic), desig
## Goal
Stand up a Tauri 2.x app (`kb-desktop` crate as backend, `kb-desktop-frontend/` as web assets) whose Tauri commands wrap `kb-app` 1:1. The frontend renders multimodal source viewers (Markdown render, PDF page viewer, image viewer with region overlay, audio player with seek). Citation clicks route to the appropriate viewer.
Stand up a Tauri 2.x app (`kebab-desktop` crate as backend, `kebab-desktop-frontend/` as web assets) whose Tauri commands wrap `kebab-app` 1:1. The frontend renders multimodal source viewers (Markdown render, PDF page viewer, image viewer with region overlay, audio player with seek). Citation clicks route to the appropriate viewer.
## Why now / why this size
Last task. Combines all backend phases into a single user-facing surface. Strict policy: backend commands are thin wrappers over `kb-app`; no new business logic.
Last task. Combines all backend phases into a single user-facing surface. Strict policy: backend commands are thin wrappers over `kebab-app`; no new business logic.
## Allowed dependencies
- backend (`kb-desktop`):
- `kb-core`
- `kb-config`
- `kb-app`
- backend (`kebab-desktop`):
- `kebab-core`
- `kebab-config`
- `kebab-app`
- `tauri = "2"` + `tauri-build`
- `serde`, `serde_json`
- `tracing`
- `thiserror`
- frontend (`kb-desktop-frontend/`): vanilla TypeScript + Vite (default; user may swap to Svelte/Solid in a follow-up).
- frontend (`kebab-desktop-frontend/`): vanilla TypeScript + Vite (default; user may swap to Svelte/Solid in a follow-up).
- PDF rendering: `pdfjs-dist`
- Markdown rendering: `marked` + `dompurify`
- Audio: HTML `<audio>` with custom segment overlay
@@ -38,7 +38,7 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
## Forbidden dependencies
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (UI must go through `kb-app` only — design §8).
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (UI must go through `kebab-app` only — design §8).
- **No native PDF render backend** (no `pdfium`, no `mupdf`, no `poppler`). PDF rendering lives entirely in the frontend (`pdfjs-dist`). Adding any of these would (a) bloat the bundle 100+ MB, (b) require frozen-design amendment, and (c) double the path-containment surface.
## Inputs
@@ -46,7 +46,7 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
| input | type | source |
|-------|------|--------|
| Tauri commands | invoked from frontend | user clicks |
| `kb-config::Config` | runtime | env / file |
| `kebab-config::Config` | runtime | env / file |
| user file system (read-only) | for source viewers | OS |
## Outputs
@@ -59,7 +59,7 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
## Public surface (signatures only — no new types)
```rust
// Tauri command surface (one per kb-app facade method, plus source viewers)
// Tauri command surface (one per kebab-app facade method, plus source viewers)
#[tauri::command] fn cmd_init(force: bool) -> Result<()>;
#[tauri::command] fn cmd_ingest(scope_json: serde_json::Value, summary_only: bool) -> Result<serde_json::Value /* IngestReportWireV1 */>;
#[tauri::command] fn cmd_list_docs(filter_json: serde_json::Value) -> Result<Vec<serde_json::Value /* DocSummaryWireV1 */>>;
@@ -75,11 +75,11 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
#[tauri::command] fn cmd_read_file_bytes(path: String) -> Result<Vec<u8>>; // raw bytes for PDF / image / audio
```
(All commands convert internal `kb-core` types to wire-schema-v1 JSON before returning.)
(All commands convert internal `kebab-core` types to wire-schema-v1 JSON before returning.)
## Behavior contract
- Backend bootstraps `tracing` to a file under `~/.local/state/kb/logs/` and a Tauri plugin loads/saves window state.
- Backend bootstraps `tracing` to a file under `~/.local/state/kebab/logs/` and a Tauri plugin loads/saves window state.
- Every Tauri command performs **path containment** for source viewers: resolves `path` against `config.workspace.root`, rejects (`anyhow::Error`) any path outside.
- Layout (frontend): left = Library + Search + Ask tabs; right = Source viewer keyed by current citation.
- Citation routing in the frontend (clicks on `[#N]` markers or hit rows). All rendering is frontend-side; backend serves raw bytes only.
@@ -88,23 +88,23 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
- `Citation::Region { path, x, y, w, h }``cmd_read_file_bytes(path)` → blob URL → `<img>` + absolute-positioned overlay at `(x, y, w, h)`.
- `Citation::Caption { path, model }` → same as Region but no overlay; caption banner shows `model`.
- `Citation::Time { path, start_ms, end_ms }``cmd_read_file_bytes(path)` → blob URL → `<audio src=...>` seeked to `start_ms / 1000`, with a timeline marker spanning `[start_ms, end_ms]`.
- Streaming `kb ask`: backend command `cmd_ask` returns the buffered Answer (per §0 Q5: pipe/JSON mode buffers). For real-time streaming in the desktop, expose a separate `cmd_ask_stream` event channel via Tauri's `Window::emit("kb://ask-token", payload)`. (Implementation can be deferred to a follow-up; v1 of the desktop accepts buffered.)
- Streaming `kebab ask`: backend command `cmd_ask` returns the buffered Answer (per §0 Q5: pipe/JSON mode buffers). For real-time streaming in the desktop, expose a separate `cmd_ask_stream` event channel via Tauri's `Window::emit("kebab://ask-token", payload)`. (Implementation can be deferred to a follow-up; v1 of the desktop accepts buffered.)
- All backend errors mapped to a `String` message with structure `{ "error": msg, "hint": Option<msg> }`.
- Frontend respects light/dark per OS theme (Tauri supplies the API).
- No telemetry. No automatic update channel for v1 (manual download).
## Storage / wire effects
- Reads via `kb-app` (which reads/writes via SQLite + LanceDB).
- Reads via `kebab-app` (which reads/writes via SQLite + LanceDB).
- Reads workspace files directly for source viewers (path-contained).
- Writes nothing outside what `kb-app` writes.
- Writes nothing outside what `kebab-app` writes.
- Wire JSON between backend and frontend uses schema v1 strictly. The frontend MUST validate `schema_version` strings on every IPC return and warn (or upgrade-gate) when `v1 != current`.
## Test plan
| kind | description | fixture / data |
|------|-------------|----------------|
| unit (backend) | each command wraps the corresponding `kb-app` function and serializes via wire schema | inline mocks |
| unit (backend) | each command wraps the corresponding `kebab-app` function and serializes via wire schema | inline mocks |
| unit (backend) | `cmd_read_markdown` rejects paths outside workspace | tmp config |
| unit (backend) | `cmd_read_file_bytes` rejects paths outside workspace incl. `..`, absolute path, symlink-out | tmp config + traversal vectors |
| unit (backend) | `cmd_read_file_bytes` returns identical bytes to `std::fs::read` for an in-workspace file | tmp config |
@@ -112,13 +112,13 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
| smoke (frontend, optional in this task) | Vitest test that mounts the Library tab, calls a mocked `cmd_list_docs`, renders 1 row | minimal |
| manual | full-stack smoke against a real ingested workspace (Markdown + 1 PDF + 1 image + 1 audio); each citation jumps correctly | manual checklist |
Backend tests under `cargo test -p kb-desktop`. Frontend tests are bonus and not gated by this task's DoD.
Backend tests under `cargo test -p kebab-desktop`. Frontend tests are bonus and not gated by this task's DoD.
## Definition of Done
- [ ] `cargo check -p kb-desktop` passes
- [ ] `cargo test -p kb-desktop` passes
- [ ] `pnpm --filter kb-desktop-frontend build` produces a static asset bundle Tauri can package
- [ ] `cargo check -p kebab-desktop` passes
- [ ] `cargo test -p kebab-desktop` passes
- [ ] `pnpm --filter kebab-desktop-frontend build` produces a static asset bundle Tauri can package
- [ ] `tauri build` produces an unsigned dmg on macOS in CI (signed/notarized are out of scope)
- [ ] Each Tauri command returns wire-schema-v1 JSON; frontend asserts `schema_version`
- [ ] No imports outside Allowed dependencies (backend)
@@ -139,4 +139,4 @@ Backend tests under `cargo test -p kb-desktop`. Frontend tests are bonus and not
- Path containment is the desktop's most security-sensitive surface; tests must include path traversal vectors (`..`, symlinks, absolute paths).
- PDF rendering via `pdfjs-dist` is heavy (~2 MB worker); lazy-load on first PDF citation. The trade-off vs a native render backend (e.g., `pdfium` ~150 MB binary, code-signing pain) is heavily one-sided; v1 stays on `pdfjs-dist`.
- Audio formats vary; rely on the browser engine's HTML audio decoder (WebKit on macOS supports `.m4a`, `.mp3`; mileage varies on `.flac`/`.ogg`).
- Wide Tauri command surface tempts business-logic creep; CI must enforce that no `kb-rag` / `kb-search` / store crate appears in `kb-desktop`'s `cargo tree`.
- Wide Tauri command surface tempts business-logic creep; CI must enforce that no `kebab-rag` / `kebab-search` / store crate appears in `kebab-desktop`'s `cargo tree`.

View File

@@ -3,7 +3,7 @@ phase: P0
title: "Workspace 뼈대 + 도메인 계약"
status: completed
depends_on: []
source: kb_local_rust_report.md §3, §4, §6, §7, §13
source: kebab_local_rust_report.md §3, §4, §6, §7, §13
---
# P0 — Workspace 뼈대 + 도메인 계약
@@ -16,11 +16,11 @@ compile 되는 Rust 2024 workspace 와 domain spec 확정. 이후 모든 phase
| crate | 역할 |
|-------|------|
| `kb-core` | domain type, trait, error, ID 규칙 |
| `kb-parse-types` | parser 중간 표현 (`ParsedBlock` 등) — `kb-core` 와 parsers/normalize 사이 thin layer (design §3.7b) |
| `kb-config` | config 로딩, 기본값, 경로 확장 |
| `kb-app` | facade. CLI/TUI/desktop 공통 진입점 |
| `kb-cli` | `kb` 바이너리 skeleton (`--help`만 동작) |
| `kebab-core` | domain type, trait, error, ID 규칙 |
| `kebab-parse-types` | parser 중간 표현 (`ParsedBlock` 등) — `kebab-core` 와 parsers/normalize 사이 thin layer (design §3.7b) |
| `kebab-config` | config 로딩, 기본값, 경로 확장 |
| `kebab-app` | facade. CLI/TUI/desktop 공통 진입점 |
| `kebab-cli` | `kebab` 바이너리 skeleton (`--help`만 동작) |
## Workspace 설정
@@ -29,7 +29,7 @@ compile 되는 Rust 2024 workspace 와 domain spec 확정. 이후 모든 phase
```toml
[workspace]
resolver = "3"
members = ["crates/kb-core", "crates/kb-parse-types", "crates/kb-config", "crates/kb-app", "crates/kb-cli"]
members = ["crates/kebab-core", "crates/kebab-parse-types", "crates/kebab-config", "crates/kebab-app", "crates/kebab-cli"]
[workspace.package]
edition = "2024"
@@ -48,11 +48,11 @@ tracing = "0.1"
추가 멤버 crate 는 후속 phase 에서 합류.
## kb-core 도메인 타입
## kebab-core 도메인 타입
`RawAsset`, `CanonicalDocument`, `Block`, `Chunk`, `SearchHit`, `Citation`, `SourceSpan`, `Provenance`, `Metadata`. report §6 정의 그대로.
## kb-core trait
## kebab-core trait
```rust
pub trait SourceConnector { fn scan(&self, scope: &SourceScope) -> anyhow::Result<Vec<RawAsset>>; }
@@ -78,7 +78,7 @@ index_id = collection name + index version + embedding model id
각 record 에 다음 version 필드 보존: `doc_version`, `schema_version`, `parser_version`, `chunker_version`, `embedding_model`, `embedding_version`, `index_version`, `prompt_template_version`.
## kb-app facade
## kebab-app facade
CLI/TUI/desktop 모두 facade 함수 호출. parser/DB/LLM adapter 직접 호출 금지. 초기 facade 메서드 stub:
@@ -91,9 +91,9 @@ pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
pub fn doctor() -> anyhow::Result<DoctorReport>;
```
## kb-cli skeleton
## kebab-cli skeleton
`clap` derive. subcommand: `init`, `ingest`, `index`, `search`, `ask`, `inspect doc|chunk`, `doctor`. 본체는 `kb-app` 호출만. P0 에선 `--help` 만 동작.
`clap` derive. subcommand: `init`, `ingest`, `index`, `search`, `ask`, `inspect doc|chunk`, `doctor`. 본체는 `kebab-app` 호출만. P0 에선 `--help` 만 동작.
## spec 문서
@@ -112,15 +112,15 @@ pub fn doctor() -> anyhow::Result<DoctorReport>;
## 의존성 경계
- `kb-core`: 외부 의존 최소 (serde, time, uuid, blake3, thiserror, tracing).
- `kb-cli``kb-app` 만 의존. parser/DB/LLM 직접 의존 금지.
- `kb-app` 은 trait 만 보고 동작. 구현체는 dyn injection.
- `kebab-core`: 외부 의존 최소 (serde, time, uuid, blake3, thiserror, tracing).
- `kebab-cli``kebab-app` 만 의존. parser/DB/LLM 직접 의존 금지.
- `kebab-app` 은 trait 만 보고 동작. 구현체는 dyn injection.
## 완료 조건
- [ ] `cargo check --workspace` 통과
- [ ] `cargo test --workspace` 통과 (단위 테스트는 ID 생성/도메인 직렬화 round-trip)
- [ ] `kb --help` 출력
- [ ] `kebab --help` 출력
- [ ] `docs/spec/*` 7개 문서 존재
- [ ] `fixtures/markdown/*` 3개 존재
- [ ] domain type serde JSON snapshot test 1개 이상

View File

@@ -3,47 +3,47 @@ phase: P1
title: "Markdown ingestion 파이프라인"
status: completed
depends_on: [P0]
source: kb_local_rust_report.md §8, §14, §17 Phase 1
source: kebab_local_rust_report.md §8, §14, §17 Phase 1
---
# P1 — Markdown ingestion 파이프라인
## 목표
`Markdown 파일 -> RawAsset -> CanonicalDocument -> Chunk -> SQLite` 흐름 완성. LLM/embedding 없이도 `kb ingest` / `kb list docs` / `kb inspect doc <id>` 동작.
`Markdown 파일 -> RawAsset -> CanonicalDocument -> Chunk -> SQLite` 흐름 완성. LLM/embedding 없이도 `kebab ingest` / `kebab list docs` / `kebab inspect doc <id>` 동작.
## 산출 crate
| crate | 역할 |
|-------|------|
| `kb-source-fs` | local folder scan, checksum, 변경 감지. `SourceConnector` 구현 |
| `kb-parse-md` | Markdown bytes → structured document. `Extractor` 구현 |
| `kb-normalize` | parser output → `CanonicalDocument` |
| `kb-chunk` | block-aware chunking. `Chunker` 구현 (`md-heading-v1`) |
| `kb-store-sqlite` | metadata, document, chunk, job table. FTS table 은 P2 에서 활성화 |
| `kebab-source-fs` | local folder scan, checksum, 변경 감지. `SourceConnector` 구현 |
| `kebab-parse-md` | Markdown bytes → structured document. `Extractor` 구현 |
| `kebab-normalize` | parser output → `CanonicalDocument` |
| `kebab-chunk` | block-aware chunking. `Chunker` 구현 (`md-heading-v1`) |
| `kebab-store-sqlite` | metadata, document, chunk, job table. FTS table 은 P2 에서 활성화 |
## kb-source-fs
## kebab-source-fs
- 입력: `SourceScope { root: PathBuf, include: Vec<Glob>, exclude: Vec<Glob> }`
- 동작: 재귀 walk → 각 파일 `blake3``RawAsset` 목록.
- 변경 감지: `(source_uri, checksum)` 기준 신/구 비교. 동일 checksum 은 skip.
- watch 모드는 P1 범위 밖 (config 만 정의, 구현 후순위).
## kb-parse-md
## kebab-parse-md
- parser 후보: `pulldown-cmark` 1차. GFM table/task list 필요해지면 `comrak` 검토 (§8).
- 보존 대상: YAML/TOML frontmatter, heading tree, paragraph, list, code block + lang tag, table, blockquote, link, image ref, **line range**.
- 출력: 중간 표현 (parser 고유). `kb-normalize` 가 canonical 로 변환.
- 출력: 중간 표현 (parser 고유). `kebab-normalize` 가 canonical 로 변환.
- malformed markdown: panic 금지. 가능한 부분만 보존하고 `Provenance` 에 warning 기록.
## kb-normalize
## kebab-normalize
- 책임: parser 중간 표현 → `CanonicalDocument`.
- frontmatter → `Metadata` (id, title, aliases, tags, created_at, updated_at, source_type, trust_level, lang).
- block 트리 평탄화 + `BlockId` 부여 (heading path + 순번 기반 deterministic).
- `SourceSpan``LineRange { start, end }` 또는 `ByteRange` 둘 다 허용. Markdown 은 line range 1차.
## kb-chunk (`md-heading-v1`)
## kebab-chunk (`md-heading-v1`)
우선순위 (§14):
1. heading boundary 우선
@@ -58,7 +58,7 @@ policy 기본값: `target_tokens = 500`, `overlap_tokens = 80`, `respect_markdow
token 추정: tokenizer 미도입 단계라 byte / 문자 기반 근사 OK. 실제 tokenizer 는 P3 embedding 도입 시 교체.
## kb-store-sqlite
## kebab-store-sqlite
스키마 (1차):
@@ -117,7 +117,7 @@ CREATE TABLE jobs (
- transaction: ingest 1건 = 1 transaction. 부분 실패 시 rollback.
- idempotent: 동일 `doc_id` 재수집은 UPSERT, version bump.
## kb-app facade 확장
## kebab-app facade 확장
```rust
pub fn ingest(scope: SourceScope) -> anyhow::Result<IngestReport>;
@@ -131,10 +131,10 @@ pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
## CLI
```text
kb ingest <path> [--include <glob>] [--exclude <glob>]
kb list docs [--tag <t>]
kb inspect doc <doc_id>
kb inspect chunk <chunk_id>
kebab ingest <path> [--include <glob>] [--exclude <glob>]
kebab list docs [--tag <t>]
kebab inspect doc <doc_id>
kebab inspect chunk <chunk_id>
```
## 테스트
@@ -146,14 +146,14 @@ kb inspect chunk <chunk_id>
## 의존성 경계
`kb-parse-md` 금지: `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, embedding 호출. parser 는 순수 함수.
`kebab-parse-md` 금지: `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, embedding 호출. parser 는 순수 함수.
## 완료 조건
- [ ] `kb ingest <path>` 실행 후 SQLite 에 documents/blocks/chunks 채워짐
- [ ] `kb list docs` 정상 출력
- [ ] `kb inspect doc <id>` JSON 출력
- [ ] `kb inspect chunk <id>` JSON 출력 (heading path + source span 포함)
- [ ] `kebab ingest <path>` 실행 후 SQLite 에 documents/blocks/chunks 채워짐
- [ ] `kebab list docs` 정상 출력
- [ ] `kebab inspect doc <id>` JSON 출력
- [ ] `kebab inspect chunk <id>` JSON 출력 (heading path + source span 포함)
- [ ] 같은 폴더 재수집 시 중복 row 없음
- [ ] parser/chunker version 변경 시 재처리 대상 식별 가능
- [ ] fixture snapshot test 통과

View File

@@ -3,19 +3,19 @@ phase: P2
title: "SQLite FTS5 lexical 검색 + citation"
status: completed
depends_on: [P1]
source: kb_local_rust_report.md §10, §15, §17 Phase 2
source: kebab_local_rust_report.md §10, §15, §17 Phase 2
---
# P2 — SQLite FTS5 lexical 검색 + citation
## 목표
embedding/LLM 없이 FTS5 만으로 동작하는 검색 + citation 출력. `kb search "..."` 가 chunk 와 source span 반환.
embedding/LLM 없이 FTS5 만으로 동작하는 검색 + citation 출력. `kebab search "..."` 가 chunk 와 source span 반환.
## 산출 crate
- `kb-search` (lexical 모드) — `Retriever` trait 구현 1번째.
- `kb-store-sqlite` 확장: FTS5 virtual table + trigger.
- `kebab-search` (lexical 모드) — `Retriever` trait 구현 1번째.
- `kebab-store-sqlite` 확장: FTS5 virtual table + trigger.
## FTS5 스키마
@@ -69,15 +69,15 @@ pub struct SearchHit {
}
```
`Citation` 형식: `notes/rust/kb.md:L12-L34`.
`Citation` 형식: `notes/rust/kebab.md:L12-L34`.
## 인덱스 라이프사이클
- ingest 시 trigger 로 자동 동기화.
- `kb index --rebuild-fts` command 로 FTS table 재구축 (chunker version bump 후 사용).
- `kebab index --rebuild-fts` command 로 FTS table 재구축 (chunker version bump 후 사용).
- `index_version``(schema_version, fts_config_hash)` 조합.
## kb-app facade 확장
## kebab-app facade 확장
```rust
pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHit>>;
@@ -86,16 +86,16 @@ pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHit>>;
## CLI
```text
kb search "Rust workspace 설계" [--k 10] [--tag rust] [--mode lexical]
kb index --rebuild-fts
kebab search "Rust workspace 설계" [--k 10] [--tag rust] [--mode lexical]
kebab index --rebuild-fts
```
출력 예:
```text
1. [0.82] Rust workspace는 여러 package를 하나로 관리한다…
doc: notes/rust/kb.md
citation: notes/rust/kb.md:L12-L34
doc: notes/rust/kebab.md
citation: notes/rust/kebab.md:L12-L34
heading: 아키텍처 > Rust workspace
```
@@ -108,13 +108,13 @@ kb index --rebuild-fts
## 의존성 경계
- `kb-search``kb-store-sqlite``kb-core` 만 의존.
- `kebab-search``kebab-store-sqlite``kebab-core` 만 의존.
- LLM/embedding 호출 금지 (P2 단계).
- CLI 는 `kb-app` 통해서만 호출.
- CLI 는 `kebab-app` 통해서만 호출.
## 완료 조건
- [ ] `kb search "..."` top-k chunk 반환
- [ ] `kebab search "..."` top-k chunk 반환
- [ ] 모든 결과에 citation 포함
- [ ] citation line range 가 원본과 일치
- [ ] 한영 혼합 query 동작 (한국어 토큰화 한계는 노트로)

View File

@@ -3,23 +3,23 @@ phase: P3
title: "Local embedding + LanceDB + hybrid search"
status: completed
depends_on: [P2]
source: kb_local_rust_report.md §10, §11, §15, §17 Phase 3
source: kebab_local_rust_report.md §10, §11, §15, §17 Phase 3
---
# P3 — Local embedding + LanceDB + hybrid search
## 목표
local embedding 으로 chunk vector 화 → LanceDB 저장 → vector 검색 + lexical 융합 (hybrid). `kb search --mode {lexical,vector,hybrid}` 동작.
local embedding 으로 chunk vector 화 → LanceDB 저장 → vector 검색 + lexical 융합 (hybrid). `kebab search --mode {lexical,vector,hybrid}` 동작.
## 산출 crate
| crate | 역할 |
|-------|------|
| `kb-embed` | `Embedder` trait + `EmbeddingInput`/output 타입 |
| `kb-embed-local` | `fastembed-rs` adapter (1차). later: Ollama embed endpoint, candle |
| `kb-store-vector` | LanceDB 연동. table 관리, upsert, vector search |
| `kb-search` | lexical + vector 병행 + score fusion |
| `kebab-embed` | `Embedder` trait + `EmbeddingInput`/output 타입 |
| `kebab-embed-local` | `fastembed-rs` adapter (1차). later: Ollama embed endpoint, candle |
| `kebab-store-vector` | LanceDB 연동. table 관리, upsert, vector search |
| `kebab-search` | lexical + vector 병행 + score fusion |
## Embedder
@@ -64,7 +64,7 @@ created_at : timestamp
## Indexing job
```text
kb index --embeddings [--model <id>] [--batch-size N] [--resume]
kebab index --embeddings [--model <id>] [--batch-size N] [--resume]
```
- chunk 중 `embedding_id = chunk_id + model_id + dim` 가 vector store 에 없는 것만 처리.
@@ -91,7 +91,7 @@ k_rrf 기본 60.
P3 범위에선 reranker 미도입 (P+ 단계 노트).
## kb-search 구조
## kebab-search 구조
```rust
pub struct HybridRetriever {
@@ -102,11 +102,11 @@ pub struct HybridRetriever {
```
- 각 sub retriever 는 `Retriever` trait 구현.
- `kb-app::search` 가 mode 따라 dispatch.
- `kebab-app::search` 가 mode 따라 dispatch.
## kb-app facade 확장
## kebab-app facade 확장
P3 동안 kb-app facade 의 `ingest` / `search` / `list_docs` / `inspect_doc` / `inspect_chunk` 의 stub 본체를 실제 라이브러리 호출로 대체. P0 부터 시그니처는 frozen 이므로 signature 변경 없이 body 만 swap.
P3 동안 kebab-app facade 의 `ingest` / `search` / `list_docs` / `inspect_doc` / `inspect_chunk` 의 stub 본체를 실제 라이브러리 호출로 대체. P0 부터 시그니처는 frozen 이므로 signature 변경 없이 body 만 swap.
```rust
pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result<IngestReport>; // p3-5
@@ -117,14 +117,14 @@ pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>; // p4-3 (stub remains)
```
p3-5 는 LLM 미관여 facade 모두 (`ask` 제외) 를 한 번에 wire. 이후 `cargo run -p kb-cli -- index``cargo run -p kb-cli -- search --mode {lexical,vector,hybrid}` 가 실 동작.
p3-5 는 LLM 미관여 facade 모두 (`ask` 제외) 를 한 번에 wire. 이후 `cargo run -p kebab-cli -- index``cargo run -p kebab-cli -- search --mode {lexical,vector,hybrid}` 가 실 동작.
## CLI
```text
kb index --embeddings
kb search --mode vector "비슷한 설계 원칙"
kb search --mode hybrid "Markdown chunking 규칙"
kebab index --embeddings
kebab search --mode vector "비슷한 설계 원칙"
kebab search --mode hybrid "Markdown chunking 규칙"
```
## 테스트
@@ -137,19 +137,19 @@ kb search --mode hybrid "Markdown chunking 규칙"
## 의존성 경계
- `kb-embed-local` 만 ONNX/모델 binding 의존. 다른 crate 는 trait 만 사용.
- `kb-store-vector``lancedb` 의존. SQLite 와 cross-write 금지 (각 store 책임 분리).
- `kebab-embed-local` 만 ONNX/모델 binding 의존. 다른 crate 는 trait 만 사용.
- `kebab-store-vector``lancedb` 의존. SQLite 와 cross-write 금지 (각 store 책임 분리).
- LLM crate 와 분리 (§11.1).
## 완료 조건
- [ ] `kb index` (= `kb-app::ingest`) 로 모든 chunk 가 SQLite + LanceDB 에 저장 (p3-5)
- [ ] `kb search --mode vector` 정상 hit
- [ ] `kb search --mode hybrid` 정상 hit, citation 포함
- [ ] `kebab index` (= `kebab-app::ingest`) 로 모든 chunk 가 SQLite + LanceDB 에 저장 (p3-5)
- [ ] `kebab search --mode vector` 정상 hit
- [ ] `kebab search --mode hybrid` 정상 hit, citation 포함
- [ ] 모델/차원 변경 시 별도 table 로 분리 저장
- [ ] resume 시 미완료 chunk 만 처리 (P+ 로 deferred)
- [ ] hit@k 측정 가능한 형태로 결과 구조화 (P5 준비)
- [ ] `kb-app::{ingest,search,list_docs,inspect_doc,inspect_chunk}` 가 실 동작 (`ask` 는 P4-3 까지 stub) — p3-5
- [ ] `kebab-app::{ingest,search,list_docs,inspect_doc,inspect_chunk}` 가 실 동작 (`ask` 는 P4-3 까지 stub) — p3-5
## 리스크 / 주의

View File

@@ -3,22 +3,22 @@ phase: P4
title: "Local LLM + RAG + grounded answer"
status: completed
depends_on: [P3]
source: kb_local_rust_report.md §11, §15.2, §17 Phase 4
source: kebab_local_rust_report.md §11, §15.2, §17 Phase 4
---
# P4 — Local LLM + RAG + grounded answer
## 목표
local LLM 으로 citation 포함 답변 생성. 근거 부족 시 거절. `kb ask "..."` 동작.
local LLM 으로 citation 포함 답변 생성. 근거 부족 시 거절. `kebab ask "..."` 동작.
## 산출 crate
| crate | 역할 |
|-------|------|
| `kb-llm` | `LanguageModel` trait + request/response 타입 |
| `kb-llm-local` | Ollama adapter 1차. later: llama.cpp, candle |
| `kb-rag` | retrieval → context packing → prompt → generate → citation 검증 |
| `kebab-llm` | `LanguageModel` trait + request/response 타입 |
| `kebab-llm-local` | Ollama adapter 1차. later: llama.cpp, candle |
| `kebab-rag` | retrieval → context packing → prompt → generate → citation 검증 |
## LanguageModel
@@ -50,9 +50,9 @@ pub struct GenerateResponse {
- HTTP localhost 호출 (`http://127.0.0.1:11434/api/generate`).
- 내부에서 async runtime 사용 가능. 외부 API 는 동기 wrapper 유지.
- model 기본값 config (`qwen2.5:14b-instruct` 등). 실제 선택은 P5 eval 후 결정.
- 서버 미기동 시 명확한 에러 메시지 + `kb doctor` 진단.
- 서버 미기동 시 명확한 에러 메시지 + `kebab doctor` 진단.
## kb-rag 파이프라인
## kebab-rag 파이프라인
```text
query
@@ -118,7 +118,7 @@ pub struct Answer {
`answers` table 에 저장 (재현/감사용). 사용한 chunk_id 목록 + retrieval params 도 함께.
## kb-app facade 확장
## kebab-app facade 확장
```rust
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>;
@@ -127,9 +127,9 @@ pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>;
## CLI
```text
kb ask "내 KB 설계에서 저장소 전략은?"
kb ask --k 8 --temperature 0 "..."
kb ask --explain "..." # retrieval trace + packed prompt 출력
kebab ask "내 KB 설계에서 저장소 전략은?"
kebab ask --k 8 --temperature 0 "..."
kebab ask --explain "..." # retrieval trace + packed prompt 출력
```
## 테스트
@@ -142,13 +142,13 @@ kb ask --explain "..." # retrieval trace + packed prompt 출력
## 의존성 경계
- `kb-llm-local` 만 Ollama HTTP 의존.
- `kb-rag``kb-search` (Retriever trait) + `kb-llm` (LanguageModel trait) 만 사용. SQLite/LanceDB 직접 호출 금지.
- CLI 는 `kb-app::ask` 만 호출.
- `kebab-llm-local` 만 Ollama HTTP 의존.
- `kebab-rag``kebab-search` (Retriever trait) + `kebab-llm` (LanguageModel trait) 만 사용. SQLite/LanceDB 직접 호출 금지.
- CLI 는 `kebab-app::ask` 만 호출.
## 완료 조건
- [ ] `kb ask "..."` 동작
- [ ] `kebab ask "..."` 동작
- [ ] 답변에 citation 포함
- [ ] 근거 없는 질문 거절
- [ ] `--explain` 으로 retrieval trace 확인
@@ -158,6 +158,6 @@ kb ask --explain "..." # retrieval trace + packed prompt 출력
## 리스크 / 주의
- 모델 선택은 P5 golden set 으로 평가 후 확정. P4 에선 default 만.
- Ollama 미기동 / 모델 미다운로드 → `kb doctor` 가 명확히 안내.
- Ollama 미기동 / 모델 미다운로드 → `kebab doctor` 가 명확히 안내.
- LLM 답변에 hallucinated citation 자주 나옴. 후처리 검증이 핵심.
- prompt template 변경은 `prompt_template_version` 반드시 bump.

View File

@@ -3,7 +3,7 @@ phase: P5
title: "Golden query / regression eval"
status: completed
depends_on: [P4]
source: kb_local_rust_report.md §17 Phase 5, §18
source: kebab_local_rust_report.md §17 Phase 5, §18
---
# P5 — Golden query / regression eval
@@ -14,7 +14,7 @@ source: kb_local_rust_report.md §17 Phase 5, §18
## 산출 crate
- `kb-eval` — golden query 실행기, 지표 계산, report 생성.
- `kebab-eval` — golden query 실행기, 지표 계산, report 생성.
## Golden set fixture
@@ -25,9 +25,9 @@ source: kb_local_rust_report.md §17 Phase 5, §18
query: "Markdown chunking 규칙"
lang: ko
expected_doc_ids:
- doc:notes/rust/kb-architecture.md
- doc:notes/rust/kebab-architecture.md
expected_chunk_ids:
- chunk:notes/rust/kb-architecture.md#chunking-policy
- chunk:notes/rust/kebab-architecture.md#chunking-policy
must_contain:
- "heading"
- "code block"
@@ -57,9 +57,9 @@ source: kb_local_rust_report.md §17 Phase 5, §18
## 실행 모드
```text
kb eval run --suite golden [--mode {lexical,vector,hybrid}] [--with-rag]
kb eval compare <run_id_a> <run_id_b>
kb eval report <run_id> --format {json,md,html}
kebab eval run --suite golden [--mode {lexical,vector,hybrid}] [--with-rag]
kebab eval compare <run_id_a> <run_id_b>
kebab eval report <run_id> --format {json,md,html}
```
run record:
@@ -90,7 +90,7 @@ DB 저장 (`eval_runs`, `eval_query_results` table) 또는 JSON 파일. 재현
- 자동 hyperparameter 탐색 — 안 함.
- LLM judge ("LLM as a judge") — P5 범위 밖. groundedness 는 rule-based (`must_contain`) 만.
## kb-app facade 확장
## kebab-app facade 확장
```rust
pub fn eval_run(opts: EvalRunOpts) -> anyhow::Result<EvalRun>;
@@ -105,13 +105,13 @@ pub fn eval_compare(a: &str, b: &str) -> anyhow::Result<CompareReport>;
## 의존성 경계
- `kb-eval``kb-app` 만 호출 (검색/ask 는 facade 통해서). 내부 store/LLM 직접 호출 금지.
- `kebab-eval``kebab-app` 만 호출 (검색/ask 는 facade 통해서). 내부 store/LLM 직접 호출 금지.
## 완료 조건
- [ ] `fixtures/golden_queries.yaml` 30+ 개
- [ ] `kb eval run` 으로 hit@k, MRR, citation_coverage 산출
- [ ] `kb eval compare` 로 두 run 비교 가능
- [ ] `kebab eval run` 으로 hit@k, MRR, citation_coverage 산출
- [ ] `kebab eval compare` 로 두 run 비교 가능
- [ ] config snapshot 이 run 에 저장됨 (chunker, embedding, llm, prompt 버전)
- [ ] CI 로 회귀 감지 가능 (예: hit@5 가 baseline 대비 -3% 이상 떨어지면 실패)

View File

@@ -3,7 +3,7 @@ phase: P6
title: "이미지 ingestion (OCR + caption)"
status: planned
depends_on: [P5]
source: kb_local_rust_report.md §9.1, §17 Phase 6
source: kebab_local_rust_report.md §9.1, §17 Phase 6
---
# P6 — 이미지 ingestion
@@ -14,8 +14,8 @@ source: kb_local_rust_report.md §9.1, §17 Phase 6
## 산출 crate
- `kb-parse-image``Extractor` 구현. 이미지 → CanonicalDocument.
- (선택) `kb-ocr` / `kb-vlm` 어댑터 (외부 모델 분리 시).
- `kebab-parse-image``Extractor` 구현. 이미지 → CanonicalDocument.
- (선택) `kebab-ocr` / `kebab-vlm` 어댑터 (외부 모델 분리 시).
## 추출 정보 3종 (§9.1)
@@ -83,10 +83,10 @@ photos/diagram-2026.png#caption # caption chunk
## CLI
```text
kb ingest ./assets/diagram.png
kb ingest ./assets/ # 폴더 안 이미지 자동 인식
kb search "이미지 안의 OCR 텍스트"
kb inspect doc <image_doc_id> # OCR/caption/EXIF 모두 표시
kebab ingest ./assets/diagram.png
kebab ingest ./assets/ # 폴더 안 이미지 자동 인식
kebab search "이미지 안의 OCR 텍스트"
kebab inspect doc <image_doc_id> # OCR/caption/EXIF 모두 표시
```
## 테스트
@@ -99,13 +99,13 @@ kb inspect doc <image_doc_id> # OCR/caption/EXIF 모두 표시
## 의존성 경계
- `kb-parse-image``kb-core` + 이미지 디코딩 (`image` crate) + OCR adapter 만.
- `kebab-parse-image``kebab-core` + 이미지 디코딩 (`image` crate) + OCR adapter 만.
- LLM/embedding 호출 금지 (caption 은 별도 adapter 통해).
- VLM caption 은 background job. ingest blocking 금지.
## 완료 조건
- [ ] `kb ingest <image>` 동작
- [ ] `kebab ingest <image>` 동작
- [ ] OCR text 검색 가능
- [ ] OCR region citation 출력
- [ ] caption 과 observed text provenance 분리

View File

@@ -3,7 +3,7 @@ phase: P7
title: "PDF text extraction + page citation"
status: planned
depends_on: [P5]
source: kb_local_rust_report.md §9.2, §17 Phase 7
source: kebab_local_rust_report.md §9.2, §17 Phase 7
---
# P7 — PDF ingestion
@@ -14,7 +14,7 @@ text PDF 추출 → page-aware chunking → citation `paper.pdf:p13`. scanned PD
## 산출 crate
- `kb-parse-pdf``Extractor` 구현.
- `kebab-parse-pdf``Extractor` 구현.
## 단계 분리 (§9.2)
@@ -63,10 +63,10 @@ paper.pdf:p13:span=0-1240 # char range within page
## CLI
```text
kb ingest ./paper.pdf
kb ingest ./papers/
kb search "PDF 안의 특정 개념"
kb inspect doc <pdf_doc_id>
kebab ingest ./paper.pdf
kebab ingest ./papers/
kebab search "PDF 안의 특정 개념"
kebab inspect doc <pdf_doc_id>
```
## 테스트
@@ -79,12 +79,12 @@ kb inspect doc <pdf_doc_id>
## 의존성 경계
- `kb-parse-pdf``kb-core` + `pdf-extract` / `lopdf` 만.
- `kebab-parse-pdf``kebab-core` + `pdf-extract` / `lopdf` 만.
- OCR 호출은 별도 adapter 통해 (P6 OCR 인프라 재사용).
## 완료 조건
- [ ] `kb ingest <pdf>` 동작
- [ ] `kebab ingest <pdf>` 동작
- [ ] page-level chunk + citation
- [ ] 검색 결과에 `paper.pdf:p<n>` 포함
- [ ] 추출 실패 페이지에 대한 provenance warning

View File

@@ -3,7 +3,7 @@ phase: P8
title: "음성 transcription + timestamp citation"
status: planned
depends_on: [P5]
source: kb_local_rust_report.md §9.3, §17 Phase 8
source: kebab_local_rust_report.md §9.3, §17 Phase 8
---
# P8 — 음성 ingestion
@@ -14,8 +14,8 @@ audio 파일 → transcript (timestamped segment) → CanonicalDocument → 동
## 산출 crate
- `kb-parse-audio``Extractor` 구현.
- `kb-asr-whisper` (또는 `kb-parse-audio` 내부 모듈) — whisper.cpp adapter.
- `kebab-parse-audio``Extractor` 구현.
- `kebab-asr-whisper` (또는 `kebab-parse-audio` 내부 모듈) — whisper.cpp adapter.
## 파이프라인 (§9.3)
@@ -94,11 +94,11 @@ meeting-2026-04-27.m4a:00:13:42-00:14:10:speaker=S1
## CLI
```text
kb ingest ./meeting.m4a
kb ingest ./recordings/
kb search "회의에서 언급한 결정사항"
kb inspect doc <audio_doc_id> # transcript + segment timestamp 표시
kb play <chunk_id> # (선택) 해당 구간 재생 — 후순위
kebab ingest ./meeting.m4a
kebab ingest ./recordings/
kebab search "회의에서 언급한 결정사항"
kebab inspect doc <audio_doc_id> # transcript + segment timestamp 표시
kebab play <chunk_id> # (선택) 해당 구간 재생 — 후순위
```
## 테스트
@@ -111,12 +111,12 @@ kb play <chunk_id> # (선택) 해당 구간 재생 — 후순위
## 의존성 경계
- `kb-parse-audio``kb-core` + `Transcriber` adapter 만.
- `kebab-parse-audio``kebab-core` + `Transcriber` adapter 만.
- LLM 호출 금지. RAG 단계는 transcript text 기반으로 동일 파이프라인.
## 완료 조건
- [ ] `kb ingest <audio>` 동작
- [ ] `kebab ingest <audio>` 동작
- [ ] transcript 가 segment timestamp 와 함께 저장
- [ ] 검색 결과에 `00:hh:mm:ss-` citation 포함
- [ ] 동일 오디오 재수집 idempotent

View File

@@ -3,7 +3,7 @@ phase: P9
title: "TUI + desktop app"
status: planned
depends_on: [P5]
source: kb_local_rust_report.md §16, §17 Phase 9
source: kebab_local_rust_report.md §16, §17 Phase 9
---
# P9 — TUI + desktop app
@@ -15,18 +15,18 @@ CLI 위에 사용성 레이어 추가. domain/검색/RAG 가 안정된 뒤 마
## 순서
```text
kb-tui (먼저) → kb-desktop (나중)
kebab-tui (먼저) → kebab-desktop (나중)
```
이유: TUI 는 domain 변화에 빠르게 적응 가능. desktop 은 packaging/배포 비용 큼.
---
## P9.A — kb-tui (Ratatui)
## P9.A — kebab-tui (Ratatui)
### 산출 crate
- `kb-tui` — Ratatui + crossterm 기반 terminal UI.
- `kebab-tui` — Ratatui + crossterm 기반 terminal UI.
### 화면 구성
@@ -51,8 +51,8 @@ q : 종료
### 의존성 경계
- `kb-tui``kb-app` 만. parser/store/LLM 직접 호출 금지.
- 비동기 I/O (검색/ask) 는 `kb-app` 비동기 wrapper 또는 thread + channel.
- `kebab-tui``kebab-app` 만. parser/store/LLM 직접 호출 금지.
- 비동기 I/O (검색/ask) 는 `kebab-app` 비동기 wrapper 또는 thread + channel.
### 완료 조건
@@ -63,7 +63,7 @@ q : 종료
---
## P9.B — kb-desktop
## P9.B — kebab-desktop
### 후보 비교
@@ -72,14 +72,14 @@ q : 종료
| Tauri | Rust backend + web frontend. native webview, 작은 binary | web frontend 별도 stack (TS/JS) |
| egui/eframe | 순수 Rust, immediate-mode | 디자인 자유도/접근성 한계 |
추천: Tauri 1차. 기존 `kb-app` facade 그대로 backend 로 노출. frontend 는 가볍게 (svelte/solid/vanilla).
추천: Tauri 1차. 기존 `kebab-app` facade 그대로 backend 로 노출. frontend 는 가볍게 (svelte/solid/vanilla).
### 산출 crate / 구조
- `kb-desktop` (Tauri app crate)
- `kb-desktop-frontend/` (web 자산)
- `kebab-desktop` (Tauri app crate)
- `kebab-desktop-frontend/` (web 자산)
Tauri command 는 `kb-app` 함수 1:1 wrap. 신규 비즈니스 로직 추가 금지.
Tauri command 는 `kebab-app` 함수 1:1 wrap. 신규 비즈니스 로직 추가 금지.
### 화면 구성 (1차)
@@ -100,7 +100,7 @@ Tauri command 는 `kb-app` 함수 1:1 wrap. 신규 비즈니스 로직 추가
### 의존성 경계
- frontend 는 Tauri command (= `kb-app` wrapper) 만 호출. SQLite/LanceDB 직접 접근 금지.
- frontend 는 Tauri command (= `kebab-app` wrapper) 만 호출. SQLite/LanceDB 직접 접근 금지.
- 모델 다운로드/실행은 backend 책임.
### 완료 조건