docs(rename): kb → kebab — README, tasks/, docs/, design doc, report

마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-02 04:01:55 +00:00
parent f1a448d6dc
commit f9714aa5cb
56 changed files with 1324 additions and 1324 deletions

140
README.md
View File

@@ -1,8 +1,8 @@
# kb — Local-first Knowledge Base # kebab — Local-first Knowledge Base
> **상태:** P0P4 구현 완료 (31 component task 중 17 완료) + 3건 post-merge hotfix 적용. `kb index` / `kb search --mode {lexical,vector,hybrid}` / `kb ask` 모두 실 동작. 다음 단계 = P5 (eval suite). 자세한 진행 상황은 [tasks/INDEX.md](tasks/INDEX.md), 머지 후 발견된 버그와 fix는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md). > **상태:** P0P4 구현 완료 (31 component task 중 17 완료) + 3건 post-merge hotfix 적용. `kebab index` / `kebab search --mode {lexical,vector,hybrid}` / `kebab ask` 모두 실 동작. 다음 단계 = P5 (eval suite). 자세한 진행 상황은 [tasks/INDEX.md](tasks/INDEX.md), 머지 후 발견된 버그와 fix는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md).
`kb` 는 개인용 로컬 knowledge base + RAG 도구다. Markdown / PDF / 이미지 / 음성을 한 곳에 색인하고, 의미 검색 + citation 포함 LLM 답변을 단일 binary 로 제공한다. 모든 추론은 로컬 (Ollama / fastembed / whisper.cpp) 에서 돌아간다. `kebab` 는 개인용 로컬 knowledge base + RAG 도구다. Markdown / PDF / 이미지 / 음성을 한 곳에 색인하고, 의미 검색 + citation 포함 LLM 답변을 단일 binary 로 제공한다. 모든 추론은 로컬 (Ollama / fastembed / whisper.cpp) 에서 돌아간다.
대상 하드웨어: M4 48GB MacBook 1대, 사용자 1명. 대상 하드웨어: M4 48GB MacBook 1대, 사용자 1명.
@@ -12,14 +12,14 @@
| 명령 | 동작 | 상태 | | 명령 | 동작 | 상태 |
|------|------|------| |------|------|------|
| `kb init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 | ✅ P0 | | `kebab init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 | ✅ P0 |
| `kb ingest [<path>]` | Markdown 색인 (idempotent). PDF/이미지/음성은 P6+. | ✅ P3-5 | | `kebab ingest [<path>]` | Markdown 색인 (idempotent). PDF/이미지/음성은 P6+. | ✅ P3-5 |
| `kb search --mode {lexical,vector,hybrid} "<query>"` | 검색 — citation 포함, hybrid는 RRF fusion | ✅ P3-5 | | `kebab search --mode {lexical,vector,hybrid} "<query>"` | 검색 — citation 포함, hybrid는 RRF fusion | ✅ P3-5 |
| `kb list docs` | 색인된 문서 목록 | ✅ P3-5 | | `kebab list docs` | 색인된 문서 목록 | ✅ P3-5 |
| `kb inspect doc <id>` / `kb inspect chunk <id>` | raw record 보기 | ✅ P3-5 | | `kebab inspect doc <id>` / `kebab inspect chunk <id>` | raw record 보기 | ✅ P3-5 |
| `kb ask "<query>"` | RAG 답변 + 근거 인용. 근거 부족 시 거절. Ollama 필요. | ✅ P4-3 | | `kebab ask "<query>"` | RAG 답변 + 근거 인용. 근거 부족 시 거절. Ollama 필요. | ✅ P4-3 |
| `kb doctor` | 설정/모델/DB 헬스 체크 | ✅ P0 | | `kebab doctor` | 설정/모델/DB 헬스 체크 | ✅ P0 |
| `kb eval run / compare` | golden query 회귀 측정 | ⏳ P5 | | `kebab eval run / compare` | golden query 회귀 측정 | ⏳ P5 |
기계 친화 모드: 모든 명령에 `--json` 플래그. 출력은 frozen wire schema v1 (`schema_version` 필드 항상 포함, 예: `ingest_report.v1`, `search_hit.v1`, `answer.v1`, `doctor.v1`). 기계 친화 모드: 모든 명령에 `--json` 플래그. 출력은 frozen wire schema v1 (`schema_version` 필드 항상 포함, 예: `ingest_report.v1`, `search_hit.v1`, `answer.v1`, `doctor.v1`).
@@ -44,35 +44,35 @@
| citation 형식 | URI fragment (`path#L12-L34`, W3C Media Fragments) | | citation 형식 | URI fragment (`path#L12-L34`, W3C Media Fragments) |
| ID 생성 | `blake3(canonical_json(tuple))[..32]` hex | | ID 생성 | `blake3(canonical_json(tuple))[..32]` hex |
| RRF fusion_score | `[0, 1]` 정규화 — `2 / (k_rrf + 1)` 로 나눠 mode 간 비교 가능 (post-merge hotfix) | | RRF fusion_score | `[0, 1]` 정규화 — `2 / (k_rrf + 1)` 로 나눠 mode 간 비교 가능 (post-merge hotfix) |
| layout | XDG (`~/.local/share/kb/`, `~/.config/kb/`, …) | | layout | XDG (`~/.local/share/kebab/`, `~/.config/kebab/`, …) |
전체는 [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](docs/superpowers/specs/2026-04-27-kb-final-form-design.md) 참조. 전체는 [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](docs/superpowers/specs/2026-04-27-kebab-final-form-design.md) 참조.
--- ---
## 의존성 그래프 ## 의존성 그래프
```text ```text
kb-cli, kb-tui, kb-desktop kebab-cli, kebab-tui, kebab-desktop
└─> kb-app └─> kebab-app
├─> kb-source-fs ├─> kebab-source-fs
├─> kb-parse-md / kb-parse-pdf / kb-parse-image / kb-parse-audio ├─> kebab-parse-md / kebab-parse-pdf / kebab-parse-image / kebab-parse-audio
│ └─> kb-parse-types │ └─> kebab-parse-types
├─> kb-normalize ├─> kebab-normalize
│ └─> kb-parse-types │ └─> kebab-parse-types
├─> kb-chunk ├─> kebab-chunk
├─> kb-store-sqlite ├─> kebab-store-sqlite
├─> kb-store-vector ├─> kebab-store-vector
├─> kb-embed-local (kb-embed trait crate) ├─> kebab-embed-local (kebab-embed trait crate)
├─> kb-search ├─> kebab-search
├─> kb-llm-local (kb-llm trait crate) ├─> kebab-llm-local (kebab-llm trait crate)
├─> kb-rag ├─> kebab-rag
├─> kb-eval ├─> kebab-eval
└─> kb-config └─> kebab-config
└─> kb-core (모두 의존) └─> kebab-core (모두 의존)
``` ```
UI → store/llm/parse 직접 의존 금지. 모든 user-facing 진입은 `kb-app` facade 만 통한다 (design §8). `kb-cli``--config <path>` flag 를 honor 하려면 `kb_app::*_with_config(cfg, …)` companion 을 통해 Config 을 명시적으로 thread 하는 패턴 — 자세한 이유는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md) 의 `--config` 항목. UI → store/llm/parse 직접 의존 금지. 모든 user-facing 진입은 `kebab-app` facade 만 통한다 (design §8). `kebab-cli``--config <path>` flag 를 honor 하려면 `kebab_app::*_with_config(cfg, …)` companion 을 통해 Config 을 명시적으로 thread 하는 패턴 — 자세한 이유는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md) 의 `--config` 항목.
--- ---
@@ -80,16 +80,16 @@ UI → store/llm/parse 직접 의존 금지. 모든 user-facing 진입은 `kb-ap
| Phase | 내용 | 핵심 산출 crate | 선행 | 상태 | | Phase | 내용 | 핵심 산출 crate | 선행 | 상태 |
|-------|------|----------------|------|------| |-------|------|----------------|------|------|
| **P0** | Workspace 뼈대 + 도메인 계약 + ID recipe | `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` | | ✅ 완료 | | **P0** | Workspace 뼈대 + 도메인 계약 + ID recipe | `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` | | ✅ 완료 |
| **P1** | Markdown ingestion (walk → parse → chunk → SQLite) | `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-sqlite` | P0 | ✅ 완료 | | **P1** | Markdown ingestion (walk → parse → chunk → SQLite) | `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-sqlite` | P0 | ✅ 완료 |
| **P2** | SQLite FTS5 lexical 검색 + citation | `kb-search` (lexical) | P1 | ✅ 완료 | | **P2** | SQLite FTS5 lexical 검색 + citation | `kebab-search` (lexical) | P1 | ✅ 완료 |
| **P3** | Local embedding + LanceDB + hybrid (RRF) + kb-app wiring | `kb-embed`, `kb-embed-local`, `kb-store-vector`, `kb-search` | P2 | ✅ 완료 | | **P3** | Local embedding + LanceDB + hybrid (RRF) + kebab-app wiring | `kebab-embed`, `kebab-embed-local`, `kebab-store-vector`, `kebab-search` | P2 | ✅ 완료 |
| **P4** | Local LLM + RAG + grounded answer | `kb-llm`, `kb-llm-local`, `kb-rag` | P3 | ✅ 완료 | | **P4** | Local LLM + RAG + grounded answer | `kebab-llm`, `kebab-llm-local`, `kebab-rag` | P3 | ✅ 완료 |
| **P5** | Golden query / regression eval | `kb-eval` | P4 | ⏳ 다음 | | **P5** | Golden query / regression eval | `kebab-eval` | P4 | ⏳ 다음 |
| **P6** | 이미지 ingestion (OCR + caption) | `kb-parse-image` | P5 | ⏳ | | **P6** | 이미지 ingestion (OCR + caption) | `kebab-parse-image` | P5 | ⏳ |
| **P7** | PDF text + page citation | `kb-parse-pdf` | P5 | ⏳ | | **P7** | PDF text + page citation | `kebab-parse-pdf` | P5 | ⏳ |
| **P8** | 음성 transcription + timestamp citation | `kb-parse-audio` | P5 | ⏳ | | **P8** | 음성 transcription + timestamp citation | `kebab-parse-audio` | P5 | ⏳ |
| **P9** | TUI + desktop app | `kb-tui`, `kb-desktop` | P5 | ⏳ | | **P9** | TUI + desktop app | `kebab-tui`, `kebab-desktop` | P5 | ⏳ |
P0~P5 직렬. P6~P9 P5 이후 병렬 가능. P0~P5 직렬. P6~P9 P5 이후 병렬 가능.
@@ -100,13 +100,13 @@ P0~P5 직렬. P6~P9 P5 이후 병렬 가능.
## 디렉토리 구조 ## 디렉토리 구조
```text ```text
kb/ kebab/
├── README.md # 이 파일 ├── README.md # 이 파일
├── kb_local_rust_report.md # 최초 설계 보고서 (방향성 + 근거) ├── kebab_local_rust_report.md # 최초 설계 보고서 (방향성 + 근거)
├── docs/ ├── docs/
│ ├── superpowers/ │ ├── superpowers/
│ │ ├── specs/ │ │ ├── specs/
│ │ │ └── 2026-04-27-kb-final-form-design.md # frozen design (12 sections) │ │ │ └── 2026-04-27-kebab-final-form-design.md # frozen design (12 sections)
│ │ └── plans/ │ │ └── plans/
│ │ └── 2026-04-27-task-decomposition.md # task 분해 implementation plan │ │ └── 2026-04-27-task-decomposition.md # task 분해 implementation plan
│ ├── SMOKE.md # 로컬 워크스페이스에 직접 돌려보는 절차 │ ├── SMOKE.md # 로컬 워크스페이스에 직접 돌려보는 절차
@@ -127,19 +127,19 @@ kb/
│ ├── p8/p8-1, p8-2 # (2) │ ├── p8/p8-1, p8-2 # (2)
│ └── p9/p9-1 … p9-5 # (5) │ └── p9/p9-1 … p9-5 # (5)
├── crates/ ├── crates/
│ ├── kb-core/ kb-parse-types/ kb-config/ # 도메인 + 설정 (P0) │ ├── kebab-core/ kebab-parse-types/ kebab-config/ # 도메인 + 설정 (P0)
│ ├── kb-source-fs/ # 워크스페이스 walk + checksum (P1-1) │ ├── kebab-source-fs/ # 워크스페이스 walk + checksum (P1-1)
│ ├── kb-parse-md/ # Markdown frontmatter + blocks (P1-2/3) │ ├── kebab-parse-md/ # Markdown frontmatter + blocks (P1-2/3)
│ ├── kb-normalize/ # ParsedBlock → CanonicalDocument (P1-4) │ ├── kebab-normalize/ # ParsedBlock → CanonicalDocument (P1-4)
│ ├── kb-chunk/ # heading-aware chunker (P1-5) │ ├── kebab-chunk/ # heading-aware chunker (P1-5)
│ ├── kb-store-sqlite/ # SQLite + FTS5 (V001/V002/V003) (P1-6, P2-1, P3-3) │ ├── kebab-store-sqlite/ # SQLite + FTS5 (V001/V002/V003) (P1-6, P2-1, P3-3)
│ ├── kb-search/ # Lexical + Vector + Hybrid retriever (P2-2, P3-4) │ ├── kebab-search/ # Lexical + Vector + Hybrid retriever (P2-2, P3-4)
│ ├── kb-embed/ kb-embed-local/ # Embedder trait + fastembed adapter (P3-1, P3-2) │ ├── kebab-embed/ kebab-embed-local/ # Embedder trait + fastembed adapter (P3-1, P3-2)
│ ├── kb-store-vector/ # LanceDB VectorStore (P3-3) │ ├── kebab-store-vector/ # LanceDB VectorStore (P3-3)
│ ├── kb-llm/ kb-llm-local/ # LanguageModel trait + Ollama adapter (P4-1, P4-2) │ ├── kebab-llm/ kebab-llm-local/ # LanguageModel trait + Ollama adapter (P4-1, P4-2)
│ ├── kb-rag/ # RAG pipeline (P4-3) │ ├── kebab-rag/ # RAG pipeline (P4-3)
│ ├── kb-app/ # facade (P0 시그니처 + P3-5 본체) │ ├── kebab-app/ # facade (P0 시그니처 + P3-5 본체)
│ └── kb-cli/ # binary (P0 → 핫픽스로 --config flag wiring 강화) │ └── kebab-cli/ # binary (P0 → 핫픽스로 --config flag wiring 강화)
├── migrations/ # SQLite refinery V001/V002/V003 ├── migrations/ # SQLite refinery V001/V002/V003
└── fixtures/ # 테스트 fixture 트리 └── fixtures/ # 테스트 fixture 트리
``` ```
@@ -153,19 +153,19 @@ kb/
cargo build --release cargo build --release
# 첫 실행 — XDG 경로에 config.toml 생성 # 첫 실행 — XDG 경로에 config.toml 생성
./target/release/kb init ./target/release/kebab init
# config 손보고 # config 손보고
${EDITOR:-vi} ~/.config/kb/config.toml ${EDITOR:-vi} ~/.config/kebab/config.toml
# 색인 # 색인
./target/release/kb ingest ./target/release/kebab ingest
# 검색 # 검색
./target/release/kb search "Markdown chunking 규칙" --mode hybrid ./target/release/kebab search "Markdown chunking 규칙" --mode hybrid
# 질문 (Ollama 필요) # 질문 (Ollama 필요)
./target/release/kb ask "내 KB 설계에서 저장소 전략은?" ./target/release/kebab ask "내 KB 설계에서 저장소 전략은?"
``` ```
워크스페이스를 격리해서 직접 돌려보는 패턴은 [docs/SMOKE.md](docs/SMOKE.md) 참조 — `--config <path>` 로 임시 디렉토리에 격리된 KB 를 만들 수 있다. 워크스페이스를 격리해서 직접 돌려보는 패턴은 [docs/SMOKE.md](docs/SMOKE.md) 참조 — `--config <path>` 로 임시 디렉토리에 격리된 KB 를 만들 수 있다.
@@ -181,17 +181,17 @@ ${EDITOR:-vi} ~/.config/kb/config.toml
- multi-workspace (P+ 후순위) - multi-workspace (P+ 후순위)
- LLM-as-judge eval (rule-based `must_contain` 만) - LLM-as-judge eval (rule-based `must_contain` 만)
- visual embedding (CLIP) — P+ - visual embedding (CLIP) — P+
- desktop app `kb://` protocol handler — P+ - desktop app `kebab://` protocol handler — P+
--- ---
## 외부 AI 통합 ## 외부 AI 통합
`kb``--json` 모드 + frozen wire schema v1 은 외부 자동화의 stable contract. 가능한 통합: `kebab``--json` 모드 + frozen wire schema v1 은 외부 자동화의 stable contract. 가능한 통합:
1. **Claude Code / Codex skill** — 얇은 wrapper (`kb search --json` / `kb ask --json` 호출). ~50 lines. 1. **Claude Code / Codex skill** — 얇은 wrapper (`kebab search --json` / `kebab ask --json` 호출). ~50 lines.
2. **MCP server**`kb-mcp` binary (stdio JSON-RPC) 가 `kb-app` facade 를 1:1 노출. Claude Desktop / Cursor / Zed 등 공유. 2. **MCP server**`kebab-mcp` binary (stdio JSON-RPC) 가 `kebab-app` facade 를 1:1 노출. Claude Desktop / Cursor / Zed 등 공유.
3. **HTTP wrapper**`kb serve --bind 127.0.0.1:7711` (P+, local-only 가치 깨므로 신중). 3. **HTTP wrapper**`kebab serve --bind 127.0.0.1:7711` (P+, local-only 가치 깨므로 신중).
--- ---
@@ -199,7 +199,7 @@ ${EDITOR:-vi} ~/.config/kb/config.toml
이 repo 는 단일 사용자 프로젝트지만 spec 변경 절차는 명문화되어 있다. 이 repo 는 단일 사용자 프로젝트지만 spec 변경 절차는 명문화되어 있다.
1. **frozen design 변경**`docs/superpowers/specs/2026-04-27-kb-final-form-design.md` 가 단일 contract. 변경 시 영향 받는 component task 모두 동시 갱신 필요. PR 1개로 묶기. 1. **frozen design 변경**`docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` 가 단일 contract. 변경 시 영향 받는 component task 모두 동시 갱신 필요. PR 1개로 묶기.
2. **새 component task 추가**`tasks/_template.md` 복사 후 `tasks/p<phase>/p<phase>-<n>-<name>.md` 생성. `contract_sections` 에 design doc 섹션 명시. `Allowed/Forbidden dependencies` 는 design §8 module-boundary 표 따름. 2. **새 component task 추가**`tasks/_template.md` 복사 후 `tasks/p<phase>/p<phase>-<n>-<name>.md` 생성. `contract_sections` 에 design doc 섹션 명시. `Allowed/Forbidden dependencies` 는 design §8 module-boundary 표 따름.
3. **구현** — component task 1개당 sub-agent 1세션 권장. `cargo test -p <crate>` + DoD 체크리스트 통과. PR 으로 머지. 3. **구현** — component task 1개당 sub-agent 1세션 권장. `cargo test -p <crate>` + DoD 체크리스트 통과. PR 으로 머지.
4. **버전 변경**`parser_version` / `chunker_version` / `embedding_version` 등 변경은 design §9 의 cascade rule 따름. 영향 받는 record 는 재처리 필요. 4. **버전 변경**`parser_version` / `chunker_version` / `embedding_version` 등 변경은 design §9 의 cascade rule 따름. 영향 받는 record 는 재처리 필요.
@@ -215,8 +215,8 @@ ${EDITOR:-vi} ~/.config/kb/config.toml
## 참고 ## 참고
- 최초 설계 보고서: [kb_local_rust_report.md](kb_local_rust_report.md) - 최초 설계 보고서: [kebab_local_rust_report.md](kebab_local_rust_report.md)
- Frozen design: [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](docs/superpowers/specs/2026-04-27-kb-final-form-design.md) - Frozen design: [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](docs/superpowers/specs/2026-04-27-kebab-final-form-design.md)
- Task 분해 plan: [docs/superpowers/plans/2026-04-27-task-decomposition.md](docs/superpowers/plans/2026-04-27-task-decomposition.md) - Task 분해 plan: [docs/superpowers/plans/2026-04-27-task-decomposition.md](docs/superpowers/plans/2026-04-27-task-decomposition.md)
- Task 인덱스: [tasks/INDEX.md](tasks/INDEX.md) - Task 인덱스: [tasks/INDEX.md](tasks/INDEX.md)
- Post-merge 핫픽스 로그: [tasks/HOTFIXES.md](tasks/HOTFIXES.md) - Post-merge 핫픽스 로그: [tasks/HOTFIXES.md](tasks/HOTFIXES.md)

View File

@@ -1,21 +1,21 @@
--- ---
title: "kb 스모크 실행 가이드" title: "kebab 스모크 실행 가이드"
date: 2026-05-01 date: 2026-05-01
--- ---
# kb 스모크 실행 가이드 # kebab 스모크 실행 가이드
P3-5 머지 후 (`kb-app::ingest` / `search` / `list` / `inspect` 와이어링) 부터, 그리고 P4-3 머지 후 (`kb ask` 와이어링) 부터 사용자가 자기 설치본을 직접 검증할 수 있다. 이 문서는 사용자 환경 (`~/.config/kb/`, `~/.local/share/kb/`) 을 건드리지 않고 임시 디렉토리에 격리된 KB 를 띄워 전체 파이프라인을 1세션 안에 한 번 돌리는 절차다. P3-5 머지 후 (`kebab-app::ingest` / `search` / `list` / `inspect` 와이어링) 부터, 그리고 P4-3 머지 후 (`kebab ask` 와이어링) 부터 사용자가 자기 설치본을 직접 검증할 수 있다. 이 문서는 사용자 환경 (`~/.config/kebab/`, `~/.local/share/kebab/`) 을 건드리지 않고 임시 디렉토리에 격리된 KB 를 띄워 전체 파이프라인을 1세션 안에 한 번 돌리는 절차다.
## 준비 ## 준비
빌드: 빌드:
```bash ```bash
cargo build --release -p kb-cli # debug 도 무방. 디버그가 더 빠르게 빌드됨. cargo build --release -p kebab-cli # debug 도 무방. 디버그가 더 빠르게 빌드됨.
``` ```
원격 Ollama (선택, `kb ask` 만 필요): 원격 Ollama (선택, `kebab ask` 만 필요):
```bash ```bash
# Mac 등 별도 호스트에서 # Mac 등 별도 호스트에서
@@ -34,8 +34,8 @@ curl http://<host>:11434/api/tags
## 격리된 워크스페이스 생성 ## 격리된 워크스페이스 생성
```bash ```bash
mkdir -p /tmp/kb-smoke/{workspace,data} mkdir -p /tmp/kebab-smoke/{workspace,data}
cat > /tmp/kb-smoke/workspace/intro.md <<'EOF' cat > /tmp/kebab-smoke/workspace/intro.md <<'EOF'
--- ---
title: 인사말 title: 인사말
tags: [demo] tags: [demo]
@@ -51,19 +51,19 @@ EOF
## 격리된 config ## 격리된 config
`/tmp/kb-smoke/config.toml`: `/tmp/kebab-smoke/config.toml`:
```toml ```toml
schema_version = 1 schema_version = 1
[workspace] [workspace]
root = "/tmp/kb-smoke/workspace" root = "/tmp/kebab-smoke/workspace"
include = ["**/*.md"] include = ["**/*.md"]
exclude = [".git/**", "node_modules/**", ".obsidian/**"] exclude = [".git/**", "node_modules/**", ".obsidian/**"]
[storage] [storage]
data_dir = "/tmp/kb-smoke/data" data_dir = "/tmp/kebab-smoke/data"
sqlite = "{data_dir}/kb.sqlite" sqlite = "{data_dir}/kebab.sqlite"
vector_dir = "{data_dir}/lancedb" vector_dir = "{data_dir}/lancedb"
asset_dir = "{data_dir}/assets" asset_dir = "{data_dir}/assets"
artifact_dir = "{data_dir}/artifacts" artifact_dir = "{data_dir}/artifacts"
@@ -110,12 +110,12 @@ explain_default = false
max_context_tokens = 6000 max_context_tokens = 6000
``` ```
`KB_*` 환경변수로 override 가능 (`KB_MODELS_LLM_MODEL=qwen2.5:32b kb …` 등). 자세한 키 목록은 `crates/kb-config/src/lib.rs``apply_env` 매치 암. `KEBAB_*` 환경변수로 override 가능 (`KEBAB_MODELS_LLM_MODEL=qwen2.5:32b kebab …` 등). 자세한 키 목록은 `crates/kebab-config/src/lib.rs``apply_env` 매치 암.
## 명령 시퀀스 ## 명령 시퀀스
```bash ```bash
KB() { ./target/debug/kb --config /tmp/kb-smoke/config.toml "$@"; } KEBAB() { ./target/debug/kebab --config /tmp/kebab-smoke/config.toml "$@"; }
KB doctor # 1. health check KB doctor # 1. health check
KB ingest # 2. 워크스페이스 색인 KB ingest # 2. 워크스페이스 색인
@@ -128,31 +128,31 @@ KB ask "이 KB 안에서 ..." --mode hybrid --k 5 # 8. RAG 답변 (Ollama
KB --json ask "..." --mode hybrid # 9. 기계 친화 출력 검증 KB --json ask "..." --mode hybrid # 9. 기계 친화 출력 검증
``` ```
각 명령은 0 종료 코드면 정상. `kb ask` 는 거절 시 종료 코드 1 (`RefusalSignal`) — 의도된 동작. 각 명령은 0 종료 코드면 정상. `kebab ask` 는 거절 시 종료 코드 1 (`RefusalSignal`) — 의도된 동작.
## 검증 체크리스트 ## 검증 체크리스트
- `kb doctor``--config` path 를 honor 하고 그 안의 `storage.data_dir` 를 출력 (XDG default 가 아님). - `kebab doctor``--config` path 를 honor 하고 그 안의 `storage.data_dir` 를 출력 (XDG default 가 아님).
- `kb ingest` idempotent — 두 번째 실행이 `new=0 updated=N`. - `kebab ingest` idempotent — 두 번째 실행이 `new=0 updated=N`.
- `kb list docs` 출력에 frontmatter 의 `title` 이 아닌 deterministic `doc_id` (32-hex) + `workspace_path` 가 보임. - `kebab list docs` 출력에 frontmatter 의 `title` 이 아닌 deterministic `doc_id` (32-hex) + `workspace_path` 가 보임.
- `kb search --mode hybrid``fusion_score``[0, 1]` 범위 (top-1 종종 1.0 — 두 retriever 모두 rank 1 일 때). - `kebab search --mode hybrid``fusion_score``[0, 1]` 범위 (top-1 종종 1.0 — 두 retriever 모두 rank 1 일 때).
- `kb ask` JSON 응답에 `model.id` 가 config 의 모델 (`gemma4:26b` 등) 과 일치, `embedding.id = multilingual-e5-small`, `citations[].marker``[1]` / `[2]` 형식 (square-bracketed bare index). - `kebab ask` JSON 응답에 `model.id` 가 config 의 모델 (`gemma4:26b` 등) 과 일치, `embedding.id = multilingual-e5-small`, `citations[].marker``[1]` / `[2]` 형식 (square-bracketed bare index).
- 코퍼스에 없는 주제로 `kb ask``refusal_reason: "llm_self_judge"` (또는 `no_chunks` / `score_gate`) + `grounded: false`. - 코퍼스에 없는 주제로 `kebab ask``refusal_reason: "llm_self_judge"` (또는 `no_chunks` / `score_gate`) + `grounded: false`.
## 정리 ## 정리
```bash ```bash
rm -rf /tmp/kb-smoke/data # 데이터만 날리고 다시 ingest 가능 rm -rf /tmp/kebab-smoke/data # 데이터만 날리고 다시 ingest 가능
rm -rf /tmp/kb-smoke # 통째로 정리 rm -rf /tmp/kebab-smoke # 통째로 정리
``` ```
`~/.config/kb/``~/.local/share/kb/` 는 한 번도 터치되지 않는다 (`--config` flag 가 정확히 honor 되는 경우 — P3-5 hotfix 이후 보장). `~/.config/kebab/``~/.local/share/kebab/` 는 한 번도 터치되지 않는다 (`--config` flag 가 정확히 honor 되는 경우 — P3-5 hotfix 이후 보장).
## 알려진 동작 ## 알려진 동작
-`kb ingest` 시 fastembed 모델 다운로드 (~470MB) — `data_dir/models/fastembed/` 에 캐시. -`kebab ingest` 시 fastembed 모델 다운로드 (~470MB) — `data_dir/models/fastembed/` 에 캐시.
- `kb ask` 응답 시간 = LLM 토큰 throughput 에 종속. M4 Pro 48GB + gemma4:26b 기준 답변 50100 토큰에 2055초. - `kebab ask` 응답 시간 = LLM 토큰 throughput 에 종속. M4 Pro 48GB + gemma4:26b 기준 답변 50100 토큰에 2055초.
- `--config` path 가 존재하지 않거나 malformed 면 `kb doctor` 가 hard fail (defaults 가 silently mask 하지 않게 하는 hotfix 동작). - `--config` path 가 존재하지 않거나 malformed 면 `kebab doctor` 가 hard fail (defaults 가 silently mask 하지 않게 하는 hotfix 동작).
- 매 CLI invocation 마다 fastembed 모델 init 비용 (~4초) — process-level 캐시 부재 때문. P9 TUI 진입 시 `App``OnceLock` 으로 세션 동안 한 번만 init. - 매 CLI invocation 마다 fastembed 모델 init 비용 (~4초) — process-level 캐시 부재 때문. P9 TUI 진입 시 `App``OnceLock` 으로 세션 동안 한 번만 init.
자세한 history 와 발견된 버그는 [tasks/HOTFIXES.md](../tasks/HOTFIXES.md) 참조. 자세한 history 와 발견된 버그는 [tasks/HOTFIXES.md](../tasks/HOTFIXES.md) 참조.

View File

@@ -5,8 +5,8 @@ When implementing tasks against this codebase:
- Treat the frozen design doc as the single source of truth. Do not invent - Treat the frozen design doc as the single source of truth. Do not invent
new fields, traits, or enum variants. new fields, traits, or enum variants.
- Prefer editing existing files to creating new ones; reuse types from - Prefer editing existing files to creating new ones; reuse types from
`kb-core` instead of duplicating shapes. `kebab-core` instead of duplicating shapes.
- For each task, follow the task spec under `tasks/p<N>/p<N>-<i>.md`. - For each task, follow the task spec under `tasks/p<N>/p<N>-<i>.md`.
Canonical source: Canonical source:
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §11 + §12. [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §11 + §12.

View File

@@ -4,4 +4,4 @@ Medium-agnostic representation of a document with `Block`s, `SourceSpan`s,
and provenance. and provenance.
Canonical source: Canonical source:
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3.4 + §3.7a. [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3.4 + §3.7a.

View File

@@ -5,4 +5,4 @@
`policy_hash` so chunk IDs include the policy. `policy_hash` so chunk IDs include the policy.
Canonical source: Canonical source:
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3.5 + §7.1 + §7.2. [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3.5 + §7.1 + §7.2.

View File

@@ -4,4 +4,4 @@ Citations use W3C Media Fragments URIs to locate evidence inside a
document. Five variants: `Line`, `Page`, `Region`, `Caption`, `Time`. document. Five variants: `Line`, `Page`, `Region`, `Caption`, `Time`.
Canonical source: Canonical source:
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3.5 + §0 Q3. [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3.5 + §0 Q3.

View File

@@ -1,6 +1,6 @@
# Domain model # Domain model
The domain types live in `kb-core` and mirror the frozen design exactly. The domain types live in `kebab-core` and mirror the frozen design exactly.
Canonical source: Canonical source:
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3. [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3.

View File

@@ -1,6 +1,6 @@
# ID recipe # ID recipe
All `kb-*` IDs are 32 hex chars: the first 32 of `blake3(canonical_json(tuple))`. All `kebab-*` IDs are 32 hex chars: the first 32 of `blake3(canonical_json(tuple))`.
Canonical source: Canonical source:
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §4. [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §4.

View File

@@ -1,8 +1,8 @@
# Module boundaries # Module boundaries
`kb-core` is leaf — every other crate depends on it. Parsers depend on `kebab-core` is leaf — every other crate depends on it. Parsers depend on
`kb-parse-types` (not on `kb-normalize`); `kb-normalize` depends on `kebab-parse-types` (not on `kebab-normalize`); `kebab-normalize` depends on
`kb-parse-types` (not on parsers). UI crates depend only on `kb-app`. `kebab-parse-types` (not on parsers). UI crates depend only on `kebab-app`.
Canonical source: Canonical source:
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §8. [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §8.

View File

@@ -8,11 +8,11 @@
- Phase A — Author the canonical task spec template (`tasks/_template.md`). - Phase A — Author the canonical task spec template (`tasks/_template.md`).
- Phase B — Decompose P1 (Markdown ingestion) into 6 component task specs to validate the template. - Phase B — Decompose P1 (Markdown ingestion) into 6 component task specs to validate the template.
- Phase C — After Phase B passes review, decompose P0 + P2..P9 into the remaining ~24 task specs in one pass. - Phase C — After Phase B passes review, decompose P0 + P2..P9 into the remaining ~24 task specs in one pass.
- Each task spec cites only the frozen design doc (`docs/superpowers/specs/2026-04-27-kb-final-form-design.md`) for types, traits, schema, layout. No new domain types or traits are introduced inside task specs. - Each task spec cites only the frozen design doc (`docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`) for types, traits, schema, layout. No new domain types or traits are introduced inside task specs.
**Tech Stack:** Plain Markdown documents under `tasks/`. No code changes in this plan — produces specs that AI sub-agents will later implement against. **Tech Stack:** Plain Markdown documents under `tasks/`. No code changes in this plan — produces specs that AI sub-agents will later implement against.
**Frozen contract source:** [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../specs/2026-04-27-kb-final-form-design.md). All task specs reference this. Modifications to the contract require updating that file first, then re-checking dependent task specs. **Frozen contract source:** [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../specs/2026-04-27-kebab-final-form-design.md). All task specs reference this. Modifications to the contract require updating that file first, then re-checking dependent task specs.
**Phase task index (target file layout):** **Phase task index (target file layout):**
@@ -83,7 +83,7 @@ Existing per-phase epic files (`tasks/phase-0-skeleton.md` … `phase-9-ui.md`)
- [ ] **Step 1: Verify the design doc path resolves** - [ ] **Step 1: Verify the design doc path resolves**
```bash ```bash
test -f docs/superpowers/specs/2026-04-27-kb-final-form-design.md && echo OK test -f docs/superpowers/specs/2026-04-27-kebab-final-form-design.md && echo OK
``` ```
Expected: `OK` Expected: `OK`
@@ -101,7 +101,7 @@ title: "<Component title>"
status: planned status: planned
depends_on: [] # other task_ids depends_on: [] # other task_ids
unblocks: [] # other task_ids unblocks: [] # other task_ids
contract_source: ../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [] # e.g. [§3.5, §5.5, §7.2] contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
--- ---
@@ -117,7 +117,7 @@ contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- <other crates per design §8> - <other crates per design §8>
- <external crates with versions> - <external crates with versions>
@@ -166,7 +166,7 @@ If any item here is needed during implementation, STOP and update the frozen des
| unit | ... | ... | | unit | ... | ... |
| snapshot | ... (JSON freeze) | `fixtures/...` | | snapshot | ... (JSON freeze) | `fixtures/...` |
| contract | trait round-trip | mock impls | | contract | trait round-trip | mock impls |
| integration | end-to-end via `kb-app` facade | tmp workspace | | integration | end-to-end via `kebab-app` facade | tmp workspace |
All tests must run under `cargo test -p <crate>` and not require external network or Ollama unless explicitly stated. All tests must run under `cargo test -p <crate>` and not require external network or Ollama unless explicitly stated.
@@ -247,7 +247,7 @@ git add tasks/p1 tasks/INDEX.md
git commit -m "tasks: prepare P1 component decomposition skeleton" git commit -m "tasks: prepare P1 component decomposition skeleton"
``` ```
### Task B1: `p1-1-source-fs.md` (kb-source-fs) ### Task B1: `p1-1-source-fs.md` (kebab-source-fs)
**Files:** **Files:**
- Create: `tasks/p1/p1-1-source-fs.md` - Create: `tasks/p1/p1-1-source-fs.md`
@@ -259,13 +259,13 @@ Write the following content to `tasks/p1/p1-1-source-fs.md`:
````markdown ````markdown
--- ---
phase: P1 phase: P1
component: kb-source-fs component: kebab-source-fs
task_id: p1-1 task_id: p1-1
title: "Local filesystem source connector" title: "Local filesystem source connector"
status: planned status: planned
depends_on: [p0-1] depends_on: [p0-1]
unblocks: [p1-2, p1-3, p1-4, p1-5, p1-6] unblocks: [p1-2, p1-3, p1-4, p1-5, p1-6]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.3, §6.2, §6.6, §7.1, §7.2 SourceConnector, §8] contract_sections: [§3.3, §6.2, §6.6, §7.1, §7.2 SourceConnector, §8]
--- ---
@@ -281,8 +281,8 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `ignore` (gitignore semantics) - `ignore` (gitignore semantics)
- `blake3` - `blake3`
- `walkdir` - `walkdir`
@@ -293,21 +293,21 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
## Forbidden dependencies ## Forbidden dependencies
- `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `SourceScope` | `kb_core::SourceScope` | `kb-app` from config | | `SourceScope` | `kebab_core::SourceScope` | `kebab-app` from config |
| filesystem | `&Path` | OS | | filesystem | `&Path` | OS |
| `.kbignore` | text file | workspace root, optional | | `.kebabignore` | text file | workspace root, optional |
## Outputs ## Outputs
| output | type | downstream consumer | | output | type | downstream consumer |
|--------|------|---------------------| |--------|------|---------------------|
| `Vec<RawAsset>` | `kb_core::RawAsset` | `kb-parse-md`, asset writer in `kb-store-sqlite` (via `kb-app`) | | `Vec<RawAsset>` | `kebab_core::RawAsset` | `kebab-parse-md`, asset writer in `kebab-store-sqlite` (via `kebab-app`) |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -315,11 +315,11 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
pub struct FsSourceConnector { /* internal */ } pub struct FsSourceConnector { /* internal */ }
impl FsSourceConnector { impl FsSourceConnector {
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>; pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
} }
impl kb_core::SourceConnector for FsSourceConnector { impl kebab_core::SourceConnector for FsSourceConnector {
fn scan(&self, scope: &kb_core::SourceScope) -> anyhow::Result<Vec<kb_core::RawAsset>>; fn scan(&self, scope: &kebab_core::SourceScope) -> anyhow::Result<Vec<kebab_core::RawAsset>>;
} }
``` ```
@@ -329,7 +329,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
- `asset_id` derived per design §4.2 from `blake3(raw bytes)` full hex. - `asset_id` derived per design §4.2 from `blake3(raw bytes)` full hex.
- `media_type` selected from extension + libmagic-like sniff fallback (`.md` → Markdown, others fall through to `MediaType::Other`). - `media_type` selected from extension + libmagic-like sniff fallback (`.md` → Markdown, others fall through to `MediaType::Other`).
- `discovered_at` = current `OffsetDateTime::now_utc()` at scan time. - `discovered_at` = current `OffsetDateTime::now_utc()` at scan time.
- Combine `config.workspace.exclude` `.kbignore` for filter (union; ordering does not matter). - Combine `config.workspace.exclude` `.kebabignore` for filter (union; ordering does not matter).
- Symbolic links: follow once, detect cycles via `canonicalize` + visited set. - Symbolic links: follow once, detect cycles via `canonicalize` + visited set.
- Files larger than `storage.copy_threshold_mb` MB → emit `AssetStorage::Reference { path, sha }` (do not copy bytes here; copying is done by the asset writer task). - Files larger than `storage.copy_threshold_mb` MB → emit `AssetStorage::Reference { path, sha }` (do not copy bytes here; copying is done by the asset writer task).
- Idempotent: same input → same `Vec<RawAsset>` (sort by `workspace_path`). - Idempotent: same input → same `Vec<RawAsset>` (sort by `workspace_path`).
@@ -337,7 +337,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
## Storage / wire effects ## Storage / wire effects
- Reads: filesystem under `config.workspace.root`. - Reads: filesystem under `config.workspace.root`.
- Writes: nothing. (Asset copy is handled by the asset writer in `kb-store-sqlite`.) - Writes: nothing. (Asset copy is handled by the asset writer in `kebab-store-sqlite`.)
## Test plan ## Test plan
@@ -346,25 +346,25 @@ impl kb_core::SourceConnector for FsSourceConnector {
| unit | POSIX path normalization | inline cases incl. `./a/b.md`, `a//b.md`, `a/b.md` → identical | | unit | POSIX path normalization | inline cases incl. `./a/b.md`, `a//b.md`, `a/b.md` → identical |
| unit | blake3 of known bytes matches expected hex | inline | | unit | blake3 of known bytes matches expected hex | inline |
| unit | gitignore filter (`*.tmp`, `node_modules/**`) excludes correctly | tmp tree built in test | | unit | gitignore filter (`*.tmp`, `node_modules/**`) excludes correctly | tmp tree built in test |
| unit | `.kbignore` config exclude works | tmp tree | | unit | `.kebabignore` config exclude works | tmp tree |
| unit | symlink cycle does not loop | tmp tree with `a -> b -> a` | | unit | symlink cycle does not loop | tmp tree with `a -> b -> a` |
| snapshot | `Vec<RawAsset>` serialized JSON for fixture tree is stable | `fixtures/source-fs/tree-1` | | snapshot | `Vec<RawAsset>` serialized JSON for fixture tree is stable | `fixtures/source-fs/tree-1` |
| determinism | re-running scan twice produces byte-identical JSON | `fixtures/source-fs/tree-1` | | determinism | re-running scan twice produces byte-identical JSON | `fixtures/source-fs/tree-1` |
All tests run under `cargo test -p kb-source-fs` with no network and no model. All tests run under `cargo test -p kebab-source-fs` with no network and no model.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-source-fs` passes - [ ] `cargo check -p kebab-source-fs` passes
- [ ] `cargo test -p kb-source-fs` passes - [ ] `cargo test -p kebab-source-fs` passes
- [ ] Snapshot test `fixtures/source-fs/tree-1` round-trips deterministically - [ ] Snapshot test `fixtures/source-fs/tree-1` round-trips deterministically
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kb-source-fs`) - [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kebab-source-fs`)
- [ ] PR description links to design §3.3, §6.2, §7.2 - [ ] PR description links to design §3.3, §6.2, §7.2
## Out of scope ## Out of scope
- File watching (P+). - File watching (P+).
- Asset copy/reference storage on disk (`kb-store-sqlite` task p1-6). - Asset copy/reference storage on disk (`kebab-store-sqlite` task p1-6).
- Non-fs source connectors (HTTP, S3 — P+). - Non-fs source connectors (HTTP, S3 — P+).
## Risks / notes ## Risks / notes
@@ -400,13 +400,13 @@ Write to `tasks/p1/p1-2-parse-md-frontmatter.md`:
````markdown ````markdown
--- ---
phase: P1 phase: P1
component: kb-parse-md (frontmatter submodule) component: kebab-parse-md (frontmatter submodule)
task_id: p1-2 task_id: p1-2
title: "Markdown frontmatter parsing → Metadata" title: "Markdown frontmatter parsing → Metadata"
status: planned status: planned
depends_on: [p0-1] depends_on: [p0-1]
unblocks: [p1-4] unblocks: [p1-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.6 Metadata, §0 Q9 frontmatter, §10 errors] contract_sections: [§3.6 Metadata, §0 Q9 frontmatter, §10 errors]
--- ---
@@ -414,15 +414,15 @@ contract_sections: [§3.6 Metadata, §0 Q9 frontmatter, §10 errors]
## Goal ## Goal
Parse YAML/TOML frontmatter from Markdown bytes into `kb_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`. Parse YAML/TOML frontmatter from Markdown bytes into `kebab_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
## Why now / why this size ## Why now / why this size
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kb-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9. Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kebab-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `serde` - `serde`
- `serde_yaml` (or `yaml-rust2`) for YAML - `serde_yaml` (or `yaml-rust2`) for YAML
- `toml` for TOML - `toml` for TOML
@@ -432,7 +432,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
## Forbidden dependencies ## Forbidden dependencies
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-tui`, `kb-desktop`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `pulldown-cmark` (block parser is a sibling task) - `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-tui`, `kebab-desktop`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `pulldown-cmark` (block parser is a sibling task)
## Inputs ## Inputs
@@ -445,7 +445,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `(Metadata, Option<FrontmatterSpan>, Vec<Warning>)` | tuple | `kb-normalize` → CanonicalDocument | | `(Metadata, Option<FrontmatterSpan>, Vec<Warning>)` | tuple | `kebab-normalize` → CanonicalDocument |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -453,7 +453,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
pub fn parse_frontmatter( pub fn parse_frontmatter(
bytes: &[u8], bytes: &[u8],
hints: &BodyHints, hints: &BodyHints,
) -> anyhow::Result<(kb_core::Metadata, Option<FrontmatterSpan>, Vec<Warning>)>; ) -> anyhow::Result<(kebab_core::Metadata, Option<FrontmatterSpan>, Vec<Warning>)>;
``` ```
`FrontmatterSpan` and `Warning` are crate-internal helpers; if any new public type is needed, STOP and update the frozen design doc first. `FrontmatterSpan` and `Warning` are crate-internal helpers; if any new public type is needed, STOP and update the frozen design doc first.
@@ -490,12 +490,12 @@ pub fn parse_frontmatter(
| snapshot | `fixtures/markdown/frontmatter-only.md` produces stable JSON | fixture | | snapshot | `fixtures/markdown/frontmatter-only.md` produces stable JSON | fixture |
| snapshot | mixed-language body with no `lang:` detects `ko` or `en` | `fixtures/markdown/mixed-lang.md` | | snapshot | mixed-language body with no `lang:` detects `ko` or `en` | `fixtures/markdown/mixed-lang.md` |
All tests under `cargo test -p kb-parse-md --lib frontmatter`. All tests under `cargo test -p kebab-parse-md --lib frontmatter`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-parse-md` passes - [ ] `cargo check -p kebab-parse-md` passes
- [ ] `cargo test -p kb-parse-md frontmatter` passes - [ ] `cargo test -p kebab-parse-md frontmatter` passes
- [ ] No `pulldown-cmark` import in this submodule - [ ] No `pulldown-cmark` import in this submodule
- [ ] Snapshot tests stable across two consecutive runs - [ ] Snapshot tests stable across two consecutive runs
- [ ] PR links design §0 Q9, §3.6 - [ ] PR links design §0 Q9, §3.6
@@ -534,13 +534,13 @@ Write to `tasks/p1/p1-3-parse-md-blocks.md`:
````markdown ````markdown
--- ---
phase: P1 phase: P1
component: kb-parse-md (blocks submodule) component: kebab-parse-md (blocks submodule)
task_id: p1-3 task_id: p1-3
title: "Markdown body → Block tree with line spans" title: "Markdown body → Block tree with line spans"
status: planned status: planned
depends_on: [p0-1] depends_on: [p0-1]
unblocks: [p1-4] unblocks: [p1-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 Block, §3.4 SourceSpan, §0 Q3 citation] contract_sections: [§3.4 Block, §3.4 SourceSpan, §0 Q3 citation]
--- ---
@@ -548,7 +548,7 @@ contract_sections: [§3.4 Block, §3.4 SourceSpan, §0 Q3 citation]
## Goal ## Goal
Parse Markdown body bytes into a flat `Vec<ParsedBlock>` (intermediate, crate-private) with heading paths and line ranges preserved, ready for `kb-normalize` to lift into `CanonicalDocument`. Parse Markdown body bytes into a flat `Vec<ParsedBlock>` (intermediate, crate-private) with heading paths and line ranges preserved, ready for `kebab-normalize` to lift into `CanonicalDocument`.
## Why now / why this size ## Why now / why this size
@@ -556,14 +556,14 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `pulldown-cmark` (CommonMark with source-map; GFM tables enabled via feature) - `pulldown-cmark` (CommonMark with source-map; GFM tables enabled via feature)
- `serde` - `serde`
- `thiserror` - `thiserror`
## Forbidden dependencies ## Forbidden dependencies
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `kb-tui`, `kb-desktop`, `comrak` (alternative parser; pick one) - `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `kebab-tui`, `kebab-desktop`, `comrak` (alternative parser; pick one)
## Inputs ## Inputs
@@ -576,7 +576,7 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `Vec<ParsedBlock>` (intermediate type, crate-private) | | `kb-normalize` | | `Vec<ParsedBlock>` (intermediate type, crate-private) | | `kebab-normalize` |
| `Vec<Warning>` | | propagated into Provenance | | `Vec<Warning>` | | propagated into Provenance |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -585,7 +585,7 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec<ParsedBlock>, Vec<Warning>)>; pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec<ParsedBlock>, Vec<Warning>)>;
``` ```
`ParsedBlock` is a crate-internal mirror that maps 1:1 to `kb_core::Block` variants once `kb-normalize` assigns `BlockId`s. `ParsedBlock` is a crate-internal mirror that maps 1:1 to `kebab_core::Block` variants once `kebab-normalize` assigns `BlockId`s.
## Behavior contract ## Behavior contract
@@ -593,7 +593,7 @@ pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec<
- Heading tree: every block records its ancestor heading texts in order (e.g., `["아키텍처", "Chunking 정책"]`). - Heading tree: every block records its ancestor heading texts in order (e.g., `["아키텍처", "Chunking 정책"]`).
- Code blocks: language tag preserved (` ```rust ``Some("rust")`), fenced content not split. - Code blocks: language tag preserved (` ```rust ``Some("rust")`), fenced content not split.
- Tables: GFM tables produce `TableBlock` with header row + body rows; if a table cell is malformed, fall back to a `Paragraph` block + warning. - Tables: GFM tables produce `TableBlock` with header row + body rows; if a table cell is malformed, fall back to a `Paragraph` block + warning.
- Image references: `![alt](src)` produces `ImageRefBlock` with `asset_id = None`, `src = "..."`, `alt = "..."`. Resolution to `AssetId` happens later in `kb-normalize`. - Image references: `![alt](src)` produces `ImageRefBlock` with `asset_id = None`, `src = "..."`, `alt = "..."`. Resolution to `AssetId` happens later in `kebab-normalize`.
- Lists: ordered/unordered preserved; nested list items flattened into one `ListBlock` with each top-level item's text. - Lists: ordered/unordered preserved; nested list items flattened into one `ListBlock` with each top-level item's text.
- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per design §3.4). Drop other inlines silently. - Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per design §3.4). Drop other inlines silently.
- Malformed input never panics. Worst case: empty `Vec<ParsedBlock>` + warning. - Malformed input never panics. Worst case: empty `Vec<ParsedBlock>` + warning.
@@ -616,12 +616,12 @@ pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec<
| snapshot | `fixtures/markdown/nested-headings.md` → ParsedBlock JSON stable | fixture | | snapshot | `fixtures/markdown/nested-headings.md` → ParsedBlock JSON stable | fixture |
| snapshot | `fixtures/markdown/code-and-table.md` → JSON stable | fixture | | snapshot | `fixtures/markdown/code-and-table.md` → JSON stable | fixture |
All tests under `cargo test -p kb-parse-md --lib blocks`. All tests under `cargo test -p kebab-parse-md --lib blocks`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-parse-md` passes - [ ] `cargo check -p kebab-parse-md` passes
- [ ] `cargo test -p kb-parse-md blocks` passes - [ ] `cargo test -p kebab-parse-md blocks` passes
- [ ] Snapshot tests stable across two runs - [ ] Snapshot tests stable across two runs
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.4 - [ ] PR links design §3.4
@@ -629,7 +629,7 @@ All tests under `cargo test -p kb-parse-md --lib blocks`.
## Out of scope ## Out of scope
- Frontmatter (p1-2). - Frontmatter (p1-2).
- Lifting `ParsedBlock``kb_core::Block` with `BlockId` (p1-4). - Lifting `ParsedBlock``kebab_core::Block` with `BlockId` (p1-4).
- Chunking (p1-5). - Chunking (p1-5).
## Risks / notes ## Risks / notes
@@ -658,13 +658,13 @@ Write to `tasks/p1/p1-4-normalize.md`:
````markdown ````markdown
--- ---
phase: P1 phase: P1
component: kb-normalize component: kebab-normalize
task_id: p1-4 task_id: p1-4
title: "Lift parser output → CanonicalDocument with deterministic IDs" title: "Lift parser output → CanonicalDocument with deterministic IDs"
status: planned status: planned
depends_on: [p1-2, p1-3] depends_on: [p1-2, p1-3]
unblocks: [p1-5, p1-6] unblocks: [p1-5, p1-6]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4, §4 ID recipe, §3.6 Provenance] contract_sections: [§3.4, §4 ID recipe, §3.6 Provenance]
--- ---
@@ -676,12 +676,12 @@ Combine `Metadata` (p1-2) + `Vec<ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a
## Why now / why this size ## Why now / why this size
Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate. Single responsibility: ID generation + struct assembly. Keeps `kebab-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `serde` - `serde`
- `serde-json-canonicalizer` (canonical JSON for ID hashing) - `serde-json-canonicalizer` (canonical JSON for ID hashing)
- `blake3` - `blake3`
@@ -691,38 +691,38 @@ Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` pure
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md` (consumed via plain types only — must not couple back), `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-source-fs`, `kebab-parse-md` (consumed via plain types only — must not couple back), `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
Note: this crate accepts `ParsedBlock` from `kb-parse-md` either by (a) exposing `ParsedBlock` as a `kb-core` type, or (b) `kb-parse-md` re-exporting via a public DTO. Pick (a): move `ParsedBlock` into `kb-core` so this task does not import `kb-parse-md`. Note: this crate accepts `ParsedBlock` from `kebab-parse-md` either by (a) exposing `ParsedBlock` as a `kebab-core` type, or (b) `kebab-parse-md` re-exporting via a public DTO. Pick (a): move `ParsedBlock` into `kebab-core` so this task does not import `kebab-parse-md`.
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `RawAsset` | `kb_core::RawAsset` | p1-1 | | `RawAsset` | `kebab_core::RawAsset` | p1-1 |
| `Metadata` + frontmatter span + warnings | from p1-2 | parser caller | | `Metadata` + frontmatter span + warnings | from p1-2 | parser caller |
| `Vec<ParsedBlock>` + warnings | from p1-3 | parser caller | | `Vec<ParsedBlock>` + warnings | from p1-3 | parser caller |
| `parser_version` | `kb_core::ParserVersion` | constant in `kb-parse-md` | | `parser_version` | `kebab_core::ParserVersion` | constant in `kebab-parse-md` |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk`, `kb-store-sqlite` | | `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk`, `kebab-store-sqlite` |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
```rust ```rust
pub fn build_canonical_document( pub fn build_canonical_document(
asset: &kb_core::RawAsset, asset: &kebab_core::RawAsset,
metadata: kb_core::Metadata, metadata: kebab_core::Metadata,
blocks: Vec<kb_core::ParsedBlock>, blocks: Vec<kebab_core::ParsedBlock>,
parser_version: &kb_core::ParserVersion, parser_version: &kebab_core::ParserVersion,
warnings: Vec<Warning>, warnings: Vec<Warning>,
) -> anyhow::Result<kb_core::CanonicalDocument>; ) -> anyhow::Result<kebab_core::CanonicalDocument>;
pub fn id_for_doc(workspace_path: &kb_core::WorkspacePath, asset: &kb_core::AssetId, parser_version: &kb_core::ParserVersion) -> kb_core::DocumentId; pub fn id_for_doc(workspace_path: &kebab_core::WorkspacePath, asset: &kebab_core::AssetId, parser_version: &kebab_core::ParserVersion) -> kebab_core::DocumentId;
pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kb_core::SourceSpan) -> kb_core::BlockId; pub fn id_for_block(doc: &kebab_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kebab_core::SourceSpan) -> kebab_core::BlockId;
``` ```
## Behavior contract ## Behavior contract
@@ -750,14 +750,14 @@ pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[Strin
| unit | provenance contains Discovered/Parsed/Normalized in order | inline | | unit | provenance contains Discovered/Parsed/Normalized in order | inline |
| snapshot | `fixtures/markdown/code-and-table.md` → CanonicalDocument JSON stable (incl. all IDs) | fixture | | snapshot | `fixtures/markdown/code-and-table.md` → CanonicalDocument JSON stable (incl. all IDs) | fixture |
All tests under `cargo test -p kb-normalize`. All tests under `cargo test -p kebab-normalize`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-normalize` passes - [ ] `cargo check -p kebab-normalize` passes
- [ ] `cargo test -p kb-normalize` passes - [ ] `cargo test -p kebab-normalize` passes
- [ ] Determinism test runs ≥ 1000 iterations under 1 second - [ ] Determinism test runs ≥ 1000 iterations under 1 second
- [ ] No `kb-parse-md` import (consumed via `kb-core::ParsedBlock`) - [ ] No `kebab-parse-md` import (consumed via `kebab-core::ParsedBlock`)
- [ ] PR links design §4.2, §4.3 - [ ] PR links design §4.2, §4.3
## Out of scope ## Out of scope
@@ -791,13 +791,13 @@ Write to `tasks/p1/p1-5-chunk.md`:
````markdown ````markdown
--- ---
phase: P1 phase: P1
component: kb-chunk component: kebab-chunk
task_id: p1-5 task_id: p1-5
title: "Markdown heading-aware chunker (md-heading-v1)" title: "Markdown heading-aware chunker (md-heading-v1)"
status: planned status: planned
depends_on: [p1-4] depends_on: [p1-4]
unblocks: [p1-6, p2-2, p3-2] unblocks: [p1-6, p2-2, p3-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §7.2 Chunker, §0 Q3 citation] contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §7.2 Chunker, §0 Q3 citation]
--- ---
@@ -813,8 +813,8 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `serde` - `serde`
- `blake3` (policy_hash) - `blake3` (policy_hash)
- `serde-json-canonicalizer` - `serde-json-canonicalizer`
@@ -822,30 +822,30 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize` (consumes `CanonicalDocument` only via `kb-core`), `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize` (consumes `CanonicalDocument` only via `kebab-core`), `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 | | `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` from config | | `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` from config |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite` (p1-6), `kb-embed*` (P3) | | `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite` (p1-6), `kebab-embed*` (P3) |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
```rust ```rust
pub struct MdHeadingV1Chunker; pub struct MdHeadingV1Chunker;
impl kb_core::Chunker for MdHeadingV1Chunker { impl kebab_core::Chunker for MdHeadingV1Chunker {
fn chunker_version(&self) -> kb_core::ChunkerVersion; fn chunker_version(&self) -> kebab_core::ChunkerVersion;
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String; fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>; fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
} }
``` ```
@@ -882,12 +882,12 @@ impl kb_core::Chunker for MdHeadingV1Chunker {
| determinism | identical input + identical policy → identical chunk_ids | inline | | determinism | identical input + identical policy → identical chunk_ids | inline |
| snapshot | `fixtures/markdown/long-section.md` → Vec<Chunk> JSON stable | fixture | | snapshot | `fixtures/markdown/long-section.md` → Vec<Chunk> JSON stable | fixture |
All tests under `cargo test -p kb-chunk`. All tests under `cargo test -p kebab-chunk`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-chunk` passes - [ ] `cargo check -p kebab-chunk` passes
- [ ] `cargo test -p kb-chunk` passes - [ ] `cargo test -p kebab-chunk` passes
- [ ] Snapshot stable across two runs - [ ] Snapshot stable across two runs
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.5, §4.2 - [ ] PR links design §3.5, §4.2
@@ -924,13 +924,13 @@ Write to `tasks/p1/p1-6-store-sqlite.md`:
````markdown ````markdown
--- ---
phase: P1 phase: P1
component: kb-store-sqlite (P1 subset) component: kebab-store-sqlite (P1 subset)
task_id: p1-6 task_id: p1-6
title: "SQLite store: assets/documents/blocks/chunks + asset writer + migrations" title: "SQLite store: assets/documents/blocks/chunks + asset writer + migrations"
status: planned status: planned
depends_on: [p1-1, p1-4, p1-5] depends_on: [p1-1, p1-4, p1-5]
unblocks: [p2-1, p3-3, p4-3] unblocks: [p2-1, p3-3, p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5 DDL (5.1, 5.2, 5.3, 5.4, 5.5 chunks only — FTS handled in p2-1), §5.7 jobs/ingest_runs, §5.8 transactions, §6.3 data_dir layout] contract_sections: [§5 DDL (5.1, 5.2, 5.3, 5.4, 5.5 chunks only — FTS handled in p2-1), §5.7 jobs/ingest_runs, §5.8 transactions, §6.3 data_dir layout]
--- ---
@@ -946,8 +946,8 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `rusqlite` (with `bundled-sqlcipher` disabled; use `bundled` feature) - `rusqlite` (with `bundled-sqlcipher` disabled; use `bundled` feature)
- `refinery` for migrations - `refinery` for migrations
- `serde_json` - `serde_json`
@@ -958,7 +958,7 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs` (only types via `kb-core`), `kb-parse-md`, `kb-normalize`, `kb-chunk` (only types via `kb-core`), `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-source-fs` (only types via `kebab-core`), `kebab-parse-md`, `kebab-normalize`, `kebab-chunk` (only types via `kebab-core`), `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
@@ -966,17 +966,17 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|-------|------|--------| |-------|------|--------|
| migrations | `migrations/V001__init.sql` | repo | | migrations | `migrations/V001__init.sql` | repo |
| `RawAsset` + bytes | `(RawAsset, Vec<u8>)` | p1-1 + reader | | `RawAsset` + bytes | `(RawAsset, Vec<u8>)` | p1-1 + reader |
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 | | `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
| `Vec<Chunk>` | `kb_core::Chunk` | p1-5 | | `Vec<Chunk>` | `kebab_core::Chunk` | p1-5 |
| `IngestRun` aggregates | `(scope, counts, duration)` | `kb-app` | | `IngestRun` aggregates | `(scope, counts, duration)` | `kebab-app` |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `data_dir/kb.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | | every later phase | | `data_dir/kebab.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | | every later phase |
| `data_dir/assets/<aa>/<asset_id>` bytes (when copied) | | future re-extraction, integrity verification | | `data_dir/assets/<aa>/<asset_id>` bytes (when copied) | | future re-extraction, integrity verification |
| `IngestReport` (wire schema v1) | `kb_core::IngestReport` | `kb-cli`, eval | | `IngestReport` (wire schema v1) | `kebab_core::IngestReport` | `kebab-cli`, eval |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -984,23 +984,23 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
pub struct SqliteStore { /* internal */ } pub struct SqliteStore { /* internal */ }
impl SqliteStore { impl SqliteStore {
pub fn open(config: &kb_config::Config) -> anyhow::Result<Self>; pub fn open(config: &kebab_config::Config) -> anyhow::Result<Self>;
pub fn run_migrations(&self) -> anyhow::Result<()>; pub fn run_migrations(&self) -> anyhow::Result<()>;
pub fn put_asset_with_bytes(&self, asset: &kb_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>; pub fn put_asset_with_bytes(&self, asset: &kebab_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
} }
impl kb_core::DocumentStore for SqliteStore { impl kebab_core::DocumentStore for SqliteStore {
fn put_asset(&self, a: &kb_core::RawAsset) -> anyhow::Result<()>; fn put_asset(&self, a: &kebab_core::RawAsset) -> anyhow::Result<()>;
fn put_document(&self, d: &kb_core::CanonicalDocument) -> anyhow::Result<()>; fn put_document(&self, d: &kebab_core::CanonicalDocument) -> anyhow::Result<()>;
fn put_blocks(&self, doc: &kb_core::DocumentId, blocks: &[kb_core::Block]) -> anyhow::Result<()>; fn put_blocks(&self, doc: &kebab_core::DocumentId, blocks: &[kebab_core::Block]) -> anyhow::Result<()>;
fn put_chunks(&self, doc: &kb_core::DocumentId, chunks: &[kb_core::Chunk]) -> anyhow::Result<()>; fn put_chunks(&self, doc: &kebab_core::DocumentId, chunks: &[kebab_core::Chunk]) -> anyhow::Result<()>;
fn get_document(&self, id: &kb_core::DocumentId) -> anyhow::Result<Option<kb_core::CanonicalDocument>>; fn get_document(&self, id: &kebab_core::DocumentId) -> anyhow::Result<Option<kebab_core::CanonicalDocument>>;
fn get_chunk(&self, id: &kb_core::ChunkId) -> anyhow::Result<Option<kb_core::Chunk>>; fn get_chunk(&self, id: &kebab_core::ChunkId) -> anyhow::Result<Option<kebab_core::Chunk>>;
fn list_documents(&self, filter: &kb_core::DocFilter) -> anyhow::Result<Vec<kb_core::DocSummary>>; fn list_documents(&self, filter: &kebab_core::DocFilter) -> anyhow::Result<Vec<kebab_core::DocSummary>>;
} }
impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ } impl kebab_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
``` ```
## Behavior contract ## Behavior contract
@@ -1020,7 +1020,7 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
## Storage / wire effects ## Storage / wire effects
- Writes: `kb.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case). - Writes: `kebab.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
- Reads on subsequent calls: same DB. - Reads on subsequent calls: same DB.
## Test plan ## Test plan
@@ -1036,14 +1036,14 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
| contract | DocumentStore trait round-trip for fixture document | `fixtures/markdown/code-and-table.md` | | contract | DocumentStore trait round-trip for fixture document | `fixtures/markdown/code-and-table.md` |
| snapshot | IngestReport JSON for fixture run | fixture | | snapshot | IngestReport JSON for fixture run | fixture |
All tests under `cargo test -p kb-store-sqlite` with no network. All tests under `cargo test -p kebab-store-sqlite` with no network.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-store-sqlite` passes - [ ] `cargo check -p kebab-store-sqlite` passes
- [ ] `cargo test -p kb-store-sqlite` passes - [ ] `cargo test -p kebab-store-sqlite` passes
- [ ] migration `V001__init.sql` matches design §5 verbatim (diff-checked in CI) - [ ] migration `V001__init.sql` matches design §5 verbatim (diff-checked in CI)
- [ ] Writes to `~/.local/share/kb/` are gated by `kb-config`'s `data_dir` and never escape it - [ ] Writes to `~/.local/share/kebab/` are gated by `kebab-config`'s `data_dir` and never escape it
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §5 - [ ] PR links design §5
@@ -1056,7 +1056,7 @@ All tests under `cargo test -p kb-store-sqlite` with no network.
## Risks / notes ## Risks / notes
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kb.sqlite-wal` / `-shm`. - WAL mode requires careful test cleanup: tests must drop the connection before removing `kebab.sqlite-wal` / `-shm`.
- Asset directory shard prefix uses `asset_id[..2]`; using `asset_id[..1]` would create at most 16 dirs (insufficient). - Asset directory shard prefix uses `asset_id[..2]`; using `asset_id[..1]` would create at most 16 dirs (insufficient).
```` ````
@@ -1074,7 +1074,7 @@ git commit -m "tasks: add p1-6 store-sqlite component spec"
```bash ```bash
ls tasks/p1/ | sort ls tasks/p1/ | sort
for f in tasks/p1/p1-*.md; do grep -q '2026-04-27-kb-final-form-design.md' "$f" || echo "MISSING REF in $f"; done for f in tasks/p1/p1-*.md; do grep -q '2026-04-27-kebab-final-form-design.md' "$f" || echo "MISSING REF in $f"; done
echo done echo done
``` ```
@@ -1106,9 +1106,9 @@ For each component task below, the steps are: (1) write file, (2) verify, (3) co
**Files:** Create: `tasks/p0/p0-1-skeleton.md`. Also `mkdir -p tasks/p0`. **Files:** Create: `tasks/p0/p0-1-skeleton.md`. Also `mkdir -p tasks/p0`.
`contract_sections`: §3 (all subsections), §4, §5 (migrations meta only), §6, §7, §8, §10. `Allowed`: workspace + `kb-core` + `kb-config` + `kb-app` + `kb-cli` only. `contract_sections`: §3 (all subsections), §4, §5 (migrations meta only), §6, §7, §8, §10. `Allowed`: workspace + `kebab-core` + `kebab-config` + `kebab-app` + `kebab-cli` only.
Body covers: workspace `Cargo.toml` resolver=3, edition 2024, member list (`kb-core`, `kb-config`, `kb-app`, `kb-cli`), workspace dependencies, `kb-core` types and traits per design §3 / §7, deterministic ID functions per §4 (with full unit tests), `kb-config` loader (TOML + env + CLI override per §6.4), `kb-app` facade signatures (`ingest`, `search`, `ask`, `inspect_doc`, `inspect_chunk`, `doctor`, `init`), `kb-cli` skeleton with clap + `--help`. DoD: `cargo check --workspace`, `cargo test --workspace` (Newtype+ID+canonical-json tests only), `kb --help` works, `docs/spec/*` stubs created (link to frozen design doc), `docs/wire-schema/v1/*.schema.json` stubs (one file per object in §2). Body covers: workspace `Cargo.toml` resolver=3, edition 2024, member list (`kebab-core`, `kebab-config`, `kebab-app`, `kebab-cli`), workspace dependencies, `kebab-core` types and traits per design §3 / §7, deterministic ID functions per §4 (with full unit tests), `kebab-config` loader (TOML + env + CLI override per §6.4), `kebab-app` facade signatures (`ingest`, `search`, `ask`, `inspect_doc`, `inspect_chunk`, `doctor`, `init`), `kebab-cli` skeleton with clap + `--help`. DoD: `cargo check --workspace`, `cargo test --workspace` (Newtype+ID+canonical-json tests only), `kebab --help` works, `docs/spec/*` stubs created (link to frozen design doc), `docs/wire-schema/v1/*.schema.json` stubs (one file per object in §2).
- [ ] **Step 1: Create directory and file** - [ ] **Step 1: Create directory and file**
@@ -1134,11 +1134,11 @@ git -c user.name=kb -c user.email=kb@local commit -m "tasks: add p0-1 skeleton c
#### `p2-1-fts-schema.md` #### `p2-1-fts-schema.md`
`contract_sections`: §5.5 FTS5 + triggers, §9 versioning. `Allowed`: `kb-core`, `kb-config`, `kb-store-sqlite` (extends migrations). `depends_on: [p1-6]`. Migration `V002__fts.sql` adds `chunks_fts` virtual table and three triggers verbatim from §5.5. Tests: backfill from existing chunks via `INSERT INTO chunks_fts SELECT ... FROM chunks`, then assert FTS row count == chunks row count; insert/update/delete in `chunks` reflects in `chunks_fts`. `contract_sections`: §5.5 FTS5 + triggers, §9 versioning. `Allowed`: `kebab-core`, `kebab-config`, `kebab-store-sqlite` (extends migrations). `depends_on: [p1-6]`. Migration `V002__fts.sql` adds `chunks_fts` virtual table and three triggers verbatim from §5.5. Tests: backfill from existing chunks via `INSERT INTO chunks_fts SELECT ... FROM chunks`, then assert FTS row count == chunks row count; insert/update/delete in `chunks` reflects in `chunks_fts`.
#### `p2-2-lexical-retriever.md` #### `p2-2-lexical-retriever.md`
`contract_sections`: §3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5 search output (for snippet length defaults), §2.2 wire schema. `Allowed`: `kb-core`, `kb-config`, `kb-store-sqlite`. `depends_on: [p2-1]`. Implements `Retriever` trait with `bm25(chunks_fts)` ranking, snippet via SQLite `snippet()` (≤ `snippet_chars` chars), citation built per §0 Q3 from `source_spans`. Tests: top-k correctness on fixture corpus, citation line range round-trip against original Markdown, deterministic across two runs. `contract_sections`: §3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5 search output (for snippet length defaults), §2.2 wire schema. `Allowed`: `kebab-core`, `kebab-config`, `kebab-store-sqlite`. `depends_on: [p2-1]`. Implements `Retriever` trait with `bm25(chunks_fts)` ranking, snippet via SQLite `snippet()` (≤ `snippet_chars` chars), citation built per §0 Q3 from `source_spans`. Tests: top-k correctness on fixture corpus, citation line range round-trip against original Markdown, deterministic across two runs.
- [ ] **Step 1: Create directory and both spec files** (template per Phase B; bodies as described above). - [ ] **Step 1: Create directory and both spec files** (template per Phase B; bodies as described above).
- [ ] **Step 2: Verify with `for f in tasks/p2/p2-*.md; do test -s "$f" || echo MISSING $f; done; echo done` (expect only `done`).** - [ ] **Step 2: Verify with `for f in tasks/p2/p2-*.md; do test -s "$f" || echo MISSING $f; done; echo done` (expect only `done`).**
@@ -1152,10 +1152,10 @@ git add tasks/p2 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
**Files:** `mkdir -p tasks/p3` and create `p3-1-embedder-trait.md`, `p3-2-fastembed-adapter.md`, `p3-3-lancedb-store.md`, `p3-4-hybrid-fusion.md`. **Files:** `mkdir -p tasks/p3` and create `p3-1-embedder-trait.md`, `p3-2-fastembed-adapter.md`, `p3-3-lancedb-store.md`, `p3-4-hybrid-fusion.md`.
- `p3-1-embedder-trait.md`: §3.7, §7.2 Embedder, §11. Allowed: `kb-core`, `kb-config`. No external embedding dep. Public surface: `Embedder` trait + `EmbeddingInput`/`EmbeddingKind` already in core (validate they exist; if not, this task is also a `kb-core` patch). `depends_on: [p0-1]`. Tests: trait dyn dispatch, mock embedder. - `p3-1-embedder-trait.md`: §3.7, §7.2 Embedder, §11. Allowed: `kebab-core`, `kebab-config`. No external embedding dep. Public surface: `Embedder` trait + `EmbeddingInput`/`EmbeddingKind` already in core (validate they exist; if not, this task is also a `kebab-core` patch). `depends_on: [p0-1]`. Tests: trait dyn dispatch, mock embedder.
- `p3-2-fastembed-adapter.md`: §11.3, §6.4 `[models.embedding]`. Allowed: `kb-core`, `kb-config`, `fastembed`, `tokenizers`, `ort`. `depends_on: [p3-1]`. Provides `FastembedEmbedder` implementing `Embedder` for `multilingual-e5-small` (default), with required Document/Query prefix per §11.3. Tests: dimension check, deterministic vector for fixed input (hash compare on first 8 floats with epsilon), batch size respected. - `p3-2-fastembed-adapter.md`: §11.3, §6.4 `[models.embedding]`. Allowed: `kebab-core`, `kebab-config`, `fastembed`, `tokenizers`, `ort`. `depends_on: [p3-1]`. Provides `FastembedEmbedder` implementing `Embedder` for `multilingual-e5-small` (default), with required Document/Query prefix per §11.3. Tests: dimension check, deterministic vector for fixed input (hash compare on first 8 floats with epsilon), batch size respected.
- `p3-3-lancedb-store.md`: §3.5, §5.6 embedding_records, §6.3 lancedb table naming. Allowed: `kb-core`, `kb-config`, `lancedb`, `arrow`, `kb-store-sqlite` (write `embedding_records` row only — no other table). `depends_on: [p3-2, p1-6]`. Implements `VectorStore` trait. Table naming `chunk_embeddings_<model>_<dim>.lance`. `ensure_table` creates if missing. `upsert` inserts vectors and writes a matching `embedding_records` row in same logical operation (best-effort 2PC: lance commit, then SQLite insert; on SQLite failure, log warning + leave lance row — re-upsert is idempotent because of the `UNIQUE(chunk_id, model_id, model_version, dimensions)` constraint and lance upsert semantics). `search` filters via SearchFilters and returns top-k. Tests: smoke (insert+search), dimension mismatch error, model isolation (two models stay in two tables). - `p3-3-lancedb-store.md`: §3.5, §5.6 embedding_records, §6.3 lancedb table naming. Allowed: `kebab-core`, `kebab-config`, `lancedb`, `arrow`, `kebab-store-sqlite` (write `embedding_records` row only — no other table). `depends_on: [p3-2, p1-6]`. Implements `VectorStore` trait. Table naming `chunk_embeddings_<model>_<dim>.lance`. `ensure_table` creates if missing. `upsert` inserts vectors and writes a matching `embedding_records` row in same logical operation (best-effort 2PC: lance commit, then SQLite insert; on SQLite failure, log warning + leave lance row — re-upsert is idempotent because of the `UNIQUE(chunk_id, model_id, model_version, dimensions)` constraint and lance upsert semantics). `search` filters via SearchFilters and returns top-k. Tests: smoke (insert+search), dimension mismatch error, model isolation (two models stay in two tables).
- `p3-4-hybrid-fusion.md`: §3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 `[search]` rrf settings. Allowed: `kb-core`, `kb-config`, `kb-store-sqlite` (lexical Retriever from p2-2), `kb-store-vector` (vector Retriever wrapper around `VectorStore::search`). `depends_on: [p2-2, p3-3]`. Implements `HybridRetriever` that dispatches by `SearchMode`, fuses with RRF (k from config, default 60), populates `lexical_score`, `vector_score`, `lexical_rank`, `vector_rank`, `fusion_score`. Tests: pure lexical mode == p2-2 output; pure vector mode == p3-3 output; hybrid produces strictly larger or equal coverage of expected hits than either single mode on a small fixture; deterministic. - `p3-4-hybrid-fusion.md`: §3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 `[search]` rrf settings. Allowed: `kebab-core`, `kebab-config`, `kebab-store-sqlite` (lexical Retriever from p2-2), `kebab-store-vector` (vector Retriever wrapper around `VectorStore::search`). `depends_on: [p2-2, p3-3]`. Implements `HybridRetriever` that dispatches by `SearchMode`, fuses with RRF (k from config, default 60), populates `lexical_score`, `vector_score`, `lexical_rank`, `vector_rank`, `fusion_score`. Tests: pure lexical mode == p2-2 output; pure vector mode == p3-3 output; hybrid produces strictly larger or equal coverage of expected hits than either single mode on a small fixture; deterministic.
- [ ] **Step 1: Create directory and 4 files** (template per Phase B). - [ ] **Step 1: Create directory and 4 files** (template per Phase B).
- [ ] **Step 2: Verify** - [ ] **Step 2: Verify**
@@ -1176,9 +1176,9 @@ git add tasks/p3 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
**Files:** `mkdir -p tasks/p4` and create `p4-1-llm-trait.md`, `p4-2-ollama-adapter.md`, `p4-3-rag-pipeline.md`. **Files:** `mkdir -p tasks/p4` and create `p4-1-llm-trait.md`, `p4-2-ollama-adapter.md`, `p4-3-rag-pipeline.md`.
- `p4-1-llm-trait.md`: §7.2 LanguageModel + TokenChunk, §0 Q5 streaming, §3.8 Answer types referenced. Allowed: `kb-core`, `kb-config`. `depends_on: [p0-1]`. Defines (or validates) `LanguageModel` trait, `GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage` per design. Tests: trait dyn dispatch, mock LM streams 3 tokens. - `p4-1-llm-trait.md`: §7.2 LanguageModel + TokenChunk, §0 Q5 streaming, §3.8 Answer types referenced. Allowed: `kebab-core`, `kebab-config`. `depends_on: [p0-1]`. Defines (or validates) `LanguageModel` trait, `GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage` per design. Tests: trait dyn dispatch, mock LM streams 3 tokens.
- `p4-2-ollama-adapter.md`: §11.2 Ollama, §6.4 `[models.llm]`, §0 Q5 streaming. Allowed: `kb-core`, `kb-config`, `reqwest` (blocking + json + stream feature) or `ureq` + manual SSE; `serde_json`, `tokio`/runtime if needed. `depends_on: [p4-1]`. Implements `OllamaLanguageModel` with streaming `/api/generate`. `temperature=0.0` default, `seed` honored for determinism. Reachability/missing-model errors map to `LlmError` per design §10. Tests: against a mock HTTP server (`wiremock` or hand-rolled `tiny_http`); deterministic stream collect equals buffered concatenation; missing model returns `LlmError::ModelNotPulled` with proper hint. - `p4-2-ollama-adapter.md`: §11.2 Ollama, §6.4 `[models.llm]`, §0 Q5 streaming. Allowed: `kebab-core`, `kebab-config`, `reqwest` (blocking + json + stream feature) or `ureq` + manual SSE; `serde_json`, `tokio`/runtime if needed. `depends_on: [p4-1]`. Implements `OllamaLanguageModel` with streaming `/api/generate`. `temperature=0.0` default, `seed` honored for determinism. Reachability/missing-model errors map to `LlmError` per design §10. Tests: against a mock HTTP server (`wiremock` or hand-rolled `tiny_http`); deterministic stream collect equals buffered concatenation; missing model returns `LlmError::ModelNotPulled` with proper hint.
- `p4-3-rag-pipeline.md`: §0 Q4 refusal (two-layer), §0 Q7 footer, §1.11.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 `[rag]`. Allowed: `kb-core`, `kb-config`, `kb-search` (Retriever), `kb-llm` (LanguageModel). `depends_on: [p3-4, p4-2]`. Pipeline: retrieve top-k → score gate (`refusal_reason: ScoreGate` if top1 < gate) → context packer (token budget + heading_path header `[#n doc=… heading=… span=…]`) → render `rag-v1` prompt → stream → collect → citation extraction (regex `\[(\d+)\]`) → citation validation (each `[n]` must map to a packed chunk; otherwise `grounded=false`, `refusal_reason: LlmSelfJudge`) → write `answers` row. Tests: happy path produces grounded Answer with citations; query with all chunks below gate produces ScoreGate refusal; query whose LLM emits a citation pointing to non-existent `[7]` becomes LlmSelfJudge refusal; identical query under temperature=0 produces byte-identical Answer (snapshot). - `p4-3-rag-pipeline.md`: §0 Q4 refusal (two-layer), §0 Q7 footer, §1.11.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 `[rag]`. Allowed: `kebab-core`, `kebab-config`, `kebab-search` (Retriever), `kebab-llm` (LanguageModel). `depends_on: [p3-4, p4-2]`. Pipeline: retrieve top-k → score gate (`refusal_reason: ScoreGate` if top1 < gate) → context packer (token budget + heading_path header `[#n doc=… heading=… span=…]`) → render `rag-v1` prompt → stream → collect → citation extraction (regex `\[(\d+)\]`) → citation validation (each `[n]` must map to a packed chunk; otherwise `grounded=false`, `refusal_reason: LlmSelfJudge`) → write `answers` row. Tests: happy path produces grounded Answer with citations; query with all chunks below gate produces ScoreGate refusal; query whose LLM emits a citation pointing to non-existent `[7]` becomes LlmSelfJudge refusal; identical query under temperature=0 produces byte-identical Answer (snapshot).
- [ ] **Step 1, 2, 3** as in C3. - [ ] **Step 1, 2, 3** as in C3.
@@ -1193,8 +1193,8 @@ git add tasks/p4 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
**Files:** `mkdir -p tasks/p5` and create `p5-1-golden-fixture-runner.md`, `p5-2-metrics-compare.md`. **Files:** `mkdir -p tasks/p5` and create `p5-1-golden-fixture-runner.md`, `p5-2-metrics-compare.md`.
- `p5-1-golden-fixture-runner.md`: phase epic + §5.7 eval_runs/eval_query_results, §6.3 runs_dir. Allowed: `kb-core`, `kb-config`, `kb-app` (calls facade for search/ask), `serde_yaml`. `depends_on: [p4-3]`. Loads `fixtures/golden_queries.yaml`, runs each query in selected mode (lexical/vector/hybrid/rag), captures per-query results to `eval_query_results` and to `runs_dir/<run_id>/per_query.jsonl`. Tests: fixture with 3 queries runs end-to-end on a tiny corpus, all rows recorded. - `p5-1-golden-fixture-runner.md`: phase epic + §5.7 eval_runs/eval_query_results, §6.3 runs_dir. Allowed: `kebab-core`, `kebab-config`, `kebab-app` (calls facade for search/ask), `serde_yaml`. `depends_on: [p4-3]`. Loads `fixtures/golden_queries.yaml`, runs each query in selected mode (lexical/vector/hybrid/rag), captures per-query results to `eval_query_results` and to `runs_dir/<run_id>/per_query.jsonl`. Tests: fixture with 3 queries runs end-to-end on a tiny corpus, all rows recorded.
- `p5-2-metrics-compare.md`: phase epic, §0 Q6 wire schema. Allowed: `kb-core`, `kb-config`, `kb-store-sqlite` (read eval rows). `depends_on: [p5-1]`. Computes hit@k, MRR, recall@k_doc, citation_coverage, groundedness (rule-based via `must_contain`), empty_result_rate, refusal_correctness. `kb eval compare a b` produces wins/losses/draws + delta. Tests: fixed input rows produce expected metric values; compare produces stable sorted output. - `p5-2-metrics-compare.md`: phase epic, §0 Q6 wire schema. Allowed: `kebab-core`, `kebab-config`, `kebab-store-sqlite` (read eval rows). `depends_on: [p5-1]`. Computes hit@k, MRR, recall@k_doc, citation_coverage, groundedness (rule-based via `must_contain`), empty_result_rate, refusal_correctness. `kebab eval compare a b` produces wins/losses/draws + delta. Tests: fixed input rows produce expected metric values; compare produces stable sorted output.
- [ ] **Step 1, 2, 3** as in C3. - [ ] **Step 1, 2, 3** as in C3.
@@ -1209,9 +1209,9 @@ git add tasks/p5 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
`mkdir -p tasks/p6` and create: `mkdir -p tasks/p6` and create:
- `p6-1-image-extractor-exif.md`: phase epic §9.1, §3.4 ImageRefBlock, §3.7a ImageType. Allowed: `kb-core`, `kb-config`, `image`, `kamadak-exif`. Implements `Extractor` for `MediaType::Image(_)` producing a `CanonicalDocument` whose body is exactly one `ImageRefBlock`. EXIF goes to `metadata.user`. `depends_on: [p0-1, p1-6]`. Tests: PNG/JPEG decode metadata; EXIF extraction; deterministic doc_id. - `p6-1-image-extractor-exif.md`: phase epic §9.1, §3.4 ImageRefBlock, §3.7a ImageType. Allowed: `kebab-core`, `kebab-config`, `image`, `kamadak-exif`. Implements `Extractor` for `MediaType::Image(_)` producing a `CanonicalDocument` whose body is exactly one `ImageRefBlock`. EXIF goes to `metadata.user`. `depends_on: [p0-1, p1-6]`. Tests: PNG/JPEG decode metadata; EXIF extraction; deterministic doc_id.
- `p6-2-ocr-adapter.md`: phase epic §9.1. Allowed: `kb-core`, `kb-config`, `image`, OS-specific OCR (feature `apple-vision` for macOS via sidecar binary; feature `tesseract` for cross-platform; default tesseract). Defines `OcrEngine` trait + adapter. Populates `ImageRefBlock.ocr` `OcrText` (`joined`, regions, engine, engine_version). `depends_on: [p6-1]`. Tests: deterministic text on a fixed fixture image with high-confidence text. - `p6-2-ocr-adapter.md`: phase epic §9.1. Allowed: `kebab-core`, `kebab-config`, `image`, OS-specific OCR (feature `apple-vision` for macOS via sidecar binary; feature `tesseract` for cross-platform; default tesseract). Defines `OcrEngine` trait + adapter. Populates `ImageRefBlock.ocr` `OcrText` (`joined`, regions, engine, engine_version). `depends_on: [p6-1]`. Tests: deterministic text on a fixed fixture image with high-confidence text.
- `p6-3-caption-adapter.md`: phase epic §9.1 caption section, §3.7a ModelCaption. Allowed: `kb-core`, `kb-config`, `kb-llm` (reuse LanguageModel for VLM). Optional/feature-gated. `depends_on: [p6-1, p4-2]`. Populates `ImageRefBlock.caption`. Tests: with mock LM, caption recorded with model id; absence of feature flag leaves caption=None. - `p6-3-caption-adapter.md`: phase epic §9.1 caption section, §3.7a ModelCaption. Allowed: `kebab-core`, `kebab-config`, `kebab-llm` (reuse LanguageModel for VLM). Optional/feature-gated. `depends_on: [p6-1, p4-2]`. Populates `ImageRefBlock.caption`. Tests: with mock LM, caption recorded with model id; absence of feature flag leaves caption=None.
- [ ] **Step 1, 2, 3** as in C3. - [ ] **Step 1, 2, 3** as in C3.
@@ -1226,8 +1226,8 @@ git add tasks/p6 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
`mkdir -p tasks/p7` and create: `mkdir -p tasks/p7` and create:
- `p7-1-pdf-text-extractor.md`: phase epic §9.2, §3.4 SourceSpan::Page. Allowed: `kb-core`, `kb-config`, `pdf-extract`, `lopdf` (page metadata). `depends_on: [p0-1, p1-6]`. Extractor for `MediaType::Pdf` produces a `CanonicalDocument` with one `Paragraph` per page, `SourceSpan::Page`. Failed-text pages are emitted as paragraphs with empty text and a `Provenance` warning marking them as scanned candidates. Tests: page count, span correctness, failure handling. - `p7-1-pdf-text-extractor.md`: phase epic §9.2, §3.4 SourceSpan::Page. Allowed: `kebab-core`, `kebab-config`, `pdf-extract`, `lopdf` (page metadata). `depends_on: [p0-1, p1-6]`. Extractor for `MediaType::Pdf` produces a `CanonicalDocument` with one `Paragraph` per page, `SourceSpan::Page`. Failed-text pages are emitted as paragraphs with empty text and a `Provenance` warning marking them as scanned candidates. Tests: page count, span correctness, failure handling.
- `p7-2-pdf-page-chunker.md`: phase epic §9.2, §3.5, §0 Q3 citation. Allowed: `kb-core`, `kb-config`. New chunker version `pdf-page-v1` that respects page boundaries. `depends_on: [p7-1]`. Tests: chunk does not cross page boundary; very long page subdivides per `target_tokens`. - `p7-2-pdf-page-chunker.md`: phase epic §9.2, §3.5, §0 Q3 citation. Allowed: `kebab-core`, `kebab-config`. New chunker version `pdf-page-v1` that respects page boundaries. `depends_on: [p7-1]`. Tests: chunk does not cross page boundary; very long page subdivides per `target_tokens`.
- [ ] **Step 1, 2, 3** as in C3. - [ ] **Step 1, 2, 3** as in C3.
@@ -1242,7 +1242,7 @@ git add tasks/p7 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
`mkdir -p tasks/p8` and create: `mkdir -p tasks/p8` and create:
- `p8-1-whisper-adapter.md`: phase epic §9.3, §3.4 AudioRefBlock + `Transcript`. Allowed: `kb-core`, `kb-config`, whisper.cpp Rust binding (`whisper-rs`) or sidecar binary. `depends_on: [p0-1, p1-6]`. Implements `Transcriber` trait. Default model `large-v3` via config; tests use a tiny model (e.g., `base.en`) for speed. Tests: monotone segment timestamps, language detection populated, deterministic transcript on fixed audio. - `p8-1-whisper-adapter.md`: phase epic §9.3, §3.4 AudioRefBlock + `Transcript`. Allowed: `kebab-core`, `kebab-config`, whisper.cpp Rust binding (`whisper-rs`) or sidecar binary. `depends_on: [p0-1, p1-6]`. Implements `Transcriber` trait. Default model `large-v3` via config; tests use a tiny model (e.g., `base.en`) for speed. Tests: monotone segment timestamps, language detection populated, deterministic transcript on fixed audio.
- `p8-2-segment-chunker.md`: phase epic §9.3, §3.5. New `audio-segment-v1` chunker that groups segments up to `target_tokens` with priority on speaker turn boundaries (when present). `depends_on: [p8-1]`. Tests: chunk timestamp == first/last segment timestamp; speaker change forces split. - `p8-2-segment-chunker.md`: phase epic §9.3, §3.5. New `audio-segment-v1` chunker that groups segments up to `target_tokens` with priority on speaker turn boundaries (when present). `depends_on: [p8-1]`. Tests: chunk timestamp == first/last segment timestamp; speaker change forces split.
- [ ] **Step 1, 2, 3** as in C3. - [ ] **Step 1, 2, 3** as in C3.
@@ -1258,11 +1258,11 @@ git add tasks/p8 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
`mkdir -p tasks/p9` and create: `mkdir -p tasks/p9` and create:
- `p9-1-tui-library.md`: phase epic §16.2, §3.7. Allowed: `kb-core`, `kb-app` only (UI law). `ratatui`, `crossterm`. `depends_on: [p1-6]`. Library list view + tag filter. Tests: snapshot of rendered frame against fixture corpus list. - `p9-1-tui-library.md`: phase epic §16.2, §3.7. Allowed: `kebab-core`, `kebab-app` only (UI law). `ratatui`, `crossterm`. `depends_on: [p1-6]`. Library list view + tag filter. Tests: snapshot of rendered frame against fixture corpus list.
- `p9-2-tui-search.md`: phase epic §16.2, §1.5. Allowed: same as p9-1. `depends_on: [p2-2, p3-4]`. Search input + result list + preview pane; `Enter` triggers external editor jump (`$EDITOR +<line> <path>`). Tests: search results render; `g` keybinding constructs the correct editor command. - `p9-2-tui-search.md`: phase epic §16.2, §1.5. Allowed: same as p9-1. `depends_on: [p2-2, p3-4]`. Search input + result list + preview pane; `Enter` triggers external editor jump (`$EDITOR +<line> <path>`). Tests: search results render; `g` keybinding constructs the correct editor command.
- `p9-3-tui-ask.md`: phase epic §16.2, §1.1, §1.2. Allowed: same. `depends_on: [p4-3]`. Ask pane shows streaming tokens; `--explain` toggle. Tests: streaming render, refusal render. - `p9-3-tui-ask.md`: phase epic §16.2, §1.1, §1.2. Allowed: same. `depends_on: [p4-3]`. Ask pane shows streaming tokens; `--explain` toggle. Tests: streaming render, refusal render.
- `p9-4-tui-inspect.md`: §1.6 inspect, §3.5. Allowed: same. `depends_on: [p1-6, p3-3]`. Renders Document and Chunk inspection per wire schemas 2.5/2.6. - `p9-4-tui-inspect.md`: §1.6 inspect, §3.5. Allowed: same. `depends_on: [p1-6, p3-3]`. Renders Document and Chunk inspection per wire schemas 2.5/2.6.
- `p9-5-desktop-tauri.md`: phase epic §16.3, §1 all scenes. Allowed: `kb-core`, `kb-app`, Tauri backend; frontend stack TBD by user (vanilla TS by default). `depends_on: [p9-1, p9-2, p9-3, p9-4]`. Backend exposes Tauri commands that wrap `kb-app` 1:1. Source viewer per medium (Markdown render, PDF page, image with region overlay, audio with seek). Tests: backend command unit tests (no frontend e2e in this task). - `p9-5-desktop-tauri.md`: phase epic §16.3, §1 all scenes. Allowed: `kebab-core`, `kebab-app`, Tauri backend; frontend stack TBD by user (vanilla TS by default). `depends_on: [p9-1, p9-2, p9-3, p9-4]`. Backend exposes Tauri commands that wrap `kebab-app` 1:1. Source viewer per medium (Markdown render, PDF page, image with region overlay, audio with seek). Tests: backend command unit tests (no frontend e2e in this task).
- [ ] **Step 1, 2, 3** as in C3. - [ ] **Step 1, 2, 3** as in C3.
@@ -1351,5 +1351,5 @@ git -c user.name=kb -c user.email=kb@local commit -m "tasks: update INDEX with f
- [ ] `tasks/p0/`, `tasks/p1/` … `tasks/p9/` exist with the component spec files listed above. - [ ] `tasks/p0/`, `tasks/p1/` … `tasks/p9/` exist with the component spec files listed above.
- [ ] Every component spec contains: frontmatter (with `contract_sections`), Allowed/Forbidden, Inputs, Outputs, Public surface, Behavior contract, Storage/wire effects, Test plan, Definition of Done, Out of scope, Risks/notes. - [ ] Every component spec contains: frontmatter (with `contract_sections`), Allowed/Forbidden, Inputs, Outputs, Public surface, Behavior contract, Storage/wire effects, Test plan, Definition of Done, Out of scope, Risks/notes.
- [ ] `tasks/INDEX.md` lists every component task. - [ ] `tasks/INDEX.md` lists every component task.
- [ ] No new domain types introduced inside any component spec — every type referenced is defined in [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../specs/2026-04-27-kb-final-form-design.md). - [ ] No new domain types introduced inside any component spec — every type referenced is defined in [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../specs/2026-04-27-kebab-final-form-design.md).
- [ ] All commits authored sequentially per task; rollback is per-task. - [ ] All commits authored sequentially per task; rollback is per-task.

View File

@@ -3,7 +3,7 @@ title: "KB v1 최종 결과물 형태 — Frozen Design"
date: 2026-04-27 date: 2026-04-27
status: frozen status: frozen
purpose: 작은 단위 분해 작업 시 spec 변경을 막기 위한 단단한 contract 동결 purpose: 작은 단위 분해 작업 시 spec 변경을 막기 위한 단단한 contract 동결
source_report: ../../../kb_local_rust_report.md source_report: ../../../kebab_local_rust_report.md
related_tasks: ../../../tasks/INDEX.md related_tasks: ../../../tasks/INDEX.md
--- ---
@@ -11,7 +11,7 @@ related_tasks: ../../../tasks/INDEX.md
이 문서는 사용자가 만족할 **최종 결과물의 매우 구체적 형태**를 동결한다. 각 phase 의 task 분해는 이 contract 위에서 수행되며, 이 문서가 바뀌지 않는 한 task 들의 인터페이스는 변하지 않는다. 이 문서는 사용자가 만족할 **최종 결과물의 매우 구체적 형태**를 동결한다. 각 phase 의 task 분해는 이 contract 위에서 수행되며, 이 문서가 바뀌지 않는 한 task 들의 인터페이스는 변하지 않는다.
전제 보고서는 [`kb_local_rust_report.md`](../../../kb_local_rust_report.md). 그 보고서가 *방향*과 *근거*를 제공하며, 이 문서가 *형태*를 못박는다. 전제 보고서는 [`kebab_local_rust_report.md`](../../../kebab_local_rust_report.md). 그 보고서가 *방향*과 *근거*를 제공하며, 이 문서가 *형태*를 못박는다.
--- ---
@@ -20,7 +20,7 @@ related_tasks: ../../../tasks/INDEX.md
| # | 결정 | 값 | 근거 | | # | 결정 | 값 | 근거 |
|---|------|-----|------| |---|------|-----|------|
| Q1 | scope 우선순위 | UX → Data 역도출 | 사용자 만족이 spec 안정성 lever | | Q1 | scope 우선순위 | UX → Data 역도출 | 사용자 만족이 spec 안정성 lever |
| Q2 | headline UX | `kb ask` 답변 화면 | 검색/citation/RAG/refusal/모델메타 모두 노출 | | Q2 | headline UX | `kebab ask` 답변 화면 | 검색/citation/RAG/refusal/모델메타 모두 노출 |
| | ask 기본 형식 | inline numeric refs `[1]…[n]` + footer | 일상 가독성 | | | ask 기본 형식 | inline numeric refs `[1]…[n]` + footer | 일상 가독성 |
| | ask `--explain` | per-claim 분해 + verbose footer + retrieval trace | 디버그 단일 플래그 | | | ask `--explain` | per-claim 분해 + verbose footer + retrieval trace | 디버그 단일 플래그 |
| Q3 | citation 문자열 | URI fragment (`path#k=v…`, W3C Media Fragments) | 표준 정합 + Windows path 안전 + 브라우저 자동 스크롤 | | Q3 | citation 문자열 | URI fragment (`path#k=v…`, W3C Media Fragments) | 표준 정합 + Windows path 안전 + 브라우저 자동 스크롤 |
@@ -34,7 +34,7 @@ related_tasks: ../../../tasks/INDEX.md
| Q10 | workspace | single root + XDG layout | personal v1 적정 | | Q10 | workspace | single root + XDG layout | personal v1 적정 |
| | asset 보존 | content-addressable copy, `copy_threshold_mb=100` 초과 시 reference + checksum | reproducibility + 디스크 절감 | | | asset 보존 | content-addressable copy, `copy_threshold_mb=100` 초과 시 reference + checksum | reproducibility + 디스크 절감 |
| | wire 버전 | additive within `vN`, breaking → `vN+1` | 외부 깨짐 방지 | | | wire 버전 | additive within `vN`, breaking → `vN+1` | 외부 깨짐 방지 |
| | ignore | gitignore 문법 + `.kbignore` | 익숙함 | | | ignore | gitignore 문법 + `.kebabignore` | 익숙함 |
| | 에러 | thiserror per crate, anyhow at boundary | 추적성 + UX | | | 에러 | thiserror per crate, anyhow at boundary | 추적성 + UX |
| | sync | watch=false default | v1 명시 ingest | | | sync | watch=false default | v1 명시 ingest |
@@ -42,43 +42,43 @@ related_tasks: ../../../tasks/INDEX.md
## 1. Headline UX scenes ## 1. Headline UX scenes
### 1.1 `kb ask` (default) ### 1.1 `kebab ask` (default)
```text ```text
$ kb ask "Markdown chunking 규칙은?" $ kebab ask "Markdown chunking 규칙은?"
heading boundary 우선 [1]. code block 중간 분할 금지 [2]. table 가능한 한 heading boundary 우선 [1]. code block 중간 분할 금지 [2]. table 가능한 한
단일 chunk 유지 [2]. 긴 section 은 paragraph 단위로 분할 [1]. chunk 마다 단일 chunk 유지 [2]. 긴 section 은 paragraph 단위로 분할 [1]. chunk 마다
heading_path 와 source_span 보존 [1]. heading_path 와 source_span 보존 [1].
───────────────────────────────────────────────────────── ─────────────────────────────────────────────────────────
[1] notes/rust/kb-architecture.md#L661-L672 [1] notes/rust/kebab-architecture.md#L661-L672
§14 Chunking 정책 §14 Chunking 정책
[2] notes/rust/kb-architecture.md#L665-L668 [2] notes/rust/kebab-architecture.md#L665-L668
§14 Chunking 정책 §14 Chunking 정책
grounded ✓ qwen2.5:14b-instruct rag-v1 3 chunks grounded ✓ qwen2.5:14b-instruct rag-v1 3 chunks
``` ```
### 1.2 `kb ask --explain` ### 1.2 `kebab ask --explain`
```text ```text
$ kb ask --explain "Markdown chunking 규칙은?" $ kebab ask --explain "Markdown chunking 규칙은?"
▎ heading boundary 우선 ▎ heading boundary 우선
└ notes/rust/kb-architecture.md#L662 └ notes/rust/kebab-architecture.md#L662
「heading boundary를 우선한다」 「heading boundary를 우선한다」
▎ code block 중간 분할 금지 ▎ code block 중간 분할 금지
└ notes/rust/kb-architecture.md#L663 └ notes/rust/kebab-architecture.md#L663
「code block은 중간에서 자르지 않는다」 「code block은 중간에서 자르지 않는다」
▎ table 단일 chunk 유지 ▎ table 단일 chunk 유지
└ notes/rust/kb-architecture.md#L664 └ notes/rust/kebab-architecture.md#L664
「table은 가능한 한 하나의 chunk로」 「table은 가능한 한 하나의 chunk로」
▎ heading_path / source_span 보존 ▎ heading_path / source_span 보존
└ notes/rust/kb-architecture.md#L668-L670 └ notes/rust/kebab-architecture.md#L668-L670
retrieval trace retrieval trace
query "Markdown chunking 규칙은?" query "Markdown chunking 규칙은?"
@@ -87,8 +87,8 @@ retrieval trace
threshold (gate) 0.30 → top-1 0.82 pass threshold (gate) 0.30 → top-1 0.82 pass
fusion rrf (k=60) fusion rrf (k=60)
chunks (used) 3 / 8 returned chunks (used) 3 / 8 returned
#1 0.82 notes/rust/kb-architecture.md#L661-L672 bm25=12.4 vec=0.78 #1 0.82 notes/rust/kebab-architecture.md#L661-L672 bm25=12.4 vec=0.78
#2 0.78 notes/rust/kb-architecture.md#L692-L713 bm25=10.1 vec=0.74 #2 0.78 notes/rust/kebab-architecture.md#L692-L713 bm25=10.1 vec=0.74
#3 0.55 guides/markdown-style.md#L4-L18 bm25=8.2 vec=0.61 #3 0.55 guides/markdown-style.md#L4-L18 bm25=8.2 vec=0.61
grounded ✓ qwen2.5:14b-instruct rag-v1 3 chunks grounded ✓ qwen2.5:14b-instruct rag-v1 3 chunks
@@ -96,10 +96,10 @@ prompt 1184 tokens completion 312 tokens latency 1842 ms
embedding multilingual-e5-small index v1.0 embedding multilingual-e5-small index v1.0
``` ```
### 1.3 `kb ask` (refusal — score gate) ### 1.3 `kebab ask` (refusal — score gate)
```text ```text
$ kb ask "당신의 회사 매출은?" $ kebab ask "당신의 회사 매출은?"
근거 부족. KB 에 해당 내용 없음. 근거 부족. KB 에 해당 내용 없음.
가까운 후보 (모두 임계 0.30 미만): 가까운 후보 (모두 임계 0.30 미만):
@@ -108,10 +108,10 @@ $ kb ask "당신의 회사 매출은?"
grounded ✗ qwen2.5:14b-instruct rag-v1 0 chunks used grounded ✗ qwen2.5:14b-instruct rag-v1 0 chunks used
``` ```
### 1.4 `kb ask` (refusal — LLM self-judge) ### 1.4 `kebab ask` (refusal — LLM self-judge)
```text ```text
$ kb ask "이 책의 23쪽 결론은?" $ kebab ask "이 책의 23쪽 결론은?"
근거 부족. 제공된 chunk 중 결론 내용 없음. 근거 부족. 제공된 chunk 중 결론 내용 없음.
검색은 됨, LLM 이 결론 부재 판단: 검색은 됨, LLM 이 결론 부재 판단:
@@ -121,17 +121,17 @@ $ kb ask "이 책의 23쪽 결론은?"
grounded ✗ qwen2.5:14b-instruct rag-v1 3 chunks searched, 0 grounded grounded ✗ qwen2.5:14b-instruct rag-v1 3 chunks searched, 0 grounded
``` ```
### 1.5 `kb search` (dense) ### 1.5 `kebab search` (dense)
```text ```text
$ kb search "Markdown chunking 규칙" $ kebab search "Markdown chunking 규칙"
1. 0.82 notes/rust/kb-architecture.md#L661-L672 1. 0.82 notes/rust/kebab-architecture.md#L661-L672
§14 Chunking 정책 §14 Chunking 정책
heading boundary 우선. code block 중간 분할 금지. heading boundary 우선. code block 중간 분할 금지.
table 가능한 한 단일 chunk… table 가능한 한 단일 chunk…
2. 0.71 notes/rust/kb-architecture.md#L692-L713 2. 0.71 notes/rust/kebab-architecture.md#L692-L713
§15 검색과 RAG 정책 §15 검색과 RAG 정책
검색은 처음부터 hybrid 로 설계하되 구현은 단계적… 검색은 처음부터 hybrid 로 설계하되 구현은 단계적…
@@ -142,7 +142,7 @@ $ kb search "Markdown chunking 규칙"
3 hits hybrid index v1.0 bm25+e5-small/RRF 3 hits hybrid index v1.0 bm25+e5-small/RRF
``` ```
### 1.6 `kb search --explain` ### 1.6 `kebab search --explain`
각 hit 아래 추가: 각 hit 아래 추가:
@@ -174,8 +174,8 @@ $ kb search "Markdown chunking 규칙"
{ {
"schema_version": "citation.v1", "schema_version": "citation.v1",
"kind": "line|page|region|caption|time", "kind": "line|page|region|caption|time",
"path": "notes/rust/kb.md", "path": "notes/rust/kebab.md",
"uri": "notes/rust/kb.md#L12-L34", "uri": "notes/rust/kebab.md#L12-L34",
"line": { "start": 12, "end": 34, "section": "§14 Chunking 정책" }, "line": { "start": 12, "end": 34, "section": "§14 Chunking 정책" },
"page": { "page": 13, "section": "Experiment Setup" }, "page": { "page": 13, "section": "Experiment Setup" },
@@ -196,7 +196,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
"score": 0.82, "score": 0.82,
"chunk_id": "9b4a8c1e7d3f2a05", "chunk_id": "9b4a8c1e7d3f2a05",
"doc_id": "3f9a2c10ee4d6b78", "doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kb-architecture.md", "doc_path": "notes/rust/kebab-architecture.md",
"heading_path": ["아키텍처", "Chunking 정책"], "heading_path": ["아키텍처", "Chunking 정책"],
"section_label": "§14 Chunking 정책", "section_label": "§14 Chunking 정책",
"snippet": "heading boundary 우선. code block 중간 분할 금지…", "snippet": "heading boundary 우선. code block 중간 분할 금지…",
@@ -261,7 +261,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
{ {
"kind": "new|updated|skipped|error", "kind": "new|updated|skipped|error",
"doc_id": "3f9a2c10ee4d6b78", "doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kb-architecture.md", "doc_path": "notes/rust/kebab-architecture.md",
"asset_id": "8c1e7d3f2a05", "asset_id": "8c1e7d3f2a05",
"byte_len": 41822, "byte_len": 41822,
"block_count": 184, "block_count": 184,
@@ -277,13 +277,13 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
`--summary-only``items: null`. `--summary-only``items: null`.
### 2.5 DocSummary (`kb list docs`) ### 2.5 DocSummary (`kebab list docs`)
```json ```json
{ {
"schema_version": "doc_summary.v1", "schema_version": "doc_summary.v1",
"doc_id": "3f9a2c10ee4d6b78", "doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kb-architecture.md", "doc_path": "notes/rust/kebab-architecture.md",
"title": "Rust 로컬 Knowledge Base 설계", "title": "Rust 로컬 Knowledge Base 설계",
"lang": "ko", "lang": "ko",
"tags": ["knowledge-base", "rust", "rag"], "tags": ["knowledge-base", "rust", "rag"],
@@ -305,7 +305,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
"schema_version": "chunk_inspection.v1", "schema_version": "chunk_inspection.v1",
"chunk_id": "9b4a8c1e7d3f2a05", "chunk_id": "9b4a8c1e7d3f2a05",
"doc_id": "3f9a2c10ee4d6b78", "doc_id": "3f9a2c10ee4d6b78",
"doc_path": "notes/rust/kb-architecture.md", "doc_path": "notes/rust/kebab-architecture.md",
"heading_path": ["아키텍처", "Chunking 정책"], "heading_path": ["아키텍처", "Chunking 정책"],
"text": "heading boundary 우선…", "text": "heading boundary 우선…",
"source_spans": [{ "kind": "line", "start": 661, "end": 672 }], "source_spans": [{ "kind": "line", "start": 661, "end": 672 }],
@@ -325,9 +325,9 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
"schema_version": "doctor.v1", "schema_version": "doctor.v1",
"ok": true, "ok": true,
"checks": [ "checks": [
{ "name": "config_loaded", "ok": true, "detail": "~/.config/kb/config.toml" }, { "name": "config_loaded", "ok": true, "detail": "~/.config/kebab/config.toml" },
{ "name": "data_dir_writable", "ok": true, "detail": "~/.local/share/kb" }, { "name": "data_dir_writable", "ok": true, "detail": "~/.local/share/kebab" },
{ "name": "sqlite_open", "ok": true, "detail": "kb.sqlite (schema v1)" }, { "name": "sqlite_open", "ok": true, "detail": "kebab.sqlite (schema v1)" },
{ "name": "lancedb_open", "ok": true, "detail": "lancedb/" }, { "name": "lancedb_open", "ok": true, "detail": "lancedb/" },
{ "name": "embedding_model", "ok": true, "detail": "multilingual-e5-small (384d)" }, { "name": "embedding_model", "ok": true, "detail": "multilingual-e5-small (384d)" },
{ "name": "ollama_reachable", "ok": true, "detail": "http://127.0.0.1:11434" }, { "name": "ollama_reachable", "ok": true, "detail": "http://127.0.0.1:11434" },
@@ -346,7 +346,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
--- ---
## 3. 도메인 모델 (kb-core) ## 3. 도메인 모델 (kebab-core)
### 3.1 Newtype IDs ### 3.1 Newtype IDs
@@ -581,7 +581,7 @@ pub struct RetrievalDetail {
### 3.7a Forward-declared types ### 3.7a Forward-declared types
`Block::ImageRef` / `AudioRef` variant 은 v1 부터 존재하나, 그 안의 `ocr` / `caption` / `transcript` 필드는 P1 에선 항상 `None`. 다음 타입은 `kb-core` 에 stub 으로 둠 (최종 도메인 모델 슬롯): `Block::ImageRef` / `AudioRef` variant 은 v1 부터 존재하나, 그 안의 `ocr` / `caption` / `transcript` 필드는 P1 에선 항상 `None`. 다음 타입은 `kebab-core` 에 stub 으로 둠 (최종 도메인 모델 슬롯):
```rust ```rust
pub struct OcrText { pub joined: String, pub regions: Vec<OcrRegion>, pub engine: String, pub engine_version: String } pub struct OcrText { pub joined: String, pub regions: Vec<OcrRegion>, pub engine: String, pub engine_version: String }
@@ -596,41 +596,41 @@ pub enum ImageType { Png, Jpeg, Webp, Gif, Tiff, Other(String) }
pub enum AudioType { M4a, Mp3, Wav, Flac, Ogg, Other(String) } pub enum AudioType { M4a, Mp3, Wav, Flac, Ogg, Other(String) }
``` ```
`ExtractConfig`, `DocFilter`, `JobKind`, `JobStatus`, `JobFilter`, `JobRow`, `JobId`, `VectorRecord`, `VectorHit`, `RefusalSignal`, `NoHitSignal`, `DoctorUnhealthy``kb-core` 에 정의 (자세한 필드는 사용 시 결정, 이 spec 에서 forward-ref 만 보장). `ExtractConfig`, `DocFilter`, `JobKind`, `JobStatus`, `JobFilter`, `JobRow`, `JobId`, `VectorRecord`, `VectorHit`, `RefusalSignal`, `NoHitSignal`, `DoctorUnhealthy``kebab-core` 에 정의 (자세한 필드는 사용 시 결정, 이 spec 에서 forward-ref 만 보장).
`OffsetDateTime``time::OffsetDateTime`, `Result` 는 crate-local alias. `OffsetDateTime``time::OffsetDateTime`, `Result` 는 crate-local alias.
### 3.7b Parser intermediate types — `kb-parse-types` ### 3.7b Parser intermediate types — `kebab-parse-types`
Parser 의 *중간* 표현 (`ParsedBlock` 류) 은 `kb-core` 가 아니라 별도의 thin crate **`kb-parse-types`** 에 둔다. 이유: `kb-normalize` 는 medium-agnostic 한 ID/Provenance lift 를 책임지고 어떤 parser 도 직접 import 하면 안 된다. 그러나 normalize 에 들어오는 입력 타입이 어딘가에 정의되어야 하는데, 그것을 `kb-core` 에 박으면 (a) parser-별 ParsedBlock 변종 (`ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`) 이 향후 합류할 때 core 의 namespace 가 폭발하고, (b) parser 의 의미 변경이 core 변경이 되어 모든 의존자가 영향을 받는다. Parser 의 *중간* 표현 (`ParsedBlock` 류) 은 `kebab-core` 가 아니라 별도의 thin crate **`kebab-parse-types`** 에 둔다. 이유: `kebab-normalize` 는 medium-agnostic 한 ID/Provenance lift 를 책임지고 어떤 parser 도 직접 import 하면 안 된다. 그러나 normalize 에 들어오는 입력 타입이 어딘가에 정의되어야 하는데, 그것을 `kebab-core` 에 박으면 (a) parser-별 ParsedBlock 변종 (`ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`) 이 향후 합류할 때 core 의 namespace 가 폭발하고, (b) parser 의 의미 변경이 core 변경이 되어 모든 의존자가 영향을 받는다.
`kb-parse-types` 는 이 둘 사이의 **유일한 layer** 다. 의존 그래프: `kebab-parse-types` 는 이 둘 사이의 **유일한 layer** 다. 의존 그래프:
```text ```text
kb-core (도메인 모델 — Block, Chunk, SourceSpan, IDs, …) kebab-core (도메인 모델 — Block, Chunk, SourceSpan, IDs, …)
kb-parse-types (parser 중간 표현 — ParsedBlock, ParsedImageRegion[P+], ParsedPdfPage[P+], ParsedAudioSegment[P+], Inline) kebab-parse-types (parser 중간 표현 — ParsedBlock, ParsedImageRegion[P+], ParsedPdfPage[P+], ParsedAudioSegment[P+], Inline)
▲ ▲ ▲ ▲
│ │ │ │
kb-parse-md, kb-parse-pdf, kb-normalize kebab-parse-md, kebab-parse-pdf, kebab-normalize
kb-parse-image, kb-parse-audio kebab-parse-image, kebab-parse-audio
``` ```
`kb-parse-types` 는: `kebab-parse-types` 는:
- `kb-core` 에만 의존 (`Block`, `SourceSpan`, `Lang` 등 도메인 타입 사용). - `kebab-core` 에만 의존 (`Block`, `SourceSpan`, `Lang` 등 도메인 타입 사용).
- 다른 어떤 `kb-*` 에도 의존하지 않는다. - 다른 어떤 `kebab-*` 에도 의존하지 않는다.
- 어떤 parser 의 구체 라이브러리 (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`) 에도 의존하지 않는다. - 어떤 parser 의 구체 라이브러리 (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`) 에도 의존하지 않는다.
- serde + thiserror 정도의 외부 의존만 가진다. - serde + thiserror 정도의 외부 의존만 가진다.
P1 에서 정의되는 타입: P1 에서 정의되는 타입:
```rust ```rust
// kb-parse-types — depends on kb-core only. // kebab-parse-types — depends on kebab-core only.
pub struct ParsedBlock { pub struct ParsedBlock {
pub kind: ParsedBlockKind, pub kind: ParsedBlockKind,
pub heading_path: Vec<String>, pub heading_path: Vec<String>,
pub source_span: kb_core::SourceSpan, pub source_span: kebab_core::SourceSpan,
pub payload: ParsedPayload, pub payload: ParsedPayload,
} }
@@ -638,11 +638,11 @@ pub enum ParsedBlockKind { Heading, Paragraph, List, Code, Table, Quote, ImageRe
pub enum ParsedPayload { pub enum ParsedPayload {
Heading { level: u8, text: String }, Heading { level: u8, text: String },
Paragraph { text: String, inlines: Vec<kb_core::Inline> }, Paragraph { text: String, inlines: Vec<kebab_core::Inline> },
List { ordered: bool, items: Vec<Vec<kb_core::Inline>> }, List { ordered: bool, items: Vec<Vec<kebab_core::Inline>> },
Code { lang: Option<String>, code: String }, Code { lang: Option<String>, code: String },
Table { headers: Vec<String>, rows: Vec<Vec<String>> }, Table { headers: Vec<String>, rows: Vec<Vec<String>> },
Quote { text: String, inlines: Vec<kb_core::Inline> }, Quote { text: String, inlines: Vec<kebab_core::Inline> },
ImageRef { src: String, alt: String }, ImageRef { src: String, alt: String },
AudioRef { src: String }, // duration_ms filled by extractor before chunking AudioRef { src: String }, // duration_ms filled by extractor before chunking
} }
@@ -651,7 +651,7 @@ pub struct Warning { pub kind: WarningKind, pub note: String }
pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed } pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed }
``` ```
`Inline``kb-core` (§3.4) 에 있는 도메인 타입. `kb-parse-types` 는 그것을 *참조* 만 한다 — 같은 의미를 두 crate 에 중복 정의하지 않는다 (그러면 normalize 가 identity-conversion 을 해야 해서 무의미). `Inline``kebab-core` (§3.4) 에 있는 도메인 타입. `kebab-parse-types` 는 그것을 *참조* 만 한다 — 같은 의미를 두 crate 에 중복 정의하지 않는다 (그러면 normalize 가 identity-conversion 을 해야 해서 무의미).
P6/P7/P8 에서 추가될 타입 (forward-ref): P6/P7/P8 에서 추가될 타입 (forward-ref):
@@ -661,7 +661,7 @@ pub struct ParsedPdfPage { pub page: u32, pub text: String }
pub struct ParsedAudioSegment { pub start_ms: u64, pub end_ms: u64, pub text: String } pub struct ParsedAudioSegment { pub start_ms: u64, pub end_ms: u64, pub text: String }
``` ```
→ 새 medium 추가 시 `kb-core::Block` 변종은 변하지 않고, `kb-parse-types` 만 확장된다. → 새 medium 추가 시 `kebab-core::Block` 변종은 변하지 않고, `kebab-parse-types` 만 확장된다.
### 3.8 Answer / RAG types ### 3.8 Answer / RAG types
@@ -995,28 +995,28 @@ CREATE TABLE eval_query_results (
| 종류 | 기본 위치 | | 종류 | 기본 위치 |
|------|-----------| |------|-----------|
| 워크스페이스 | `~/KnowledgeBase/` | | 워크스페이스 | `~/KnowledgeBase/` |
| config | `~/.config/kb/config.toml` | | config | `~/.config/kebab/config.toml` |
| data | `~/.local/share/kb/` | | data | `~/.local/share/kebab/` |
| cache | `~/.cache/kb/` | | cache | `~/.cache/kebab/` |
| state (logs) | `~/.local/state/kb/` | | state (logs) | `~/.local/state/kebab/` |
`~`, `$HOME`, `${KB_*}` expand. 절대 path 정규화 후 사용. `~`, `$HOME`, `${KEBAB_*}` expand. 절대 path 정규화 후 사용.
### 6.2 Workspace 구조 ### 6.2 Workspace 구조
``` ```
~/KnowledgeBase/ ~/KnowledgeBase/
├── inbox/ notes/ papers/ photos/ recordings/ ├── inbox/ notes/ papers/ photos/ recordings/
└── .kbignore └── .kebabignore
``` ```
`.kbignore``config.workspace.exclude` 합집합. `.kebabignore``config.workspace.exclude` 합집합.
### 6.3 Data dir 구조 ### 6.3 Data dir 구조
``` ```
~/.local/share/kb/ ~/.local/share/kebab/
├── kb.sqlite (+ -wal, -shm) ├── kebab.sqlite (+ -wal, -shm)
├── lancedb/ ├── lancedb/
│ └── chunk_embeddings_<model>_<dim>.lance/ │ └── chunk_embeddings_<model>_<dim>.lance/
├── assets/<aa>/<asset_id> # shard ├── assets/<aa>/<asset_id> # shard
@@ -1025,7 +1025,7 @@ CREATE TABLE eval_query_results (
└── runs/<run_id>/ # eval per_query.jsonl + report.md └── runs/<run_id>/ # eval per_query.jsonl + report.md
``` ```
### 6.4 Config (`~/.config/kb/config.toml`) — frozen schema ### 6.4 Config (`~/.config/kebab/config.toml`) — frozen schema
```toml ```toml
schema_version = 1 schema_version = 1
@@ -1036,8 +1036,8 @@ include = ["**/*.md"]
exclude = [".git/**", "node_modules/**", ".obsidian/**"] exclude = [".git/**", "node_modules/**", ".obsidian/**"]
[storage] [storage]
data_dir = "${XDG_DATA_HOME:-~/.local/share}/kb" data_dir = "${XDG_DATA_HOME:-~/.local/share}/kebab"
sqlite = "{data_dir}/kb.sqlite" sqlite = "{data_dir}/kebab.sqlite"
vector_dir = "{data_dir}/lancedb" vector_dir = "{data_dir}/lancedb"
asset_dir = "{data_dir}/assets" asset_dir = "{data_dir}/assets"
artifact_dir = "{data_dir}/artifacts" artifact_dir = "{data_dir}/artifacts"
@@ -1086,15 +1086,15 @@ max_context_tokens = 8000
config 우선순위: default → file → env (`KB_<SECTION>_<KEY>`) → CLI flag. config 우선순위: default → file → env (`KB_<SECTION>_<KEY>`) → CLI flag.
### 6.5 `kb init` 출력 ### 6.5 `kebab init` 출력
```text ```text
$ kb init $ kebab init
created ~/.config/kb/config.toml created ~/.config/kebab/config.toml
created ~/.local/share/kb/ created ~/.local/share/kebab/
created ~/KnowledgeBase/ created ~/KnowledgeBase/
opened ~/.local/share/kb/kb.sqlite (schema v1) opened ~/.local/share/kebab/kebab.sqlite (schema v1)
hint edit ~/.config/kb/config.toml then `kb ingest ~/KnowledgeBase` hint edit ~/.config/kebab/config.toml then `kebab ingest ~/KnowledgeBase`
``` ```
기존 파일 보존, `--force` 명시 필요. 기존 파일 보존, `--force` 명시 필요.
@@ -1107,7 +1107,7 @@ hint edit ~/.config/kb/config.toml then `kb ingest ~/KnowledgeBase`
--- ---
## 7. Trait contracts (kb-core) ## 7. Trait contracts (kebab-core)
### 7.1 입출력 보조 ### 7.1 입출력 보조
@@ -1210,34 +1210,34 @@ pub trait JobRepo {
## 8. 모듈 경계 (Allowed / Forbidden) ## 8. 모듈 경계 (Allowed / Forbidden)
```text ```text
kb-cli, kb-tui, kb-desktop kebab-cli, kebab-tui, kebab-desktop
└─> kb-app └─> kebab-app
├─> kb-source-fs ├─> kebab-source-fs
├─> kb-parse-md / kb-parse-pdf / kb-parse-image / kb-parse-audio ├─> kebab-parse-md / kebab-parse-pdf / kebab-parse-image / kebab-parse-audio
│ └─> kb-parse-types (parser intermediate) │ └─> kebab-parse-types (parser intermediate)
├─> kb-normalize ├─> kebab-normalize
│ └─> kb-parse-types │ └─> kebab-parse-types
├─> kb-chunk ├─> kebab-chunk
├─> kb-store-sqlite (DocumentStore, JobRepo, Retriever[lexical]) ├─> kebab-store-sqlite (DocumentStore, JobRepo, Retriever[lexical])
├─> kb-store-vector (VectorStore) ├─> kebab-store-vector (VectorStore)
├─> kb-embed-local ├─> kebab-embed-local
├─> kb-search (Retriever[hybrid]) ├─> kebab-search (Retriever[hybrid])
├─> kb-llm-local ├─> kebab-llm-local
├─> kb-rag ├─> kebab-rag
├─> kb-eval ├─> kebab-eval
└─> kb-config └─> kebab-config
└─> kb-core (모두 의존) └─> kebab-core (모두 의존)
``` ```
`kb-parse-types``kb-core` 와 parsers/normalize 사이의 thin layer (§3.7b 참조). parser-별 중간 표현 (`ParsedBlock`, `ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`, `Inline`) 을 한 곳에 모아 (a) `kb-core` 의 namespace 폭발을 막고 (b) `kb-normalize` 가 parser 를 직접 import 하지 않게 한다. `kebab-parse-types``kebab-core` 와 parsers/normalize 사이의 thin layer (§3.7b 참조). parser-별 중간 표현 (`ParsedBlock`, `ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`, `Inline`) 을 한 곳에 모아 (a) `kebab-core` 의 namespace 폭발을 막고 (b) `kebab-normalize` 가 parser 를 직접 import 하지 않게 한다.
핵심 금지: 핵심 금지:
- UI → store/llm/parse 직접 의존 ✗ - UI → store/llm/parse 직접 의존 ✗
- parse-* → store/llm/embed ✗ - parse-* → store/llm/embed ✗
- parse-* → kb-normalize ✗ (단방향: parsers → kb-parse-types ← normalize) - parse-* → kebab-normalize ✗ (단방향: parsers → kebab-parse-types ← normalize)
- chunk → llm/embed ✗ - chunk → llm/embed ✗
- normalize → store / parse-* ✗ - normalize → store / parse-* ✗
- kb-parse-types → 어떤 parser/normalize/store/llm/embed/search/rag/ui ✗ (`kb-core` 만 의존) - kebab-parse-types → 어떤 parser/normalize/store/llm/embed/search/rag/ui ✗ (`kebab-core` 만 의존)
- 다른 store 와 cross-write ✗ - 다른 store 와 cross-write ✗
`cargo deny` + workspace deny.toml + CI 체크로 강제. `cargo deny` + workspace deny.toml + CI 체크로 강제.
@@ -1270,7 +1270,7 @@ CI:
## 10. 에러 모델 + exit codes ## 10. 에러 모델 + exit codes
```rust ```rust
// kb-core // kebab-core
pub enum CoreError { InvalidId, InvalidCitation, InvalidSpan, Malformed } pub enum CoreError { InvalidId, InvalidCitation, InvalidSpan, Malformed }
// crate-local examples // crate-local examples
pub enum ParseMdError { Yaml(String), Encoding, Pulldown(String), Span } pub enum ParseMdError { Yaml(String), Encoding, Pulldown(String), Span }
@@ -1278,7 +1278,7 @@ pub enum StoreError { Sqlx(rusqlite::Error), Migration(String), Conflict(String)
pub enum LlmError { Unreachable, ModelNotPulled(String), Timeout, Stream(String) } pub enum LlmError { Unreachable, ModelNotPulled(String), Timeout, Stream(String) }
``` ```
Boundary (`kb-app`, `kb-cli`) 에서 `anyhow::Error` 합침. exit code 매핑: Boundary (`kebab-app`, `kebab-cli`) 에서 `anyhow::Error` 합침. exit code 매핑:
```rust ```rust
fn exit_code(err: &anyhow::Error) -> i32 { fn exit_code(err: &anyhow::Error) -> i32 {
@@ -1295,17 +1295,17 @@ fn exit_code(err: &anyhow::Error) -> i32 {
| `--verbose` | + anyhow chain | | `--verbose` | + anyhow chain |
| `--debug` 또는 `RUST_LOG=debug` | + tracing target/level/span | | `--debug` 또는 `RUST_LOG=debug` | + tracing target/level/span |
Refusal 은 에러 아님. `kb ask` 거절은 정상 stdout (Answer with grounded=false) + exit 1. Refusal 은 에러 아님. `kebab ask` 거절은 정상 stdout (Answer with grounded=false) + exit 1.
Logging: `tracing` + `tracing-subscriber` + `tracing-appender` daily roll, `~/.local/state/kb/logs/`. structured (`trace_id`, `doc_id`, `chunk_id`). Logging: `tracing` + `tracing-subscriber` + `tracing-appender` daily roll, `~/.local/state/kebab/logs/`. structured (`trace_id`, `doc_id`, `chunk_id`).
`kb doctor` 출력 (사람): `kebab doctor` 출력 (사람):
```text ```text
$ kb doctor $ kebab doctor
✓ config_loaded ~/.config/kb/config.toml ✓ config_loaded ~/.config/kebab/config.toml
✓ data_dir_writable ~/.local/share/kb ✓ data_dir_writable ~/.local/share/kebab
✓ sqlite_open kb.sqlite (schema v1) ✓ sqlite_open kebab.sqlite (schema v1)
✓ lancedb_open lancedb/ ✓ lancedb_open lancedb/
✓ embedding_model multilingual-e5-small (384d) ✓ embedding_model multilingual-e5-small (384d)
✓ ollama_reachable http://127.0.0.1:11434 ✓ ollama_reachable http://127.0.0.1:11434
@@ -1322,7 +1322,7 @@ $ kb doctor
이 문서가 동결 ↔ 다음 컴포넌트 분해 작업이 안전: 이 문서가 동결 ↔ 다음 컴포넌트 분해 작업이 안전:
- 모든 wire schema (`docs/wire-schema/v1/*.schema.json`) - 모든 wire schema (`docs/wire-schema/v1/*.schema.json`)
- 모든 trait 시그니처 (kb-core) - 모든 trait 시그니처 (kebab-core)
- 모든 ID recipe (4.2) - 모든 ID recipe (4.2)
- SQLite DDL (5장) - SQLite DDL (5장)
- Filesystem + config schema (6장) - Filesystem + config schema (6장)
@@ -1334,7 +1334,7 @@ $ kb doctor
**의도적으로 빠진 것 (out of scope, P+)**: **의도적으로 빠진 것 (out of scope, P+)**:
- multi-workspace - multi-workspace
- watch mode - watch mode
- desktop app `kb://` protocol handler - desktop app `kebab://` protocol handler
- LLM-as-judge eval - LLM-as-judge eval
- visual embedding (CLIP) - visual embedding (CLIP)
- real-time collab - real-time collab

View File

@@ -17,11 +17,11 @@ urlcolor: blue
최종 방향은 다음 한 문장으로 요약할 수 있다. 최종 방향은 다음 한 문장으로 요약할 수 있다.
> Markdown을 1등급 지식 소스로 삼고, 이미지, PDF, 음성은 각각 extractor adapter를 통해 동일한 `CanonicalDocument -> Chunk -> Embed -> Index -> Search -> RAG` 파이프라인으로 흘려보낸다. CLI, TUI, desktop app은 모두 같은 `kb-app` facade를 함수 호출로 사용한다. > Markdown을 1등급 지식 소스로 삼고, 이미지, PDF, 음성은 각각 extractor adapter를 통해 동일한 `CanonicalDocument -> Chunk -> Embed -> Index -> Search -> RAG` 파이프라인으로 흘려보낸다. CLI, TUI, desktop app은 모두 같은 `kebab-app` facade를 함수 호출로 사용한다.
가장 먼저 만들 것은 채팅 UI가 아니다. 먼저 만들어야 할 것은 다음 7가지다. 가장 먼저 만들 것은 채팅 UI가 아니다. 먼저 만들어야 할 것은 다음 7가지다.
1. `kb-core`의 도메인 모델과 trait 계약 1. `kebab-core`의 도메인 모델과 trait 계약
2. deterministic ID 규칙 2. deterministic ID 규칙
3. Markdown canonicalization 3. Markdown canonicalization
4. chunking policy 4. chunking policy
@@ -66,16 +66,16 @@ urlcolor: blue
Markdown files Markdown files
| |
v v
kb-source-fs kebab-source-fs
| |
v v
kb-parse-md kebab-parse-md
| |
v v
CanonicalDocument CanonicalDocument
| |
v v
kb-chunk kebab-chunk
| |
v v
Chunks Chunks
@@ -88,15 +88,15 @@ SQLite metadata/FTS LanceDB vectors Raw asset store
+--------------------+--------------------+ +--------------------+--------------------+
| |
v v
kb-search kebab-search
| |
v v
kb-rag kebab-rag
| |
+---------------+---------------+ +---------------+---------------+
| | | | | |
v v v v v v
kb-cli kb-tui kb-desktop kebab-cli kebab-tui kebab-desktop
``` ```
추후 확장 후 구조는 다음과 같다. 추후 확장 후 구조는 다음과 같다.
@@ -133,23 +133,23 @@ Cargo workspace는 여러 package를 함께 관리하는 구조이며, 공통 `C
[workspace] [workspace]
resolver = "3" resolver = "3"
members = [ members = [
"crates/kb-core", "crates/kebab-core",
"crates/kb-config", "crates/kebab-config",
"crates/kb-source-fs", "crates/kebab-source-fs",
"crates/kb-parse-md", "crates/kebab-parse-md",
"crates/kb-normalize", "crates/kebab-normalize",
"crates/kb-chunk", "crates/kebab-chunk",
"crates/kb-store-sqlite", "crates/kebab-store-sqlite",
"crates/kb-store-vector", "crates/kebab-store-vector",
"crates/kb-embed", "crates/kebab-embed",
"crates/kb-embed-local", "crates/kebab-embed-local",
"crates/kb-search", "crates/kebab-search",
"crates/kb-llm", "crates/kebab-llm",
"crates/kb-llm-local", "crates/kebab-llm-local",
"crates/kb-rag", "crates/kebab-rag",
"crates/kb-eval", "crates/kebab-eval",
"crates/kb-app", "crates/kebab-app",
"crates/kb-cli" "crates/kebab-cli"
] ]
[workspace.package] [workspace.package]
@@ -173,7 +173,7 @@ tracing = "0.1"
초기 repo는 이렇게 잡는다. 초기 repo는 이렇게 잡는다.
```text ```text
kb/ kebab/
Cargo.toml Cargo.toml
README.md README.md
docs/ docs/
@@ -191,70 +191,70 @@ kb/
nested-headings.md nested-headings.md
code-and-table.md code-and-table.md
crates/ crates/
kb-core/ kebab-core/
kb-config/ kebab-config/
kb-source-fs/ kebab-source-fs/
kb-parse-md/ kebab-parse-md/
kb-normalize/ kebab-normalize/
kb-chunk/ kebab-chunk/
kb-store-sqlite/ kebab-store-sqlite/
kb-store-vector/ kebab-store-vector/
kb-embed/ kebab-embed/
kb-embed-local/ kebab-embed-local/
kb-search/ kebab-search/
kb-llm/ kebab-llm/
kb-llm-local/ kebab-llm-local/
kb-rag/ kebab-rag/
kb-eval/ kebab-eval/
kb-app/ kebab-app/
kb-cli/ kebab-cli/
``` ```
나중에 추가할 crate는 다음과 같다. 나중에 추가할 crate는 다음과 같다.
```text ```text
crates/kb-parse-image/ crates/kebab-parse-image/
crates/kb-parse-pdf/ crates/kebab-parse-pdf/
crates/kb-parse-audio/ crates/kebab-parse-audio/
crates/kb-rerank/ crates/kebab-rerank/
crates/kb-tui/ crates/kebab-tui/
crates/kb-desktop/ crates/kebab-desktop/
``` ```
중요한 의존성 규칙은 다음과 같다. 중요한 의존성 규칙은 다음과 같다.
```text ```text
kb-cli, kb-tui, kb-desktop kebab-cli, kebab-tui, kebab-desktop
-> kb-app -> kebab-app
-> kb-index / kb-search / kb-rag -> kebab-index / kebab-search / kebab-rag
-> kb-core traits -> kebab-core traits
-> concrete adapters -> concrete adapters
``` ```
UI crate는 절대로 parser, DB, LLM adapter를 직접 호출하지 않는다. 모든 user-facing command는 `kb-app` facade를 통해 호출한다. UI crate는 절대로 parser, DB, LLM adapter를 직접 호출하지 않는다. 모든 user-facing command는 `kebab-app` facade를 통해 호출한다.
# 5. 컴포넌트 목록과 책임 # 5. 컴포넌트 목록과 책임
| 컴포넌트 | 책임 | 초기 구현 | | 컴포넌트 | 책임 | 초기 구현 |
|---|---|---| |---|---|---|
| `kb-core` | domain type, trait, error, ID 규칙 | 필수 | | `kebab-core` | domain type, trait, error, ID 규칙 | 필수 |
| `kb-config` | config 파일 로딩, 기본값, 경로 확장 | 필수 | | `kebab-config` | config 파일 로딩, 기본값, 경로 확장 | 필수 |
| `kb-source-fs` | 로컬 폴더 scan, checksum, 변경 감지 | 필수 | | `kebab-source-fs` | 로컬 폴더 scan, checksum, 변경 감지 | 필수 |
| `kb-parse-md` | Markdown -> structured document | 필수 | | `kebab-parse-md` | Markdown -> structured document | 필수 |
| `kb-normalize` | parser output -> `CanonicalDocument` | 필수 | | `kebab-normalize` | parser output -> `CanonicalDocument` | 필수 |
| `kb-chunk` | block-aware chunking | 필수 | | `kebab-chunk` | block-aware chunking | 필수 |
| `kb-store-sqlite` | metadata, document, chunk, job, FTS | 필수 | | `kebab-store-sqlite` | metadata, document, chunk, job, FTS | 필수 |
| `kb-store-vector` | vector upsert/search | P1 | | `kebab-store-vector` | vector upsert/search | P1 |
| `kb-embed` | embedding trait | P1 | | `kebab-embed` | embedding trait | P1 |
| `kb-embed-local` | local embedding adapter | P1 | | `kebab-embed-local` | local embedding adapter | P1 |
| `kb-search` | lexical, vector, hybrid retrieval | P1 | | `kebab-search` | lexical, vector, hybrid retrieval | P1 |
| `kb-llm` | language model trait | P1 | | `kebab-llm` | language model trait | P1 |
| `kb-llm-local` | Ollama 또는 llama.cpp adapter | P1 | | `kebab-llm-local` | Ollama 또는 llama.cpp adapter | P1 |
| `kb-rag` | context packing, answer, citation | P1 | | `kebab-rag` | context packing, answer, citation | P1 |
| `kb-eval` | golden query, regression test | P1 | | `kebab-eval` | golden query, regression test | P1 |
| `kb-cli` | command line interface | 필수 | | `kebab-cli` | command line interface | 필수 |
| `kb-tui` | terminal UI | P2 | | `kebab-tui` | terminal UI | P2 |
| `kb-desktop` | desktop app | P3 | | `kebab-desktop` | desktop app | P3 |
# 6. 핵심 도메인 모델 # 6. 핵심 도메인 모델
@@ -398,10 +398,10 @@ Markdown frontmatter 기본 규약은 다음 정도로 시작한다.
```yaml ```yaml
--- ---
id: rust-kb-architecture id: rust-kebab-architecture
title: Rust 로컬 Knowledge Base 설계 title: Rust 로컬 Knowledge Base 설계
aliases: aliases:
- local kb - local kebab
- rust rag - rust rag
tags: tags:
- knowledge-base - knowledge-base
@@ -418,7 +418,7 @@ lang: ko
Markdown citation은 line range를 기본으로 한다. Markdown citation은 line range를 기본으로 한다.
```text ```text
notes/rust/kb.md:L12-L34 notes/rust/kebab.md:L12-L34
``` ```
# 9. 이미지, PDF, 음성 확장 전략 # 9. 이미지, PDF, 음성 확장 전략
@@ -534,13 +534,13 @@ Ollama 문서는 macOS Sonoma 이상에서 Apple M series CPU/GPU support를 언
초기 adapter는 다음처럼 둔다. 초기 adapter는 다음처럼 둔다.
```text ```text
kb-llm-local kebab-llm-local
- OllamaLanguageModel - OllamaLanguageModel
- later: LlamaCppLanguageModel - later: LlamaCppLanguageModel
- later: CandleLanguageModel - later: CandleLanguageModel
``` ```
Ollama가 내부적으로 local server를 쓰더라도, 프로젝트 아키텍처 관점에서는 HTTP MSA가 아니다. `kb-llm-local` 안에 캡슐화된 model adapter일 뿐이다. Ollama가 내부적으로 local server를 쓰더라도, 프로젝트 아키텍처 관점에서는 HTTP MSA가 아니다. `kebab-llm-local` 안에 캡슐화된 model adapter일 뿐이다.
## 11.3 Local embedding ## 11.3 Local embedding
@@ -583,10 +583,10 @@ M4 48GB MacBook은 개인용 local KB에 충분한 타겟이지만, indexing과
root = "~/KnowledgeBase" root = "~/KnowledgeBase"
[storage] [storage]
sqlite_path = "~/.local/share/kb/kb.sqlite" sqlite_path = "~/.local/share/kebab/kebab.sqlite"
vector_path = "~/.local/share/kb/lancedb" vector_path = "~/.local/share/kebab/lancedb"
raw_asset_path = "~/.local/share/kb/assets" raw_asset_path = "~/.local/share/kebab/assets"
artifact_path = "~/.local/share/kb/artifacts" artifact_path = "~/.local/share/kebab/artifacts"
[indexing] [indexing]
max_parallel_extractors = 2 max_parallel_extractors = 2
@@ -609,7 +609,7 @@ context_tokens = 32768
# 13. Trait 계약 # 13. Trait 계약
컴포넌트는 trait으로 연결한다. 아래 계약을 `kb-core`에 둔다. 컴포넌트는 trait으로 연결한다. 아래 계약을 `kebab-core`에 둔다.
```rust ```rust
pub trait SourceConnector { pub trait SourceConnector {
@@ -741,14 +741,14 @@ pub struct Answer {
CLI는 가장 먼저 만든다. CLI는 가장 먼저 만든다.
```text ```text
kb init kebab init
kb ingest <path> kebab ingest <path>
kb index kebab index
kb search <query> kebab search <query>
kb ask <query> kebab ask <query>
kb inspect doc <doc_id> kebab inspect doc <doc_id>
kb inspect chunk <chunk_id> kebab inspect chunk <chunk_id>
kb doctor kebab doctor
``` ```
CLI는 개발과 테스트의 기준점이다. TUI와 desktop app은 CLI 기능이 안정된 뒤 붙인다. CLI는 개발과 테스트의 기준점이다. TUI와 desktop app은 CLI 기능이 안정된 뒤 붙인다.
@@ -791,10 +791,10 @@ Desktop app은 가장 나중에 만든다. 이유는 UI보다 먼저 domain mode
산출물: 산출물:
```text ```text
kb-core kebab-core
kb-config kebab-config
kb-app kebab-app
kb-cli kebab-cli
docs/spec/* docs/spec/*
fixtures/markdown/* fixtures/markdown/*
``` ```
@@ -804,7 +804,7 @@ fixtures/markdown/*
```text ```text
cargo check --workspace cargo check --workspace
cargo test --workspace cargo test --workspace
kb --help kebab --help
``` ```
## Phase 1 - Markdown ingestion ## Phase 1 - Markdown ingestion
@@ -814,19 +814,19 @@ kb --help
구현 crate: 구현 crate:
```text ```text
kb-source-fs kebab-source-fs
kb-parse-md kebab-parse-md
kb-normalize kebab-normalize
kb-chunk kebab-chunk
kb-store-sqlite kebab-store-sqlite
``` ```
완료 조건: 완료 조건:
```text ```text
kb ingest ~/KnowledgeBase kebab ingest ~/KnowledgeBase
kb list docs kebab list docs
kb inspect doc <doc_id> kebab inspect doc <doc_id>
``` ```
## Phase 2 - Lexical search ## Phase 2 - Lexical search
@@ -836,14 +836,14 @@ kb inspect doc <doc_id>
완료 조건: 완료 조건:
```text ```text
kb search "Rust workspace 설계" kebab search "Rust workspace 설계"
``` ```
결과는 citation을 포함해야 한다. 결과는 citation을 포함해야 한다.
```text ```text
1. Rust workspace는 여러 package를 하나로 관리한다... 1. Rust workspace는 여러 package를 하나로 관리한다...
source: notes/rust/kb.md:L12-L34 source: notes/rust/kebab.md:L12-L34
``` ```
## Phase 3 - Vector search와 embedding ## Phase 3 - Vector search와 embedding
@@ -853,18 +853,18 @@ kb search "Rust workspace 설계"
구현 crate: 구현 crate:
```text ```text
kb-embed kebab-embed
kb-embed-local kebab-embed-local
kb-store-vector kebab-store-vector
kb-search kebab-search
``` ```
완료 조건: 완료 조건:
```text ```text
kb index --embeddings kebab index --embeddings
kb search --mode vector "비슷한 설계 원칙" kebab search --mode vector "비슷한 설계 원칙"
kb search --mode hybrid "Markdown chunking 규칙" kebab search --mode hybrid "Markdown chunking 규칙"
``` ```
## Phase 4 - Local LLM RAG ## Phase 4 - Local LLM RAG
@@ -874,15 +874,15 @@ kb search --mode hybrid "Markdown chunking 규칙"
구현 crate: 구현 crate:
```text ```text
kb-llm kebab-llm
kb-llm-local kebab-llm-local
kb-rag kebab-rag
``` ```
완료 조건: 완료 조건:
```text ```text
kb ask "내 KB 설계에서 저장소 전략은?" kebab ask "내 KB 설계에서 저장소 전략은?"
``` ```
답변은 citation을 포함해야 하며, 근거가 없으면 거절해야 한다. 답변은 citation을 포함해야 하며, 근거가 없으면 거절해야 한다.
@@ -895,7 +895,7 @@ kb ask "내 KB 설계에서 저장소 전략은?"
```text ```text
fixtures/golden_queries.yaml fixtures/golden_queries.yaml
kb-eval kebab-eval
``` ```
측정값: 측정값:
@@ -915,8 +915,8 @@ answer groundedness
완료 조건: 완료 조건:
```text ```text
kb ingest ./assets/diagram.png kebab ingest ./assets/diagram.png
kb search "이미지 안의 OCR 텍스트" kebab search "이미지 안의 OCR 텍스트"
``` ```
## Phase 7 - PDF support ## Phase 7 - PDF support
@@ -926,8 +926,8 @@ kb search "이미지 안의 OCR 텍스트"
완료 조건: 완료 조건:
```text ```text
kb ingest ./paper.pdf kebab ingest ./paper.pdf
kb search "PDF 안의 특정 개념" kebab search "PDF 안의 특정 개념"
``` ```
## Phase 8 - 음성 support ## Phase 8 - 음성 support
@@ -937,8 +937,8 @@ kb search "PDF 안의 특정 개념"
완료 조건: 완료 조건:
```text ```text
kb ingest ./meeting.m4a kebab ingest ./meeting.m4a
kb search "회의에서 언급한 결정사항" kebab search "회의에서 언급한 결정사항"
``` ```
## Phase 9 - TUI와 desktop app ## Phase 9 - TUI와 desktop app
@@ -948,7 +948,7 @@ kb search "회의에서 언급한 결정사항"
순서: 순서:
```text ```text
kb-tui -> kb-desktop kebab-tui -> kebab-desktop
``` ```
# 18. 테스트 전략 # 18. 테스트 전략
@@ -986,7 +986,7 @@ AI에게 “전체 repo를 만들어줘”라고 시키지 말고, component spe
템플릿은 다음과 같다. 템플릿은 다음과 같다.
```text ```text
Component: kb-parse-md Component: kebab-parse-md
Responsibility: Responsibility:
- Markdown bytes를 CanonicalDocument로 변환한다. - Markdown bytes를 CanonicalDocument로 변환한다.
@@ -994,17 +994,17 @@ Responsibility:
- line range 또는 byte range를 최대한 보존한다. - line range 또는 byte range를 최대한 보존한다.
Allowed dependencies: Allowed dependencies:
- kb-core - kebab-core
- pulldown-cmark 또는 comrak - pulldown-cmark 또는 comrak
- serde - serde
- thiserror - thiserror
Forbidden dependencies: Forbidden dependencies:
- kb-store - kebab-store
- kb-llm - kebab-llm
- kb-rag - kebab-rag
- kb-tui - kebab-tui
- kb-desktop - kebab-desktop
Inputs: Inputs:
- RawAsset - RawAsset
@@ -1052,7 +1052,7 @@ Non-goals:
## 1-2일차 ## 1-2일차
- workspace 생성 - workspace 생성
- `kb-core` 도메인 타입 초안 - `kebab-core` 도메인 타입 초안
- ID 규칙 문서화 - ID 규칙 문서화
- CLI skeleton - CLI skeleton
@@ -1073,7 +1073,7 @@ Non-goals:
- SQLite FTS5 검색 - SQLite FTS5 검색
- citation 출력 - citation 출력
- `kb inspect` 구현 - `kebab inspect` 구현
## 12-14일차 ## 12-14일차
@@ -1100,7 +1100,7 @@ MVP 완료 조건은 다음과 같다.
- [ ] parser/chunker version을 바꾸면 재처리 대상이 식별된다. - [ ] parser/chunker version을 바꾸면 재처리 대상이 식별된다.
- [ ] local embedding을 붙일 수 있는 trait이 있다. - [ ] local embedding을 붙일 수 있는 trait이 있다.
- [ ] local LLM을 붙일 수 있는 trait이 있다. - [ ] local LLM을 붙일 수 있는 trait이 있다.
- [ ] `kb-app` facade를 통해 CLI가 동작한다. - [ ] `kebab-app` facade를 통해 CLI가 동작한다.
P1 완료 조건은 다음과 같다. P1 완료 조건은 다음과 같다.

View File

@@ -14,30 +14,30 @@ historical contract that was implemented; this file accumulates the
deltas so phase 5+ readers can find the live behavior without diffing deltas so phase 5+ readers can find the live behavior without diffing
git history. git history.
## 2026-05-01 — `--config` flag silently ignored across all kb-cli subcommands ## 2026-05-01 — `--config` flag silently ignored across all kebab-cli subcommands
**Discovered**: post-P3-5 manual smoke at `/tmp/kb-smoke/`. **Discovered**: post-P3-5 manual smoke at `/tmp/kebab-smoke/`.
**Symptom**: `kb --config /path/to/config.toml ingest|search|list|inspect|doctor` ignored the flag and fell back to `~/.config/kb/config.toml` (XDG default). Users had to use `KB_*` env vars to point at a non-default config. **Symptom**: `kebab --config /path/to/config.toml ingest|search|list|inspect|doctor` ignored the flag and fell back to `~/.config/kebab/config.toml` (XDG default). Users had to use `KEBAB_*` env vars to point at a non-default config.
**Root cause**: `kb-cli` read `cli.config` only inside `Cmd::Ingest` to build `SourceScope`, then called bare `kb_app::ingest(scope, summary_only)` which internally re-loaded `Config::load(None)` (XDG path). Same pattern in `Cmd::Search` / `List` / `Inspect` / `Doctor`. P3-5 introduced `*_with_config` test seams via `#[doc(hidden)] pub fn` but kb-cli never used them. **Root cause**: `kebab-cli` read `cli.config` only inside `Cmd::Ingest` to build `SourceScope`, then called bare `kebab_app::ingest(scope, summary_only)` which internally re-loaded `Config::load(None)` (XDG path). Same pattern in `Cmd::Search` / `List` / `Inspect` / `Doctor`. P3-5 introduced `*_with_config` test seams via `#[doc(hidden)] pub fn` but kebab-cli never used them.
**Fix** (PR #20, fix/cli-config-flag-and-search-output): **Fix** (PR #20, fix/cli-config-flag-and-search-output):
- `kb-cli` now builds the Config once via `Config::load(cli.config.as_deref())` at the top of every subcommand and threads it into `kb_app::*_with_config(cfg, ...)` instead of `kb_app::*(...)`. - `kebab-cli` now builds the Config once via `Config::load(cli.config.as_deref())` at the top of every subcommand and threads it into `kebab_app::*_with_config(cfg, ...)` instead of `kebab_app::*(...)`.
- `kb_app::doctor()` rewritten as `doctor_with_config_path(Option<&Path>)` that reports the actual path probed and hard-fails when `--config <path>` doesn't exist (defaults would otherwise mask user intent). - `kebab_app::doctor()` rewritten as `doctor_with_config_path(Option<&Path>)` that reports the actual path probed and hard-fails when `--config <path>` doesn't exist (defaults would otherwise mask user intent).
- `kb-app` module doc-comment updated: `#[doc(hidden)] pub fn *_with_config` is no longer "test-only seam" — it's the official "config-explicit" API consumed by CLI `--config`, integration tests, and TUI sessions. - `kebab-app` module doc-comment updated: `#[doc(hidden)] pub fn *_with_config` is no longer "test-only seam" — it's the official "config-explicit" API consumed by CLI `--config`, integration tests, and TUI sessions.
- Same PR also improved `kb search` printer: `{:.4}` score formatting (RRF range collapses on `{:.2}`) and `> heading_path` suffix so chunks from the same document are visually distinct. - Same PR also improved `kebab search` printer: `{:.4}` score formatting (RRF range collapses on `{:.2}`) and `> heading_path` suffix so chunks from the same document are visually distinct.
**Amends**: tasks/p3/p3-5-app-wiring.md (the test seam was always meant to be the config-explicit API; only the doc-comment lied). **Amends**: tasks/p3/p3-5-app-wiring.md (the test seam was always meant to be the config-explicit API; only the doc-comment lied).
### 2026-05-01 — `--config` regression in `kb ask` (P4-3 follow-up) ### 2026-05-01 — `--config` regression in `kebab ask` (P4-3 follow-up)
**Discovered**: post-P4-3 manual smoke against 192.168.0.47 Ollama with `gemma4:26b`. **Discovered**: post-P4-3 manual smoke against 192.168.0.47 Ollama with `gemma4:26b`.
**Symptom**: `kb --config <path> ask` returned `model.id = qwen2.5:14b-instruct` (XDG default model) and `score_gate = 0.30` (XDG default), instead of `gemma4:26b` / `0.05` from the explicit config. P4-3 added the ask body but kb-cli's `Cmd::Ask` arm still called bare `kb_app::ask(query, opts)` — same regression class as the P3-5 fix above, just missed when ask was wired. **Symptom**: `kebab --config <path> ask` returned `model.id = qwen2.5:14b-instruct` (XDG default model) and `score_gate = 0.30` (XDG default), instead of `gemma4:26b` / `0.05` from the explicit config. P4-3 added the ask body but kebab-cli's `Cmd::Ask` arm still called bare `kebab_app::ask(query, opts)` — same regression class as the P3-5 fix above, just missed when ask was wired.
**Fix** (PR #24, fix/cli-ask-honor-config-flag): **Fix** (PR #24, fix/cli-ask-honor-config-flag):
- `kb-cli` builds `Config::load(cli.config.as_deref())` once at the top of `Cmd::Ask` and calls `kb_app::ask_with_config(cfg, query, opts)`. - `kebab-cli` builds `Config::load(cli.config.as_deref())` once at the top of `Cmd::Ask` and calls `kebab_app::ask_with_config(cfg, query, opts)`.
**Amends**: tasks/p4/p4-3-rag-pipeline.md. **Amends**: tasks/p4/p4-3-rag-pipeline.md.
@@ -48,15 +48,15 @@ git history.
**Root cause**: RRF formula `score(c) = Σ 1/(k_rrf + rank_m(c))` produces values bounded by `num_retrievers / (k_rrf + 1)`. With `num_retrievers = 2` and the default `k_rrf = 60`, the upper bound is `2/61 ≈ 0.0328`. The default `config.rag.score_gate = 0.05` was calibrated for vector / lexical scores already in `[0, 1]` and silently refused every hybrid query. `fusion_score` was also incomparable across modes — Lexical / Vector lived in `[0, 1]`, Hybrid lived in `(0, 0.033]`. **Root cause**: RRF formula `score(c) = Σ 1/(k_rrf + rank_m(c))` produces values bounded by `num_retrievers / (k_rrf + 1)`. With `num_retrievers = 2` and the default `k_rrf = 60`, the upper bound is `2/61 ≈ 0.0328`. The default `config.rag.score_gate = 0.05` was calibrated for vector / lexical scores already in `[0, 1]` and silently refused every hybrid query. `fusion_score` was also incomparable across modes — Lexical / Vector lived in `[0, 1]`, Hybrid lived in `(0, 0.033]`.
**Fix** (PR #25, fix/rrf-fusion-score-normalize-and-docs): **Fix** (PR #25, fix/rrf-fusion-score-normalize-and-docs):
- `crates/kb-search/src/hybrid.rs` divides every raw RRF score by `2 / (k_rrf + 1)` so `fusion_score` always lives in `[0, 1]` regardless of mode. Both retrievers contributing rank 1 normalises to `1.0`; chunks present in only one retriever cap around `0.5`. RRF's rank-ordering invariants are preserved (same constant divides every score), so sort + tiebreak behaviour is identical. - `crates/kebab-search/src/hybrid.rs` divides every raw RRF score by `2 / (k_rrf + 1)` so `fusion_score` always lives in `[0, 1]` regardless of mode. Both retrievers contributing rank 1 normalises to `1.0`; chunks present in only one retriever cap around `0.5`. RRF's rank-ordering invariants are preserved (same constant divides every score), so sort + tiebreak behaviour is identical.
- One unit test (`rrf_formula_matches_known_value`) updated to expect the normalised value `(1/61 + 1/62) / (2/61) ≈ 0.9919`. - One unit test (`rrf_formula_matches_known_value`) updated to expect the normalised value `(1/61 + 1/62) / (2/61) ≈ 0.9919`.
- The integration snapshot `crates/kb-search/tests/fixtures/search/hybrid/run-1.json` already used presence checks (`fusion_score_positive: true`) rather than absolute values, so it didn't need regeneration. - The integration snapshot `crates/kebab-search/tests/fixtures/search/hybrid/run-1.json` already used presence checks (`fusion_score_positive: true`) rather than absolute values, so it didn't need regeneration.
**Why not a per-mode `score_gate` config**: separate `lexical_score_gate / vector_score_gate / hybrid_score_gate` would force every downstream consumer (CLI, eval, TUI) to know which mode picks which threshold. Normalising the score itself is a one-line change at the source and makes `Answer.retrieval.score_gate` semantically meaningful without per-mode bookkeeping. **Why not a per-mode `score_gate` config**: separate `lexical_score_gate / vector_score_gate / hybrid_score_gate` would force every downstream consumer (CLI, eval, TUI) to know which mode picks which threshold. Normalising the score itself is a one-line change at the source and makes `Answer.retrieval.score_gate` semantically meaningful without per-mode bookkeeping.
**Amends**: tasks/p3/p3-4-hybrid-fusion.md (RRF formula now divides by `2/(k_rrf+1)` after summation), tasks/phase-3-vector-hybrid.md (RRF section). **Amends**: tasks/p3/p3-4-hybrid-fusion.md (RRF formula now divides by `2/(k_rrf+1)` after summation), tasks/phase-3-vector-hybrid.md (RRF section).
**Verification**: post-fix smoke at `/tmp/kb-smoke/` with default `score_gate = 0.05` succeeded across four scenarios — Korean→Korean, English→English, cross-language, and out-of-corpus refusal. **Verification**: post-fix smoke at `/tmp/kebab-smoke/` with default `score_gate = 0.05` succeeded across four scenarios — Korean→Korean, English→English, cross-language, and out-of-corpus refusal.
## How to add an entry ## How to add an entry

View File

@@ -1,12 +1,12 @@
--- ---
title: "KB 작업 단위 인덱스" title: "KB 작업 단위 인덱스"
source: kb_local_rust_report.md source: kebab_local_rust_report.md
date: 2026-04-27 date: 2026-04-27
--- ---
# KB 작업 단위 인덱스 # KB 작업 단위 인덱스
[`kb_local_rust_report.md`](../kb_local_rust_report.md) 의 Phase 로드맵을 아키텍처 수준 작업 단위로 분해. 각 task 문서는 독립적으로 착수/검수 가능한 단위. [`kebab_local_rust_report.md`](../kebab_local_rust_report.md) 의 Phase 로드맵을 아키텍처 수준 작업 단위로 분해. 각 task 문서는 독립적으로 착수/검수 가능한 단위.
## 의존 그래프 ## 의존 그래프
@@ -25,16 +25,16 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능.
| # | 코드 | 제목 | 핵심 산출 crate | 선행 | | # | 코드 | 제목 | 핵심 산출 crate | 선행 |
|---|------|------|----------------|------| |---|------|------|----------------|------|
| P0 | [phase-0-skeleton.md](phase-0-skeleton.md) | Workspace 뼈대 + 도메인 계약 | kb-core, kb-parse-types, kb-config, kb-app, kb-cli | | | P0 | [phase-0-skeleton.md](phase-0-skeleton.md) | Workspace 뼈대 + 도메인 계약 | kebab-core, kebab-parse-types, kebab-config, kebab-app, kebab-cli | |
| P1 | [phase-1-markdown-ingestion.md](phase-1-markdown-ingestion.md) | Markdown ingestion 파이프라인 | kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-sqlite | P0 | | P1 | [phase-1-markdown-ingestion.md](phase-1-markdown-ingestion.md) | Markdown ingestion 파이프라인 | kebab-source-fs, kebab-parse-md, kebab-normalize, kebab-chunk, kebab-store-sqlite | P0 |
| P2 | [phase-2-lexical-search.md](phase-2-lexical-search.md) | SQLite FTS5 lexical 검색 + citation | kb-search (lexical) | P1 | | P2 | [phase-2-lexical-search.md](phase-2-lexical-search.md) | SQLite FTS5 lexical 검색 + citation | kebab-search (lexical) | P1 |
| P3 | [phase-3-vector-hybrid.md](phase-3-vector-hybrid.md) | Local embedding + LanceDB + hybrid | kb-embed, kb-embed-local, kb-store-vector, kb-search | P2 | | P3 | [phase-3-vector-hybrid.md](phase-3-vector-hybrid.md) | Local embedding + LanceDB + hybrid | kebab-embed, kebab-embed-local, kebab-store-vector, kebab-search | P2 |
| P4 | [phase-4-local-llm-rag.md](phase-4-local-llm-rag.md) | Local LLM + RAG + grounded answer | kb-llm, kb-llm-local, kb-rag | P3 | | P4 | [phase-4-local-llm-rag.md](phase-4-local-llm-rag.md) | Local LLM + RAG + grounded answer | kebab-llm, kebab-llm-local, kebab-rag | P3 |
| P5 | [phase-5-evaluation.md](phase-5-evaluation.md) | Golden query / regression eval | kb-eval | P4 | | P5 | [phase-5-evaluation.md](phase-5-evaluation.md) | Golden query / regression eval | kebab-eval | P4 |
| P6 | [phase-6-image.md](phase-6-image.md) | 이미지 ingestion (OCR + caption) | kb-parse-image | P5 | | P6 | [phase-6-image.md](phase-6-image.md) | 이미지 ingestion (OCR + caption) | kebab-parse-image | P5 |
| P7 | [phase-7-pdf.md](phase-7-pdf.md) | PDF text + page citation | kb-parse-pdf | P5 | | P7 | [phase-7-pdf.md](phase-7-pdf.md) | PDF text + page citation | kebab-parse-pdf | P5 |
| P8 | [phase-8-audio.md](phase-8-audio.md) | 음성 transcription + timestamp citation | kb-parse-audio | P5 | | P8 | [phase-8-audio.md](phase-8-audio.md) | 음성 transcription + timestamp citation | kebab-parse-audio | P5 |
| P9 | [phase-9-ui.md](phase-9-ui.md) | TUI + desktop app | kb-tui, kb-desktop | P5 | | P9 | [phase-9-ui.md](phase-9-ui.md) | TUI + desktop app | kebab-tui, kebab-desktop | P5 |
## Component task decomposition (per phase) ## Component task decomposition (per phase)

View File

@@ -6,7 +6,7 @@ title: "<Component title>"
status: planned status: planned
depends_on: [] # other task_ids depends_on: [] # other task_ids
unblocks: [] # other task_ids unblocks: [] # other task_ids
contract_source: ../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [] # e.g. [§3.5, §5.5, §7.2] contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
--- ---
@@ -22,7 +22,7 @@ contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- <other crates per design §8> - <other crates per design §8>
- <external crates with versions> - <external crates with versions>
@@ -71,7 +71,7 @@ If any item here is needed during implementation, STOP and update the frozen des
| unit | ... | ... | | unit | ... | ... |
| snapshot | ... (JSON freeze) | `fixtures/...` | | snapshot | ... (JSON freeze) | `fixtures/...` |
| contract | trait round-trip | mock impls | | contract | trait round-trip | mock impls |
| integration | end-to-end via `kb-app` facade | tmp workspace | | integration | end-to-end via `kebab-app` facade | tmp workspace |
All tests must run under `cargo test -p <crate>` and not require external network or Ollama unless explicitly stated. All tests must run under `cargo test -p <crate>` and not require external network or Ollama unless explicitly stated.

View File

@@ -1,12 +1,12 @@
--- ---
phase: P0 phase: P0
component: workspace + kb-core + kb-config + kb-app + kb-cli component: workspace + kebab-core + kebab-config + kebab-app + kebab-cli
task_id: p0-1 task_id: p0-1
title: "Workspace skeleton + frozen domain types/traits + ID recipe + facade" title: "Workspace skeleton + frozen domain types/traits + ID recipe + facade"
status: completed status: completed
depends_on: [] depends_on: []
unblocks: [p1-1, p1-2, p1-3, p1-4, p1-5, p1-6, p2-1, p2-2, p3-1, p3-2, p3-3, p3-4, p4-1, p4-2, p4-3, p5-1, p5-2, p6-1, p6-2, p6-3, p7-1, p7-2, p8-1, p8-2, p9-1, p9-2, p9-3, p9-4, p9-5] unblocks: [p1-1, p1-2, p1-3, p1-4, p1-5, p1-6, p2-1, p2-2, p3-1, p3-2, p3-3, p3-4, p4-1, p4-2, p4-3, p5-1, p5-2, p6-1, p6-2, p6-3, p7-1, p7-2, p8-1, p8-2, p9-1, p9-2, p9-3, p9-4, p9-5]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3 (all), §4, §5.1 schema_meta+migrations, §6 (config + XDG), §7 (all traits), §8 module boundaries, §9 versioning, §10 errors+exit codes, §2.8 wire schema_version] contract_sections: [§3 (all), §4, §5.1 schema_meta+migrations, §6 (config + XDG), §7 (all traits), §8 module boundaries, §9 versioning, §10 errors+exit codes, §2.8 wire schema_version]
--- ---
@@ -14,56 +14,56 @@ contract_sections: [§3 (all), §4, §5.1 schema_meta+migrations, §6 (config +
## Goal ## Goal
Stand up the Cargo workspace (Rust 2024, resolver=3) with `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts. Stand up the Cargo workspace (Rust 2024, resolver=3) with `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts.
## Why now / why this size ## Why now / why this size
Every other task imports `kb-core`. If types or trait signatures wobble after this point, every downstream task spec drifts. This task is large but indivisible: types + traits + ID recipe + facade + CLI skeleton + wire schema stubs must land together so the rest of the workspace can compile against them. Every other task imports `kebab-core`. If types or trait signatures wobble after this point, every downstream task spec drifts. This task is large but indivisible: types + traits + ID recipe + facade + CLI skeleton + wire schema stubs must land together so the rest of the workspace can compile against them.
## Allowed dependencies ## Allowed dependencies
- workspace `[workspace.dependencies]`: `anyhow = "1"`, `thiserror = "2"`, `serde = { version = "1", features = ["derive"] }`, `serde_json = "1"`, `time = { version = "0.3", features = ["serde", "macros"] }`, `uuid = { version = "1", features = ["v7", "serde"] }`, `blake3 = "1"`, `tracing = "0.1"` - workspace `[workspace.dependencies]`: `anyhow = "1"`, `thiserror = "2"`, `serde = { version = "1", features = ["derive"] }`, `serde_json = "1"`, `time = { version = "0.3", features = ["serde", "macros"] }`, `uuid = { version = "1", features = ["v7", "serde"] }`, `blake3 = "1"`, `tracing = "0.1"`
- per crate: - per crate:
- `kb-core`: workspace deps + `serde_json::Map`, `serde-json-canonicalizer`, `unicode-normalization` - `kebab-core`: workspace deps + `serde_json::Map`, `serde-json-canonicalizer`, `unicode-normalization`
- `kb-parse-types`: workspace deps + `kb-core` ONLY (no parsers, no stores, no normalize). Defines parser intermediate representations per design §3.7b. - `kebab-parse-types`: workspace deps + `kebab-core` ONLY (no parsers, no stores, no normalize). Defines parser intermediate representations per design §3.7b.
- `kb-config`: workspace deps + `toml = "0.8"`, `dirs = "5"` (XDG paths) - `kebab-config`: workspace deps + `toml = "0.8"`, `dirs = "5"` (XDG paths)
- `kb-app`: workspace deps + `kb-core`, `kb-config`, `tracing-subscriber`, `tracing-appender` - `kebab-app`: workspace deps + `kebab-core`, `kebab-config`, `tracing-subscriber`, `tracing-appender`
- `kb-cli`: workspace deps + `kb-core`, `kb-config`, `kb-app`, `clap = { version = "4", features = ["derive"] }` - `kebab-cli`: workspace deps + `kebab-core`, `kebab-config`, `kebab-app`, `clap = { version = "4", features = ["derive"] }`
## Forbidden dependencies ## Forbidden dependencies
- `kb-core` MUST NOT depend on any other `kb-*` crate. - `kebab-core` MUST NOT depend on any other `kebab-*` crate.
- `kb-parse-types` MUST depend ONLY on `kb-core`. No parser libraries (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`, …), no other `kb-*` crate. - `kebab-parse-types` MUST depend ONLY on `kebab-core`. No parser libraries (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`, …), no other `kebab-*` crate.
- `kb-config` MUST NOT depend on `kb-app`, `kb-cli`, parsers, stores, embedders, search, llm, rag, tui, desktop. - `kebab-config` MUST NOT depend on `kebab-app`, `kebab-cli`, parsers, stores, embedders, search, llm, rag, tui, desktop.
- `kb-app` MUST NOT yet depend on parsers/stores/embedders/search/llm/rag (those crates do not exist yet — facade methods stub out and return `unimplemented!()` or `anyhow::bail!("not yet wired (Pn-i)")`). - `kebab-app` MUST NOT yet depend on parsers/stores/embedders/search/llm/rag (those crates do not exist yet — facade methods stub out and return `unimplemented!()` or `anyhow::bail!("not yet wired (Pn-i)")`).
- `kb-cli` MUST NOT call any non-`kb-app` crate directly. - `kebab-cli` MUST NOT call any non-`kebab-app` crate directly.
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| frozen design doc | Markdown | `docs/superpowers/specs/2026-04-27-kb-final-form-design.md` | | frozen design doc | Markdown | `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` |
| user `kb` invocation | command-line args | end user | | user `kebab` invocation | command-line args | end user |
## Outputs ## Outputs
| output | type | downstream consumer | | output | type | downstream consumer |
|--------|------|---------------------| |--------|------|---------------------|
| compiling workspace | Rust crates | every later task | | compiling workspace | Rust crates | every later task |
| `kb-core` types/traits | Rust API | every other crate | | `kebab-core` types/traits | Rust API | every other crate |
| `kb-core` ID functions | Rust API | parsers, normalize, chunkers, embedders, search, rag | | `kebab-core` ID functions | Rust API | parsers, normalize, chunkers, embedders, search, rag |
| `kb-config::Config` | Rust struct | every other crate | | `kebab-config::Config` | Rust struct | every other crate |
| `kb-app` facade methods (stubs) | Rust API | `kb-cli`, future TUI/desktop | | `kebab-app` facade methods (stubs) | Rust API | `kebab-cli`, future TUI/desktop |
| `kb` binary | executable | end user | | `kebab` binary | executable | end user |
| `docs/wire-schema/v1/*.schema.json` stubs | JSON Schema files | future wire emitters and consumers | | `docs/wire-schema/v1/*.schema.json` stubs | JSON Schema files | future wire emitters and consumers |
| `docs/spec/*.md` stubs (link to frozen design) | Markdown | future contributors | | `docs/spec/*.md` stubs (link to frozen design) | Markdown | future contributors |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
All types/traits below are defined in `kb-core` exactly per design §3 and §7 (no additions, no renames). Subagent must copy field-for-field. All types/traits below are defined in `kebab-core` exactly per design §3 and §7 (no additions, no renames). Subagent must copy field-for-field.
```rust ```rust
// ── kb-core ───────────────────────────────────────────────────────────────── // ── kebab-core ─────────────────────────────────────────────────────────────────
// Newtype IDs (design §3.1) — Display + FromStr implemented. // Newtype IDs (design §3.1) — Display + FromStr implemented.
pub struct AssetId(pub String); pub struct AssetId(pub String);
@@ -114,7 +114,7 @@ pub struct AudioRefBlock { /* per §3.4 */ }
pub enum Inline { /* per §3.4 */ } pub enum Inline { /* per §3.4 */ }
pub enum SourceSpan { /* per §3.4 */ } pub enum SourceSpan { /* per §3.4 */ }
// (ParsedBlock + parser intermediates live in kb-parse-types per design §3.7b — NOT in kb-core.) // (ParsedBlock + parser intermediates live in kebab-parse-types per design §3.7b — NOT in kebab-core.)
// Chunk + Citation (§3.5) // Chunk + Citation (§3.5)
pub struct Chunk { /* per §3.5 */ } pub struct Chunk { /* per §3.5 */ }
@@ -232,14 +232,14 @@ pub fn nfc(input: &str) -> String; // §4.1
``` ```
```rust ```rust
// ── kb-parse-types ────────────────────────────────────────────────────────── // ── kebab-parse-types ──────────────────────────────────────────────────────────
// Per design §3.7b. Defines parser intermediate representations consumed by // Per design §3.7b. Defines parser intermediate representations consumed by
// kb-normalize. Depends on kb-core only — never on parser libraries. // kebab-normalize. Depends on kebab-core only — never on parser libraries.
pub struct ParsedBlock { pub struct ParsedBlock {
pub kind: ParsedBlockKind, pub kind: ParsedBlockKind,
pub heading_path: Vec<String>, pub heading_path: Vec<String>,
pub source_span: kb_core::SourceSpan, pub source_span: kebab_core::SourceSpan,
pub payload: ParsedPayload, pub payload: ParsedPayload,
} }
@@ -247,16 +247,16 @@ pub enum ParsedBlockKind { Heading, Paragraph, List, Code, Table, Quote, ImageRe
pub enum ParsedPayload { pub enum ParsedPayload {
Heading { level: u8, text: String }, Heading { level: u8, text: String },
Paragraph { text: String, inlines: Vec<kb_core::Inline> }, Paragraph { text: String, inlines: Vec<kebab_core::Inline> },
List { ordered: bool, items: Vec<Vec<kb_core::Inline>> }, List { ordered: bool, items: Vec<Vec<kebab_core::Inline>> },
Code { lang: Option<String>, code: String }, Code { lang: Option<String>, code: String },
Table { headers: Vec<String>, rows: Vec<Vec<String>> }, Table { headers: Vec<String>, rows: Vec<Vec<String>> },
Quote { text: String, inlines: Vec<kb_core::Inline> }, Quote { text: String, inlines: Vec<kebab_core::Inline> },
ImageRef { src: String, alt: String }, ImageRef { src: String, alt: String },
AudioRef { src: String }, AudioRef { src: String },
} }
// `Inline` itself lives in kb-core (§3.4) — parse-types references it, never duplicates it. // `Inline` itself lives in kebab-core (§3.4) — parse-types references it, never duplicates it.
pub struct Warning { pub kind: WarningKind, pub note: String } pub struct Warning { pub kind: WarningKind, pub note: String }
pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed } pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed }
@@ -268,31 +268,31 @@ pub struct ParsedAudioSegment;
``` ```
```rust ```rust
// ── kb-config ─────────────────────────────────────────────────────────────── // ── kebab-config ───────────────────────────────────────────────────────────────
pub struct Config { /* full schema per §6.4 */ } pub struct Config { /* full schema per §6.4 */ }
impl Config { impl Config {
pub fn load(path: Option<&std::path::Path>) -> anyhow::Result<Self>; pub fn load(path: Option<&std::path::Path>) -> anyhow::Result<Self>;
pub fn from_file(path: &std::path::Path) -> anyhow::Result<Self>; pub fn from_file(path: &std::path::Path) -> anyhow::Result<Self>;
pub fn defaults() -> Self; pub fn defaults() -> Self;
pub fn apply_env(self, env: &std::collections::HashMap<String, String>) -> Self; pub fn apply_env(self, env: &std::collections::HashMap<String, String>) -> Self;
pub fn xdg_config_path() -> std::path::PathBuf; // ~/.config/kb/config.toml pub fn xdg_config_path() -> std::path::PathBuf; // ~/.config/kebab/config.toml
pub fn xdg_data_dir() -> std::path::PathBuf; // ~/.local/share/kb pub fn xdg_data_dir() -> std::path::PathBuf; // ~/.local/share/kebab
pub fn xdg_cache_dir() -> std::path::PathBuf; pub fn xdg_cache_dir() -> std::path::PathBuf;
pub fn xdg_state_dir() -> std::path::PathBuf; pub fn xdg_state_dir() -> std::path::PathBuf;
} }
``` ```
```rust ```rust
// ── kb-app ────────────────────────────────────────────────────────────────── // ── kebab-app ──────────────────────────────────────────────────────────────────
pub fn init_workspace(force: bool) -> anyhow::Result<()>; pub fn init_workspace(force: bool) -> anyhow::Result<()>;
pub fn ingest(scope: kb_core::SourceScope, summary_only: bool) -> anyhow::Result<kb_core::IngestReport>; pub fn ingest(scope: kebab_core::SourceScope, summary_only: bool) -> anyhow::Result<kebab_core::IngestReport>;
pub fn list_docs(filter: kb_core::DocFilter) -> anyhow::Result<Vec<kb_core::DocSummary>>; pub fn list_docs(filter: kebab_core::DocFilter) -> anyhow::Result<Vec<kebab_core::DocSummary>>;
pub fn inspect_doc(id: &kb_core::DocumentId) -> anyhow::Result<kb_core::CanonicalDocument>; pub fn inspect_doc(id: &kebab_core::DocumentId) -> anyhow::Result<kebab_core::CanonicalDocument>;
pub fn inspect_chunk(id: &kb_core::ChunkId) -> anyhow::Result<kb_core::Chunk>; pub fn inspect_chunk(id: &kebab_core::ChunkId) -> anyhow::Result<kebab_core::Chunk>;
pub fn search(query: kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>; pub fn search(query: kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<kb_core::Answer>; pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<kebab_core::Answer>;
pub fn doctor() -> anyhow::Result<DoctorReport>; pub fn doctor() -> anyhow::Result<DoctorReport>;
pub struct AskOpts { pub k: usize, pub explain: bool, pub mode: kb_core::SearchMode, pub temperature: Option<f32>, pub seed: Option<u64> } pub struct AskOpts { pub k: usize, pub explain: bool, pub mode: kebab_core::SearchMode, pub temperature: Option<f32>, pub seed: Option<u64> }
pub struct DoctorReport { pub ok: bool, pub checks: Vec<DoctorCheck> } pub struct DoctorReport { pub ok: bool, pub checks: Vec<DoctorCheck> }
pub struct DoctorCheck { pub name: String, pub ok: bool, pub detail: String, pub hint: Option<String> } pub struct DoctorCheck { pub name: String, pub ok: bool, pub detail: String, pub hint: Option<String> }
``` ```
@@ -300,9 +300,9 @@ pub struct DoctorCheck { pub name: String, pub ok: bool, pub detail: String, pub
P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; later phases replace bodies but never change signatures. P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; later phases replace bodies but never change signatures.
```rust ```rust
// ── kb-cli ────────────────────────────────────────────────────────────────── // ── kebab-cli ──────────────────────────────────────────────────────────────────
// clap subcommands: init | ingest | list (docs) | inspect (doc|chunk) | search | ask | doctor | eval (subcommand placeholder) // clap subcommands: init | ingest | list (docs) | inspect (doc|chunk) | search | ask | doctor | eval (subcommand placeholder)
// Each maps 1:1 to a kb_app function. Exit code mapping per §10. // Each maps 1:1 to a kebab_app function. Exit code mapping per §10.
``` ```
## Behavior contract ## Behavior contract
@@ -312,18 +312,18 @@ P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; late
- `id_from` uses `serde-json-canonicalizer` exactly as design §4.2 specifies and truncates blake3 to 32 hex chars. - `id_from` uses `serde-json-canonicalizer` exactly as design §4.2 specifies and truncates blake3 to 32 hex chars.
- `Citation::to_uri` emits W3C Media Fragments URIs per §0 Q3 (`#L<a>-L<b>`, `#p=<n>`, `#xywh=…`, `#caption`, `#t=hh:mm:ss,hh:mm:ss[&speaker=…]`). - `Citation::to_uri` emits W3C Media Fragments URIs per §0 Q3 (`#L<a>-L<b>`, `#p=<n>`, `#xywh=…`, `#caption`, `#t=hh:mm:ss,hh:mm:ss[&speaker=…]`).
- `Citation::parse` is the strict inverse (round-trip property). - `Citation::parse` is the strict inverse (round-trip property).
- `kb-config` resolves XDG paths via `dirs` crate; respects `XDG_CONFIG_HOME`, `XDG_DATA_HOME`, `XDG_CACHE_HOME`, `XDG_STATE_HOME` if set. - `kebab-config` resolves XDG paths via `dirs` crate; respects `XDG_CONFIG_HOME`, `XDG_DATA_HOME`, `XDG_CACHE_HOME`, `XDG_STATE_HOME` if set.
- Config layer order: defaults → file → env (`KB_<SECTION>_<KEY>`) → CLI flag (CLI override is applied by `kb-cli` after `Config::load`). - Config layer order: defaults → file → env (`KB_<SECTION>_<KEY>`) → CLI flag (CLI override is applied by `kebab-cli` after `Config::load`).
- `kb-cli` global flags: `--config <path>`, `--verbose`, `--debug`, `--json`, `--explain` (where applicable). On `--json`, output conforms to wire schema v1. - `kebab-cli` global flags: `--config <path>`, `--verbose`, `--debug`, `--json`, `--explain` (where applicable). On `--json`, output conforms to wire schema v1.
- `kb-cli` exit codes: 0 success, 1 no-hit/refusal, 2 error, 3 doctor unhealthy (per §10). - `kebab-cli` exit codes: 0 success, 1 no-hit/refusal, 2 error, 3 doctor unhealthy (per §10).
- All facade-returned wire objects emit `schema_version` per §2 (e.g., `"answer.v1"`, `"search_hit.v1"`). - All facade-returned wire objects emit `schema_version` per §2 (e.g., `"answer.v1"`, `"search_hit.v1"`).
## Storage / wire effects ## Storage / wire effects
- Filesystem: creates `~/.config/kb/`, `~/.local/share/kb/`, `~/KnowledgeBase/` only when `kb init` runs; never on `Config::load`. - Filesystem: creates `~/.config/kebab/`, `~/.local/share/kebab/`, `~/KnowledgeBase/` only when `kebab init` runs; never on `Config::load`.
- Wire schemas: ships `docs/wire-schema/v1/{citation,search_hit,answer,ingest_report,doc_summary,chunk_inspection,doctor}.schema.json` as **stubs** declaring the top-level `schema_version` and required fields per §2. Full property validation can land later. - Wire schemas: ships `docs/wire-schema/v1/{citation,search_hit,answer,ingest_report,doc_summary,chunk_inspection,doctor}.schema.json` as **stubs** declaring the top-level `schema_version` and required fields per §2. Full property validation can land later.
- DB: workspace ships `migrations/V001__init.sql` containing **only** §5.1 `schema_meta` + `migrations` tables (the full schema lands in p1-6's migration file or p0-1 may pre-stage the empty migrations directory; choose the former to keep this task within `kb-core`/`kb-config`/`kb-app`/`kb-cli` scope). - DB: workspace ships `migrations/V001__init.sql` containing **only** §5.1 `schema_meta` + `migrations` tables (the full schema lands in p1-6's migration file or p0-1 may pre-stage the empty migrations directory; choose the former to keep this task within `kebab-core`/`kebab-config`/`kebab-app`/`kebab-cli` scope).
- Logging: `tracing` initialized in `kb-cli`; daily-rolling file in `~/.local/state/kb/logs/`. - Logging: `tracing` initialized in `kebab-cli`; daily-rolling file in `~/.local/state/kebab/logs/`.
## Test plan ## Test plan
@@ -336,20 +336,20 @@ P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; late
| unit | newtype `Display`/`FromStr` rejects invalid lengths/chars | inline | | unit | newtype `Display`/`FromStr` rejects invalid lengths/chars | inline |
| unit | `Config::defaults` + env override + CLI override produces expected merged config | inline | | unit | `Config::defaults` + env override + CLI override produces expected merged config | inline |
| snapshot | `Config::defaults` JSON serde stable | inline (round-trip) | | snapshot | `Config::defaults` JSON serde stable | inline (round-trip) |
| smoke | `kb --help`, `kb init`, `kb doctor` run; doctor reports config_loaded ✓ data_dir_writable ✓ even with no DB present (downstream checks may fail with hint) | tmp `XDG_*` env | | smoke | `kebab --help`, `kebab init`, `kebab doctor` run; doctor reports config_loaded ✓ data_dir_writable ✓ even with no DB present (downstream checks may fail with hint) | tmp `XDG_*` env |
| build | `cargo check --workspace` and `cargo test --workspace` pass | repo | | build | `cargo check --workspace` and `cargo test --workspace` pass | repo |
All tests must run with no network, no Ollama, no models. All tests must run with no network, no Ollama, no models.
## Definition of Done ## Definition of Done
- [ ] `Cargo.toml` workspace lists `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` and resolver=3, edition 2024 - [ ] `Cargo.toml` workspace lists `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` and resolver=3, edition 2024
- [ ] `cargo check --workspace` passes - [ ] `cargo check --workspace` passes
- [ ] `cargo test --workspace` passes - [ ] `cargo test --workspace` passes
- [ ] `kb-parse-types` `cargo tree` shows ONLY `kb-core` + `serde`/`thiserror` style deps (no parser libs, no other `kb-*`) - [ ] `kebab-parse-types` `cargo tree` shows ONLY `kebab-core` + `serde`/`thiserror` style deps (no parser libs, no other `kebab-*`)
- [ ] `kb --help` prints subcommands - [ ] `kebab --help` prints subcommands
- [ ] `kb init` creates XDG dirs idempotently and writes `config.toml` - [ ] `kebab init` creates XDG dirs idempotently and writes `config.toml`
- [ ] `kb doctor` returns wire JSON conforming to `doctor.v1` (in `--json` mode) - [ ] `kebab doctor` returns wire JSON conforming to `doctor.v1` (in `--json` mode)
- [ ] `docs/wire-schema/v1/*.schema.json` stubs exist (7 files: citation, search_hit, answer, ingest_report, doc_summary, chunk_inspection, doctor) - [ ] `docs/wire-schema/v1/*.schema.json` stubs exist (7 files: citation, search_hit, answer, ingest_report, doc_summary, chunk_inspection, doctor)
- [ ] `docs/spec/` stubs exist linking to the frozen design (one file per: domain-model, ids, canonical-document, chunk-policy, citation-policy, module-boundaries, ai-generation-guidelines) - [ ] `docs/spec/` stubs exist linking to the frozen design (one file per: domain-model, ids, canonical-document, chunk-policy, citation-policy, module-boundaries, ai-generation-guidelines)
- [ ] `fixtures/` root directory created with all subdirectories that downstream tasks reference: `fixtures/markdown/`, `fixtures/source-fs/`, `fixtures/search/lexical/`, `fixtures/search/hybrid/`, `fixtures/embed/`, `fixtures/vector/`, `fixtures/rag/`, `fixtures/eval/`, `fixtures/image/`, `fixtures/pdf/`, `fixtures/audio/`. Each subdir gets a `.gitkeep` so it tracks. P1 ships at minimum `fixtures/markdown/{simple-note,nested-headings,code-and-table}.md` (per epic phase-0); other dirs stay empty until their phase lands. - [ ] `fixtures/` root directory created with all subdirectories that downstream tasks reference: `fixtures/markdown/`, `fixtures/source-fs/`, `fixtures/search/lexical/`, `fixtures/search/hybrid/`, `fixtures/embed/`, `fixtures/vector/`, `fixtures/rag/`, `fixtures/eval/`, `fixtures/image/`, `fixtures/pdf/`, `fixtures/audio/`. Each subdir gets a `.gitkeep` so it tracks. P1 ships at minimum `fixtures/markdown/{simple-note,nested-headings,code-and-table}.md` (per epic phase-0); other dirs stay empty until their phase lands.
@@ -361,12 +361,12 @@ All tests must run with no network, no Ollama, no models.
- Any parser / store / embedder / search / llm / rag / tui / desktop logic (downstream phases). - Any parser / store / embedder / search / llm / rag / tui / desktop logic (downstream phases).
- Full schema migrations (most DDL lands in p1-6 / p2-1 / p3-3). - Full schema migrations (most DDL lands in p1-6 / p2-1 / p3-3).
- Wire schema deep validation (only required fields + `schema_version` checked here). - Wire schema deep validation (only required fields + `schema_version` checked here).
- Real `kb-app` business logic (functions stub with `unimplemented!()` or explicit `bail!`). - Real `kebab-app` business logic (functions stub with `unimplemented!()` or explicit `bail!`).
## Risks / notes ## Risks / notes
- ID recipe is the contract that every later record depends on. Any change after this task lands forces a `parser_version` / `chunker_version` / `embedding_version` cascade per §9. Treat changes as schema migrations and update the design doc first. - ID recipe is the contract that every later record depends on. Any change after this task lands forces a `parser_version` / `chunker_version` / `embedding_version` cascade per §9. Treat changes as schema migrations and update the design doc first.
- Newtype IDs use `String` (not `[u8; 16]`) to keep serde simple; tests must still enforce 32-char hex constraint on `FromStr`. - Newtype IDs use `String` (not `[u8; 16]`) to keep serde simple; tests must still enforce 32-char hex constraint on `FromStr`.
- `kb-app` stubs must use `bail!` not `panic!` so the CLI exits with code 2 cleanly per §10. - `kebab-app` stubs must use `bail!` not `panic!` so the CLI exits with code 2 cleanly per §10.
- `clap` v4 derive: subcommand `inspect` has nested `doc` / `chunk` variants; ensure exit code 0/1/2 mapping wraps the facade call uniformly. - `clap` v4 derive: subcommand `inspect` has nested `doc` / `chunk` variants; ensure exit code 0/1/2 mapping wraps the facade call uniformly.
- XDG path discovery on macOS: spec uses XDG (not `Application Support`) per §6.1 — `dirs` crate honors XDG env vars; tests must set them explicitly. - XDG path discovery on macOS: spec uses XDG (not `Application Support`) per §6.1 — `dirs` crate honors XDG env vars; tests must set them explicitly.

View File

@@ -1,12 +1,12 @@
--- ---
phase: P1 phase: P1
component: kb-source-fs component: kebab-source-fs
task_id: p1-1 task_id: p1-1
title: "Local filesystem source connector" title: "Local filesystem source connector"
status: completed status: completed
depends_on: [p0-1] depends_on: [p0-1]
unblocks: [p1-2, p1-3, p1-4, p1-5, p1-6] unblocks: [p1-2, p1-3, p1-4, p1-5, p1-6]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.3, §6.2, §6.6, §7.1, §7.2 SourceConnector, §8] contract_sections: [§3.3, §6.2, §6.6, §7.1, §7.2 SourceConnector, §8]
--- ---
@@ -22,8 +22,8 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `ignore` (gitignore semantics) - `ignore` (gitignore semantics)
- `blake3` - `blake3`
- `walkdir` - `walkdir`
@@ -34,21 +34,21 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
## Forbidden dependencies ## Forbidden dependencies
- `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `SourceScope` | `kb_core::SourceScope` | `kb-app` from config | | `SourceScope` | `kebab_core::SourceScope` | `kebab-app` from config |
| filesystem | `&Path` | OS | | filesystem | `&Path` | OS |
| `.kbignore` | text file | workspace root, optional | | `.kebabignore` | text file | workspace root, optional |
## Outputs ## Outputs
| output | type | downstream consumer | | output | type | downstream consumer |
|--------|------|---------------------| |--------|------|---------------------|
| `Vec<RawAsset>` | `kb_core::RawAsset` | `kb-parse-md`, asset writer in `kb-store-sqlite` (via `kb-app`) | | `Vec<RawAsset>` | `kebab_core::RawAsset` | `kebab-parse-md`, asset writer in `kebab-store-sqlite` (via `kebab-app`) |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -56,11 +56,11 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
pub struct FsSourceConnector { /* internal */ } pub struct FsSourceConnector { /* internal */ }
impl FsSourceConnector { impl FsSourceConnector {
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>; pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
} }
impl kb_core::SourceConnector for FsSourceConnector { impl kebab_core::SourceConnector for FsSourceConnector {
fn scan(&self, scope: &kb_core::SourceScope) -> anyhow::Result<Vec<kb_core::RawAsset>>; fn scan(&self, scope: &kebab_core::SourceScope) -> anyhow::Result<Vec<kebab_core::RawAsset>>;
} }
``` ```
@@ -70,7 +70,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
- `asset_id` derived per design §4.2 from `blake3(raw bytes)` full hex. - `asset_id` derived per design §4.2 from `blake3(raw bytes)` full hex.
- `media_type` selected from extension + libmagic-like sniff fallback (`.md` → Markdown, others fall through to `MediaType::Other`). - `media_type` selected from extension + libmagic-like sniff fallback (`.md` → Markdown, others fall through to `MediaType::Other`).
- `discovered_at` = current `OffsetDateTime::now_utc()` at scan time. - `discovered_at` = current `OffsetDateTime::now_utc()` at scan time.
- Combine `config.workspace.exclude` `.kbignore` for filter (union; ordering does not matter). - Combine `config.workspace.exclude` `.kebabignore` for filter (union; ordering does not matter).
- Symbolic links: follow once, detect cycles via `canonicalize` + visited set. - Symbolic links: follow once, detect cycles via `canonicalize` + visited set.
- Files larger than `storage.copy_threshold_mb` MB → emit `AssetStorage::Reference { path, sha }` (do not copy bytes here; copying is done by the asset writer task). - Files larger than `storage.copy_threshold_mb` MB → emit `AssetStorage::Reference { path, sha }` (do not copy bytes here; copying is done by the asset writer task).
- Idempotent: same input → same `Vec<RawAsset>` (sort by `workspace_path`). - Idempotent: same input → same `Vec<RawAsset>` (sort by `workspace_path`).
@@ -78,7 +78,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
## Storage / wire effects ## Storage / wire effects
- Reads: filesystem under `config.workspace.root`. - Reads: filesystem under `config.workspace.root`.
- Writes: nothing. (Asset copy is handled by the asset writer in `kb-store-sqlite`.) - Writes: nothing. (Asset copy is handled by the asset writer in `kebab-store-sqlite`.)
## Test plan ## Test plan
@@ -87,25 +87,25 @@ impl kb_core::SourceConnector for FsSourceConnector {
| unit | POSIX path normalization | inline cases incl. `./a/b.md`, `a//b.md`, `a/b.md` → identical | | unit | POSIX path normalization | inline cases incl. `./a/b.md`, `a//b.md`, `a/b.md` → identical |
| unit | blake3 of known bytes matches expected hex | inline | | unit | blake3 of known bytes matches expected hex | inline |
| unit | gitignore filter (`*.tmp`, `node_modules/**`) excludes correctly | tmp tree built in test | | unit | gitignore filter (`*.tmp`, `node_modules/**`) excludes correctly | tmp tree built in test |
| unit | `.kbignore` config exclude works | tmp tree | | unit | `.kebabignore` config exclude works | tmp tree |
| unit | symlink cycle does not loop | tmp tree with `a -> b -> a` | | unit | symlink cycle does not loop | tmp tree with `a -> b -> a` |
| snapshot | `Vec<RawAsset>` serialized JSON for fixture tree is stable | `fixtures/source-fs/tree-1` | | snapshot | `Vec<RawAsset>` serialized JSON for fixture tree is stable | `fixtures/source-fs/tree-1` |
| determinism | re-running scan twice produces byte-identical JSON | `fixtures/source-fs/tree-1` | | determinism | re-running scan twice produces byte-identical JSON | `fixtures/source-fs/tree-1` |
All tests run under `cargo test -p kb-source-fs` with no network and no model. All tests run under `cargo test -p kebab-source-fs` with no network and no model.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-source-fs` passes - [ ] `cargo check -p kebab-source-fs` passes
- [ ] `cargo test -p kb-source-fs` passes - [ ] `cargo test -p kebab-source-fs` passes
- [ ] Snapshot test `fixtures/source-fs/tree-1` round-trips deterministically - [ ] Snapshot test `fixtures/source-fs/tree-1` round-trips deterministically
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kb-source-fs`) - [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kebab-source-fs`)
- [ ] PR description links to design §3.3, §6.2, §7.2 - [ ] PR description links to design §3.3, §6.2, §7.2
## Out of scope ## Out of scope
- File watching (P+). - File watching (P+).
- Asset copy/reference storage on disk (`kb-store-sqlite` task p1-6). - Asset copy/reference storage on disk (`kebab-store-sqlite` task p1-6).
- Non-fs source connectors (HTTP, S3 — P+). - Non-fs source connectors (HTTP, S3 — P+).
## Risks / notes ## Risks / notes

View File

@@ -1,29 +1,29 @@
--- ---
phase: P1 phase: P1
component: kb-parse-md (frontmatter submodule) component: kebab-parse-md (frontmatter submodule)
task_id: p1-2 task_id: p1-2
title: "Markdown frontmatter parsing → Metadata" title: "Markdown frontmatter parsing → Metadata"
status: completed status: completed
depends_on: [p0-1] depends_on: [p0-1]
unblocks: [p1-4] unblocks: [p1-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [design §3.6 Metadata, design §3.7b kb-parse-types (Warning), design §0 Q9 frontmatter, design §10 errors] contract_sections: [design §3.6 Metadata, design §3.7b kebab-parse-types (Warning), design §0 Q9 frontmatter, design §10 errors]
--- ---
# p1-2 — Markdown frontmatter parsing # p1-2 — Markdown frontmatter parsing
## Goal ## Goal
Parse YAML/TOML frontmatter from Markdown bytes into `kb_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`. Parse YAML/TOML frontmatter from Markdown bytes into `kebab_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
## Why now / why this size ## Why now / why this size
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kb-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9. Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kebab-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-parse-types` (provides shared `Warning` + `WarningKind` per design §3.7b) - `kebab-parse-types` (provides shared `Warning` + `WarningKind` per design §3.7b)
- `serde` - `serde`
- `serde_yaml` (or `yaml-rust2`) for YAML - `serde_yaml` (or `yaml-rust2`) for YAML
- `toml` for TOML - `toml` for TOML
@@ -33,7 +33,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
## Forbidden dependencies ## Forbidden dependencies
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-tui`, `kb-desktop`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `pulldown-cmark` (block parser is a sibling task) - `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-tui`, `kebab-desktop`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `pulldown-cmark` (block parser is a sibling task)
## Inputs ## Inputs
@@ -46,7 +46,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `(Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)` | tuple | `kb-normalize` → CanonicalDocument | | `(Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)` | tuple | `kebab-normalize` → CanonicalDocument |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -54,10 +54,10 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
pub fn parse_frontmatter( pub fn parse_frontmatter(
bytes: &[u8], bytes: &[u8],
hints: &BodyHints, hints: &BodyHints,
) -> anyhow::Result<(kb_core::Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)>; ) -> anyhow::Result<(kebab_core::Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)>;
``` ```
`Warning` / `WarningKind` come from `kb-parse-types` (shared with `p1-3` blocks parser and downstream `kb-normalize`). `FrontmatterSpan` is crate-internal; if any new public type is needed, STOP and update the frozen design doc first. `Warning` / `WarningKind` come from `kebab-parse-types` (shared with `p1-3` blocks parser and downstream `kebab-normalize`). `FrontmatterSpan` is crate-internal; if any new public type is needed, STOP and update the frozen design doc first.
## Behavior contract ## Behavior contract
@@ -68,7 +68,7 @@ pub fn parse_frontmatter(
- `source_type` default `markdown`; `trust_level` default `primary`. - `source_type` default `markdown`; `trust_level` default `primary`.
- `aliases`, `tags` default empty. - `aliases`, `tags` default empty.
- Unknown keys → `metadata.user` (`serde_json::Map`), preserved verbatim, no warning. - Unknown keys → `metadata.user` (`serde_json::Map`), preserved verbatim, no warning.
- Unknown enum value (e.g. `trust_level: weird`) → emit `kb_parse_types::Warning { kind: WarningKind::MalformedFrontmatter, note: "unknown trust_level=weird, defaulted to primary" }` + ingest continues with default. - Unknown enum value (e.g. `trust_level: weird`) → emit `kebab_parse_types::Warning { kind: WarningKind::MalformedFrontmatter, note: "unknown trust_level=weird, defaulted to primary" }` + ingest continues with default.
- Malformed YAML → frontmatter discarded, body still parsed, `Warning { kind: WarningKind::MalformedFrontmatter, note: "<error msg>" }` emitted. - Malformed YAML → frontmatter discarded, body still parsed, `Warning { kind: WarningKind::MalformedFrontmatter, note: "<error msg>" }` emitted.
- No frontmatter at all → defaults applied silently. - No frontmatter at all → defaults applied silently.
- `id:` field captured into `metadata.user_id_alias` (alias only — does NOT influence `doc_id` per design §4.2). - `id:` field captured into `metadata.user_id_alias` (alias only — does NOT influence `doc_id` per design §4.2).
@@ -91,12 +91,12 @@ pub fn parse_frontmatter(
| snapshot | `fixtures/markdown/frontmatter-only.md` produces stable JSON | fixture | | snapshot | `fixtures/markdown/frontmatter-only.md` produces stable JSON | fixture |
| snapshot | mixed-language body with no `lang:` detects `ko` or `en` | `fixtures/markdown/mixed-lang.md` | | snapshot | mixed-language body with no `lang:` detects `ko` or `en` | `fixtures/markdown/mixed-lang.md` |
All tests under `cargo test -p kb-parse-md --lib frontmatter`. All tests under `cargo test -p kebab-parse-md --lib frontmatter`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-parse-md` passes - [ ] `cargo check -p kebab-parse-md` passes
- [ ] `cargo test -p kb-parse-md frontmatter` passes - [ ] `cargo test -p kebab-parse-md frontmatter` passes
- [ ] No `pulldown-cmark` import in this submodule - [ ] No `pulldown-cmark` import in this submodule
- [ ] Snapshot tests stable across two consecutive runs - [ ] Snapshot tests stable across two consecutive runs
- [ ] PR links design §0 Q9, §3.6 - [ ] PR links design §0 Q9, §3.6

View File

@@ -1,20 +1,20 @@
--- ---
phase: P1 phase: P1
component: kb-parse-md (blocks submodule) component: kebab-parse-md (blocks submodule)
task_id: p1-3 task_id: p1-3
title: "Markdown body → Block tree with line spans" title: "Markdown body → Block tree with line spans"
status: completed status: completed
depends_on: [p0-1] depends_on: [p0-1]
unblocks: [p1-4] unblocks: [p1-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 Block, §3.4 SourceSpan, §3.7b kb-parse-types, §0 Q3 citation] contract_sections: [§3.4 Block, §3.4 SourceSpan, §3.7b kebab-parse-types, §0 Q3 citation]
--- ---
# p1-3 — Markdown body → Block tree # p1-3 — Markdown body → Block tree
## Goal ## Goal
Parse Markdown body bytes into a flat `Vec<kb_parse_types::ParsedBlock>` with heading paths and line ranges preserved, ready for `kb-normalize` to lift into `CanonicalDocument`. Parse Markdown body bytes into a flat `Vec<kebab_parse_types::ParsedBlock>` with heading paths and line ranges preserved, ready for `kebab-normalize` to lift into `CanonicalDocument`.
## Why now / why this size ## Why now / why this size
@@ -22,15 +22,15 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-parse-types` (defines `ParsedBlock`, `ParsedPayload`, `Warning`) - `kebab-parse-types` (defines `ParsedBlock`, `ParsedPayload`, `Warning`)
- `pulldown-cmark` (CommonMark with source-map; GFM tables enabled via feature) - `pulldown-cmark` (CommonMark with source-map; GFM tables enabled via feature)
- `serde` - `serde`
- `thiserror` - `thiserror`
## Forbidden dependencies ## Forbidden dependencies
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `kb-tui`, `kb-desktop`, `comrak` (alternative parser; pick one) - `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `kebab-tui`, `kebab-desktop`, `comrak` (alternative parser; pick one)
## Inputs ## Inputs
@@ -43,8 +43,8 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `Vec<kb_parse_types::ParsedBlock>` | shared type from `kb-parse-types` | `kb-normalize` | | `Vec<kebab_parse_types::ParsedBlock>` | shared type from `kebab-parse-types` | `kebab-normalize` |
| `Vec<kb_parse_types::Warning>` | shared type | propagated into Provenance | | `Vec<kebab_parse_types::Warning>` | shared type | propagated into Provenance |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -52,10 +52,10 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
pub fn parse_blocks( pub fn parse_blocks(
body: &[u8], body: &[u8],
body_offset_lines: u32, body_offset_lines: u32,
) -> anyhow::Result<(Vec<kb_parse_types::ParsedBlock>, Vec<kb_parse_types::Warning>)>; ) -> anyhow::Result<(Vec<kebab_parse_types::ParsedBlock>, Vec<kebab_parse_types::Warning>)>;
``` ```
`ParsedBlock` is defined in `kb-parse-types` (design §3.7b). `kb-parse-md` does NOT define its own; it consumes the shared type. Lift to `kb_core::Block` (with `BlockId` assignment) is `kb-normalize`'s job (p1-4). `ParsedBlock` is defined in `kebab-parse-types` (design §3.7b). `kebab-parse-md` does NOT define its own; it consumes the shared type. Lift to `kebab_core::Block` (with `BlockId` assignment) is `kebab-normalize`'s job (p1-4).
## Behavior contract ## Behavior contract
@@ -63,9 +63,9 @@ pub fn parse_blocks(
- Heading tree: every block records its ancestor heading texts in order (e.g., `["아키텍처", "Chunking 정책"]`). - Heading tree: every block records its ancestor heading texts in order (e.g., `["아키텍처", "Chunking 정책"]`).
- Code blocks: `ParsedPayload::Code { lang: Some("rust"), code }` — fenced content not split. - Code blocks: `ParsedPayload::Code { lang: Some("rust"), code }` — fenced content not split.
- Tables: GFM tables produce `ParsedPayload::Table { headers, rows }`; if a table cell is malformed, fall back to `ParsedPayload::Paragraph` + `Warning::MalformedTable`. - Tables: GFM tables produce `ParsedPayload::Table { headers, rows }`; if a table cell is malformed, fall back to `ParsedPayload::Paragraph` + `Warning::MalformedTable`.
- Image references: `![alt](src)` produces `ParsedPayload::ImageRef { src, alt }`. `AssetId` resolution happens later in `kb-normalize` (when image src can be matched to a workspace asset). - Image references: `![alt](src)` produces `ParsedPayload::ImageRef { src, alt }`. `AssetId` resolution happens later in `kebab-normalize` (when image src can be matched to a workspace asset).
- Lists: ordered/unordered preserved via `ParsedPayload::List { ordered, items }`; nested list items flattened so each `items[i]` is a `Vec<kb_core::Inline>` for one top-level item. - Lists: ordered/unordered preserved via `ParsedPayload::List { ordered, items }`; nested list items flattened so each `items[i]` is a `Vec<kebab_core::Inline>` for one top-level item.
- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per `kb_core::Inline` per design §3.4). Drop other inlines silently. - Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per `kebab_core::Inline` per design §3.4). Drop other inlines silently.
- Malformed input never panics. Worst case: empty `Vec<ParsedBlock>` + `Warning::ExtractFailed`. - Malformed input never panics. Worst case: empty `Vec<ParsedBlock>` + `Warning::ExtractFailed`.
## Storage / wire effects ## Storage / wire effects
@@ -86,12 +86,12 @@ pub fn parse_blocks(
| snapshot | `fixtures/markdown/nested-headings.md` → ParsedBlock JSON stable | fixture | | snapshot | `fixtures/markdown/nested-headings.md` → ParsedBlock JSON stable | fixture |
| snapshot | `fixtures/markdown/code-and-table.md` → JSON stable | fixture | | snapshot | `fixtures/markdown/code-and-table.md` → JSON stable | fixture |
All tests under `cargo test -p kb-parse-md --lib blocks`. All tests under `cargo test -p kebab-parse-md --lib blocks`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-parse-md` passes - [ ] `cargo check -p kebab-parse-md` passes
- [ ] `cargo test -p kb-parse-md blocks` passes - [ ] `cargo test -p kebab-parse-md blocks` passes
- [ ] Snapshot tests stable across two runs - [ ] Snapshot tests stable across two runs
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.4 - [ ] PR links design §3.4
@@ -99,7 +99,7 @@ All tests under `cargo test -p kb-parse-md --lib blocks`.
## Out of scope ## Out of scope
- Frontmatter (p1-2). - Frontmatter (p1-2).
- Lifting `kb_parse_types::ParsedBlock``kb_core::Block` with `BlockId` (p1-4 normalize). - Lifting `kebab_parse_types::ParsedBlock``kebab_core::Block` with `BlockId` (p1-4 normalize).
- Chunking (p1-5). - Chunking (p1-5).
## Risks / notes ## Risks / notes

View File

@@ -1,30 +1,30 @@
--- ---
phase: P1 phase: P1
component: kb-normalize component: kebab-normalize
task_id: p1-4 task_id: p1-4
title: "Lift parser output → CanonicalDocument with deterministic IDs" title: "Lift parser output → CanonicalDocument with deterministic IDs"
status: completed status: completed
depends_on: [p1-2, p1-3] depends_on: [p1-2, p1-3]
unblocks: [p1-5, p1-6] unblocks: [p1-5, p1-6]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4, §3.7b kb-parse-types, §4 ID recipe, §3.6 Provenance, §8 module boundaries] contract_sections: [§3.4, §3.7b kebab-parse-types, §4 ID recipe, §3.6 Provenance, §8 module boundaries]
--- ---
# p1-4 — Lift to CanonicalDocument # p1-4 — Lift to CanonicalDocument
## Goal ## Goal
Combine `Metadata` (p1-2) + `Vec<kb_parse_types::ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a `kb_core::CanonicalDocument` with deterministic `doc_id` and `block_id`s per design §4 recipe. Combine `Metadata` (p1-2) + `Vec<kebab_parse_types::ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a `kebab_core::CanonicalDocument` with deterministic `doc_id` and `block_id`s per design §4 recipe.
## Why now / why this size ## Why now / why this size
Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate. Single responsibility: ID generation + struct assembly. Keeps `kebab-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-parse-types` (input shapes — `ParsedBlock`, `ParsedPayload`, `Warning`) - `kebab-parse-types` (input shapes — `ParsedBlock`, `ParsedPayload`, `Warning`)
- `kb-config` - `kebab-config`
- `serde` - `serde`
- `serde-json-canonicalizer` (canonical JSON for ID hashing) - `serde-json-canonicalizer` (canonical JSON for ID hashing)
- `blake3` - `blake3`
@@ -34,36 +34,36 @@ Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` pure
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md` (consumed via shared `kb-parse-types` only — `kb-parse-md` must NOT appear in this crate's `cargo tree`), `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-source-fs`, `kebab-parse-md` (consumed via shared `kebab-parse-types` only — `kebab-parse-md` must NOT appear in this crate's `cargo tree`), `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `RawAsset` | `kb_core::RawAsset` | p1-1 | | `RawAsset` | `kebab_core::RawAsset` | p1-1 |
| `Metadata` + frontmatter span + warnings | `(kb_core::Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)` | p1-2 | | `Metadata` + frontmatter span + warnings | `(kebab_core::Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)` | p1-2 |
| `Vec<ParsedBlock>` + warnings | `(Vec<kb_parse_types::ParsedBlock>, Vec<kb_parse_types::Warning>)` | p1-3 | | `Vec<ParsedBlock>` + warnings | `(Vec<kebab_parse_types::ParsedBlock>, Vec<kebab_parse_types::Warning>)` | p1-3 |
| `parser_version` | `kb_core::ParserVersion` | constant in `kb-parse-md` | | `parser_version` | `kebab_core::ParserVersion` | constant in `kebab-parse-md` |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk`, `kb-store-sqlite` | | `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk`, `kebab-store-sqlite` |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
```rust ```rust
pub fn build_canonical_document( pub fn build_canonical_document(
asset: &kb_core::RawAsset, asset: &kebab_core::RawAsset,
metadata: kb_core::Metadata, metadata: kebab_core::Metadata,
blocks: Vec<kb_parse_types::ParsedBlock>, blocks: Vec<kebab_parse_types::ParsedBlock>,
parser_version: &kb_core::ParserVersion, parser_version: &kebab_core::ParserVersion,
warnings: Vec<kb_parse_types::Warning>, warnings: Vec<kebab_parse_types::Warning>,
) -> anyhow::Result<kb_core::CanonicalDocument>; ) -> anyhow::Result<kebab_core::CanonicalDocument>;
pub fn id_for_doc(workspace_path: &kb_core::WorkspacePath, asset: &kb_core::AssetId, parser_version: &kb_core::ParserVersion) -> kb_core::DocumentId; pub fn id_for_doc(workspace_path: &kebab_core::WorkspacePath, asset: &kebab_core::AssetId, parser_version: &kebab_core::ParserVersion) -> kebab_core::DocumentId;
pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kb_core::SourceSpan) -> kb_core::BlockId; pub fn id_for_block(doc: &kebab_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kebab_core::SourceSpan) -> kebab_core::BlockId;
``` ```
## Behavior contract ## Behavior contract
@@ -91,14 +91,14 @@ pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[Strin
| unit | provenance contains Discovered/Parsed/Normalized in order | inline | | unit | provenance contains Discovered/Parsed/Normalized in order | inline |
| snapshot | `fixtures/markdown/code-and-table.md` → CanonicalDocument JSON stable (incl. all IDs) | fixture | | snapshot | `fixtures/markdown/code-and-table.md` → CanonicalDocument JSON stable (incl. all IDs) | fixture |
All tests under `cargo test -p kb-normalize`. All tests under `cargo test -p kebab-normalize`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-normalize` passes - [ ] `cargo check -p kebab-normalize` passes
- [ ] `cargo test -p kb-normalize` passes - [ ] `cargo test -p kebab-normalize` passes
- [ ] Determinism test runs ≥ 1000 iterations under 1 second - [ ] Determinism test runs ≥ 1000 iterations under 1 second
- [ ] No `kb-parse-md` (or any other parser crate) appears in `cargo tree -p kb-normalize` — input types come from `kb-parse-types` only - [ ] No `kebab-parse-md` (or any other parser crate) appears in `cargo tree -p kebab-normalize` — input types come from `kebab-parse-types` only
- [ ] PR links design §3.7b, §4.2, §4.3, §8 - [ ] PR links design §3.7b, §4.2, §4.3, §8
## Out of scope ## Out of scope

View File

@@ -1,12 +1,12 @@
--- ---
phase: P1 phase: P1
component: kb-chunk component: kebab-chunk
task_id: p1-5 task_id: p1-5
title: "Markdown heading-aware chunker (md-heading-v1)" title: "Markdown heading-aware chunker (md-heading-v1)"
status: completed status: completed
depends_on: [p1-4] depends_on: [p1-4]
unblocks: [p1-6, p2-2, p3-2] unblocks: [p1-6, p2-2, p3-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §7.2 Chunker, §0 Q3 citation] contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §7.2 Chunker, §0 Q3 citation]
--- ---
@@ -22,8 +22,8 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `serde` - `serde`
- `blake3` (policy_hash) - `blake3` (policy_hash)
- `serde-json-canonicalizer` - `serde-json-canonicalizer`
@@ -31,30 +31,30 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize` (consumes `CanonicalDocument` only via `kb-core`), `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize` (consumes `CanonicalDocument` only via `kebab-core`), `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 | | `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` from config | | `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` from config |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite` (p1-6), `kb-embed*` (P3) | | `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite` (p1-6), `kebab-embed*` (P3) |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
```rust ```rust
pub struct MdHeadingV1Chunker; pub struct MdHeadingV1Chunker;
impl kb_core::Chunker for MdHeadingV1Chunker { impl kebab_core::Chunker for MdHeadingV1Chunker {
fn chunker_version(&self) -> kb_core::ChunkerVersion; fn chunker_version(&self) -> kebab_core::ChunkerVersion;
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String; fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>; fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
} }
``` ```
@@ -91,12 +91,12 @@ impl kb_core::Chunker for MdHeadingV1Chunker {
| determinism | identical input + identical policy → identical chunk_ids | inline | | determinism | identical input + identical policy → identical chunk_ids | inline |
| snapshot | `fixtures/markdown/long-section.md` → Vec<Chunk> JSON stable | fixture | | snapshot | `fixtures/markdown/long-section.md` → Vec<Chunk> JSON stable | fixture |
All tests under `cargo test -p kb-chunk`. All tests under `cargo test -p kebab-chunk`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-chunk` passes - [ ] `cargo check -p kebab-chunk` passes
- [ ] `cargo test -p kb-chunk` passes - [ ] `cargo test -p kebab-chunk` passes
- [ ] Snapshot stable across two runs - [ ] Snapshot stable across two runs
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.5, §4.2 - [ ] PR links design §3.5, §4.2

View File

@@ -1,12 +1,12 @@
--- ---
phase: P1 phase: P1
component: kb-store-sqlite (P1 subset) component: kebab-store-sqlite (P1 subset)
task_id: p1-6 task_id: p1-6
title: "SQLite store: assets/documents/blocks/chunks + asset writer + migrations" title: "SQLite store: assets/documents/blocks/chunks + asset writer + migrations"
status: completed status: completed
depends_on: [p1-1, p1-4, p1-5] depends_on: [p1-1, p1-4, p1-5]
unblocks: [p2-1, p3-3, p4-3] unblocks: [p2-1, p3-3, p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5 DDL (5.1, 5.2, 5.3, 5.4, 5.5 chunks only — FTS handled in p2-1), §5.7 jobs/ingest_runs, §5.8 transactions, §6.3 data_dir layout] contract_sections: [§5 DDL (5.1, 5.2, 5.3, 5.4, 5.5 chunks only — FTS handled in p2-1), §5.7 jobs/ingest_runs, §5.8 transactions, §6.3 data_dir layout]
--- ---
@@ -22,8 +22,8 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `rusqlite` (with `bundled-sqlcipher` disabled; use `bundled` feature) - `rusqlite` (with `bundled-sqlcipher` disabled; use `bundled` feature)
- `refinery` for migrations - `refinery` for migrations
- `serde_json` - `serde_json`
@@ -34,7 +34,7 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs` (only types via `kb-core`), `kb-parse-md`, `kb-normalize`, `kb-chunk` (only types via `kb-core`), `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-source-fs` (only types via `kebab-core`), `kebab-parse-md`, `kebab-normalize`, `kebab-chunk` (only types via `kebab-core`), `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
@@ -42,17 +42,17 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|-------|------|--------| |-------|------|--------|
| migrations | `migrations/V001__init.sql` | repo | | migrations | `migrations/V001__init.sql` | repo |
| `RawAsset` + bytes | `(RawAsset, Vec<u8>)` | p1-1 + reader | | `RawAsset` + bytes | `(RawAsset, Vec<u8>)` | p1-1 + reader |
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 | | `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
| `Vec<Chunk>` | `kb_core::Chunk` | p1-5 | | `Vec<Chunk>` | `kebab_core::Chunk` | p1-5 |
| `IngestRun` aggregates | `(scope, counts, duration)` | `kb-app` | | `IngestRun` aggregates | `(scope, counts, duration)` | `kebab-app` |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `data_dir/kb.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | | every later phase | | `data_dir/kebab.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | | every later phase |
| `data_dir/assets/<aa>/<asset_id>` bytes (when copied) | | future re-extraction, integrity verification | | `data_dir/assets/<aa>/<asset_id>` bytes (when copied) | | future re-extraction, integrity verification |
| `IngestReport` (wire schema v1) | `kb_core::IngestReport` | `kb-cli`, eval | | `IngestReport` (wire schema v1) | `kebab_core::IngestReport` | `kebab-cli`, eval |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -60,23 +60,23 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
pub struct SqliteStore { /* internal */ } pub struct SqliteStore { /* internal */ }
impl SqliteStore { impl SqliteStore {
pub fn open(config: &kb_config::Config) -> anyhow::Result<Self>; pub fn open(config: &kebab_config::Config) -> anyhow::Result<Self>;
pub fn run_migrations(&self) -> anyhow::Result<()>; pub fn run_migrations(&self) -> anyhow::Result<()>;
pub fn put_asset_with_bytes(&self, asset: &kb_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>; pub fn put_asset_with_bytes(&self, asset: &kebab_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
} }
impl kb_core::DocumentStore for SqliteStore { impl kebab_core::DocumentStore for SqliteStore {
fn put_asset(&self, a: &kb_core::RawAsset) -> anyhow::Result<()>; fn put_asset(&self, a: &kebab_core::RawAsset) -> anyhow::Result<()>;
fn put_document(&self, d: &kb_core::CanonicalDocument) -> anyhow::Result<()>; fn put_document(&self, d: &kebab_core::CanonicalDocument) -> anyhow::Result<()>;
fn put_blocks(&self, doc: &kb_core::DocumentId, blocks: &[kb_core::Block]) -> anyhow::Result<()>; fn put_blocks(&self, doc: &kebab_core::DocumentId, blocks: &[kebab_core::Block]) -> anyhow::Result<()>;
fn put_chunks(&self, doc: &kb_core::DocumentId, chunks: &[kb_core::Chunk]) -> anyhow::Result<()>; fn put_chunks(&self, doc: &kebab_core::DocumentId, chunks: &[kebab_core::Chunk]) -> anyhow::Result<()>;
fn get_document(&self, id: &kb_core::DocumentId) -> anyhow::Result<Option<kb_core::CanonicalDocument>>; fn get_document(&self, id: &kebab_core::DocumentId) -> anyhow::Result<Option<kebab_core::CanonicalDocument>>;
fn get_chunk(&self, id: &kb_core::ChunkId) -> anyhow::Result<Option<kb_core::Chunk>>; fn get_chunk(&self, id: &kebab_core::ChunkId) -> anyhow::Result<Option<kebab_core::Chunk>>;
fn list_documents(&self, filter: &kb_core::DocFilter) -> anyhow::Result<Vec<kb_core::DocSummary>>; fn list_documents(&self, filter: &kebab_core::DocFilter) -> anyhow::Result<Vec<kebab_core::DocSummary>>;
} }
impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ } impl kebab_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
``` ```
## Behavior contract ## Behavior contract
@@ -96,7 +96,7 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
## Storage / wire effects ## Storage / wire effects
- Writes: `kb.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case). - Writes: `kebab.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
- Reads on subsequent calls: same DB. - Reads on subsequent calls: same DB.
## Test plan ## Test plan
@@ -112,14 +112,14 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
| contract | DocumentStore trait round-trip for fixture document | `fixtures/markdown/code-and-table.md` | | contract | DocumentStore trait round-trip for fixture document | `fixtures/markdown/code-and-table.md` |
| snapshot | IngestReport JSON for fixture run | fixture | | snapshot | IngestReport JSON for fixture run | fixture |
All tests under `cargo test -p kb-store-sqlite` with no network. All tests under `cargo test -p kebab-store-sqlite` with no network.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-store-sqlite` passes - [ ] `cargo check -p kebab-store-sqlite` passes
- [ ] `cargo test -p kb-store-sqlite` passes - [ ] `cargo test -p kebab-store-sqlite` passes
- [ ] migration `V001__init.sql` matches design §5 verbatim (diff-checked in CI) - [ ] migration `V001__init.sql` matches design §5 verbatim (diff-checked in CI)
- [ ] Writes to `~/.local/share/kb/` are gated by `kb-config`'s `data_dir` and never escape it - [ ] Writes to `~/.local/share/kebab/` are gated by `kebab-config`'s `data_dir` and never escape it
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §5 - [ ] PR links design §5
@@ -132,5 +132,5 @@ All tests under `cargo test -p kb-store-sqlite` with no network.
## Risks / notes ## Risks / notes
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kb.sqlite-wal` / `-shm`. - WAL mode requires careful test cleanup: tests must drop the connection before removing `kebab.sqlite-wal` / `-shm`.
- Asset directory shard prefix uses `asset_id[..2]`; using `asset_id[..1]` would create at most 16 dirs (insufficient). - Asset directory shard prefix uses `asset_id[..2]`; using `asset_id[..1]` would create at most 16 dirs (insufficient).

View File

@@ -1,12 +1,12 @@
--- ---
phase: P2 phase: P2
component: kb-store-sqlite (FTS5 migration) component: kebab-store-sqlite (FTS5 migration)
task_id: p2-1 task_id: p2-1
title: "FTS5 virtual table + triggers (V002 migration)" title: "FTS5 virtual table + triggers (V002 migration)"
status: completed status: completed
depends_on: [p1-6] depends_on: [p1-6]
unblocks: [p2-2] unblocks: [p2-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5.5 chunks_fts + triggers, §9 versioning] contract_sections: [§5.5 chunks_fts + triggers, §9 versioning]
--- ---
@@ -18,19 +18,19 @@ Add `chunks_fts` virtual table and three sync triggers via migration `V002__fts.
## Why now / why this size ## Why now / why this size
`chunks_fts` is the lexical index for `kb-search`. Splitting it from p1-6 keeps P1 focused on relational data; bringing it as `V002` lets users upgrade an existing P1 DB without re-ingesting. `chunks_fts` is the lexical index for `kebab-search`. Splitting it from p1-6 keeps P1 focused on relational data; bringing it as `V002` lets users upgrade an existing P1 DB without re-ingesting.
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-store-sqlite` (extends migrations) - `kebab-store-sqlite` (extends migrations)
- `rusqlite` - `rusqlite`
- `refinery` - `refinery`
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search` (consumer is p2-2), `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search` (consumer is p2-2), `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
@@ -52,7 +52,7 @@ Add `chunks_fts` virtual table and three sync triggers via migration `V002__fts.
pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>; pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>;
``` ```
(Used by `kb index --rebuild-fts`. Re-runs `INSERT INTO chunks_fts SELECT ... FROM chunks` after `DELETE FROM chunks_fts;`.) (Used by `kebab index --rebuild-fts`. Re-runs `INSERT INTO chunks_fts SELECT ... FROM chunks` after `DELETE FROM chunks_fts;`.)
## Behavior contract ## Behavior contract
@@ -64,7 +64,7 @@ pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>;
## Storage / wire effects ## Storage / wire effects
- Writes: `chunks_fts` virtual table inside `kb.sqlite`. - Writes: `chunks_fts` virtual table inside `kebab.sqlite`.
- Reads: existing `chunks` rows for backfill. - Reads: existing `chunks` rows for backfill.
## Test plan ## Test plan
@@ -78,12 +78,12 @@ pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>;
| function | `rebuild_chunks_fts` produces deterministic content equal to fresh backfill | tmp DB | | function | `rebuild_chunks_fts` produces deterministic content equal to fresh backfill | tmp DB |
| migration | running `V002` twice is a no-op (refinery handles idempotency) | tmp DB | | migration | running `V002` twice is a no-op (refinery handles idempotency) | tmp DB |
All tests under `cargo test -p kb-store-sqlite fts`. All tests under `cargo test -p kebab-store-sqlite fts`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-store-sqlite` passes - [ ] `cargo check -p kebab-store-sqlite` passes
- [ ] `cargo test -p kb-store-sqlite fts` passes - [ ] `cargo test -p kebab-store-sqlite fts` passes
- [ ] `migrations/V002__fts.sql` matches design §5.5 verbatim (CI diff check) - [ ] `migrations/V002__fts.sql` matches design §5.5 verbatim (CI diff check)
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §5.5 - [ ] PR links design §5.5

View File

@@ -1,12 +1,12 @@
--- ---
phase: P2 phase: P2
component: kb-search (lexical mode) component: kebab-search (lexical mode)
task_id: p2-2 task_id: p2-2
title: "Lexical Retriever via SQLite FTS5 + bm25 + citation" title: "Lexical Retriever via SQLite FTS5 + bm25 + citation"
status: completed status: completed
depends_on: [p2-1] depends_on: [p2-1]
unblocks: [p3-4, p4-3] unblocks: [p3-4, p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5/1.6 search output, §2.2 wire schema, §6.4 search settings] contract_sections: [§3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5/1.6 search output, §2.2 wire schema, §6.4 search settings]
--- ---
@@ -14,38 +14,38 @@ contract_sections: [§3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5
## Goal ## Goal
Implement `kb_core::Retriever` for `SearchMode::Lexical` using SQLite FTS5. Returns `SearchHit` with `bm25` ranking, `snippet()`-derived preview, and proper W3C-fragment citation. Implement `kebab_core::Retriever` for `SearchMode::Lexical` using SQLite FTS5. Returns `SearchHit` with `bm25` ranking, `snippet()`-derived preview, and proper W3C-fragment citation.
## Why now / why this size ## Why now / why this size
First concrete `Retriever`. Lets `kb search --mode lexical` work without any embedding/LLM infrastructure. Establishes the SearchHit construction contract that hybrid (p3-4) reuses. First concrete `Retriever`. Lets `kebab search --mode lexical` work without any embedding/LLM infrastructure. Establishes the SearchHit construction contract that hybrid (p3-4) reuses.
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-store-sqlite` (read access to `chunks_fts` + `chunks` + `documents`) - `kebab-store-sqlite` (read access to `chunks_fts` + `chunks` + `documents`)
- `rusqlite` - `rusqlite`
- `tracing` - `tracing`
- `thiserror` - `thiserror`
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `SearchQuery` (mode=Lexical) | `kb_core::SearchQuery` | `kb-app::search` | | `SearchQuery` (mode=Lexical) | `kebab_core::SearchQuery` | `kebab-app::search` |
| `kb-config::search` settings (`default_k`, `snippet_chars`) | `kb_config::Config` | runtime | | `kebab-config::search` settings (`default_k`, `snippet_chars`) | `kebab_config::Config` | runtime |
| SQLite connection (read) | `rusqlite::Connection` | `kb-store-sqlite` | | SQLite connection (read) | `rusqlite::Connection` | `kebab-store-sqlite` |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `Vec<SearchHit>` | `kb_core::SearchHit` | `kb-cli` printer, `kb-rag` packer (P4), hybrid (p3-4) | | `Vec<SearchHit>` | `kebab_core::SearchHit` | `kebab-cli` printer, `kebab-rag` packer (P4), hybrid (p3-4) |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -53,12 +53,12 @@ First concrete `Retriever`. Lets `kb search --mode lexical` work without any emb
pub struct LexicalRetriever { /* internal: holds an Arc<rusqlite::Connection> + IndexVersion */ } pub struct LexicalRetriever { /* internal: holds an Arc<rusqlite::Connection> + IndexVersion */ }
impl LexicalRetriever { impl LexicalRetriever {
pub fn new(store: std::sync::Arc<kb_store_sqlite::SqliteStore>, index_version: kb_core::IndexVersion) -> Self; pub fn new(store: std::sync::Arc<kebab_store_sqlite::SqliteStore>, index_version: kebab_core::IndexVersion) -> Self;
} }
impl kb_core::Retriever for LexicalRetriever { impl kebab_core::Retriever for LexicalRetriever {
fn search(&self, query: &kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>; fn search(&self, query: &kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
fn index_version(&self) -> kb_core::IndexVersion; fn index_version(&self) -> kebab_core::IndexVersion;
} }
``` ```
@@ -98,8 +98,8 @@ impl kb_core::Retriever for LexicalRetriever {
## Storage / wire effects ## Storage / wire effects
- Reads only. Never mutates `kb.sqlite`. - Reads only. Never mutates `kebab.sqlite`.
- Wire: `Vec<SearchHit>` serialized via wire schema `search_hit.v1` when `kb-cli --json` is used. - Wire: `Vec<SearchHit>` serialized via wire schema `search_hit.v1` when `kebab-cli --json` is used.
## Test plan ## Test plan
@@ -114,13 +114,13 @@ impl kb_core::Retriever for LexicalRetriever {
| determinism | identical query twice produces identical hit order and scores | tmp DB | | determinism | identical query twice produces identical hit order and scores | tmp DB |
| snapshot | `Vec<SearchHit>` JSON for fixed corpus stable | `fixtures/search/lexical/run-1.json` | | snapshot | `Vec<SearchHit>` JSON for fixed corpus stable | `fixtures/search/lexical/run-1.json` |
All tests under `cargo test -p kb-search lexical`. All tests under `cargo test -p kebab-search lexical`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-search` passes - [ ] `cargo check -p kebab-search` passes
- [ ] `cargo test -p kb-search lexical` passes - [ ] `cargo test -p kebab-search lexical` passes
- [ ] No imports outside Allowed dependencies (`cargo tree -p kb-search` audit) - [ ] No imports outside Allowed dependencies (`cargo tree -p kebab-search` audit)
- [ ] Output JSON conforms to `docs/wire-schema/v1/search_hit.schema.json` - [ ] Output JSON conforms to `docs/wire-schema/v1/search_hit.schema.json`
- [ ] PR links design §3.7, §0 Q3, §2.2 - [ ] PR links design §3.7, §0 Q3, §2.2

View File

@@ -1,12 +1,12 @@
--- ---
phase: P3 phase: P3
component: kb-embed (trait crate) component: kebab-embed (trait crate)
task_id: p3-1 task_id: p3-1
title: "Embedder trait + EmbeddingInput/Kind validation" title: "Embedder trait + EmbeddingInput/Kind validation"
status: completed status: completed
depends_on: [p0-1] depends_on: [p0-1]
unblocks: [p3-2, p3-3, p3-4] unblocks: [p3-2, p3-3, p3-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [design §3.7 SearchHit.embedding_model, design §7.1 EmbeddingInput/Kind, design §7.2 Embedder, report §11 LLM/embedding split] contract_sections: [design §3.7 SearchHit.embedding_model, design §7.1 EmbeddingInput/Kind, design §7.2 Embedder, report §11 LLM/embedding split]
--- ---
@@ -14,16 +14,16 @@ contract_sections: [design §3.7 SearchHit.embedding_model, design §7.1 Embeddi
## Goal ## Goal
Provide the `kb-embed` crate that re-exports `Embedder` trait, `EmbeddingInput`/`EmbeddingKind`, and offers a mock implementation for downstream tests. This task is **trait-only**; concrete adapters live in p3-2. Provide the `kebab-embed` crate that re-exports `Embedder` trait, `EmbeddingInput`/`EmbeddingKind`, and offers a mock implementation for downstream tests. This task is **trait-only**; concrete adapters live in p3-2.
## Why now / why this size ## Why now / why this size
Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface. Owning the trait + a mock implementation in a tiny crate keeps `kb-store-vector` and `kb-search` testable without touching real models. Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface. Owning the trait + a mock implementation in a tiny crate keeps `kebab-store-vector` and `kebab-search` testable without touching real models.
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `serde` - `serde`
- `thiserror` - `thiserror`
- `tracing` - `tracing`
@@ -31,35 +31,35 @@ Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface.
## Forbidden dependencies ## Forbidden dependencies
- `fastembed`, `ort`, `tokenizers`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `fastembed`, `ort`, `tokenizers`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `EmbeddingInput` | `kb_core::EmbeddingInput<'_>` | callers (parser-side or query-side) | | `EmbeddingInput` | `kebab_core::EmbeddingInput<'_>` | callers (parser-side or query-side) |
| model identity | `(EmbeddingModelId, EmbeddingVersion, dimensions)` | adapter at construction | | model identity | `(EmbeddingModelId, EmbeddingVersion, dimensions)` | adapter at construction |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `Vec<Vec<f32>>` | row-aligned with input | `kb-store-vector`, `kb-search` (vector mode) | | `Vec<Vec<f32>>` | row-aligned with input | `kebab-store-vector`, `kebab-search` (vector mode) |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
```rust ```rust
pub use kb_core::{EmbeddingInput, EmbeddingKind, EmbeddingModelId, EmbeddingVersion, Embedder}; pub use kebab_core::{EmbeddingInput, EmbeddingKind, EmbeddingModelId, EmbeddingVersion, Embedder};
/// Test-only mock that produces deterministic vectors. Compiled only when `mock` feature is on. /// Test-only mock that produces deterministic vectors. Compiled only when `mock` feature is on.
#[cfg(feature = "mock")] #[cfg(feature = "mock")]
pub struct MockEmbedder { /* internal: model_id, dims, seed */ } pub struct MockEmbedder { /* internal: model_id, dims, seed */ }
#[cfg(feature = "mock")] #[cfg(feature = "mock")]
impl MockEmbedder { impl MockEmbedder {
pub fn new(model_id: kb_core::EmbeddingModelId, version: kb_core::EmbeddingVersion, dimensions: usize) -> Self; pub fn new(model_id: kebab_core::EmbeddingModelId, version: kebab_core::EmbeddingVersion, dimensions: usize) -> Self;
} }
#[cfg(feature = "mock")] #[cfg(feature = "mock")]
impl kb_core::Embedder for MockEmbedder { /* per §7.2 */ } impl kebab_core::Embedder for MockEmbedder { /* per §7.2 */ }
``` ```
## Behavior contract ## Behavior contract
@@ -84,21 +84,21 @@ impl kb_core::Embedder for MockEmbedder { /* per §7.2 */ }
| unit | dimensions match construction-time value | inline | | unit | dimensions match construction-time value | inline |
| contract | property test: 100 random inputs, each vector has length == dimensions, all finite floats | inline (proptest) | | contract | property test: 100 random inputs, each vector has length == dimensions, all finite floats | inline (proptest) |
All tests under `cargo test -p kb-embed`. All tests under `cargo test -p kebab-embed`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-embed` passes - [ ] `cargo check -p kebab-embed` passes
- [ ] `cargo test -p kb-embed` passes - [ ] `cargo test -p kebab-embed` passes
- [ ] No external embedding dep present - [ ] No external embedding dep present
- [ ] PR links design §7.2 Embedder, §11 - [ ] PR links design §7.2 Embedder, §11
## Out of scope ## Out of scope
- Real adapter (`kb-embed-local` is p3-2). - Real adapter (`kebab-embed-local` is p3-2).
- Reranker traits (P+). - Reranker traits (P+).
## Risks / notes ## Risks / notes
- `MockEmbedder` is gated by `mock` feature (default OFF). Downstream tests opt in via `[dev-dependencies] kb-embed = { path = "...", features = ["mock"] }`. CI build of release binary (`cargo build --release` without `--features mock`) MUST NOT include `MockEmbedder` symbol — verifiable via `cargo bloat` or `nm` symbol scan. - `MockEmbedder` is gated by `mock` feature (default OFF). Downstream tests opt in via `[dev-dependencies] kebab-embed = { path = "...", features = ["mock"] }`. CI build of release binary (`cargo build --release` without `--features mock`) MUST NOT include `MockEmbedder` symbol — verifiable via `cargo bloat` or `nm` symbol scan.
- Trait re-exports keep the call site stable even if `kb-core` reorganizes; downstream crates should `use kb_embed::Embedder` rather than `use kb_core::Embedder`. - Trait re-exports keep the call site stable even if `kebab-core` reorganizes; downstream crates should `use kebab_embed::Embedder` rather than `use kebab_core::Embedder`.

View File

@@ -1,12 +1,12 @@
--- ---
phase: P3 phase: P3
component: kb-embed-local (fastembed adapter) component: kebab-embed-local (fastembed adapter)
task_id: p3-2 task_id: p3-2
title: "fastembed-rs Embedder for multilingual-e5-small" title: "fastembed-rs Embedder for multilingual-e5-small"
status: completed status: completed
depends_on: [p3-1] depends_on: [p3-1]
unblocks: [p3-3, p3-4] unblocks: [p3-3, p3-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [design §7.2 Embedder, report §11.3 local embedding, design §6.4 [models.embedding], design §9 versioning] contract_sections: [design §7.2 Embedder, report §11.3 local embedding, design §6.4 [models.embedding], design §9 versioning]
--- ---
@@ -22,9 +22,9 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-embed` - `kebab-embed`
- `fastembed = "4"` (or current stable) - `fastembed = "4"` (or current stable)
- `tokenizers` - `tokenizers`
- `ort` (transitive via fastembed) - `ort` (transitive via fastembed)
@@ -33,21 +33,21 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, network HTTP libs (model download is fastembed's responsibility) - `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, network HTTP libs (model download is fastembed's responsibility)
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `kb-config::Config.models.embedding` | settings | runtime | | `kebab-config::Config.models.embedding` | settings | runtime |
| `EmbeddingInput[..]` | `kb_core::EmbeddingInput<'_>[]` | callers | | `EmbeddingInput[..]` | `kebab_core::EmbeddingInput<'_>[]` | callers |
| model cache | `data_dir/models/fastembed/` | filesystem | | model cache | `data_dir/models/fastembed/` | filesystem |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `Vec<Vec<f32>>` | row-aligned, `dimensions = 384` | `kb-store-vector`, query vectors for hybrid search | | `Vec<Vec<f32>>` | row-aligned, `dimensions = 384` | `kebab-store-vector`, query vectors for hybrid search |
| model identity | `(EmbeddingModelId, EmbeddingVersion, usize)` | record fields, `embedding_id` recipe | | model identity | `(EmbeddingModelId, EmbeddingVersion, usize)` | record fields, `embedding_id` recipe |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -56,14 +56,14 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d
pub struct FastembedEmbedder { /* internal: TextEmbedding instance + model meta */ } pub struct FastembedEmbedder { /* internal: TextEmbedding instance + model meta */ }
impl FastembedEmbedder { impl FastembedEmbedder {
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>; pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
} }
impl kb_core::Embedder for FastembedEmbedder { impl kebab_core::Embedder for FastembedEmbedder {
fn model_id(&self) -> kb_core::EmbeddingModelId; fn model_id(&self) -> kebab_core::EmbeddingModelId;
fn model_version(&self) -> kb_core::EmbeddingVersion; fn model_version(&self) -> kebab_core::EmbeddingVersion;
fn dimensions(&self) -> usize; fn dimensions(&self) -> usize;
fn embed(&self, inputs: &[kb_core::EmbeddingInput<'_>]) -> anyhow::Result<Vec<Vec<f32>>>; fn embed(&self, inputs: &[kebab_core::EmbeddingInput<'_>]) -> anyhow::Result<Vec<Vec<f32>>>;
} }
``` ```
@@ -96,12 +96,12 @@ impl kb_core::Embedder for FastembedEmbedder {
| performance | batch of 64 short inputs completes in < 5s on CI host | tmp config (skip on slow CI via `#[ignore]`) | | performance | batch of 64 short inputs completes in < 5s on CI host | tmp config (skip on slow CI via `#[ignore]`) |
| snapshot | aggregate hash of vectors for 5 known sentences stable across runs | `fixtures/embed/known-sentences.json` | | snapshot | aggregate hash of vectors for 5 known sentences stable across runs | `fixtures/embed/known-sentences.json` |
All tests under `cargo test -p kb-embed-local`. Mark slow tests `#[ignore]` and run via `cargo test -- --ignored` in dedicated CI lane. All tests under `cargo test -p kebab-embed-local`. Mark slow tests `#[ignore]` and run via `cargo test -- --ignored` in dedicated CI lane.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-embed-local` passes - [ ] `cargo check -p kebab-embed-local` passes
- [ ] `cargo test -p kb-embed-local` passes (excluding `#[ignore]`) - [ ] `cargo test -p kebab-embed-local` passes (excluding `#[ignore]`)
- [ ] First-run model download works under `data_dir/models/fastembed/` - [ ] First-run model download works under `data_dir/models/fastembed/`
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §11.3, §6.4, §9 - [ ] PR links design §11.3, §6.4, §9

View File

@@ -1,12 +1,12 @@
--- ---
phase: P3 phase: P3
component: kb-store-vector (LanceDB) component: kebab-store-vector (LanceDB)
task_id: p3-3 task_id: p3-3
title: "LanceDB VectorStore + embedding_records writer" title: "LanceDB VectorStore + embedding_records writer"
status: completed status: completed
depends_on: [p3-2, p1-6] depends_on: [p3-2, p1-6]
unblocks: [p3-4] unblocks: [p3-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5.6 embedding_records, §6.3 lancedb table naming, §7.2 VectorStore, §9 versioning] contract_sections: [§5.6 embedding_records, §6.3 lancedb table naming, §7.2 VectorStore, §9 versioning]
--- ---
@@ -18,13 +18,13 @@ Implement `VectorStore` over LanceDB (embedded). Stores per-model tables (`chunk
## Why now / why this size ## Why now / why this size
Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-4) can compose lexical + vector retrievers without leaking storage details. Closes the loop chunk → vector. Splits cleanly from `kebab-search` so hybrid (p3-4) can compose lexical + vector retrievers without leaking storage details.
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-store-sqlite` (only for writing/reading rows in `embedding_records`) - `kebab-store-sqlite` (only for writing/reading rows in `embedding_records`)
- `lancedb` - `lancedb`
- `arrow` (and `arrow-array`, `arrow-schema`) - `arrow` (and `arrow-array`, `arrow-schema`)
- `serde`, `serde_json` - `serde`, `serde_json`
@@ -33,16 +33,16 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-embed*` (consumes `Vec<f32>` via input only — no embedding logic here), `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-embed*` (consumes `Vec<f32>` via input only — no embedding logic here), `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `VectorRecord[..]` | `kb_core::VectorRecord` | `kb-app::embed_index` (P3 facade) | | `VectorRecord[..]` | `kebab_core::VectorRecord` | `kebab-app::embed_index` (P3 facade) |
| query vector | `&[f32]` | `kb-embed-local` (`Embedder::embed` for query) | | query vector | `&[f32]` | `kebab-embed-local` (`Embedder::embed` for query) |
| filters | `kb_core::SearchFilters` | `SearchQuery` | | filters | `kebab_core::SearchFilters` | `SearchQuery` |
| `kb-config::Config.storage.vector_dir` | path | runtime | | `kebab-config::Config.storage.vector_dir` | path | runtime |
## Outputs ## Outputs
@@ -50,7 +50,7 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-
|--------|------|------------| |--------|------|------------|
| Lance tables under `vector_dir/chunk_embeddings_<model>_<dim>.lance/` | filesystem | future searches | | Lance tables under `vector_dir/chunk_embeddings_<model>_<dim>.lance/` | filesystem | future searches |
| `embedding_records` rows | SQLite | reverse lookup, reindex bookkeeping | | `embedding_records` rows | SQLite | reverse lookup, reindex bookkeeping |
| `Vec<VectorHit>` | `kb_core::VectorHit` | hybrid retriever (p3-4) | | `Vec<VectorHit>` | `kebab_core::VectorHit` | hybrid retriever (p3-4) |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -58,13 +58,13 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-
pub struct LanceVectorStore { /* internal: connection + sqlite handle */ } pub struct LanceVectorStore { /* internal: connection + sqlite handle */ }
impl LanceVectorStore { impl LanceVectorStore {
pub fn new(config: &kb_config::Config, sqlite: std::sync::Arc<kb_store_sqlite::SqliteStore>) -> anyhow::Result<Self>; pub fn new(config: &kebab_config::Config, sqlite: std::sync::Arc<kebab_store_sqlite::SqliteStore>) -> anyhow::Result<Self>;
} }
impl kb_core::VectorStore for LanceVectorStore { impl kebab_core::VectorStore for LanceVectorStore {
fn ensure_table(&self, model: &kb_core::EmbeddingModelId, dim: usize) -> anyhow::Result<kb_core::IndexId>; fn ensure_table(&self, model: &kebab_core::EmbeddingModelId, dim: usize) -> anyhow::Result<kebab_core::IndexId>;
fn upsert(&self, recs: &[kb_core::VectorRecord]) -> anyhow::Result<()>; fn upsert(&self, recs: &[kebab_core::VectorRecord]) -> anyhow::Result<()>;
fn search(&self, query_vec: &[f32], k: usize, filters: &kb_core::SearchFilters) -> anyhow::Result<Vec<kb_core::VectorHit>>; fn search(&self, query_vec: &[f32], k: usize, filters: &kebab_core::SearchFilters) -> anyhow::Result<Vec<kebab_core::VectorHit>>;
} }
``` ```
@@ -116,12 +116,12 @@ impl kb_core::VectorStore for LanceVectorStore {
| determinism | same query vector + same data → same top-k order | tmp data_dir | | determinism | same query vector + same data → same top-k order | tmp data_dir |
| snapshot | `Vec<VectorHit>` JSON for fixed corpus stable | `fixtures/vector/run-1.json` | | snapshot | `Vec<VectorHit>` JSON for fixed corpus stable | `fixtures/vector/run-1.json` |
All tests under `cargo test -p kb-store-vector`. All tests under `cargo test -p kebab-store-vector`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-store-vector` passes - [ ] `cargo check -p kebab-store-vector` passes
- [ ] `cargo test -p kb-store-vector` passes - [ ] `cargo test -p kebab-store-vector` passes
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] `embedding_records` rows align 1:1 with Lance rows after a successful upsert batch - [ ] `embedding_records` rows align 1:1 with Lance rows after a successful upsert batch
- [ ] PR links design §5.6, §6.3, §7.2 - [ ] PR links design §5.6, §6.3, §7.2
@@ -130,7 +130,7 @@ All tests under `cargo test -p kb-store-vector`.
- IVF / PQ index tuning (P+). - IVF / PQ index tuning (P+).
- Image / multimodal vector tables (P6). - Image / multimodal vector tables (P6).
- `kb-app` orchestration of indexing jobs (`embed_index` facade method body). - `kebab-app` orchestration of indexing jobs (`embed_index` facade method body).
## Risks / notes ## Risks / notes

View File

@@ -1,12 +1,12 @@
--- ---
phase: P3 phase: P3
component: kb-search (hybrid) component: kebab-search (hybrid)
task_id: p3-4 task_id: p3-4
title: "Hybrid Retriever (RRF) over lexical + vector" title: "Hybrid Retriever (RRF) over lexical + vector"
status: completed status: completed
depends_on: [p2-2, p3-3] depends_on: [p2-2, p3-3]
unblocks: [p4-3] unblocks: [p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 [search] rrf settings] contract_sections: [§3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 [search] rrf settings]
--- ---
@@ -22,17 +22,17 @@ Single mediator. Keeps the lexical and vector retrievers focused; only this task
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-store-sqlite` (for `LexicalRetriever`) - `kebab-store-sqlite` (for `LexicalRetriever`)
- `kb-store-vector` (for `LanceVectorStore`) - `kebab-store-vector` (for `LanceVectorStore`)
- `kb-embed` (trait only — for query embedding via `Embedder`) - `kebab-embed` (trait only — for query embedding via `Embedder`)
- `tracing` - `tracing`
- `thiserror` - `thiserror`
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`. (`kb-embed-local` is a runtime-injected `dyn Embedder`; this crate must not depend on the concrete adapter directly.) - `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`. (`kebab-embed-local` is a runtime-injected `dyn Embedder`; this crate must not depend on the concrete adapter directly.)
## Inputs ## Inputs
@@ -41,21 +41,21 @@ Single mediator. Keeps the lexical and vector retrievers focused; only this task
| `LexicalRetriever` | trait object | constructed elsewhere | | `LexicalRetriever` | trait object | constructed elsewhere |
| `LanceVectorStore` | trait object | constructed elsewhere | | `LanceVectorStore` | trait object | constructed elsewhere |
| `Box<dyn Embedder>` | for query embedding | runtime-injected | | `Box<dyn Embedder>` | for query embedding | runtime-injected |
| `kb-config::Config.search` | `default_k`, `hybrid_fusion`, `rrf_k` | runtime | | `kebab-config::Config.search` | `default_k`, `hybrid_fusion`, `rrf_k` | runtime |
| `SearchQuery` | `kb_core::SearchQuery` | `kb-app::search` | | `SearchQuery` | `kebab_core::SearchQuery` | `kebab-app::search` |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `Vec<SearchHit>` (with full `RetrievalDetail`) | `kb_core::SearchHit` | `kb-cli` printer, `kb-rag` packer | | `Vec<SearchHit>` (with full `RetrievalDetail`) | `kebab_core::SearchHit` | `kebab-cli` printer, `kebab-rag` packer |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
```rust ```rust
pub struct HybridRetriever { pub struct HybridRetriever {
lexical: std::sync::Arc<dyn kb_core::Retriever>, lexical: std::sync::Arc<dyn kebab_core::Retriever>,
vector: std::sync::Arc<dyn kb_core::Retriever>, // wrapper over LanceVectorStore + Embedder vector: std::sync::Arc<dyn kebab_core::Retriever>, // wrapper over LanceVectorStore + Embedder
fusion: FusionPolicy, fusion: FusionPolicy,
k: usize, k: usize,
} }
@@ -64,27 +64,27 @@ pub enum FusionPolicy { Rrf { k_rrf: u32 } }
impl HybridRetriever { impl HybridRetriever {
pub fn new( pub fn new(
config: &kb_config::Config, config: &kebab_config::Config,
lexical: std::sync::Arc<dyn kb_core::Retriever>, lexical: std::sync::Arc<dyn kebab_core::Retriever>,
vector: std::sync::Arc<dyn kb_core::Retriever>, vector: std::sync::Arc<dyn kebab_core::Retriever>,
) -> Self; ) -> Self;
} }
impl kb_core::Retriever for HybridRetriever { impl kebab_core::Retriever for HybridRetriever {
fn search(&self, query: &kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>; fn search(&self, query: &kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
fn index_version(&self) -> kb_core::IndexVersion; fn index_version(&self) -> kebab_core::IndexVersion;
} }
/// Wrapper that turns a VectorStore + Embedder into a Retriever. /// Wrapper that turns a VectorStore + Embedder into a Retriever.
pub struct VectorRetriever { pub struct VectorRetriever {
store: std::sync::Arc<dyn kb_core::VectorStore>, store: std::sync::Arc<dyn kebab_core::VectorStore>,
embed: std::sync::Arc<dyn kb_core::Embedder>, embed: std::sync::Arc<dyn kebab_core::Embedder>,
/* heading_path/snippet enrichment hits SQLite via kb-store-sqlite read accessor */ /* heading_path/snippet enrichment hits SQLite via kebab-store-sqlite read accessor */
} }
impl VectorRetriever { impl VectorRetriever {
pub fn new(store: std::sync::Arc<dyn kb_core::VectorStore>, embed: std::sync::Arc<dyn kb_core::Embedder>, sqlite: std::sync::Arc<kb_store_sqlite::SqliteStore>) -> Self; pub fn new(store: std::sync::Arc<dyn kebab_core::VectorStore>, embed: std::sync::Arc<dyn kebab_core::Embedder>, sqlite: std::sync::Arc<kebab_store_sqlite::SqliteStore>) -> Self;
} }
impl kb_core::Retriever for VectorRetriever { /* per §7.2 */ } impl kebab_core::Retriever for VectorRetriever { /* per §7.2 */ }
``` ```
## Behavior contract ## Behavior contract
@@ -124,12 +124,12 @@ impl kb_core::Retriever for VectorRetriever { /* per §7.2 */ }
| determinism | identical query twice → byte-identical `Vec<SearchHit>` | tmp DB | | determinism | identical query twice → byte-identical `Vec<SearchHit>` | tmp DB |
| snapshot | hybrid output JSON stable | `fixtures/search/hybrid/run-1.json` | | snapshot | hybrid output JSON stable | `fixtures/search/hybrid/run-1.json` |
All tests under `cargo test -p kb-search hybrid`. All tests under `cargo test -p kebab-search hybrid`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-search` passes - [ ] `cargo check -p kebab-search` passes
- [ ] `cargo test -p kb-search hybrid` passes - [ ] `cargo test -p kebab-search hybrid` passes
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.7, §6.4 search, §0 Q3 - [ ] PR links design §3.7, §6.4 search, §0 Q3

View File

@@ -1,64 +1,64 @@
--- ---
phase: P3 phase: P3
component: kb-app (facade wiring) component: kebab-app (facade wiring)
task_id: p3-5 task_id: p3-5
title: "Wire kb-app facade — ingest / search / list / inspect end-to-end" title: "Wire kebab-app facade — ingest / search / list / inspect end-to-end"
status: completed status: completed
depends_on: [p1-6, p2-2, p3-2, p3-3, p3-4] depends_on: [p1-6, p2-2, p3-2, p3-3, p3-4]
unblocks: [p4-3, p9-1, p9-2, p9-4] unblocks: [p4-3, p9-1, p9-2, p9-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§7 components, §7.2 traits, §1 UX scenes (ingest, search, list, inspect), §6.4 config, §10 errors] contract_sections: [§7 components, §7.2 traits, §1 UX scenes (ingest, search, list, inspect), §6.4 config, §10 errors]
--- ---
# p3-5 — Wire kb-app facade # p3-5 — Wire kebab-app facade
## Goal ## Goal
Replace the `bail!("not yet wired")` stubs in `kb-app` (currently `ingest`, `search`, `list_docs`, `inspect_doc`, `inspect_chunk`) with real bodies that compose the libraries shipped in P1P3. After this task, `kb index` actually walks a workspace and persists chunks, and `kb search --mode {lexical,vector,hybrid}` returns real `SearchHit`s. `kb-app::ask` stays stubbed (P4-3 owns it). Replace the `bail!("not yet wired")` stubs in `kebab-app` (currently `ingest`, `search`, `list_docs`, `inspect_doc`, `inspect_chunk`) with real bodies that compose the libraries shipped in P1P3. After this task, `kebab index` actually walks a workspace and persists chunks, and `kebab search --mode {lexical,vector,hybrid}` returns real `SearchHit`s. `kebab-app::ask` stays stubbed (P4-3 owns it).
## Why now / why this size ## Why now / why this size
P1P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where `kb-app::ask` will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; `ask` wiring lives in P4-3. P1P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where `kebab-app::ask` will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; `ask` wiring lives in P4-3.
## Allowed dependencies ## Allowed dependencies
`kb-app` may depend on: `kebab-app` may depend on:
- `kb-core`, `kb-config` (already) - `kebab-core`, `kebab-config` (already)
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-sqlite` (P1) - `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-sqlite` (P1)
- `kb-search` (P2-2, P3-4) - `kebab-search` (P2-2, P3-4)
- `kb-store-vector`, `kb-embed`, `kb-embed-local` (P3-2, P3-3, P3-4) - `kebab-store-vector`, `kebab-embed`, `kebab-embed-local` (P3-2, P3-3, P3-4)
- `tracing`, `anyhow`, `serde`, `serde_json`, `time`, `dirs`, `toml` (existing) - `tracing`, `anyhow`, `serde`, `serde_json`, `time`, `dirs`, `toml` (existing)
## Forbidden dependencies ## Forbidden dependencies
- `kb-llm*`, `kb-rag` (P4 ownership — `ask` stays stubbed) - `kebab-llm*`, `kebab-rag` (P4 ownership — `ask` stays stubbed)
- `kb-tui`, `kb-desktop` (P9 — those *consume* `kb-app`, not the other way) - `kebab-tui`, `kebab-desktop` (P9 — those *consume* `kebab-app`, not the other way)
- `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio` (P6/P7/P8) - `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio` (P6/P7/P8)
- network HTTP libs (model download is `kb-embed-local`'s responsibility via fastembed) - network HTTP libs (model download is `kebab-embed-local`'s responsibility via fastembed)
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `kb-config::Config` | `kb_config::Config` | loaded once at process start; threaded through every facade call | | `kebab-config::Config` | `kebab_config::Config` | loaded once at process start; threaded through every facade call |
| `SourceScope` | `kb_core::SourceScope` | CLI | | `SourceScope` | `kebab_core::SourceScope` | CLI |
| `SearchQuery` | `kb_core::SearchQuery` | CLI | | `SearchQuery` | `kebab_core::SearchQuery` | CLI |
| `DocFilter` | `kb_core::DocFilter` | CLI | | `DocFilter` | `kebab_core::DocFilter` | CLI |
| `DocumentId` / `ChunkId` | `kb_core::ids::*` | CLI | | `DocumentId` / `ChunkId` | `kebab_core::ids::*` | CLI |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `IngestReport` | `kb_core::IngestReport` | `kb-cli` (`ingest_report.v1`), `kb-eval` (P5) | | `IngestReport` | `kebab_core::IngestReport` | `kebab-cli` (`ingest_report.v1`), `kebab-eval` (P5) |
| `Vec<SearchHit>` | `kb_core::SearchHit` | `kb-cli` (`search_hit.v1`), `kb-rag` (P4-3) | | `Vec<SearchHit>` | `kebab_core::SearchHit` | `kebab-cli` (`search_hit.v1`), `kebab-rag` (P4-3) |
| `Vec<DocSummary>` | `kb_core::DocSummary` | `kb-cli` (`doc_summary.v1`) | | `Vec<DocSummary>` | `kebab_core::DocSummary` | `kebab-cli` (`doc_summary.v1`) |
| `CanonicalDocument` / `Chunk` | `kb_core::*` | `kb-cli` (`chunk_inspection.v1`) | | `CanonicalDocument` / `Chunk` | `kebab_core::*` | `kebab-cli` (`chunk_inspection.v1`) |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
The signatures already live on `kb-app`. This task only swaps the bodies. Frozen surface: The signatures already live on `kebab-app`. This task only swaps the bodies. Frozen surface:
```rust ```rust
pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result<IngestReport>; pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result<IngestReport>;
@@ -69,7 +69,7 @@ pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHi
// `ask` and `init_workspace` / `doctor` stay as-is. // `ask` and `init_workspace` / `doctor` stay as-is.
``` ```
If a constructor helper is needed (e.g., a single `App::open(&Config)` that opens `SqliteStore`, `LanceVectorStore`, embedder, retrievers once), it MAY be added pub-or-pub(crate) to `kb-app`. Keep all newly-added types crate-private unless explicitly required by `kb-cli`. If a constructor helper is needed (e.g., a single `App::open(&Config)` that opens `SqliteStore`, `LanceVectorStore`, embedder, retrievers once), it MAY be added pub-or-pub(crate) to `kebab-app`. Keep all newly-added types crate-private unless explicitly required by `kebab-cli`.
## Behavior contract ## Behavior contract
@@ -77,8 +77,8 @@ If a constructor helper is needed (e.g., a single `App::open(&Config)` that open
Pipeline per design §1.2: Pipeline per design §1.2:
1. Resolve `scope` → set of files under `config.workspace.root` filtered by `config.workspace.include`/`exclude`. Use `kb-source-fs::FsSourceConnector` and pass each `RawAsset` + bytes through. 1. Resolve `scope` → set of files under `config.workspace.root` filtered by `config.workspace.include`/`exclude`. Use `kebab-source-fs::FsSourceConnector` and pass each `RawAsset` + bytes through.
2. For each markdown asset: `kb_parse_md::frontmatter::parse_frontmatter` + `kb_parse_md::blocks::parse_blocks``kb_normalize::build_canonical_document``kb_chunk::MdHeadingV1Chunker::chunk`. 2. For each markdown asset: `kebab_parse_md::frontmatter::parse_frontmatter` + `kebab_parse_md::blocks::parse_blocks``kebab_normalize::build_canonical_document``kebab_chunk::MdHeadingV1Chunker::chunk`.
3. SQLite writes per design §5.8 (one ingest = one transaction): `put_asset_with_bytes``put_document``put_blocks``put_chunks`. 3. SQLite writes per design §5.8 (one ingest = one transaction): `put_asset_with_bytes``put_document``put_blocks``put_chunks`.
4. If `config.models.embedding.provider == "fastembed"`: build `FastembedEmbedder` once at the top of the run, embed every chunk's `text` as `EmbeddingKind::Document`, batch by `config.models.embedding.batch_size`, then `LanceVectorStore::ensure_table` + `upsert`. Skip vector indexing if provider is `"none"` or `embedding.dimensions == 0` (config opt-out path). 4. If `config.models.embedding.provider == "fastembed"`: build `FastembedEmbedder` once at the top of the run, embed every chunk's `text` as `EmbeddingKind::Document`, batch by `config.models.embedding.batch_size`, then `LanceVectorStore::ensure_table` + `upsert`. Skip vector indexing if provider is `"none"` or `embedding.dimensions == 0` (config opt-out path).
5. Record an `ingest_runs` row via `JobRepo` with the aggregate counts. `summary_only=true` writes `items_json=NULL`. 5. Record an `ingest_runs` row via `JobRepo` with the aggregate counts. `summary_only=true` writes `items_json=NULL`.
@@ -90,9 +90,9 @@ Errors: any per-file parse failure should be recorded as a `Warning` in `Provena
Per `query.mode`: Per `query.mode`:
- `Lexical``kb_search::LexicalRetriever::search`. Takes a read-only borrow of the SQLite connection. - `Lexical``kebab_search::LexicalRetriever::search`. Takes a read-only borrow of the SQLite connection.
- `Vector``kb_search::VectorRetriever::search`. Requires `Embedder` (built from config) + `VectorStore` (LanceDB). - `Vector``kebab_search::VectorRetriever::search`. Requires `Embedder` (built from config) + `VectorStore` (LanceDB).
- `Hybrid``kb_search::HybridRetriever::search` composing the above two. - `Hybrid``kebab_search::HybridRetriever::search` composing the above two.
Each call constructs (or reuses) the retrievers from a single `App` context that holds `Arc<SqliteStore>`, `Arc<LanceVectorStore>`, `Arc<dyn Embedder>` so cold-start cost is paid once per process. Document the lifetime model: a CLI invocation paths through `App::open(&Config)` once, runs the requested op, then drops everything on exit. The TUI (P9) holds the `App` for the session. Each call constructs (or reuses) the retrievers from a single `App` context that holds `Arc<SqliteStore>`, `Arc<LanceVectorStore>`, `Arc<dyn Embedder>` so cold-start cost is paid once per process. Document the lifetime model: a CLI invocation paths through `App::open(&Config)` once, runs the requested op, then drops everything on exit. The TUI (P9) holds the `App` for the session.
@@ -100,17 +100,17 @@ Each call constructs (or reuses) the retrievers from a single `App` context that
### `list_docs(filter)` / `inspect_doc(id)` / `inspect_chunk(id)` ### `list_docs(filter)` / `inspect_doc(id)` / `inspect_chunk(id)`
Direct delegation to `kb_core::DocumentStore` trait methods on `SqliteStore`. No vector / embedding involvement. Direct delegation to `kebab_core::DocumentStore` trait methods on `SqliteStore`. No vector / embedding involvement.
### Lifecycle ### Lifecycle
Define an internal `pub(crate) struct App { config: Arc<Config>, sqlite: Arc<SqliteStore>, vector: Option<Arc<LanceVectorStore>>, embedder: Option<Arc<dyn Embedder + Send + Sync>>, /* retrievers built per call */ }` with a `pub(crate) fn open(config: &Config) -> Result<Self>`. Each public `kb-app::*` function builds an `App` (or accepts one in tests) and uses it. The free functions stay the public API so `kb-cli` and `kb-tui` don't need to refactor. Define an internal `pub(crate) struct App { config: Arc<Config>, sqlite: Arc<SqliteStore>, vector: Option<Arc<LanceVectorStore>>, embedder: Option<Arc<dyn Embedder + Send + Sync>>, /* retrievers built per call */ }` with a `pub(crate) fn open(config: &Config) -> Result<Self>`. Each public `kebab-app::*` function builds an `App` (or accepts one in tests) and uses it. The free functions stay the public API so `kebab-cli` and `kebab-tui` don't need to refactor.
`vector` and `embedder` are `Option` because the spec allows running KB without embeddings (lexical-only mode for resource-constrained boxes — `provider == "none"`). When absent, `search` with `Vector` / `Hybrid` modes returns `anyhow::Error` with a hint to enable embeddings; `ingest` skips the vector step. `vector` and `embedder` are `Option` because the spec allows running KB without embeddings (lexical-only mode for resource-constrained boxes — `provider == "none"`). When absent, `search` with `Vector` / `Hybrid` modes returns `anyhow::Error` with a hint to enable embeddings; `ingest` skips the vector step.
### Versioning ### Versioning
- `parser_version` from `kb-parse-md` constant. - `parser_version` from `kebab-parse-md` constant.
- `chunker_version` from `MdHeadingV1Chunker::chunker_version()`. - `chunker_version` from `MdHeadingV1Chunker::chunker_version()`.
- `embedding_model` / `embedding_version` from the embedder. - `embedding_model` / `embedding_version` from the embedder.
- `index_version` from the retriever (or `Lexical`/`Vector` IndexId). - `index_version` from the retriever (or `Lexical`/`Vector` IndexId).
@@ -119,15 +119,15 @@ All wired through to the persisted records and the wire output.
## Storage / wire effects ## Storage / wire effects
- Writes: `data_dir/kb.sqlite` (assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs), `data_dir/assets/<aa>/<id>` (when copied), `data_dir/lancedb/chunk_embeddings_<model>_<dim>.lance/` (when embeddings on), `data_dir/models/fastembed/` (model cache, first run only). - Writes: `data_dir/kebab.sqlite` (assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs), `data_dir/assets/<aa>/<id>` (when copied), `data_dir/lancedb/chunk_embeddings_<model>_<dim>.lance/` (when embeddings on), `data_dir/models/fastembed/` (model cache, first run only).
- Reads on subsequent calls: same DB. - Reads on subsequent calls: same DB.
- Wire JSON conforms to `ingest_report.v1`, `search_hit.v1`, `doc_summary.v1`, `chunk_inspection.v1``kb-cli` already has the wrappers. - Wire JSON conforms to `ingest_report.v1`, `search_hit.v1`, `doc_summary.v1`, `chunk_inspection.v1``kebab-cli` already has the wrappers.
## Test plan ## Test plan
| kind | description | fixture / data | | kind | description | fixture / data |
|------|-------------|----------------| |------|-------------|----------------|
| integration | `ingest` over a 3-file fixture workspace produces non-empty `IngestReport` and writes the expected SQLite rows + Lance rows | `crates/kb-app/tests/fixtures/workspace/`, tmp `data_dir` | | integration | `ingest` over a 3-file fixture workspace produces non-empty `IngestReport` and writes the expected SQLite rows + Lance rows | `crates/kebab-app/tests/fixtures/workspace/`, tmp `data_dir` |
| integration | `ingest` with `provider="none"` skips Lance and produces an `IngestReport` with `embeddings_indexed=0` | tmp config | | integration | `ingest` with `provider="none"` skips Lance and produces an `IngestReport` with `embeddings_indexed=0` | tmp config |
| integration | `ingest` is idempotent: re-running on the same workspace updates `documents.updated_at` and bumps `doc_version` without duplicating rows | tmp `data_dir` | | integration | `ingest` is idempotent: re-running on the same workspace updates `documents.updated_at` and bumps `doc_version` without duplicating rows | tmp `data_dir` |
| integration | `search` with `mode=Lexical` returns hits whose `embedding_model` is `None` | tmp `data_dir` | | integration | `search` with `mode=Lexical` returns hits whose `embedding_model` is `None` | tmp `data_dir` |
@@ -136,36 +136,36 @@ All wired through to the persisted records and the wire output.
| integration | `search` with `mode=Vector` and `provider="none"` returns a clear error | tmp config | | integration | `search` with `mode=Vector` and `provider="none"` returns a clear error | tmp config |
| integration | `list_docs` with `tags_any=["rust"]` filters correctly | tmp `data_dir` | | integration | `list_docs` with `tags_any=["rust"]` filters correctly | tmp `data_dir` |
| integration | `inspect_doc` round-trips a document; `inspect_chunk` round-trips a chunk | tmp `data_dir` | | integration | `inspect_doc` round-trips a document; `inspect_chunk` round-trips a chunk | tmp `data_dir` |
| smoke | end-to-end CLI smoke: `kb index` + `kb search --mode hybrid "..."` against the fixture workspace using `cargo run` (manual / `assert_cmd`) | fixture | | smoke | end-to-end CLI smoke: `kebab index` + `kebab search --mode hybrid "..."` against the fixture workspace using `cargo run` (manual / `assert_cmd`) | fixture |
`#[ignore]` AVX-gated tests follow the P3-3/P3-4 pattern: `require_avx_or_panic()` at the top of each LanceDB-touching body. Default `cargo test -p kb-app` runs the lexical-only and SQLite-only paths. `#[ignore]` AVX-gated tests follow the P3-3/P3-4 pattern: `require_avx_or_panic()` at the top of each LanceDB-touching body. Default `cargo test -p kebab-app` runs the lexical-only and SQLite-only paths.
All tests under `cargo test -p kb-app`. CLI smoke optional via `assert_cmd` if it doesn't add a heavyweight dep tree. All tests under `cargo test -p kebab-app`. CLI smoke optional via `assert_cmd` if it doesn't add a heavyweight dep tree.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-app` passes. - [ ] `cargo check -p kebab-app` passes.
- [ ] `cargo test -p kb-app` passes (default lane). - [ ] `cargo test -p kebab-app` passes (default lane).
- [ ] `cargo test -p kb-app -- --ignored` passes on AVX hardware. - [ ] `cargo test -p kebab-app -- --ignored` passes on AVX hardware.
- [ ] `cargo run -p kb-cli -- index` succeeds against a non-empty fixture workspace. - [ ] `cargo run -p kebab-cli -- index` succeeds against a non-empty fixture workspace.
- [ ] `cargo run -p kb-cli -- search --mode hybrid "<term>"` returns real hits + citations. - [ ] `cargo run -p kebab-cli -- search --mode hybrid "<term>"` returns real hits + citations.
- [ ] `cargo run -p kb-cli -- list` returns the indexed documents. - [ ] `cargo run -p kebab-cli -- list` returns the indexed documents.
- [ ] `cargo clippy --workspace --all-targets -- -D warnings` clean. - [ ] `cargo clippy --workspace --all-targets -- -D warnings` clean.
- [ ] No imports outside Allowed dependencies. - [ ] No imports outside Allowed dependencies.
- [ ] PR links design §1.2, §1.5, §1.6, §7. - [ ] PR links design §1.2, §1.5, §1.6, §7.
## Out of scope ## Out of scope
- `kb-app::ask` body — P4-3 (RAG pipeline). - `kebab-app::ask` body — P4-3 (RAG pipeline).
- `kb index --rebuild-fts` CLI option — wire `kb_store_sqlite::rebuild_chunks_fts` only when needed by a downstream task. - `kebab index --rebuild-fts` CLI option — wire `kebab_store_sqlite::rebuild_chunks_fts` only when needed by a downstream task.
- `kb index --resume` checkpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer. - `kebab index --resume` checkpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer.
- `--watch` mode — P+. - `--watch` mode — P+.
- TUI / desktop integration — P9 consumes the wired facade. - TUI / desktop integration — P9 consumes the wired facade.
## Risks / notes ## Risks / notes
- Cold-start cost: first `kb index` run downloads the fastembed model (~470MB) and warms the ONNX session. Surface via `tracing::info!` (already wired in P3-2). - Cold-start cost: first `kebab index` run downloads the fastembed model (~470MB) and warms the ONNX session. Surface via `tracing::info!` (already wired in P3-2).
- **Post-merge fix (2026-05)**: original P3-5 implementation left every `kb-cli` subcommand calling the bare `kb_app::*` free functions, which silently re-loaded `Config::load(None)` (XDG default) and ignored the user's `--config <path>` flag. CLI now goes through `kb_app::*_with_config(cfg, ...)` everywhere. The `#[doc(hidden)] pub fn *_with_config` companions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See [HOTFIXES.md](../HOTFIXES.md) for details. - **Post-merge fix (2026-05)**: original P3-5 implementation left every `kebab-cli` subcommand calling the bare `kebab_app::*` free functions, which silently re-loaded `Config::load(None)` (XDG default) and ignored the user's `--config <path>` flag. CLI now goes through `kebab_app::*_with_config(cfg, ...)` everywhere. The `#[doc(hidden)] pub fn *_with_config` companions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See [HOTFIXES.md](../HOTFIXES.md) for details.
- The `App` lifecycle struct is internal but its construction is the natural seam for adding caching / connection pooling later. Keep it `pub(crate)` so future refactors don't break the CLI. - The `App` lifecycle struct is internal but its construction is the natural seam for adding caching / connection pooling later. Keep it `pub(crate)` so future refactors don't break the CLI.
- Mismatched `index_version` across stored records and the live retriever should fail loud at `App::open` (not at first search). Reuse the `tracing::warn!` from `HybridRetriever::new` (P3-4). - Mismatched `index_version` across stored records and the live retriever should fail loud at `App::open` (not at first search). Reuse the `tracing::warn!` from `HybridRetriever::new` (P3-4).
- The fastembed adapter holds a `tokio::runtime` (P3-3); `App` must be constructed from a synchronous context. Document on `App::open`. - The fastembed adapter holds a `tokio::runtime` (P3-3); `App` must be constructed from a synchronous context. Document on `App::open`.

View File

@@ -1,12 +1,12 @@
--- ---
phase: P4 phase: P4
component: kb-llm (trait crate) component: kebab-llm (trait crate)
task_id: p4-1 task_id: p4-1
title: "LanguageModel trait + GenerateRequest/TokenChunk" title: "LanguageModel trait + GenerateRequest/TokenChunk"
status: completed status: completed
depends_on: [p0-1] depends_on: [p0-1]
unblocks: [p4-2, p4-3] unblocks: [p4-2, p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§7.1 GenerateRequest/TokenChunk, §7.2 LanguageModel, §0 Q5 streaming, §3.8 ModelRef] contract_sections: [§7.1 GenerateRequest/TokenChunk, §7.2 LanguageModel, §0 Q5 streaming, §3.8 ModelRef]
--- ---
@@ -14,16 +14,16 @@ contract_sections: [§7.1 GenerateRequest/TokenChunk, §7.2 LanguageModel, §0 Q
## Goal ## Goal
Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper types (`GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage`, `ModelRef`), plus a `MockLanguageModel` for downstream tests. Provide the `kebab-llm` crate that re-exports the `LanguageModel` trait and helper types (`GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage`, `ModelRef`), plus a `MockLanguageModel` for downstream tests.
## Why now / why this size ## Why now / why this size
`kb-rag` (p4-3) consumes a `LanguageModel` trait object. Owning the trait + a deterministic mock here lets RAG tests run with no Ollama dependency. Real adapters (Ollama, llama.cpp, candle) live in p4-2 and beyond. `kebab-rag` (p4-3) consumes a `LanguageModel` trait object. Owning the trait + a deterministic mock here lets RAG tests run with no Ollama dependency. Real adapters (Ollama, llama.cpp, candle) live in p4-2 and beyond.
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `serde` - `serde`
- `thiserror` - `thiserror`
- `tracing` - `tracing`
@@ -31,13 +31,13 @@ Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper
## Forbidden dependencies ## Forbidden dependencies
- `reqwest`, `ureq`, `tokio`, `whisper-rs`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-tui`, `kb-desktop` - `reqwest`, `ureq`, `tokio`, `whisper-rs`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `GenerateRequest` | `kb_core::GenerateRequest` | RAG pipeline | | `GenerateRequest` | `kebab_core::GenerateRequest` | RAG pipeline |
| concrete adapter at runtime | `dyn LanguageModel` | p4-2+ | | concrete adapter at runtime | `dyn LanguageModel` | p4-2+ |
## Outputs ## Outputs
@@ -45,12 +45,12 @@ Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| streaming `TokenChunk` iterator | `Box<dyn Iterator<Item=anyhow::Result<TokenChunk>> + Send>` | RAG pipeline | | streaming `TokenChunk` iterator | `Box<dyn Iterator<Item=anyhow::Result<TokenChunk>> + Send>` | RAG pipeline |
| `ModelRef` identity | `kb_core::ModelRef` | Answer.model | | `ModelRef` identity | `kebab_core::ModelRef` | Answer.model |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
```rust ```rust
pub use kb_core::{LanguageModel, GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef}; pub use kebab_core::{LanguageModel, GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef};
/// Test-only deterministic mock. Compiled only when `mock` feature is on. /// Test-only deterministic mock. Compiled only when `mock` feature is on.
#[cfg(feature = "mock")] #[cfg(feature = "mock")]
@@ -59,12 +59,12 @@ pub struct MockLanguageModel {
pub provider: String, pub provider: String,
pub context_tokens: usize, pub context_tokens: usize,
pub canned_response: String, // emitted token-by-token pub canned_response: String, // emitted token-by-token
pub canned_finish: kb_core::FinishReason, pub canned_finish: kebab_core::FinishReason,
pub canned_usage: kb_core::TokenUsage, pub canned_usage: kebab_core::TokenUsage,
} }
#[cfg(feature = "mock")] #[cfg(feature = "mock")]
impl kb_core::LanguageModel for MockLanguageModel { /* per §7.2 */ } impl kebab_core::LanguageModel for MockLanguageModel { /* per §7.2 */ }
``` ```
## Behavior contract ## Behavior contract
@@ -89,12 +89,12 @@ impl kb_core::LanguageModel for MockLanguageModel { /* per §7.2 */ }
| unit | concatenation of streamed `TokenChunk::Token` equals canned text (truncated by stop strings) | inline | | unit | concatenation of streamed `TokenChunk::Token` equals canned text (truncated by stop strings) | inline |
| contract | `model_ref()` populates `provider` and leaves `dimensions = None` | inline | | contract | `model_ref()` populates `provider` and leaves `dimensions = None` | inline |
All tests under `cargo test -p kb-llm`. All tests under `cargo test -p kebab-llm`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-llm` passes - [ ] `cargo check -p kebab-llm` passes
- [ ] `cargo test -p kb-llm` passes - [ ] `cargo test -p kebab-llm` passes
- [ ] No HTTP / async runtime deps present - [ ] No HTTP / async runtime deps present
- [ ] PR links design §7.2 LanguageModel, §0 Q5 - [ ] PR links design §7.2 LanguageModel, §0 Q5

View File

@@ -1,12 +1,12 @@
--- ---
phase: P4 phase: P4
component: kb-llm-local (Ollama adapter) component: kebab-llm-local (Ollama adapter)
task_id: p4-2 task_id: p4-2
title: "OllamaLanguageModel — streaming /api/generate" title: "OllamaLanguageModel — streaming /api/generate"
status: completed status: completed
depends_on: [p4-1] depends_on: [p4-1]
unblocks: [p4-3] unblocks: [p4-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [design §7.2 LanguageModel, report §11.2 Ollama, design §6.4 [models.llm], design §0 Q5 streaming, design §10 errors] contract_sections: [design §7.2 LanguageModel, report §11.2 Ollama, design §6.4 [models.llm], design §0 Q5 streaming, design §10 errors]
--- ---
@@ -18,13 +18,13 @@ Implement `OllamaLanguageModel` against Ollama's local HTTP API (`POST /api/gene
## Why now / why this size ## Why now / why this size
First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so swapping providers stays config-only. First real LM. Required for `kebab ask` to function. Isolated from RAG pipeline so swapping providers stays config-only.
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-llm` - `kebab-llm`
- `reqwest = { version = "0.12", default-features = false, features = ["blocking", "json", "rustls-tls"] }` - `reqwest = { version = "0.12", default-features = false, features = ["blocking", "json", "rustls-tls"] }`
- `serde`, `serde_json` - `serde`, `serde_json`
- `tracing` - `tracing`
@@ -32,21 +32,21 @@ First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so
## Forbidden dependencies ## Forbidden dependencies
- `tokio`, `async-std`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-tui`, `kb-desktop`. (Streaming uses `reqwest::blocking::Response::bytes_stream` via line-delimited JSON; no async runtime needed.) - `tokio`, `async-std`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-tui`, `kebab-desktop`. (Streaming uses `reqwest::blocking::Response::bytes_stream` via line-delimited JSON; no async runtime needed.)
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `kb-config::Config.models.llm` | endpoint, model, context, temperature, seed | runtime | | `kebab-config::Config.models.llm` | endpoint, model, context, temperature, seed | runtime |
| `GenerateRequest` | `kb_core::GenerateRequest` | RAG pipeline | | `GenerateRequest` | `kebab_core::GenerateRequest` | RAG pipeline |
| Ollama HTTP server (local) | `http://127.0.0.1:11434` | external process | | Ollama HTTP server (local) | `http://127.0.0.1:11434` | external process |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| streaming `TokenChunk` iterator | per §7.2 | `kb-rag` | | streaming `TokenChunk` iterator | per §7.2 | `kebab-rag` |
| `ModelRef` | `{ id, provider="ollama", dimensions=None }` | `Answer.model` | | `ModelRef` | `{ id, provider="ollama", dimensions=None }` | `Answer.model` |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -55,14 +55,14 @@ First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so
pub struct OllamaLanguageModel { /* internal: reqwest::blocking::Client + config */ } pub struct OllamaLanguageModel { /* internal: reqwest::blocking::Client + config */ }
impl OllamaLanguageModel { impl OllamaLanguageModel {
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>; pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
} }
impl kb_core::LanguageModel for OllamaLanguageModel { impl kebab_core::LanguageModel for OllamaLanguageModel {
fn model_ref(&self) -> kb_core::ModelRef; fn model_ref(&self) -> kebab_core::ModelRef;
fn context_tokens(&self) -> usize; fn context_tokens(&self) -> usize;
fn generate_stream(&self, req: kb_core::GenerateRequest) fn generate_stream(&self, req: kebab_core::GenerateRequest)
-> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<kb_core::TokenChunk>> + Send>>; -> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<kebab_core::TokenChunk>> + Send>>;
} }
``` ```
@@ -93,7 +93,7 @@ impl kb_core::LanguageModel for OllamaLanguageModel {
- UTF-8 boundary: buffer incomplete byte sequences across stream lines before emitting `TokenChunk::Token`. - UTF-8 boundary: buffer incomplete byte sequences across stream lines before emitting `TokenChunk::Token`.
- Determinism: with `temperature=0` and fixed `seed`, Ollama's output is reproducible (modulo nondeterminism in the model itself); tests that verify determinism use a fixed seed and may rely on aggregate hash with tolerance, NOT byte equality. - Determinism: with `temperature=0` and fixed `seed`, Ollama's output is reproducible (modulo nondeterminism in the model itself); tests that verify determinism use a fixed seed and may rely on aggregate hash with tolerance, NOT byte equality.
- `model_ref().provider = "ollama"`, `dimensions = None`. - `model_ref().provider = "ollama"`, `dimensions = None`.
- Reachability check: `OllamaLanguageModel::new` does NOT eagerly hit the network; first failure surfaces on `generate_stream`. Use `kb doctor` (separate task) to probe. - Reachability check: `OllamaLanguageModel::new` does NOT eagerly hit the network; first failure surfaces on `generate_stream`. Use `kebab doctor` (separate task) to probe.
## Storage / wire effects ## Storage / wire effects
@@ -112,12 +112,12 @@ impl kb_core::LanguageModel for OllamaLanguageModel {
| determinism | identical request + temperature=0 + seed=0 produces identical token stream against mock | mocked HTTP | | determinism | identical request + temperature=0 + seed=0 produces identical token stream against mock | mocked HTTP |
| `#[ignore]` integration | real Ollama on `localhost:11434` with `qwen2.5:14b-instruct` produces non-empty output | requires user opt-in | | `#[ignore]` integration | real Ollama on `localhost:11434` with `qwen2.5:14b-instruct` produces non-empty output | requires user opt-in |
All non-ignored tests under `cargo test -p kb-llm-local`. Real-LM integration runs via `cargo test -p kb-llm-local -- --ignored`. All non-ignored tests under `cargo test -p kebab-llm-local`. Real-LM integration runs via `cargo test -p kebab-llm-local -- --ignored`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-llm-local` passes - [ ] `cargo check -p kebab-llm-local` passes
- [ ] `cargo test -p kb-llm-local` passes (mocked tests; real LM behind `#[ignore]`) - [ ] `cargo test -p kebab-llm-local` passes (mocked tests; real LM behind `#[ignore]`)
- [ ] No async runtime present (uses `reqwest::blocking`) - [ ] No async runtime present (uses `reqwest::blocking`)
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §11.2, §0 Q5, §10 - [ ] PR links design §11.2, §0 Q5, §10
@@ -125,7 +125,7 @@ All non-ignored tests under `cargo test -p kb-llm-local`. Real-LM integration ru
## Out of scope ## Out of scope
- llama.cpp / candle adapters (P+). - llama.cpp / candle adapters (P+).
- Embedding via Ollama's `/api/embed` endpoint (alternate adapter inside `kb-embed-local` if requested later). - Embedding via Ollama's `/api/embed` endpoint (alternate adapter inside `kebab-embed-local` if requested later).
- Cancellation / abort tokens (P+). - Cancellation / abort tokens (P+).
- Connection pooling tuning (default `reqwest::blocking` is sufficient for single-user CLI). - Connection pooling tuning (default `reqwest::blocking` is sufficient for single-user CLI).

View File

@@ -1,12 +1,12 @@
--- ---
phase: P4 phase: P4
component: kb-rag component: kebab-rag
task_id: p4-3 task_id: p4-3
title: "RAG pipeline: retrieve → gate → pack → generate → cite-validate" title: "RAG pipeline: retrieve → gate → pack → generate → cite-validate"
status: completed status: completed
depends_on: [p3-4, p4-2] depends_on: [p3-4, p4-2]
unblocks: [p5-1] unblocks: [p5-1]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§0 Q4 refusal (two-layer), §0 Q7 footer, §1.11.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 [rag], §10 errors] contract_sections: [§0 Q4 refusal (two-layer), §0 Q7 footer, §1.11.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 [rag], §10 errors]
--- ---
@@ -22,11 +22,11 @@ This is the user-facing payoff. Splitting it further would couple too many inter
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-search` (Retriever trait object) - `kebab-search` (Retriever trait object)
- `kb-llm` (LanguageModel trait object) - `kebab-llm` (LanguageModel trait object)
- `kb-store-sqlite` (read chunk full text/section + write `answers` row) - `kebab-store-sqlite` (read chunk full text/section + write `answers` row)
- `serde`, `serde_json` - `serde`, `serde_json`
- `regex` (for citation marker extraction) - `regex` (for citation marker extraction)
- `time` - `time`
@@ -35,51 +35,51 @@ This is the user-facing payoff. Splitting it further would couple too many inter
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector` (only via Retriever trait), `kb-embed*` (only via Retriever), `kb-llm-local` (only via LanguageModel trait), `kb-tui`, `kb-desktop` - `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector` (only via Retriever trait), `kebab-embed*` (only via Retriever), `kebab-llm-local` (only via LanguageModel trait), `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `query: &str` | text | `kb-app::ask` | | `query: &str` | text | `kebab-app::ask` |
| `AskOpts` | k, explain, mode, temperature, seed | CLI | | `AskOpts` | k, explain, mode, temperature, seed | CLI |
| `dyn Retriever` | hybrid retriever from p3-4 | runtime injection | | `dyn Retriever` | hybrid retriever from p3-4 | runtime injection |
| `dyn LanguageModel` | from p4-2 (or mock) | runtime injection | | `dyn LanguageModel` | from p4-2 (or mock) | runtime injection |
| `dyn DocumentStore` | for chunk full-text fetch | from p1-6 | | `dyn DocumentStore` | for chunk full-text fetch | from p1-6 |
| `kb-config::Config.rag` | `prompt_template_version`, `score_gate`, `max_context_tokens` | runtime | | `kebab-config::Config.rag` | `prompt_template_version`, `score_gate`, `max_context_tokens` | runtime |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `Answer` | `kb_core::Answer` | `kb-cli` printer, `answers` table | | `Answer` | `kebab_core::Answer` | `kebab-cli` printer, `answers` table |
| `answers` table row | SQLite | history, eval | | `answers` table row | SQLite | history, eval |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
```rust ```rust
pub struct RagPipeline { pub struct RagPipeline {
retriever: std::sync::Arc<dyn kb_core::Retriever>, retriever: std::sync::Arc<dyn kebab_core::Retriever>,
llm: std::sync::Arc<dyn kb_core::LanguageModel>, llm: std::sync::Arc<dyn kebab_core::LanguageModel>,
docs: std::sync::Arc<kb_store_sqlite::SqliteStore>, docs: std::sync::Arc<kebab_store_sqlite::SqliteStore>,
config: kb_config::Config, config: kebab_config::Config,
} }
impl RagPipeline { impl RagPipeline {
pub fn new( pub fn new(
config: kb_config::Config, config: kebab_config::Config,
retriever: std::sync::Arc<dyn kb_core::Retriever>, retriever: std::sync::Arc<dyn kebab_core::Retriever>,
llm: std::sync::Arc<dyn kb_core::LanguageModel>, llm: std::sync::Arc<dyn kebab_core::LanguageModel>,
docs: std::sync::Arc<kb_store_sqlite::SqliteStore>, docs: std::sync::Arc<kebab_store_sqlite::SqliteStore>,
) -> Self; ) -> Self;
pub fn ask(&self, query: &str, opts: AskOpts) -> anyhow::Result<kb_core::Answer>; pub fn ask(&self, query: &str, opts: AskOpts) -> anyhow::Result<kebab_core::Answer>;
} }
pub struct AskOpts { pub struct AskOpts {
pub k: usize, pub k: usize,
pub explain: bool, pub explain: bool,
pub mode: kb_core::SearchMode, pub mode: kebab_core::SearchMode,
pub temperature: Option<f32>, pub temperature: Option<f32>,
pub seed: Option<u64>, pub seed: Option<u64>,
pub stream_sink: Option<std::sync::mpsc::Sender<String>>, // tty/UI token streaming pub stream_sink: Option<std::sync::mpsc::Sender<String>>, // tty/UI token streaming
@@ -158,12 +158,12 @@ pub struct AskOpts {
| determinism | identical inputs + temperature=0 + seed=0 → identical Answer (snapshot) | mock | | determinism | identical inputs + temperature=0 + seed=0 → identical Answer (snapshot) | mock |
| snapshot | `Answer` JSON for fixed query stable | `fixtures/rag/run-1.json` | | snapshot | `Answer` JSON for fixed query stable | `fixtures/rag/run-1.json` |
All tests under `cargo test -p kb-rag` with no real Ollama (mock LM only). All tests under `cargo test -p kebab-rag` with no real Ollama (mock LM only).
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-rag` passes - [ ] `cargo check -p kebab-rag` passes
- [ ] `cargo test -p kb-rag` passes - [ ] `cargo test -p kebab-rag` passes
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] All paths write an `answers` row - [ ] All paths write an `answers` row
- [ ] Output JSON conforms to `answer.v1` - [ ] Output JSON conforms to `answer.v1`
@@ -178,9 +178,9 @@ All tests under `cargo test -p kb-rag` with no real Ollama (mock LM only).
## Risks / notes ## Risks / notes
- Citation regex is STRICT `\[#(\d{1,3})\]` only. Models that emit `[1]`/`[ #1 ]`/`[foo]` are treated as no-marker → refusal. This is intentional: a noisy citation grammar lets prose `[1]` or `vec![1]` slip through as false positives, which corrupts both `grounded` and `kb eval` `citation_coverage`. The prompt template (`rag-v1`) explicitly instructs `[#번호]`. - Citation regex is STRICT `\[#(\d{1,3})\]` only. Models that emit `[1]`/`[ #1 ]`/`[foo]` are treated as no-marker → refusal. This is intentional: a noisy citation grammar lets prose `[1]` or `vec![1]` slip through as false positives, which corrupts both `grounded` and `kebab eval` `citation_coverage`. The prompt template (`rag-v1`) explicitly instructs `[#번호]`.
- **Post-merge fix (2026-05)**: kb-cli's `Cmd::Ask` arm originally called bare `kb_app::ask(query, opts)`, ignoring `--config <path>` and silently using XDG defaults (manifested as wrong model id / score_gate / data_dir surfacing in `Answer.retrieval`). Fixed by routing through `kb_app::ask_with_config(cfg, query, opts)`. See [HOTFIXES.md](../HOTFIXES.md). - **Post-merge fix (2026-05)**: kebab-cli's `Cmd::Ask` arm originally called bare `kebab_app::ask(query, opts)`, ignoring `--config <path>` and silently using XDG defaults (manifested as wrong model id / score_gate / data_dir surfacing in `Answer.retrieval`). Fixed by routing through `kebab_app::ask_with_config(cfg, query, opts)`. See [HOTFIXES.md](../HOTFIXES.md).
- **Post-merge fix (2026-05)**: `config.rag.score_gate` default `0.05` was incompatible with hybrid RRF `fusion_score` (raw range `(0, 2/k_rrf]` ≈ `≤ 0.033` at default k_rrf=60), refusing every hybrid `kb ask`. Closed by normalizing RRF in p3-4 to `[0, 1]`. See p3-4 spec + HOTFIXES.md. - **Post-merge fix (2026-05)**: `config.rag.score_gate` default `0.05` was incompatible with hybrid RRF `fusion_score` (raw range `(0, 2/k_rrf]` ≈ `≤ 0.033` at default k_rrf=60), refusing every hybrid `kebab ask`. Closed by normalizing RRF in p3-4 to `[0, 1]`. See p3-4 spec + HOTFIXES.md.
- `stream_sink` channel: pipeline `send`s tokens; if the receiver is dropped (caller cancelled), `SendError` is silently swallowed and generation continues to completion (so the `Answer` row still gets persisted). Pipeline does NOT panic on a dead sink. - `stream_sink` channel: pipeline `send`s tokens; if the receiver is dropped (caller cancelled), `SendError` is silently swallowed and generation continues to completion (so the `Answer` row still gets persisted). Pipeline does NOT panic on a dead sink.
- `temperature=0` does not fully eliminate stochasticity in some quantized Ollama models; document this and rely on `must_contain` rule-based metrics in P5 instead of exact match. - `temperature=0` does not fully eliminate stochasticity in some quantized Ollama models; document this and rely on `must_contain` rule-based metrics in P5 instead of exact match.
- Prompt-injection defense lives entirely in the system prompt; do NOT mutate `[근거]` text. If chunk text contains `<|system|>` or similar tokens, do not strip them — they are inert when wrapped. - Prompt-injection defense lives entirely in the system prompt; do NOT mutate `[근거]` text. If chunk text contains `<|system|>` or similar tokens, do not strip them — they are inert when wrapped.

View File

@@ -1,12 +1,12 @@
--- ---
phase: P5 phase: P5
component: kb-eval (runner) component: kebab-eval (runner)
task_id: p5-1 task_id: p5-1
title: "Golden query fixture loader + per-query runner" title: "Golden query fixture loader + per-query runner"
status: completed status: completed
depends_on: [p4-3] depends_on: [p4-3]
unblocks: [p5-2] unblocks: [p5-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5.7 eval_runs/eval_query_results, §6.3 runs_dir, phase epic tasks/phase-5-evaluation.md] contract_sections: [§5.7 eval_runs/eval_query_results, §6.3 runs_dir, phase epic tasks/phase-5-evaluation.md]
--- ---
@@ -14,7 +14,7 @@ contract_sections: [§5.7 eval_runs/eval_query_results, §6.3 runs_dir, phase ep
## Goal ## Goal
Load `fixtures/golden_queries.yaml`, run each query through `kb-app` (lexical / vector / hybrid / rag), and persist results into `eval_query_results` + `runs_dir/<run_id>/per_query.jsonl`. Load `fixtures/golden_queries.yaml`, run each query through `kebab-app` (lexical / vector / hybrid / rag), and persist results into `eval_query_results` + `runs_dir/<run_id>/per_query.jsonl`.
## Why now / why this size ## Why now / why this size
@@ -22,10 +22,10 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-app` (calls facade for search / ask) - `kebab-app` (calls facade for search / ask)
- `kb-store-sqlite` (writes eval rows) - `kebab-store-sqlite` (writes eval rows)
- `serde`, `serde_yaml`, `serde_json` - `serde`, `serde_yaml`, `serde_json`
- `time` - `time`
- `tracing` - `tracing`
@@ -33,7 +33,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (all reached via `kb-app` facade only), `kb-tui`, `kb-desktop` - `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (all reached via `kebab-app` facade only), `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
@@ -41,7 +41,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
|-------|------|--------| |-------|------|--------|
| `fixtures/golden_queries.yaml` | YAML | repo-shipped | | `fixtures/golden_queries.yaml` | YAML | repo-shipped |
| `EvalRunOpts` | suite, mode, with_rag, k, temperature, seed | CLI | | `EvalRunOpts` | suite, mode, with_rag, k, temperature, seed | CLI |
| `kb-app` facade | search/ask | runtime | | `kebab-app` facade | search/ask | runtime |
## Outputs ## Outputs
@@ -50,7 +50,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
| `eval_runs` row | SQLite | p5-2, history | | `eval_runs` row | SQLite | p5-2, history |
| `eval_query_results` rows | SQLite | p5-2 | | `eval_query_results` rows | SQLite | p5-2 |
| `runs_dir/<run_id>/per_query.jsonl` | filesystem | external tools, audits | | `runs_dir/<run_id>/per_query.jsonl` | filesystem | external tools, audits |
| `EvalRun` struct | `kb_eval::EvalRun` | caller | | `EvalRun` struct | `kebab_eval::EvalRun` | caller |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -58,9 +58,9 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
pub struct GoldenQuery { pub struct GoldenQuery {
pub id: String, pub id: String,
pub query: String, pub query: String,
pub lang: kb_core::Lang, pub lang: kebab_core::Lang,
pub expected_doc_ids: Vec<kb_core::DocumentId>, pub expected_doc_ids: Vec<kebab_core::DocumentId>,
pub expected_chunk_ids: Vec<kb_core::ChunkId>, pub expected_chunk_ids: Vec<kebab_core::ChunkId>,
pub must_contain: Vec<String>, pub must_contain: Vec<String>,
pub forbidden: Vec<String>, pub forbidden: Vec<String>,
pub difficulty: Option<String>, pub difficulty: Option<String>,
@@ -68,7 +68,7 @@ pub struct GoldenQuery {
pub struct EvalRunOpts { pub struct EvalRunOpts {
pub suite: String, // "golden" default pub suite: String, // "golden" default
pub mode: kb_core::SearchMode, pub mode: kebab_core::SearchMode,
pub with_rag: bool, pub with_rag: bool,
pub k: usize, pub k: usize,
pub temperature: Option<f32>, pub temperature: Option<f32>,
@@ -86,9 +86,9 @@ pub struct EvalRun {
pub struct QueryResult { pub struct QueryResult {
pub query_id: String, pub query_id: String,
pub query: String, pub query: String,
pub mode: kb_core::SearchMode, pub mode: kebab_core::SearchMode,
pub hits_top_k: Vec<kb_core::SearchHit>, pub hits_top_k: Vec<kebab_core::SearchHit>,
pub answer: Option<kb_core::Answer>, pub answer: Option<kebab_core::Answer>,
pub elapsed_ms: u32, pub elapsed_ms: u32,
pub error: Option<String>, pub error: Option<String>,
} }
@@ -103,10 +103,10 @@ pub fn run_eval(opts: &EvalRunOpts) -> anyhow::Result<EvalRun>;
- Parses YAML; required fields: `id`, `query`. Optional: everything else (defaults to empty / `None`). - Parses YAML; required fields: `id`, `query`. Optional: everything else (defaults to empty / `None`).
- Validates uniqueness of `id` and that `expected_doc_ids` / `expected_chunk_ids` exist in DB; missing → return error listing the offenders. - Validates uniqueness of `id` and that `expected_doc_ids` / `expected_chunk_ids` exist in DB; missing → return error listing the offenders.
- `run_eval`: - `run_eval`:
- Loads `fixtures/golden_queries.yaml` (path overridable via env `KB_EVAL_GOLDEN`). - Loads `fixtures/golden_queries.yaml` (path overridable via env `KEBAB_EVAL_GOLDEN`).
- Generates `run_id = "run_" + ulid_lower()`. - Generates `run_id = "run_" + ulid_lower()`.
- Captures `config_snapshot_json`: serialized `kb_config::Config` plus `chunker_version`, `embedding_model+version+dims`, `llm.model_id`, `prompt_template_version`, `score_gate`, `rrf_k`, `index_version`. - Captures `config_snapshot_json`: serialized `kebab_config::Config` plus `chunker_version`, `embedding_model+version+dims`, `llm.model_id`, `prompt_template_version`, `score_gate`, `rrf_k`, `index_version`.
- For each query: call `kb_app::search(SearchQuery { mode: opts.mode, k: opts.k, .. })`. If `opts.with_rag`, also call `kb_app::ask(query, AskOpts { mode: opts.mode, k: opts.k, explain: true, temperature: opts.temperature, seed: opts.seed, .. })`. - For each query: call `kebab_app::search(SearchQuery { mode: opts.mode, k: opts.k, .. })`. If `opts.with_rag`, also call `kebab_app::ask(query, AskOpts { mode: opts.mode, k: opts.k, explain: true, temperature: opts.temperature, seed: opts.seed, .. })`.
- Each `QueryResult` measured by elapsed wall-clock (ms). - Each `QueryResult` measured by elapsed wall-clock (ms).
- Errors are caught per-query (do not abort the run). Failed queries record `error: Some(msg)` and `hits_top_k = vec![]`. - Errors are caught per-query (do not abort the run). Failed queries record `error: Some(msg)` and `hits_top_k = vec![]`.
- Determinism: with `temperature=0` and fixed `seed`, two consecutive runs produce byte-identical `per_query.jsonl` for non-RAG queries; RAG queries may differ in negligible token budget telemetry. - Determinism: with `temperature=0` and fixed `seed`, two consecutive runs produce byte-identical `per_query.jsonl` for non-RAG queries; RAG queries may differ in negligible token budget telemetry.
@@ -130,12 +130,12 @@ pub fn run_eval(opts: &EvalRunOpts) -> anyhow::Result<EvalRun>;
| determinism | re-running same suite + fixed seed → identical `per_query.jsonl` (lexical only) | tmp DB, fixed corpus | | determinism | re-running same suite + fixed seed → identical `per_query.jsonl` (lexical only) | tmp DB, fixed corpus |
| snapshot | `EvalRun` (with mock LM for `with_rag`) JSON stable | `fixtures/eval/run-1.json` | | snapshot | `EvalRun` (with mock LM for `with_rag`) JSON stable | `fixtures/eval/run-1.json` |
All tests under `cargo test -p kb-eval runner`. All tests under `cargo test -p kebab-eval runner`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-eval` passes - [ ] `cargo check -p kebab-eval` passes
- [ ] `cargo test -p kb-eval runner` passes - [ ] `cargo test -p kebab-eval runner` passes
- [ ] `fixtures/golden_queries.yaml` template shipped (≥ 5 example entries) - [ ] `fixtures/golden_queries.yaml` template shipped (≥ 5 example entries)
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §5.7 - [ ] PR links design §5.7

View File

@@ -1,12 +1,12 @@
--- ---
phase: P5 phase: P5
component: kb-eval (metrics + compare) component: kebab-eval (metrics + compare)
task_id: p5-2 task_id: p5-2
title: "Metrics computation + compare report" title: "Metrics computation + compare report"
status: completed status: completed
depends_on: [p5-1] depends_on: [p5-1]
unblocks: [] unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§5.7 eval_runs.aggregate_json, phase epic tasks/phase-5-evaluation.md] contract_sections: [§5.7 eval_runs.aggregate_json, phase epic tasks/phase-5-evaluation.md]
--- ---
@@ -14,7 +14,7 @@ contract_sections: [§5.7 eval_runs.aggregate_json, phase epic tasks/phase-5-eva
## Goal ## Goal
Compute hit@k, MRR, recall@k_doc, citation_coverage, groundedness, empty_result_rate, refusal_correctness from stored `eval_query_results`. Write `aggregate_json` back into `eval_runs`. Provide `kb eval compare a b` that diffs two runs. Compute hit@k, MRR, recall@k_doc, citation_coverage, groundedness, empty_result_rate, refusal_correctness from stored `eval_query_results`. Write `aggregate_json` back into `eval_runs`. Provide `kebab eval compare a b` that diffs two runs.
## Why now / why this size ## Why now / why this size
@@ -22,16 +22,16 @@ Metric formulas + comparison logic are pure computation. Splitting them from p5-
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-store-sqlite` (read eval rows, write `aggregate_json`) - `kebab-store-sqlite` (read eval rows, write `aggregate_json`)
- `serde`, `serde_json` - `serde`, `serde_json`
- `tracing` - `tracing`
- `thiserror` - `thiserror`
## Forbidden dependencies ## Forbidden dependencies
- `kb-app`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-app`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
@@ -46,7 +46,7 @@ Metric formulas + comparison logic are pure computation. Splitting them from p5-
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `eval_runs.aggregate_json` updated | SQLite | history, CI checks | | `eval_runs.aggregate_json` updated | SQLite | history, CI checks |
| `CompareReport` | `kb_eval::CompareReport` | `kb-cli` printer | | `CompareReport` | `kebab_eval::CompareReport` | `kebab-cli` printer |
| optional `runs_dir/<run_id>/report.md` | filesystem | human-readable summary | | optional `runs_dir/<run_id>/report.md` | filesystem | human-readable summary |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -127,15 +127,15 @@ pub fn render_report_md(report: &CompareReport) -> String;
| determinism | running `compute_aggregate` twice produces identical `AggregateMetrics` | inline | | determinism | running `compute_aggregate` twice produces identical `AggregateMetrics` | inline |
| snapshot | `CompareReport` JSON for a fixed pair of runs stable | `fixtures/eval/compare-1.json` | | snapshot | `CompareReport` JSON for a fixed pair of runs stable | `fixtures/eval/compare-1.json` |
All tests under `cargo test -p kb-eval metrics`. All tests under `cargo test -p kebab-eval metrics`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-eval` passes - [ ] `cargo check -p kebab-eval` passes
- [ ] `cargo test -p kb-eval metrics` passes - [ ] `cargo test -p kebab-eval metrics` passes
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] `eval_runs.aggregate_json` always populated after `store_aggregate` - [ ] `eval_runs.aggregate_json` always populated after `store_aggregate`
- [ ] `kb eval compare` CLI surface integrated via `kb-app` (call `compare_runs` + `render_report_md`) - [ ] `kebab eval compare` CLI surface integrated via `kebab-app` (call `compare_runs` + `render_report_md`)
- [ ] PR links phase epic tasks/phase-5-evaluation.md - [ ] PR links phase epic tasks/phase-5-evaluation.md
## Out of scope ## Out of scope
@@ -172,11 +172,11 @@ same one this spec defines, the names / wiring just differ.
/ `compare_runs(a, b)` so integration tests can drive the pipeline / `compare_runs(a, b)` so integration tests can drive the pipeline
against a TempDir-backed `Config`. The no-arg forms wrap them with against a TempDir-backed `Config`. The no-arg forms wrap them with
`Config::load(None)`. `Config::load(None)`.
- **CLI surface lives on `kb-cli` directly, not via `kb-app`.** DoD - **CLI surface lives on `kebab-cli` directly, not via `kebab-app`.** DoD
asks for `kb eval compare` to be reached "via kb-app", but `kb-app` asks for `kebab eval compare` to be reached "via kebab-app", but `kebab-app`
already depends on `kb-eval` (the P5-1 runner uses the App facade), already depends on `kebab-eval` (the P5-1 runner uses the App facade),
so routing the CLI through `kb-app` would form a cycle. `kb-cli` so routing the CLI through `kebab-app` would form a cycle. `kebab-cli`
`kb-eval` is wired directly; `kb-app` is unchanged. `kebab-eval` is wired directly; `kebab-app` is unchanged.
- **`AggregateMetrics` is `Serialize + Deserialize`.** The spec defines - **`AggregateMetrics` is `Serialize + Deserialize`.** The spec defines
only the field shape; we add `Deserialize` so the stored only the field shape; we add `Deserialize` so the stored
`aggregate_json` can round-trip back into the type for follow-up `aggregate_json` can round-trip back into the type for follow-up
@@ -184,12 +184,12 @@ same one this spec defines, the names / wiring just differ.
- **`anyhow`** is used in `Result` returns since the rest of the - **`anyhow`** is used in `Result` returns since the rest of the
workspace already speaks anyhow; not in the spec's Allowed list but workspace already speaks anyhow; not in the spec's Allowed list but
matches every other crate. matches every other crate.
- **`kb-eval` crate-level `kb-app` dep stays.** The crate already - **`kebab-eval` crate-level `kebab-app` dep stays.** The crate already
depends on `kb-app` from P5-1 (the runner uses the `App` facade), so depends on `kebab-app` from P5-1 (the runner uses the `App` facade), so
the Cargo.toml entry remains. The new modules (`metrics.rs`, the Cargo.toml entry remains. The new modules (`metrics.rs`,
`compare.rs`) do not import `kb-app` themselves — they're behind the `compare.rs`) do not import `kebab-app` themselves — they're behind the
same crate boundary as the runner, but the metric/compare *surface* same crate boundary as the runner, but the metric/compare *surface*
is `kb-app`-clean. Splitting the crate to avoid a transitive Cargo is `kebab-app`-clean. Splitting the crate to avoid a transitive Cargo
edge would be churn for no behavior gain. edge would be churn for no behavior gain.
- **`citation_coverage` is intentionally weaker than the spec literal.** - **`citation_coverage` is intentionally weaker than the spec literal.**
Spec calls for "every citation resolves to a real chunk in the DB". Spec calls for "every citation resolves to a real chunk in the DB".

View File

@@ -1,12 +1,12 @@
--- ---
phase: P6 phase: P6
component: kb-parse-image (image extractor + EXIF) component: kebab-parse-image (image extractor + EXIF)
task_id: p6-1 task_id: p6-1
title: "Image Extractor producing single-block CanonicalDocument + EXIF metadata" title: "Image Extractor producing single-block CanonicalDocument + EXIF metadata"
status: planned status: planned
depends_on: [p0-1, p1-6] depends_on: [p0-1, p1-6]
unblocks: [p6-2, p6-3] unblocks: [p6-2, p6-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 Block::ImageRef + ImageRefBlock, §3.7a OcrText/ModelCaption stubs, §9.1 image extraction policy, §9 versioning] contract_sections: [§3.4 Block::ImageRef + ImageRefBlock, §3.7a OcrText/ModelCaption stubs, §9.1 image extraction policy, §9 versioning]
--- ---
@@ -22,8 +22,8 @@ Establishes the image-as-document contract and decouples extraction (asset → I
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `image = "0.25"` (decoding for size + format detect) - `image = "0.25"` (decoding for size + format detect)
- `kamadak-exif` for EXIF - `kamadak-exif` for EXIF
- `serde`, `serde_json` - `serde`, `serde_json`
@@ -33,31 +33,31 @@ Establishes the image-as-document contract and decouples extraction (asset → I
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, OCR libs, LLM libs - `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, OCR libs, LLM libs
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `RawAsset` | `kb_core::RawAsset` | from `kb-source-fs` | | `RawAsset` | `kebab_core::RawAsset` | from `kebab-source-fs` |
| image bytes | `&[u8]` | filesystem | | image bytes | `&[u8]` | filesystem |
| `parser_version` | `kb_core::ParserVersion` | constant in this crate (`"image-meta-v1"`) | | `parser_version` | `kebab_core::ParserVersion` | constant in this crate (`"image-meta-v1"`) |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (image-region chunker) → `kb-store-sqlite` | | `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (image-region chunker) → `kebab-store-sqlite` |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
```rust ```rust
pub struct ImageExtractor; pub struct ImageExtractor;
impl kb_core::Extractor for ImageExtractor { impl kebab_core::Extractor for ImageExtractor {
fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Image(_)) } fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Image(_)) }
fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("image-meta-v1".into()) } fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("image-meta-v1".into()) }
fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kb_core::CanonicalDocument>; fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kebab_core::CanonicalDocument>;
} }
``` ```
@@ -69,7 +69,7 @@ impl kb_core::Extractor for ImageExtractor {
- `metadata.source_type = SourceType::Reference` (per design enum); `trust_level = TrustLevel::Primary`; `tags`/`aliases` empty. - `metadata.source_type = SourceType::Reference` (per design enum); `trust_level = TrustLevel::Primary`; `tags`/`aliases` empty.
- `metadata.user["exif"]` = JSON object with whitelisted EXIF tags (DateTimeOriginal, GPS lat/lon, Make, Model, Orientation, Software). Missing tags omitted. - `metadata.user["exif"]` = JSON object with whitelisted EXIF tags (DateTimeOriginal, GPS lat/lon, Make, Model, Orientation, Software). Missing tags omitted.
- `metadata.user["dimensions"] = { "w": <u32>, "h": <u32>, "format": "<png|jpeg|...>" }`. - `metadata.user["dimensions"] = { "w": <u32>, "h": <u32>, "format": "<png|jpeg|...>" }`.
- `provenance` includes `Discovered`, `Parsed` events (no Normalized — ID assignment happens here directly per §3.4 stub from p1-4 logic, OR pipe through `kb-normalize` if available; this task's choice: emit a fully formed CanonicalDocument with deterministic IDs by calling `kb_core::id_for_doc` and `kb_core::id_for_block` directly). - `provenance` includes `Discovered`, `Parsed` events (no Normalized — ID assignment happens here directly per §3.4 stub from p1-4 logic, OR pipe through `kebab-normalize` if available; this task's choice: emit a fully formed CanonicalDocument with deterministic IDs by calling `kebab_core::id_for_doc` and `kebab_core::id_for_block` directly).
- Failure modes: - Failure modes:
- Truncated/corrupt image → still emits a CanonicalDocument with `dimensions = null`, EXIF empty, `Provenance` warning event with the decoder error message. - Truncated/corrupt image → still emits a CanonicalDocument with `dimensions = null`, EXIF empty, `Provenance` warning event with the decoder error message.
- Unsupported format → `anyhow::Error` (caller skips). - Unsupported format → `anyhow::Error` (caller skips).
@@ -77,7 +77,7 @@ impl kb_core::Extractor for ImageExtractor {
## Storage / wire effects ## Storage / wire effects
- None directly (the caller persists via `kb-store-sqlite`). - None directly (the caller persists via `kebab-store-sqlite`).
## Test plan ## Test plan
@@ -90,12 +90,12 @@ impl kb_core::Extractor for ImageExtractor {
| determinism | identical bytes → identical `doc_id`, `block_id` across two runs | inline | | determinism | identical bytes → identical `doc_id`, `block_id` across two runs | inline |
| snapshot | `CanonicalDocument` JSON stable for fixture | `fixtures/image/red-100x50.png` | | snapshot | `CanonicalDocument` JSON stable for fixture | `fixtures/image/red-100x50.png` |
All tests under `cargo test -p kb-parse-image`. All tests under `cargo test -p kebab-parse-image`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-parse-image` passes - [ ] `cargo check -p kebab-parse-image` passes
- [ ] `cargo test -p kb-parse-image` passes - [ ] `cargo test -p kebab-parse-image` passes
- [ ] No OCR/caption/embedding code present - [ ] No OCR/caption/embedding code present
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.4, §9.1 - [ ] PR links design §3.4, §9.1

View File

@@ -1,12 +1,12 @@
--- ---
phase: P6 phase: P6
component: kb-parse-image (OCR adapter) component: kebab-parse-image (OCR adapter)
task_id: p6-2 task_id: p6-2
title: "OcrEngine trait + Tesseract adapter (Apple Vision feature-gated)" title: "OcrEngine trait + Tesseract adapter (Apple Vision feature-gated)"
status: planned status: planned
depends_on: [p6-1] depends_on: [p6-1]
unblocks: [p6-3] unblocks: [p6-3]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 ImageRefBlock.ocr, §3.7a OcrText/OcrRegion, §9.1 OCR vs caption provenance] contract_sections: [§3.4 ImageRefBlock.ocr, §3.7a OcrText/OcrRegion, §9.1 OCR vs caption provenance]
--- ---
@@ -22,9 +22,9 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-parse-image` (consumes its types) - `kebab-parse-image` (consumes its types)
- `tesseract = "0.13"` (feature `tesseract`, default ON) - `tesseract = "0.13"` (feature `tesseract`, default ON)
- For feature `apple-vision`: `std::process::Command` only (sidecar binary, not a Rust dep) - For feature `apple-vision`: `std::process::Command` only (sidecar binary, not a Rust dep)
- `serde`, `serde_json` - `serde`, `serde_json`
@@ -34,21 +34,21 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| image bytes | `&[u8]` | from extractor | | image bytes | `&[u8]` | from extractor |
| optional language hint | `kb_core::Lang` | metadata | | optional language hint | `kebab_core::Lang` | metadata |
| `kb-config` OCR settings | engine name, languages | runtime | | `kebab-config` OCR settings | engine name, languages | runtime |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `OcrText` | `kb_core::OcrText` | merged into `ImageRefBlock.ocr` | | `OcrText` | `kebab_core::OcrText` | merged into `ImageRefBlock.ocr` |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -56,11 +56,11 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini
pub trait OcrEngine: Send + Sync { pub trait OcrEngine: Send + Sync {
fn engine_name(&self) -> &'static str; fn engine_name(&self) -> &'static str;
fn engine_version(&self) -> String; fn engine_version(&self) -> String;
fn recognize(&self, image_bytes: &[u8], lang_hint: Option<&kb_core::Lang>) -> anyhow::Result<kb_core::OcrText>; fn recognize(&self, image_bytes: &[u8], lang_hint: Option<&kebab_core::Lang>) -> anyhow::Result<kebab_core::OcrText>;
} }
pub struct TesseractOcr { /* internal: lazy api handle */ } pub struct TesseractOcr { /* internal: lazy api handle */ }
impl TesseractOcr { pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>; } impl TesseractOcr { pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>; }
impl OcrEngine for TesseractOcr { /* per trait */ } impl OcrEngine for TesseractOcr { /* per trait */ }
#[cfg(feature = "apple-vision")] #[cfg(feature = "apple-vision")]
@@ -71,8 +71,8 @@ impl OcrEngine for AppleVisionOcr { /* per trait */ }
pub fn apply_ocr( pub fn apply_ocr(
engine: &dyn OcrEngine, engine: &dyn OcrEngine,
image_bytes: &[u8], image_bytes: &[u8],
block: &mut kb_core::ImageRefBlock, block: &mut kebab_core::ImageRefBlock,
lang_hint: Option<&kb_core::Lang>, lang_hint: Option<&kebab_core::Lang>,
) -> anyhow::Result<()>; ) -> anyhow::Result<()>;
``` ```
@@ -85,7 +85,7 @@ pub fn apply_ocr(
- `joined` = `regions.iter().map(|r| r.text).join(" ")` (no smart layout reconstruction in v1). - `joined` = `regions.iter().map(|r| r.text).join(" ")` (no smart layout reconstruction in v1).
- `engine = "tesseract"`, `engine_version = <tesseract version string>`. The `tesseract` crate (0.13+) does NOT expose a stable Rust `version()` accessor. Use one of: (a) call libtesseract's `TessVersion()` via the bundled FFI surface, OR (b) at adapter construction, shell-out `tesseract --version` once and cache the parsed `"5.3.4"`-style string. Both are deterministic for a fixed install. Pin the chosen approach in the implementation PR. - `engine = "tesseract"`, `engine_version = <tesseract version string>`. The `tesseract` crate (0.13+) does NOT expose a stable Rust `version()` accessor. Use one of: (a) call libtesseract's `TessVersion()` via the bundled FFI surface, OR (b) at adapter construction, shell-out `tesseract --version` once and cache the parsed `"5.3.4"`-style string. Both are deterministic for a fixed install. Pin the chosen approach in the implementation PR.
- Apple Vision sidecar (feature `apple-vision`): - Apple Vision sidecar (feature `apple-vision`):
- Spawn a small Swift binary `kb-vision-ocr` (path from `config.ocr.apple_vision_binary`) feeding the image via stdin and reading JSON `{ regions: [{x,y,w,h,text,confidence}, ...] }` from stdout. - Spawn a small Swift binary `kebab-vision-ocr` (path from `config.ocr.apple_vision_binary`) feeding the image via stdin and reading JSON `{ regions: [{x,y,w,h,text,confidence}, ...] }` from stdout.
- Same threshold and `joined` rules as Tesseract. `engine = "apple-vision"`, `engine_version = sidecar's --version`. - Same threshold and `joined` rules as Tesseract. `engine = "apple-vision"`, `engine_version = sidecar's --version`.
- This subagent task does NOT write the Swift sidecar; it only wires the Rust side. Document the expected sidecar interface in `docs/spec/sidecar-vision.md` (separate doc spec stub, optional). - This subagent task does NOT write the Swift sidecar; it only wires the Rust side. Document the expected sidecar interface in `docs/spec/sidecar-vision.md` (separate doc spec stub, optional).
- `apply_ocr` calls `engine.recognize`, sets `block.ocr = Some(text)`, and appends a `Provenance::OcrApplied` event in the caller's CanonicalDocument (caller responsibility — this task exposes a helper). - `apply_ocr` calls `engine.recognize`, sets `block.ocr = Some(text)`, and appends a `Provenance::OcrApplied` event in the caller's CanonicalDocument (caller responsibility — this task exposes a helper).
@@ -109,12 +109,12 @@ pub fn apply_ocr(
| determinism | two runs of recognize on same input → identical OcrText | fixture | | determinism | two runs of recognize on same input → identical OcrText | fixture |
| `#[cfg(feature = "apple-vision")]` smoke | sidecar invocation captured (mock binary echoes fixed JSON) | inline mock | | `#[cfg(feature = "apple-vision")]` smoke | sidecar invocation captured (mock binary echoes fixed JSON) | inline mock |
All tests under `cargo test -p kb-parse-image ocr`. Tesseract install required on CI host. All tests under `cargo test -p kebab-parse-image ocr`. Tesseract install required on CI host.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-parse-image --features tesseract` passes - [ ] `cargo check -p kebab-parse-image --features tesseract` passes
- [ ] `cargo test -p kb-parse-image ocr` passes - [ ] `cargo test -p kebab-parse-image ocr` passes
- [ ] `apple-vision` feature compiles on macOS and gracefully no-ops on Linux - [ ] `apple-vision` feature compiles on macOS and gracefully no-ops on Linux
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.4, §3.7a, §9.1 - [ ] PR links design §3.4, §3.7a, §9.1
@@ -129,5 +129,5 @@ All tests under `cargo test -p kb-parse-image ocr`. Tesseract install required o
## Risks / notes ## Risks / notes
- Tesseract performance varies wildly with image quality; document `min_confidence` and default page-segmentation mode. - Tesseract performance varies wildly with image quality; document `min_confidence` and default page-segmentation mode.
- Apple Vision sidecar requires code signing for distribution; for v1 dev builds, accept unsigned binary from `~/.local/bin/kb-vision-ocr`. - Apple Vision sidecar requires code signing for distribution; for v1 dev builds, accept unsigned binary from `~/.local/bin/kebab-vision-ocr`.
- Large image downscale loses small-text recognition; expose `config.ocr.max_pixels` so power users can tune. - Large image downscale loses small-text recognition; expose `config.ocr.max_pixels` so power users can tune.

View File

@@ -1,12 +1,12 @@
--- ---
phase: P6 phase: P6
component: kb-parse-image (caption adapter) component: kebab-parse-image (caption adapter)
task_id: p6-3 task_id: p6-3
title: "ModelCaption adapter (LanguageModel-driven, feature-gated)" title: "ModelCaption adapter (LanguageModel-driven, feature-gated)"
status: planned status: planned
depends_on: [p6-1, p4-2] depends_on: [p6-1, p4-2]
unblocks: [] unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 ImageRefBlock.caption, §3.7a ModelCaption, §9.1 caption (model-generated, low trust)] contract_sections: [§3.4 ImageRefBlock.caption, §3.7a ModelCaption, §9.1 caption (model-generated, low trust)]
--- ---
@@ -22,10 +22,10 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-parse-image` - `kebab-parse-image`
- `kb-llm` (LanguageModel trait) - `kebab-llm` (LanguageModel trait)
- `base64` - `base64`
- `serde`, `serde_json` - `serde`, `serde_json`
- `image` (resize for prompt cost control) - `image` (resize for prompt cost control)
@@ -34,7 +34,7 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-llm-local` (only via trait), `kb-tui`, `kb-desktop` - `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-llm-local` (only via trait), `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
@@ -42,28 +42,28 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le
|-------|------|--------| |-------|------|--------|
| image bytes | `&[u8]` | extractor | | image bytes | `&[u8]` | extractor |
| `dyn LanguageModel` (vision-capable) | runtime | injected | | `dyn LanguageModel` (vision-capable) | runtime | injected |
| `kb-config.image.caption` | `{ enabled, max_pixels, prompt_template_version }` | runtime | | `kebab-config.image.caption` | `{ enabled, max_pixels, prompt_template_version }` | runtime |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `ModelCaption` | `kb_core::ModelCaption` | merged into `ImageRefBlock.caption` | | `ModelCaption` | `kebab_core::ModelCaption` | merged into `ImageRefBlock.caption` |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
```rust ```rust
pub fn caption_image( pub fn caption_image(
llm: &dyn kb_core::LanguageModel, llm: &dyn kebab_core::LanguageModel,
image_bytes: &[u8], image_bytes: &[u8],
cfg: &kb_config::Config, cfg: &kebab_config::Config,
) -> anyhow::Result<kb_core::ModelCaption>; ) -> anyhow::Result<kebab_core::ModelCaption>;
pub fn apply_caption( pub fn apply_caption(
llm: &dyn kb_core::LanguageModel, llm: &dyn kebab_core::LanguageModel,
image_bytes: &[u8], image_bytes: &[u8],
block: &mut kb_core::ImageRefBlock, block: &mut kebab_core::ImageRefBlock,
cfg: &kb_config::Config, cfg: &kebab_config::Config,
) -> anyhow::Result<()>; ) -> anyhow::Result<()>;
``` ```
@@ -86,7 +86,7 @@ pub fn apply_caption(
## Storage / wire effects ## Storage / wire effects
- None directly. Caller persists via `kb-store-sqlite`. - None directly. Caller persists via `kebab-store-sqlite`.
## Test plan ## Test plan
@@ -99,12 +99,12 @@ pub fn apply_caption(
| unit | downscale honors `max_pixels` (resulting bytes < some threshold) | fixture large image | | unit | downscale honors `max_pixels` (resulting bytes < some threshold) | fixture large image |
| determinism | identical input + temperature=0 + seed=0 → identical caption (mock) | inline | | determinism | identical input + temperature=0 + seed=0 → identical caption (mock) | inline |
All tests under `cargo test -p kb-parse-image caption` with mock LM only. All tests under `cargo test -p kebab-parse-image caption` with mock LM only.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-parse-image --features caption` passes - [ ] `cargo check -p kebab-parse-image --features caption` passes
- [ ] `cargo test -p kb-parse-image caption` passes - [ ] `cargo test -p kebab-parse-image caption` passes
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] Feature default OFF; only on when user opts in via config - [ ] Feature default OFF; only on when user opts in via config
- [ ] PR links design §3.4 ImageRefBlock.caption, §9.1 - [ ] PR links design §3.4 ImageRefBlock.caption, §9.1

View File

@@ -1,12 +1,12 @@
--- ---
phase: P7 phase: P7
component: kb-parse-pdf (text extractor) component: kebab-parse-pdf (text extractor)
task_id: p7-1 task_id: p7-1
title: "Text PDF extractor → CanonicalDocument with page-level blocks" title: "Text PDF extractor → CanonicalDocument with page-level blocks"
status: planned status: planned
depends_on: [p0-1, p1-6] depends_on: [p0-1, p1-6]
unblocks: [p7-2] unblocks: [p7-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 SourceSpan::Page, §3.4 Block::Paragraph, §9.2 PDF text extraction, §9 versioning] contract_sections: [§3.4 SourceSpan::Page, §3.4 Block::Paragraph, §9.2 PDF text extraction, §9 versioning]
--- ---
@@ -22,8 +22,8 @@ Strict scope: page text + page numbers. Layout reconstruction (multi-column merg
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `pdf-extract = "0.7"` (or current stable) - `pdf-extract = "0.7"` (or current stable)
- `lopdf = "0.32"` for page metadata (count, optional title from /Info) - `lopdf = "0.32"` for page metadata (count, optional title from /Info)
- `serde`, `serde_json` - `serde`, `serde_json`
@@ -33,30 +33,30 @@ Strict scope: page text + page numbers. Layout reconstruction (multi-column merg
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, OCR libraries (OCR fallback is a separate task, not this one) - `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, OCR libraries (OCR fallback is a separate task, not this one)
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `RawAsset` | `kb_core::RawAsset` | `kb-source-fs` | | `RawAsset` | `kebab_core::RawAsset` | `kebab-source-fs` |
| PDF bytes | `&[u8]` | filesystem | | PDF bytes | `&[u8]` | filesystem |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (`pdf-page-v1` chunker in p7-2) | | `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (`pdf-page-v1` chunker in p7-2) |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
```rust ```rust
pub struct PdfTextExtractor; pub struct PdfTextExtractor;
impl kb_core::Extractor for PdfTextExtractor { impl kebab_core::Extractor for PdfTextExtractor {
fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Pdf) } fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Pdf) }
fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("pdf-text-v1".into()) } fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("pdf-text-v1".into()) }
fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kb_core::CanonicalDocument>; fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kebab_core::CanonicalDocument>;
} }
``` ```
@@ -100,12 +100,12 @@ impl kb_core::Extractor for PdfTextExtractor {
| determinism | identical bytes → identical CanonicalDocument JSON across two runs | inline | | determinism | identical bytes → identical CanonicalDocument JSON across two runs | inline |
| snapshot | CanonicalDocument JSON for fixture stable | `fixtures/pdf/three-page-en.pdf` | | snapshot | CanonicalDocument JSON for fixture stable | `fixtures/pdf/three-page-en.pdf` |
All tests under `cargo test -p kb-parse-pdf`. All tests under `cargo test -p kebab-parse-pdf`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-parse-pdf` passes - [ ] `cargo check -p kebab-parse-pdf` passes
- [ ] `cargo test -p kb-parse-pdf` passes - [ ] `cargo test -p kebab-parse-pdf` passes
- [ ] No OCR / LLM code present - [ ] No OCR / LLM code present
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.4 SourceSpan::Page, §9.2 - [ ] PR links design §3.4 SourceSpan::Page, §9.2

View File

@@ -1,12 +1,12 @@
--- ---
phase: P7 phase: P7
component: kb-chunk (pdf-page-v1) component: kebab-chunk (pdf-page-v1)
task_id: p7-2 task_id: p7-2
title: "PDF page-aware chunker (pdf-page-v1)" title: "PDF page-aware chunker (pdf-page-v1)"
status: planned status: planned
depends_on: [p7-1] depends_on: [p7-1]
unblocks: [] unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §0 Q3 citation, §9 versioning] contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §0 Q3 citation, §9 versioning]
--- ---
@@ -22,8 +22,8 @@ Per-medium chunkers must stay tiny and obvious. Page-aware logic is small but it
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `serde`, `serde_json` - `serde`, `serde_json`
- `blake3` (policy_hash) - `blake3` (policy_hash)
- `serde-json-canonicalizer` - `serde-json-canonicalizer`
@@ -31,30 +31,30 @@ Per-medium chunkers must stay tiny and obvious. Page-aware logic is small but it
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf` (consumes `CanonicalDocument` via `kb-core` only), `kb-normalize`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf` (consumes `CanonicalDocument` via `kebab-core` only), `kebab-normalize`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `CanonicalDocument` (produced by `pdf-text-v1`) | `kb_core::CanonicalDocument` | p7-1 | | `CanonicalDocument` (produced by `pdf-text-v1`) | `kebab_core::CanonicalDocument` | p7-1 |
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` | | `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite`, `kb-embed*` | | `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite`, `kebab-embed*` |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
```rust ```rust
pub struct PdfPageV1Chunker; pub struct PdfPageV1Chunker;
impl kb_core::Chunker for PdfPageV1Chunker { impl kebab_core::Chunker for PdfPageV1Chunker {
fn chunker_version(&self) -> kb_core::ChunkerVersion { kb_core::ChunkerVersion("pdf-page-v1".into()) } fn chunker_version(&self) -> kebab_core::ChunkerVersion { kebab_core::ChunkerVersion("pdf-page-v1".into()) }
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String; fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>; fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
} }
``` ```
@@ -62,7 +62,7 @@ impl kb_core::Chunker for PdfPageV1Chunker {
## Behavior contract ## Behavior contract
- Only operates on documents whose blocks all carry `SourceSpan::Page` (i.e., from `kb-parse-pdf`). Other documents → return `anyhow::Error("PdfPageV1Chunker only handles PDF docs")`. - Only operates on documents whose blocks all carry `SourceSpan::Page` (i.e., from `kebab-parse-pdf`). Other documents → return `anyhow::Error("PdfPageV1Chunker only handles PDF docs")`.
- For each page block (1 block per page after p7-1): - For each page block (1 block per page after p7-1):
- If `text.len()` (byte estimate) ≤ `policy.target_tokens * 4` (proxy for tokens) → emit one chunk for the entire page. - If `text.len()` (byte estimate) ≤ `policy.target_tokens * 4` (proxy for tokens) → emit one chunk for the entire page.
- Else → split by paragraphs (split text on `\n\n` or sentence-ending punctuation followed by whitespace) and group adjacent paragraphs until the running byte total approaches `policy.target_tokens * 4`. Apply `policy.overlap_tokens * 4` bytes of trailing overlap into the next chunk's prefix. - Else → split by paragraphs (split text on `\n\n` or sentence-ending punctuation followed by whitespace) and group adjacent paragraphs until the running byte total approaches `policy.target_tokens * 4`. Apply `policy.overlap_tokens * 4` bytes of trailing overlap into the next chunk's prefix.
@@ -91,12 +91,12 @@ impl kb_core::Chunker for PdfPageV1Chunker {
| determinism | same input → same chunk_ids twice | inline | | determinism | same input → same chunk_ids twice | inline |
| snapshot | `Vec<Chunk>` JSON for fixture stable | `fixtures/pdf/three-page-en.pdf` (chunked) | | snapshot | `Vec<Chunk>` JSON for fixture stable | `fixtures/pdf/three-page-en.pdf` (chunked) |
All tests under `cargo test -p kb-chunk pdf`. All tests under `cargo test -p kebab-chunk pdf`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-chunk` passes (existing `md-heading-v1` continues to pass) - [ ] `cargo check -p kebab-chunk` passes (existing `md-heading-v1` continues to pass)
- [ ] `cargo test -p kb-chunk pdf` passes - [ ] `cargo test -p kebab-chunk pdf` passes
- [ ] Snapshot stable across two runs - [ ] Snapshot stable across two runs
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.5, §0 Q3, §9 - [ ] PR links design §3.5, §0 Q3, §9

View File

@@ -1,12 +1,12 @@
--- ---
phase: P8 phase: P8
component: kb-parse-audio (whisper adapter) component: kebab-parse-audio (whisper adapter)
task_id: p8-1 task_id: p8-1
title: "Audio Extractor + Transcriber trait + whisper.cpp adapter" title: "Audio Extractor + Transcriber trait + whisper.cpp adapter"
status: planned status: planned
depends_on: [p0-1, p1-6] depends_on: [p0-1, p1-6]
unblocks: [p8-2] unblocks: [p8-2]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.4 Block::AudioRef + AudioRefBlock, §3.7a Transcript + TranscriptSegment, §9.3 audio policy, §9 versioning] contract_sections: [§3.4 Block::AudioRef + AudioRefBlock, §3.7a Transcript + TranscriptSegment, §9.3 audio policy, §9 versioning]
--- ---
@@ -22,8 +22,8 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `whisper-rs = "0.13"` (or current stable) - `whisper-rs = "0.13"` (or current stable)
- `symphonia = { version = "0.5", features = ["all"] }` — decode `.m4a/.mp3/.wav/.flac/.ogg` to interleaved f32 PCM at the source's native sample rate / channel layout. Symphonia does NOT resample; that is rubato's job. - `symphonia = { version = "0.5", features = ["all"] }` — decode `.m4a/.mp3/.wav/.flac/.ogg` to interleaved f32 PCM at the source's native sample rate / channel layout. Symphonia does NOT resample; that is rubato's job.
- `rubato = "0.15"` — sample-rate conversion to 16 kHz mono f32 (the input shape whisper.cpp expects). Use `rubato::FftFixedIn::new(input_sample_rate, 16_000, frames_per_chunk, sub_chunks, 1 /* channels after downmix */)` for fixed-input streaming; pre-mix multi-channel to mono via simple averaging before the resampler. - `rubato = "0.15"` — sample-rate conversion to 16 kHz mono f32 (the input shape whisper.cpp expects). Use `rubato::FftFixedIn::new(input_sample_rate, 16_000, frames_per_chunk, sub_chunks, 1 /* channels after downmix */)` for fixed-input streaming; pre-mix multi-channel to mono via simple averaging before the resampler.
@@ -34,21 +34,21 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf`, `kb-parse-image`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf`, `kebab-parse-image`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `RawAsset` | `kb_core::RawAsset` | `kb-source-fs` | | `RawAsset` | `kebab_core::RawAsset` | `kebab-source-fs` |
| audio bytes | `&[u8]` | filesystem | | audio bytes | `&[u8]` | filesystem |
| `kb-config.audio` | `{ model_path, language, chunk_seconds, n_threads, gpu }` | runtime | | `kebab-config.audio` | `{ model_path, language, chunk_seconds, n_threads, gpu }` | runtime |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (`audio-segment-v1` chunker in p8-2) | | `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (`audio-segment-v1` chunker in p8-2) |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -56,19 +56,19 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor
pub trait Transcriber: Send + Sync { pub trait Transcriber: Send + Sync {
fn engine(&self) -> &'static str; fn engine(&self) -> &'static str;
fn engine_version(&self) -> String; fn engine_version(&self) -> String;
fn transcribe(&self, pcm_f32_16khz: &[f32], language_hint: Option<&kb_core::Lang>) -> anyhow::Result<kb_core::Transcript>; fn transcribe(&self, pcm_f32_16khz: &[f32], language_hint: Option<&kebab_core::Lang>) -> anyhow::Result<kebab_core::Transcript>;
} }
pub struct WhisperCppTranscriber { /* internal: whisper_rs::WhisperContext */ } pub struct WhisperCppTranscriber { /* internal: whisper_rs::WhisperContext */ }
impl WhisperCppTranscriber { pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>; } impl WhisperCppTranscriber { pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>; }
impl Transcriber for WhisperCppTranscriber { /* per trait */ } impl Transcriber for WhisperCppTranscriber { /* per trait */ }
pub struct AudioExtractor { transcriber: std::sync::Arc<dyn Transcriber> } pub struct AudioExtractor { transcriber: std::sync::Arc<dyn Transcriber> }
impl AudioExtractor { pub fn new(transcriber: std::sync::Arc<dyn Transcriber>) -> Self; } impl AudioExtractor { pub fn new(transcriber: std::sync::Arc<dyn Transcriber>) -> Self; }
impl kb_core::Extractor for AudioExtractor { impl kebab_core::Extractor for AudioExtractor {
fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Audio(_)) } fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Audio(_)) }
fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("audio-whisper-v1".into()) } fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("audio-whisper-v1".into()) }
fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kb_core::CanonicalDocument>; fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kebab_core::CanonicalDocument>;
} }
``` ```
@@ -87,7 +87,7 @@ impl kb_core::Extractor for AudioExtractor {
- `provenance` events: `Discovered`, `Parsed`, `Transcribed`. - `provenance` events: `Discovered`, `Parsed`, `Transcribed`.
- `block_id` per design §4.2 with `block_kind = "audio_ref"`, `heading_path = []`, `ordinal = 0`, `source_span = SourceSpan::Time { start_ms: 0, end_ms: duration_ms }`. - `block_id` per design §4.2 with `block_kind = "audio_ref"`, `heading_path = []`, `ordinal = 0`, `source_span = SourceSpan::Time { start_ms: 0, end_ms: duration_ms }`.
- `WhisperCppTranscriber`: - `WhisperCppTranscriber`:
- Loads model from `config.audio.model_path` (e.g., `~/.local/share/kb/models/whisper/ggml-large-v3.bin`). - Loads model from `config.audio.model_path` (e.g., `~/.local/share/kebab/models/whisper/ggml-large-v3.bin`).
- Runs with `WhisperFullParams::new(SamplingStrategy::Greedy { best_of: 1 })` — deterministic. - Runs with `WhisperFullParams::new(SamplingStrategy::Greedy { best_of: 1 })` — deterministic.
- Streams in chunks of `config.audio.chunk_seconds` (default 30) to bound memory; aggregates segments. - Streams in chunks of `config.audio.chunk_seconds` (default 30) to bound memory; aggregates segments.
- `Transcript.segments` populated with `start_ms`, `end_ms`, `text`, `confidence: Some(p)` from whisper's per-token probabilities (averaged), `speaker: None` (diarization is P+). - `Transcript.segments` populated with `start_ms`, `end_ms`, `text`, `confidence: Some(p)` from whisper's per-token probabilities (averaged), `speaker: None` (diarization is P+).
@@ -115,12 +115,12 @@ impl kb_core::Extractor for AudioExtractor {
| `#[ignore]` integration | 30-second Korean audio → segments_count > 1, language = "ko" | requires `large-v3` model | | `#[ignore]` integration | 30-second Korean audio → segments_count > 1, language = "ko" | requires `large-v3` model |
| snapshot | CanonicalDocument JSON stable for short fixture | `fixtures/audio/hello.wav` | | snapshot | CanonicalDocument JSON stable for short fixture | `fixtures/audio/hello.wav` |
All tests under `cargo test -p kb-parse-audio`. Mark slow/large-model tests `#[ignore]`. All tests under `cargo test -p kebab-parse-audio`. Mark slow/large-model tests `#[ignore]`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-parse-audio` passes - [ ] `cargo check -p kebab-parse-audio` passes
- [ ] `cargo test -p kb-parse-audio` passes (excluding `#[ignore]`) - [ ] `cargo test -p kebab-parse-audio` passes (excluding `#[ignore]`)
- [ ] No imports outside Allowed dependencies (resampler crate may be added — record in PR) - [ ] No imports outside Allowed dependencies (resampler crate may be added — record in PR)
- [ ] First-run model download path documented (NOT performed by code; user responsibility) - [ ] First-run model download path documented (NOT performed by code; user responsibility)
- [ ] PR links design §3.4, §3.7a, §9.3 - [ ] PR links design §3.4, §3.7a, §9.3

View File

@@ -1,12 +1,12 @@
--- ---
phase: P8 phase: P8
component: kb-chunk (audio-segment-v1) component: kebab-chunk (audio-segment-v1)
task_id: p8-2 task_id: p8-2
title: "Audio segment chunker (audio-segment-v1)" title: "Audio segment chunker (audio-segment-v1)"
status: planned status: planned
depends_on: [p8-1] depends_on: [p8-1]
unblocks: [] unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§3.5 Chunk, §3.4 SourceSpan::Time, §4.2 chunk_id recipe, §0 Q3 citation, §9 versioning] contract_sections: [§3.5 Chunk, §3.4 SourceSpan::Time, §4.2 chunk_id recipe, §0 Q3 citation, §9 versioning]
--- ---
@@ -22,8 +22,8 @@ Per-medium chunker. Tiny but versioned — `chunk_id` depends on `chunker_versio
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `serde`, `serde_json` - `serde`, `serde_json`
- `blake3` (policy_hash) - `blake3` (policy_hash)
- `serde-json-canonicalizer` - `serde-json-canonicalizer`
@@ -31,30 +31,30 @@ Per-medium chunker. Tiny but versioned — `chunk_id` depends on `chunker_versio
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio` (consumes via `kb-core` only), `kb-normalize`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` - `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio` (consumes via `kebab-core` only), `kebab-normalize`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `CanonicalDocument` containing one `AudioRefBlock` with `Transcript` | `kb_core::CanonicalDocument` | p8-1 | | `CanonicalDocument` containing one `AudioRefBlock` with `Transcript` | `kebab_core::CanonicalDocument` | p8-1 |
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` | | `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` |
## Outputs ## Outputs
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite`, embedders | | `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite`, embedders |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
```rust ```rust
pub struct AudioSegmentV1Chunker; pub struct AudioSegmentV1Chunker;
impl kb_core::Chunker for AudioSegmentV1Chunker { impl kebab_core::Chunker for AudioSegmentV1Chunker {
fn chunker_version(&self) -> kb_core::ChunkerVersion { kb_core::ChunkerVersion("audio-segment-v1".into()) } fn chunker_version(&self) -> kebab_core::ChunkerVersion { kebab_core::ChunkerVersion("audio-segment-v1".into()) }
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String; fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>; fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
} }
``` ```
@@ -92,12 +92,12 @@ impl kb_core::Chunker for AudioSegmentV1Chunker {
| determinism | same input → same chunk_ids twice | inline | | determinism | same input → same chunk_ids twice | inline |
| snapshot | `Vec<Chunk>` JSON for fixture transcript stable | `fixtures/audio/transcript-1.json` (constructed) | | snapshot | `Vec<Chunk>` JSON for fixture transcript stable | `fixtures/audio/transcript-1.json` (constructed) |
All tests under `cargo test -p kb-chunk audio`. All tests under `cargo test -p kebab-chunk audio`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-chunk` passes (md-heading-v1 + pdf-page-v1 + audio-segment-v1 all coexist) - [ ] `cargo check -p kebab-chunk` passes (md-heading-v1 + pdf-page-v1 + audio-segment-v1 all coexist)
- [ ] `cargo test -p kb-chunk audio` passes - [ ] `cargo test -p kebab-chunk audio` passes
- [ ] Snapshot stable across two runs - [ ] Snapshot stable across two runs
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §3.5, §3.4 SourceSpan::Time, §4.2 - [ ] PR links design §3.5, §3.4 SourceSpan::Time, §4.2

View File

@@ -1,12 +1,12 @@
--- ---
phase: P9 phase: P9
component: kb-tui (library view) component: kebab-tui (library view)
task_id: p9-1 task_id: p9-1
title: "Ratatui library list view + tag filter" title: "Ratatui library list view + tag filter"
status: planned status: planned
depends_on: [p1-6] depends_on: [p1-6]
unblocks: [p9-2, p9-3, p9-4] unblocks: [p9-2, p9-3, p9-4]
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [report §16.2 TUI (also tasks/phase-9-ui.md epic), design §3.7 SearchHit, design §1 UX scenes for shared key bindings] contract_sections: [report §16.2 TUI (also tasks/phase-9-ui.md epic), design §3.7 SearchHit, design §1 UX scenes for shared key bindings]
--- ---
@@ -14,7 +14,7 @@ contract_sections: [report §16.2 TUI (also tasks/phase-9-ui.md epic), design §
## Goal ## Goal
Stand up a Ratatui app skeleton with a "Library" pane: list documents, filter by tag/lang, navigate. Establishes the global app loop, key dispatch, and `kb-app` integration point that the search/ask/inspect panes (p9-2..p9-4) extend. Stand up a Ratatui app skeleton with a "Library" pane: list documents, filter by tag/lang, navigate. Establishes the global app loop, key dispatch, and `kebab-app` integration point that the search/ask/inspect panes (p9-2..p9-4) extend.
## Why now / why this size ## Why now / why this size
@@ -22,9 +22,9 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-app` (facade — the only crate this binary touches besides `kb-core`/`kb-config`) - `kebab-app` (facade — the only crate this binary touches besides `kebab-core`/`kebab-config`)
- `ratatui = "0.28"` - `ratatui = "0.28"`
- `crossterm` - `crossterm`
- `tracing` - `tracing`
@@ -32,15 +32,15 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (UI must go through `kb-app` only — this is the design §8 boundary) - `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (UI must go through `kebab-app` only — this is the design §8 boundary)
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `kb-app::list_docs(filter)` | facade call | runtime | | `kebab-app::list_docs(filter)` | facade call | runtime |
| keyboard events | `crossterm` | terminal | | keyboard events | `crossterm` | terminal |
| `kb-config::Config` | runtime | env / file | | `kebab-config::Config` | runtime | env / file |
## Outputs ## Outputs
@@ -57,7 +57,7 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ
// state in WITHOUT modifying the App struct definition. This avoids merge // state in WITHOUT modifying the App struct definition. This avoids merge
// conflicts when p9-2/3/4 land in parallel; only p9-1 ever changes `App`. // conflicts when p9-2/3/4 land in parallel; only p9-1 ever changes `App`.
pub struct App { pub struct App {
pub config: kb_config::Config, pub config: kebab_config::Config,
pub focus: Pane, pub focus: Pane,
pub library: LibraryState, // owned by p9-1 pub library: LibraryState, // owned by p9-1
pub search: Option<SearchState>, // populated by p9-2 (None until that crate links in) pub search: Option<SearchState>, // populated by p9-2 (None until that crate links in)
@@ -73,7 +73,7 @@ pub struct AskState; // body filled by p9-3
pub struct InspectState; // body filled by p9-4 pub struct InspectState; // body filled by p9-4
impl App { impl App {
pub fn new(config: kb_config::Config) -> anyhow::Result<Self>; pub fn new(config: kebab_config::Config) -> anyhow::Result<Self>;
pub fn run(&mut self) -> anyhow::Result<()>; // blocking loop until quit pub fn run(&mut self) -> anyhow::Result<()>; // blocking loop until quit
} }
@@ -102,12 +102,12 @@ pub enum KeyOutcome { Continue, Quit, SwitchPane(Pane), Refresh }
- `Enter` → switch to Inspect pane (p9-4) on selected doc - `Enter` → switch to Inspect pane (p9-4) on selected doc
- `q` or `Esc` → quit - `q` or `Esc` → quit
- All facade calls run on the main thread (no async). For long calls, render a "loading…" state and call from a worker thread; bridge via `mpsc::channel` (this task may keep things synchronous and accept brief UI hangs for v1). - All facade calls run on the main thread (no async). For long calls, render a "loading…" state and call from a worker thread; bridge via `mpsc::channel` (this task may keep things synchronous and accept brief UI hangs for v1).
- Logging: `tracing` initialized to a file under `~/.local/state/kb/logs/`; never to stdout/stderr (so the TUI is not corrupted). - Logging: `tracing` initialized to a file under `~/.local/state/kebab/logs/`; never to stdout/stderr (so the TUI is not corrupted).
- Error rendering: a popup overlay shows `error: {msg}\nhint: {hint}` from `anyhow::Error` chain; press any key to dismiss. - Error rendering: a popup overlay shows `error: {msg}\nhint: {hint}` from `anyhow::Error` chain; press any key to dismiss.
## Storage / wire effects ## Storage / wire effects
- Reads: `kb-app::list_docs` only. - Reads: `kebab-app::list_docs` only.
- Writes: none. - Writes: none.
## Test plan ## Test plan
@@ -118,16 +118,16 @@ pub enum KeyOutcome { Continue, Quit, SwitchPane(Pane), Refresh }
| unit | filter `f` opens edit overlay; `Enter` triggers refresh | inline | | unit | filter `f` opens edit overlay; `Enter` triggers refresh | inline |
| snapshot | rendered library with 3 docs + filter open produces stable frame buffer (use `ratatui::backend::TestBackend`) | inline | | snapshot | rendered library with 3 docs + filter open produces stable frame buffer (use `ratatui::backend::TestBackend`) | inline |
| unit | error popup renders without panic on injected `anyhow::Error` | inline | | unit | error popup renders without panic on injected `anyhow::Error` | inline |
| integration | mocked `kb-app::list_docs` returning N docs renders all rows | inline | | integration | mocked `kebab-app::list_docs` returning N docs renders all rows | inline |
All tests under `cargo test -p kb-tui library`. All tests under `cargo test -p kebab-tui library`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-tui` passes - [ ] `cargo check -p kebab-tui` passes
- [ ] `cargo test -p kb-tui library` passes - [ ] `cargo test -p kebab-tui library` passes
- [ ] No imports outside `kb-core`, `kb-config`, `kb-app` - [ ] No imports outside `kebab-core`, `kebab-config`, `kebab-app`
- [ ] `kb tui` (or `kb` if TUI is the default) launches and shows Library on a real terminal (manual smoke) - [ ] `kebab tui` (or `kebab` if TUI is the default) launches and shows Library on a real terminal (manual smoke)
- [ ] PR links design §8 module boundary, report §16.2 (TUI epic) - [ ] PR links design §8 module boundary, report §16.2 (TUI epic)
## Out of scope ## Out of scope

View File

@@ -1,12 +1,12 @@
--- ---
phase: P9 phase: P9
component: kb-tui (search pane) component: kebab-tui (search pane)
task_id: p9-2 task_id: p9-2
title: "TUI Search pane: input + result list + preview + editor jump" title: "TUI Search pane: input + result list + preview + editor jump"
status: planned status: planned
depends_on: [p2-2, p3-4, p9-1] depends_on: [p2-2, p3-4, p9-1]
unblocks: [] unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§1.5/1.6 search output, §3.7 SearchHit, §0 Q3 citation] contract_sections: [§1.5/1.6 search output, §3.7 SearchHit, §0 Q3 citation]
--- ---
@@ -14,7 +14,7 @@ contract_sections: [§1.5/1.6 search output, §3.7 SearchHit, §0 Q3 citation]
## Goal ## Goal
Add a Search pane to the TUI that drives `kb-app::search`, renders dense results (rank+score / path#frag / heading / snippet), and supports `g` (editor jump to citation) for the selected hit. Add a Search pane to the TUI that drives `kebab-app::search`, renders dense results (rank+score / path#frag / heading / snippet), and supports `g` (editor jump to citation) for the selected hit.
## Why now / why this size ## Why now / why this size
@@ -22,25 +22,25 @@ Search is the most-used surface. Confining it to one pane leverages the App skel
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-app` - `kebab-app`
- `kb-tui` (extends p9-1) - `kebab-tui` (extends p9-1)
- `ratatui`, `crossterm` - `ratatui`, `crossterm`
- `tracing` - `tracing`
- `thiserror` - `thiserror`
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-desktop` - `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `kb-app::search(query)` | facade | runtime | | `kebab-app::search(query)` | facade | runtime |
| keyboard events | `crossterm` | terminal | | keyboard events | `crossterm` | terminal |
| selected hit's citation | `kb_core::Citation` | App state | | selected hit's citation | `kebab_core::Citation` | App state |
## Outputs ## Outputs
@@ -54,16 +54,16 @@ Search is the most-used surface. Confining it to one pane leverages the App skel
```rust ```rust
pub fn render_search<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ratatui::layout::Rect, state: &App); pub fn render_search<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ratatui::layout::Rect, state: &App);
pub fn handle_key_search(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome; pub fn handle_key_search(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome;
pub fn jump_to_citation(citation: &kb_core::Citation, editor_env: &str /* $EDITOR */) -> anyhow::Result<()>; pub fn jump_to_citation(citation: &kebab_core::Citation, editor_env: &str /* $EDITOR */) -> anyhow::Result<()>;
``` ```
This task fills the body of `kb_tui::SearchState` (forward-declared in p9-1). The `App` struct itself is NOT edited — only `SearchState` gets fields: This task fills the body of `kebab_tui::SearchState` (forward-declared in p9-1). The `App` struct itself is NOT edited — only `SearchState` gets fields:
```rust ```rust
pub struct SearchState { pub struct SearchState {
pub input: String, pub input: String,
pub mode: kb_core::SearchMode, pub mode: kebab_core::SearchMode,
pub hits: Vec<kb_core::SearchHit>, pub hits: Vec<kebab_core::SearchHit>,
pub selected_hit: usize, pub selected_hit: usize,
pub last_query_at: Option<time::OffsetDateTime>, // debounce timer pub last_query_at: Option<time::OffsetDateTime>, // debounce timer
} }
@@ -73,7 +73,7 @@ The Library pane's keypress handler (in p9-1) sets `app.search = Some(SearchStat
## Behavior contract ## Behavior contract
- Layout: top input bar (search query + mode badge `[hybrid|lexical|vector]`), middle result list (one hit per 4 lines per design §1.5 dense format), bottom preview pane (full chunk text fetched lazily via `kb-app::inspect_chunk`). - Layout: top input bar (search query + mode badge `[hybrid|lexical|vector]`), middle result list (one hit per 4 lines per design §1.5 dense format), bottom preview pane (full chunk text fetched lazily via `kebab-app::inspect_chunk`).
- Key bindings (Search pane): - Key bindings (Search pane):
- typing → updates `search_input`; debounced (200 ms) re-search - typing → updates `search_input`; debounced (200 ms) re-search
- `Tab` → cycles `search_mode` Lexical → Vector → Hybrid → Lexical - `Tab` → cycles `search_mode` Lexical → Vector → Hybrid → Lexical
@@ -104,14 +104,14 @@ The Library pane's keypress handler (in p9-1) sets `app.search = Some(SearchStat
| unit | `j` / `k` move selection within bounds | inline | | unit | `j` / `k` move selection within bounds | inline |
| unit | `jump_to_citation` for `Line` builds `+<line> <path>` command (assert via mocked Command runner) | inline | | unit | `jump_to_citation` for `Line` builds `+<line> <path>` command (assert via mocked Command runner) | inline |
| snapshot | rendered Search pane with 3 hits + preview stable | TestBackend | | snapshot | rendered Search pane with 3 hits + preview stable | TestBackend |
| integration | mocked `kb-app::search` returning fixture hits drives render | inline | | integration | mocked `kebab-app::search` returning fixture hits drives render | inline |
All tests under `cargo test -p kb-tui search`. All tests under `cargo test -p kebab-tui search`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-tui` passes - [ ] `cargo check -p kebab-tui` passes
- [ ] `cargo test -p kb-tui search` passes - [ ] `cargo test -p kebab-tui search` passes
- [ ] `g` keybinding launches `$EDITOR` with correct `+<line>` argument (manual smoke against vim) - [ ] `g` keybinding launches `$EDITOR` with correct `+<line>` argument (manual smoke against vim)
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] PR links design §1.5/1.6, §3.7 - [ ] PR links design §1.5/1.6, §3.7
@@ -125,5 +125,5 @@ All tests under `cargo test -p kb-tui search`.
## Risks / notes ## Risks / notes
- Suspending and restoring crossterm raw mode around the editor spawn is finicky; code defensively (RAII guard). - Suspending and restoring crossterm raw mode around the editor spawn is finicky; code defensively (RAII guard).
- Different editors take different jump syntaxes. Provide an env override `KB_EDITOR_JUMP_FORMAT="vim"` for users on exotic editors. - Different editors take different jump syntaxes. Provide an env override `KEBAB_EDITOR_JUMP_FORMAT="vim"` for users on exotic editors.
- Long snippet text wrap: clamp to viewport width and ellipsize per design §1.5 (`…` already in dense template). - Long snippet text wrap: clamp to viewport width and ellipsize per design §1.5 (`…` already in dense template).

View File

@@ -1,12 +1,12 @@
--- ---
phase: P9 phase: P9
component: kb-tui (ask pane) component: kebab-tui (ask pane)
task_id: p9-3 task_id: p9-3
title: "TUI Ask pane: streaming answer + citation links + --explain toggle" title: "TUI Ask pane: streaming answer + citation links + --explain toggle"
status: planned status: planned
depends_on: [p4-3, p9-1] depends_on: [p4-3, p9-1]
unblocks: [] unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§1.11.4 ask scenes, §2.3 Answer wire, §3.8 Answer] contract_sections: [§1.11.4 ask scenes, §2.3 Answer wire, §3.8 Answer]
--- ---
@@ -14,7 +14,7 @@ contract_sections: [§1.11.4 ask scenes, §2.3 Answer wire, §3.8 Answer]
## Goal ## Goal
Add an Ask pane that calls `kb-app::ask`, streams tokens into the answer area in real time, renders citation footnotes (default mode A), and toggles to `--explain` (mode B + retrieval trace) with a key. Add an Ask pane that calls `kebab-app::ask`, streams tokens into the answer area in real time, renders citation footnotes (default mode A), and toggles to `--explain` (mode B + retrieval trace) with a key.
## Why now / why this size ## Why now / why this size
@@ -22,23 +22,23 @@ Streaming UI is the only TUI piece that meaningfully differs from search/inspect
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-app` - `kebab-app`
- `kb-tui` (extends p9-1) - `kebab-tui` (extends p9-1)
- `ratatui`, `crossterm` - `ratatui`, `crossterm`
- `tracing` - `tracing`
- `thiserror` - `thiserror`
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (only via `kb-app`), `kb-desktop` - `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (only via `kebab-app`), `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `kb-app::ask(query, AskOpts)` | facade | runtime | | `kebab-app::ask(query, AskOpts)` | facade | runtime |
| keyboard events | `crossterm` | terminal | | keyboard events | `crossterm` | terminal |
## Outputs ## Outputs
@@ -46,7 +46,7 @@ Streaming UI is the only TUI piece that meaningfully differs from search/inspect
| output | type | downstream | | output | type | downstream |
|--------|------|------------| |--------|------|------------|
| Ratatui Ask pane render | terminal | user | | Ratatui Ask pane render | terminal | user |
| `kb-app::ask` invocation with streaming closure | facade | RAG pipeline | | `kebab-app::ask` invocation with streaming closure | facade | RAG pipeline |
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
@@ -55,7 +55,7 @@ pub fn render_ask<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ra
pub fn handle_key_ask(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome; pub fn handle_key_ask(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome;
``` ```
This task fills the body of `kb_tui::AskState` (forward-declared in p9-1). `App` is NOT edited — only `AskState` gets fields: This task fills the body of `kebab_tui::AskState` (forward-declared in p9-1). `App` is NOT edited — only `AskState` gets fields:
```rust ```rust
pub struct AskState { pub struct AskState {
@@ -63,8 +63,8 @@ pub struct AskState {
pub explain: bool, pub explain: bool,
pub streaming: bool, pub streaming: bool,
pub partial: String, pub partial: String,
pub answer: Option<kb_core::Answer>, pub answer: Option<kebab_core::Answer>,
pub thread: Option<std::thread::JoinHandle<anyhow::Result<kb_core::Answer>>>, pub thread: Option<std::thread::JoinHandle<anyhow::Result<kebab_core::Answer>>>,
pub rx: Option<std::sync::mpsc::Receiver<String>>, pub rx: Option<std::sync::mpsc::Receiver<String>>,
} }
``` ```
@@ -74,7 +74,7 @@ pub struct AskState {
## Behavior contract ## Behavior contract
- Layout: top input bar (`?` prompt, query text), middle answer area (rendered Markdown-light: paragraphs + inline `[N]` markers), bottom-right citations panel (numbered list of citations with `path#fragment` and section label), bottom-left status (`grounded ✓/✗ model prompt_v k chunks`). - Layout: top input bar (`?` prompt, query text), middle answer area (rendered Markdown-light: paragraphs + inline `[N]` markers), bottom-right citations panel (numbered list of citations with `path#fragment` and section label), bottom-left status (`grounded ✓/✗ model prompt_v k chunks`).
- Submission: `Enter` triggers a worker thread that calls `kb-app::ask` with `AskOpts.stream_sink: Some(tx)` (`tx: mpsc::Sender<String>`). The thread holds the `tx`, the TUI holds the matching `rx` (set on `AskState.rx`). On each render frame the TUI drains `rx.try_iter()` into `state.partial`, no blocking. - Submission: `Enter` triggers a worker thread that calls `kebab-app::ask` with `AskOpts.stream_sink: Some(tx)` (`tx: mpsc::Sender<String>`). The thread holds the `tx`, the TUI holds the matching `rx` (set on `AskState.rx`). On each render frame the TUI drains `rx.try_iter()` into `state.partial`, no blocking.
- Streaming: while `ask_streaming = true`, the Answer area shows `ask_partial` and a small "▍" cursor. When the worker finishes, `ask_answer` is populated and the citations panel switches to the final list. - Streaming: while `ask_streaming = true`, the Answer area shows `ask_partial` and a small "▍" cursor. When the worker finishes, `ask_answer` is populated and the citations panel switches to the final list.
- Refusal rendering: - Refusal rendering:
- `grounded = false` and `refusal_reason = ScoreGate` → render the answer (which is the human-friendly "근거 부족…" message), citations show "가까운 후보". - `grounded = false` and `refusal_reason = ScoreGate` → render the answer (which is the human-friendly "근거 부족…" message), citations show "가까운 후보".
@@ -85,12 +85,12 @@ pub struct AskState {
- `e` → toggle `ask_explain`; resubmit on next `Enter`. While explain ON, citations panel is replaced by the per-claim breakdown (mode B in design §1.2) and a footer shows the retrieval trace summary. - `e` → toggle `ask_explain`; resubmit on next `Enter`. While explain ON, citations panel is replaced by the per-claim breakdown (mode B in design §1.2) and a footer shows the retrieval trace summary.
- `Esc` → switch back to Library pane (cancellation of an in-flight ask is best-effort: the worker thread continues but its final answer is dropped). - `Esc` → switch back to Library pane (cancellation of an in-flight ask is best-effort: the worker thread continues but its final answer is dropped).
- `j` / `k` → scroll the answer area when oversized. - `j` / `k` → scroll the answer area when oversized.
- All facade calls stay within `kb-app::ask` — never reach into `kb-rag` directly. - All facade calls stay within `kebab-app::ask` — never reach into `kebab-rag` directly.
- Errors render as a popup overlay; do not crash the pane. - Errors render as a popup overlay; do not crash the pane.
## Storage / wire effects ## Storage / wire effects
- Reads/writes via `kb-app::ask` which itself writes the `answers` row in `kb.sqlite`. The pane has no direct DB access. - Reads/writes via `kebab-app::ask` which itself writes the `answers` row in `kebab.sqlite`. The pane has no direct DB access.
## Test plan ## Test plan
@@ -102,14 +102,14 @@ pub struct AskState {
| unit | refusal answer renders without citations panel index errors | inline | | unit | refusal answer renders without citations panel index errors | inline |
| snapshot | rendered Ask pane mid-stream is stable | TestBackend | | snapshot | rendered Ask pane mid-stream is stable | TestBackend |
| snapshot | rendered Ask pane after finished grounded answer is stable | TestBackend | | snapshot | rendered Ask pane after finished grounded answer is stable | TestBackend |
| integration | mocked `kb-app::ask` returning a canned `Answer` populates final state correctly | inline | | integration | mocked `kebab-app::ask` returning a canned `Answer` populates final state correctly | inline |
All tests under `cargo test -p kb-tui ask`. All tests under `cargo test -p kebab-tui ask`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-tui` passes - [ ] `cargo check -p kebab-tui` passes
- [ ] `cargo test -p kb-tui ask` passes - [ ] `cargo test -p kebab-tui ask` passes
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] Manual smoke: stream tokens visible character-by-character against a real Ollama (or `MockLanguageModel`) - [ ] Manual smoke: stream tokens visible character-by-character against a real Ollama (or `MockLanguageModel`)
- [ ] PR links design §1.11.4, §2.3 - [ ] PR links design §1.11.4, §2.3

View File

@@ -1,12 +1,12 @@
--- ---
phase: P9 phase: P9
component: kb-tui (inspect pane) component: kebab-tui (inspect pane)
task_id: p9-4 task_id: p9-4
title: "TUI Inspect pane: document & chunk detail render" title: "TUI Inspect pane: document & chunk detail render"
status: planned status: planned
depends_on: [p1-6, p9-1] depends_on: [p1-6, p9-1]
unblocks: [] unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [§1 inspect output, §3.5 Chunk, §2.5 DocSummary, §2.6 ChunkInspection] contract_sections: [§1 inspect output, §3.5 Chunk, §2.5 DocSummary, §2.6 ChunkInspection]
--- ---
@@ -22,24 +22,24 @@ Inspect is read-only and has no external interactions; smallest possible pane. U
## Allowed dependencies ## Allowed dependencies
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-app` - `kebab-app`
- `kb-tui` (extends p9-1) - `kebab-tui` (extends p9-1)
- `ratatui`, `crossterm` - `ratatui`, `crossterm`
- `tracing` - `tracing`
- `thiserror` - `thiserror`
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (only via `kb-app`), `kb-desktop` - `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (only via `kebab-app`), `kebab-desktop`
## Inputs ## Inputs
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| `kb-app::inspect_doc(id)` | facade | runtime | | `kebab-app::inspect_doc(id)` | facade | runtime |
| `kb-app::inspect_chunk(id)` | facade | runtime | | `kebab-app::inspect_chunk(id)` | facade | runtime |
| keyboard events | `crossterm` | terminal | | keyboard events | `crossterm` | terminal |
## Outputs ## Outputs
@@ -51,19 +51,19 @@ Inspect is read-only and has no external interactions; smallest possible pane. U
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
```rust ```rust
pub enum InspectTarget { Doc(kb_core::DocumentId), Chunk(kb_core::ChunkId) } pub enum InspectTarget { Doc(kebab_core::DocumentId), Chunk(kebab_core::ChunkId) }
pub fn render_inspect<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ratatui::layout::Rect, state: &App); pub fn render_inspect<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ratatui::layout::Rect, state: &App);
pub fn handle_key_inspect(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome; pub fn handle_key_inspect(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome;
``` ```
This task fills the body of `kb_tui::InspectState` (forward-declared in p9-1). `App` is NOT edited. This task fills the body of `kebab_tui::InspectState` (forward-declared in p9-1). `App` is NOT edited.
```rust ```rust
pub struct InspectState { pub struct InspectState {
pub target: Option<InspectTarget>, pub target: Option<InspectTarget>,
pub doc: Option<kb_core::CanonicalDocument>, pub doc: Option<kebab_core::CanonicalDocument>,
pub chunk: Option<kb_core::Chunk>, pub chunk: Option<kebab_core::Chunk>,
pub collapsed: std::collections::HashSet<&'static str>, pub collapsed: std::collections::HashSet<&'static str>,
pub scroll: u16, pub scroll: u16,
} }
@@ -89,7 +89,7 @@ pub struct InspectState {
- `c` → collapse / expand currently focused section (focus is implicit by current scroll position; v1 may simplify by toggling all sections) - `c` → collapse / expand currently focused section (focus is implicit by current scroll position; v1 may simplify by toggling all sections)
- `Esc` → return to previous pane (Library or Search) - `Esc` → return to previous pane (Library or Search)
- `Enter` → no-op (Inspect is terminal — no editor jump here; users use Search pane for jump) - `Enter` → no-op (Inspect is terminal — no editor jump here; users use Search pane for jump)
- Loading: while `kb-app::inspect_doc` or `inspect_chunk` runs, show "loading…". On error, popup with hint. - Loading: while `kebab-app::inspect_doc` or `inspect_chunk` runs, show "loading…". On error, popup with hint.
- Renders must conform to wire schemas `doc_summary.v1` (subset for header) and `chunk_inspection.v1`. - Renders must conform to wire schemas `doc_summary.v1` (subset for header) and `chunk_inspection.v1`.
## Storage / wire effects ## Storage / wire effects
@@ -100,18 +100,18 @@ pub struct InspectState {
| kind | description | fixture / data | | kind | description | fixture / data |
|------|-------------|----------------| |------|-------------|----------------|
| unit | switching to InspectTarget::Doc triggers `kb-app::inspect_doc` once | inline mock | | unit | switching to InspectTarget::Doc triggers `kebab-app::inspect_doc` once | inline mock |
| unit | scroll bounded by content height | inline | | unit | scroll bounded by content height | inline |
| unit | collapse toggle via `c` flips state | inline | | unit | collapse toggle via `c` flips state | inline |
| snapshot | doc-view rendered for fixture stable | TestBackend + fixture | | snapshot | doc-view rendered for fixture stable | TestBackend + fixture |
| snapshot | chunk-view rendered for fixture stable | TestBackend + fixture | | snapshot | chunk-view rendered for fixture stable | TestBackend + fixture |
All tests under `cargo test -p kb-tui inspect`. All tests under `cargo test -p kebab-tui inspect`.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-tui` passes - [ ] `cargo check -p kebab-tui` passes
- [ ] `cargo test -p kb-tui inspect` passes - [ ] `cargo test -p kebab-tui inspect` passes
- [ ] No imports outside Allowed dependencies - [ ] No imports outside Allowed dependencies
- [ ] Manual smoke: inspect a doc with multiple chunks, scroll, return to library - [ ] Manual smoke: inspect a doc with multiple chunks, scroll, return to library
- [ ] PR links design §3.5, §2.5, §2.6 - [ ] PR links design §3.5, §2.5, §2.6

View File

@@ -1,12 +1,12 @@
--- ---
phase: P9 phase: P9
component: kb-desktop (Tauri) component: kebab-desktop (Tauri)
task_id: p9-5 task_id: p9-5
title: "Tauri desktop app: backend commands wrapping kb-app + multimodal source viewer" title: "Tauri desktop app: backend commands wrapping kebab-app + multimodal source viewer"
status: planned status: planned
depends_on: [p9-1, p9-2, p9-3, p9-4] depends_on: [p9-1, p9-2, p9-3, p9-4]
unblocks: [] unblocks: []
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
contract_sections: [report §16.3 desktop (also tasks/phase-9-ui.md epic), design §1 ask/search scenes, design §2 wire schemas v1, design §8 module boundaries] contract_sections: [report §16.3 desktop (also tasks/phase-9-ui.md epic), design §1 ask/search scenes, design §2 wire schemas v1, design §8 module boundaries]
--- ---
@@ -14,23 +14,23 @@ contract_sections: [report §16.3 desktop (also tasks/phase-9-ui.md epic), desig
## Goal ## Goal
Stand up a Tauri 2.x app (`kb-desktop` crate as backend, `kb-desktop-frontend/` as web assets) whose Tauri commands wrap `kb-app` 1:1. The frontend renders multimodal source viewers (Markdown render, PDF page viewer, image viewer with region overlay, audio player with seek). Citation clicks route to the appropriate viewer. Stand up a Tauri 2.x app (`kebab-desktop` crate as backend, `kebab-desktop-frontend/` as web assets) whose Tauri commands wrap `kebab-app` 1:1. The frontend renders multimodal source viewers (Markdown render, PDF page viewer, image viewer with region overlay, audio player with seek). Citation clicks route to the appropriate viewer.
## Why now / why this size ## Why now / why this size
Last task. Combines all backend phases into a single user-facing surface. Strict policy: backend commands are thin wrappers over `kb-app`; no new business logic. Last task. Combines all backend phases into a single user-facing surface. Strict policy: backend commands are thin wrappers over `kebab-app`; no new business logic.
## Allowed dependencies ## Allowed dependencies
- backend (`kb-desktop`): - backend (`kebab-desktop`):
- `kb-core` - `kebab-core`
- `kb-config` - `kebab-config`
- `kb-app` - `kebab-app`
- `tauri = "2"` + `tauri-build` - `tauri = "2"` + `tauri-build`
- `serde`, `serde_json` - `serde`, `serde_json`
- `tracing` - `tracing`
- `thiserror` - `thiserror`
- frontend (`kb-desktop-frontend/`): vanilla TypeScript + Vite (default; user may swap to Svelte/Solid in a follow-up). - frontend (`kebab-desktop-frontend/`): vanilla TypeScript + Vite (default; user may swap to Svelte/Solid in a follow-up).
- PDF rendering: `pdfjs-dist` - PDF rendering: `pdfjs-dist`
- Markdown rendering: `marked` + `dompurify` - Markdown rendering: `marked` + `dompurify`
- Audio: HTML `<audio>` with custom segment overlay - Audio: HTML `<audio>` with custom segment overlay
@@ -38,7 +38,7 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
## Forbidden dependencies ## Forbidden dependencies
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (UI must go through `kb-app` only — design §8). - `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (UI must go through `kebab-app` only — design §8).
- **No native PDF render backend** (no `pdfium`, no `mupdf`, no `poppler`). PDF rendering lives entirely in the frontend (`pdfjs-dist`). Adding any of these would (a) bloat the bundle 100+ MB, (b) require frozen-design amendment, and (c) double the path-containment surface. - **No native PDF render backend** (no `pdfium`, no `mupdf`, no `poppler`). PDF rendering lives entirely in the frontend (`pdfjs-dist`). Adding any of these would (a) bloat the bundle 100+ MB, (b) require frozen-design amendment, and (c) double the path-containment surface.
## Inputs ## Inputs
@@ -46,7 +46,7 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
| input | type | source | | input | type | source |
|-------|------|--------| |-------|------|--------|
| Tauri commands | invoked from frontend | user clicks | | Tauri commands | invoked from frontend | user clicks |
| `kb-config::Config` | runtime | env / file | | `kebab-config::Config` | runtime | env / file |
| user file system (read-only) | for source viewers | OS | | user file system (read-only) | for source viewers | OS |
## Outputs ## Outputs
@@ -59,7 +59,7 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
## Public surface (signatures only — no new types) ## Public surface (signatures only — no new types)
```rust ```rust
// Tauri command surface (one per kb-app facade method, plus source viewers) // Tauri command surface (one per kebab-app facade method, plus source viewers)
#[tauri::command] fn cmd_init(force: bool) -> Result<()>; #[tauri::command] fn cmd_init(force: bool) -> Result<()>;
#[tauri::command] fn cmd_ingest(scope_json: serde_json::Value, summary_only: bool) -> Result<serde_json::Value /* IngestReportWireV1 */>; #[tauri::command] fn cmd_ingest(scope_json: serde_json::Value, summary_only: bool) -> Result<serde_json::Value /* IngestReportWireV1 */>;
#[tauri::command] fn cmd_list_docs(filter_json: serde_json::Value) -> Result<Vec<serde_json::Value /* DocSummaryWireV1 */>>; #[tauri::command] fn cmd_list_docs(filter_json: serde_json::Value) -> Result<Vec<serde_json::Value /* DocSummaryWireV1 */>>;
@@ -75,11 +75,11 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
#[tauri::command] fn cmd_read_file_bytes(path: String) -> Result<Vec<u8>>; // raw bytes for PDF / image / audio #[tauri::command] fn cmd_read_file_bytes(path: String) -> Result<Vec<u8>>; // raw bytes for PDF / image / audio
``` ```
(All commands convert internal `kb-core` types to wire-schema-v1 JSON before returning.) (All commands convert internal `kebab-core` types to wire-schema-v1 JSON before returning.)
## Behavior contract ## Behavior contract
- Backend bootstraps `tracing` to a file under `~/.local/state/kb/logs/` and a Tauri plugin loads/saves window state. - Backend bootstraps `tracing` to a file under `~/.local/state/kebab/logs/` and a Tauri plugin loads/saves window state.
- Every Tauri command performs **path containment** for source viewers: resolves `path` against `config.workspace.root`, rejects (`anyhow::Error`) any path outside. - Every Tauri command performs **path containment** for source viewers: resolves `path` against `config.workspace.root`, rejects (`anyhow::Error`) any path outside.
- Layout (frontend): left = Library + Search + Ask tabs; right = Source viewer keyed by current citation. - Layout (frontend): left = Library + Search + Ask tabs; right = Source viewer keyed by current citation.
- Citation routing in the frontend (clicks on `[#N]` markers or hit rows). All rendering is frontend-side; backend serves raw bytes only. - Citation routing in the frontend (clicks on `[#N]` markers or hit rows). All rendering is frontend-side; backend serves raw bytes only.
@@ -88,23 +88,23 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
- `Citation::Region { path, x, y, w, h }``cmd_read_file_bytes(path)` → blob URL → `<img>` + absolute-positioned overlay at `(x, y, w, h)`. - `Citation::Region { path, x, y, w, h }``cmd_read_file_bytes(path)` → blob URL → `<img>` + absolute-positioned overlay at `(x, y, w, h)`.
- `Citation::Caption { path, model }` → same as Region but no overlay; caption banner shows `model`. - `Citation::Caption { path, model }` → same as Region but no overlay; caption banner shows `model`.
- `Citation::Time { path, start_ms, end_ms }``cmd_read_file_bytes(path)` → blob URL → `<audio src=...>` seeked to `start_ms / 1000`, with a timeline marker spanning `[start_ms, end_ms]`. - `Citation::Time { path, start_ms, end_ms }``cmd_read_file_bytes(path)` → blob URL → `<audio src=...>` seeked to `start_ms / 1000`, with a timeline marker spanning `[start_ms, end_ms]`.
- Streaming `kb ask`: backend command `cmd_ask` returns the buffered Answer (per §0 Q5: pipe/JSON mode buffers). For real-time streaming in the desktop, expose a separate `cmd_ask_stream` event channel via Tauri's `Window::emit("kb://ask-token", payload)`. (Implementation can be deferred to a follow-up; v1 of the desktop accepts buffered.) - Streaming `kebab ask`: backend command `cmd_ask` returns the buffered Answer (per §0 Q5: pipe/JSON mode buffers). For real-time streaming in the desktop, expose a separate `cmd_ask_stream` event channel via Tauri's `Window::emit("kebab://ask-token", payload)`. (Implementation can be deferred to a follow-up; v1 of the desktop accepts buffered.)
- All backend errors mapped to a `String` message with structure `{ "error": msg, "hint": Option<msg> }`. - All backend errors mapped to a `String` message with structure `{ "error": msg, "hint": Option<msg> }`.
- Frontend respects light/dark per OS theme (Tauri supplies the API). - Frontend respects light/dark per OS theme (Tauri supplies the API).
- No telemetry. No automatic update channel for v1 (manual download). - No telemetry. No automatic update channel for v1 (manual download).
## Storage / wire effects ## Storage / wire effects
- Reads via `kb-app` (which reads/writes via SQLite + LanceDB). - Reads via `kebab-app` (which reads/writes via SQLite + LanceDB).
- Reads workspace files directly for source viewers (path-contained). - Reads workspace files directly for source viewers (path-contained).
- Writes nothing outside what `kb-app` writes. - Writes nothing outside what `kebab-app` writes.
- Wire JSON between backend and frontend uses schema v1 strictly. The frontend MUST validate `schema_version` strings on every IPC return and warn (or upgrade-gate) when `v1 != current`. - Wire JSON between backend and frontend uses schema v1 strictly. The frontend MUST validate `schema_version` strings on every IPC return and warn (or upgrade-gate) when `v1 != current`.
## Test plan ## Test plan
| kind | description | fixture / data | | kind | description | fixture / data |
|------|-------------|----------------| |------|-------------|----------------|
| unit (backend) | each command wraps the corresponding `kb-app` function and serializes via wire schema | inline mocks | | unit (backend) | each command wraps the corresponding `kebab-app` function and serializes via wire schema | inline mocks |
| unit (backend) | `cmd_read_markdown` rejects paths outside workspace | tmp config | | unit (backend) | `cmd_read_markdown` rejects paths outside workspace | tmp config |
| unit (backend) | `cmd_read_file_bytes` rejects paths outside workspace incl. `..`, absolute path, symlink-out | tmp config + traversal vectors | | unit (backend) | `cmd_read_file_bytes` rejects paths outside workspace incl. `..`, absolute path, symlink-out | tmp config + traversal vectors |
| unit (backend) | `cmd_read_file_bytes` returns identical bytes to `std::fs::read` for an in-workspace file | tmp config | | unit (backend) | `cmd_read_file_bytes` returns identical bytes to `std::fs::read` for an in-workspace file | tmp config |
@@ -112,13 +112,13 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
| smoke (frontend, optional in this task) | Vitest test that mounts the Library tab, calls a mocked `cmd_list_docs`, renders 1 row | minimal | | smoke (frontend, optional in this task) | Vitest test that mounts the Library tab, calls a mocked `cmd_list_docs`, renders 1 row | minimal |
| manual | full-stack smoke against a real ingested workspace (Markdown + 1 PDF + 1 image + 1 audio); each citation jumps correctly | manual checklist | | manual | full-stack smoke against a real ingested workspace (Markdown + 1 PDF + 1 image + 1 audio); each citation jumps correctly | manual checklist |
Backend tests under `cargo test -p kb-desktop`. Frontend tests are bonus and not gated by this task's DoD. Backend tests under `cargo test -p kebab-desktop`. Frontend tests are bonus and not gated by this task's DoD.
## Definition of Done ## Definition of Done
- [ ] `cargo check -p kb-desktop` passes - [ ] `cargo check -p kebab-desktop` passes
- [ ] `cargo test -p kb-desktop` passes - [ ] `cargo test -p kebab-desktop` passes
- [ ] `pnpm --filter kb-desktop-frontend build` produces a static asset bundle Tauri can package - [ ] `pnpm --filter kebab-desktop-frontend build` produces a static asset bundle Tauri can package
- [ ] `tauri build` produces an unsigned dmg on macOS in CI (signed/notarized are out of scope) - [ ] `tauri build` produces an unsigned dmg on macOS in CI (signed/notarized are out of scope)
- [ ] Each Tauri command returns wire-schema-v1 JSON; frontend asserts `schema_version` - [ ] Each Tauri command returns wire-schema-v1 JSON; frontend asserts `schema_version`
- [ ] No imports outside Allowed dependencies (backend) - [ ] No imports outside Allowed dependencies (backend)
@@ -139,4 +139,4 @@ Backend tests under `cargo test -p kb-desktop`. Frontend tests are bonus and not
- Path containment is the desktop's most security-sensitive surface; tests must include path traversal vectors (`..`, symlinks, absolute paths). - Path containment is the desktop's most security-sensitive surface; tests must include path traversal vectors (`..`, symlinks, absolute paths).
- PDF rendering via `pdfjs-dist` is heavy (~2 MB worker); lazy-load on first PDF citation. The trade-off vs a native render backend (e.g., `pdfium` ~150 MB binary, code-signing pain) is heavily one-sided; v1 stays on `pdfjs-dist`. - PDF rendering via `pdfjs-dist` is heavy (~2 MB worker); lazy-load on first PDF citation. The trade-off vs a native render backend (e.g., `pdfium` ~150 MB binary, code-signing pain) is heavily one-sided; v1 stays on `pdfjs-dist`.
- Audio formats vary; rely on the browser engine's HTML audio decoder (WebKit on macOS supports `.m4a`, `.mp3`; mileage varies on `.flac`/`.ogg`). - Audio formats vary; rely on the browser engine's HTML audio decoder (WebKit on macOS supports `.m4a`, `.mp3`; mileage varies on `.flac`/`.ogg`).
- Wide Tauri command surface tempts business-logic creep; CI must enforce that no `kb-rag` / `kb-search` / store crate appears in `kb-desktop`'s `cargo tree`. - Wide Tauri command surface tempts business-logic creep; CI must enforce that no `kebab-rag` / `kebab-search` / store crate appears in `kebab-desktop`'s `cargo tree`.

View File

@@ -3,7 +3,7 @@ phase: P0
title: "Workspace 뼈대 + 도메인 계약" title: "Workspace 뼈대 + 도메인 계약"
status: completed status: completed
depends_on: [] depends_on: []
source: kb_local_rust_report.md §3, §4, §6, §7, §13 source: kebab_local_rust_report.md §3, §4, §6, §7, §13
--- ---
# P0 — Workspace 뼈대 + 도메인 계약 # P0 — Workspace 뼈대 + 도메인 계약
@@ -16,11 +16,11 @@ compile 되는 Rust 2024 workspace 와 domain spec 확정. 이후 모든 phase
| crate | 역할 | | crate | 역할 |
|-------|------| |-------|------|
| `kb-core` | domain type, trait, error, ID 규칙 | | `kebab-core` | domain type, trait, error, ID 규칙 |
| `kb-parse-types` | parser 중간 표현 (`ParsedBlock` 등) — `kb-core` 와 parsers/normalize 사이 thin layer (design §3.7b) | | `kebab-parse-types` | parser 중간 표현 (`ParsedBlock` 등) — `kebab-core` 와 parsers/normalize 사이 thin layer (design §3.7b) |
| `kb-config` | config 로딩, 기본값, 경로 확장 | | `kebab-config` | config 로딩, 기본값, 경로 확장 |
| `kb-app` | facade. CLI/TUI/desktop 공통 진입점 | | `kebab-app` | facade. CLI/TUI/desktop 공통 진입점 |
| `kb-cli` | `kb` 바이너리 skeleton (`--help`만 동작) | | `kebab-cli` | `kebab` 바이너리 skeleton (`--help`만 동작) |
## Workspace 설정 ## Workspace 설정
@@ -29,7 +29,7 @@ compile 되는 Rust 2024 workspace 와 domain spec 확정. 이후 모든 phase
```toml ```toml
[workspace] [workspace]
resolver = "3" resolver = "3"
members = ["crates/kb-core", "crates/kb-parse-types", "crates/kb-config", "crates/kb-app", "crates/kb-cli"] members = ["crates/kebab-core", "crates/kebab-parse-types", "crates/kebab-config", "crates/kebab-app", "crates/kebab-cli"]
[workspace.package] [workspace.package]
edition = "2024" edition = "2024"
@@ -48,11 +48,11 @@ tracing = "0.1"
추가 멤버 crate 는 후속 phase 에서 합류. 추가 멤버 crate 는 후속 phase 에서 합류.
## kb-core 도메인 타입 ## kebab-core 도메인 타입
`RawAsset`, `CanonicalDocument`, `Block`, `Chunk`, `SearchHit`, `Citation`, `SourceSpan`, `Provenance`, `Metadata`. report §6 정의 그대로. `RawAsset`, `CanonicalDocument`, `Block`, `Chunk`, `SearchHit`, `Citation`, `SourceSpan`, `Provenance`, `Metadata`. report §6 정의 그대로.
## kb-core trait ## kebab-core trait
```rust ```rust
pub trait SourceConnector { fn scan(&self, scope: &SourceScope) -> anyhow::Result<Vec<RawAsset>>; } pub trait SourceConnector { fn scan(&self, scope: &SourceScope) -> anyhow::Result<Vec<RawAsset>>; }
@@ -78,7 +78,7 @@ index_id = collection name + index version + embedding model id
각 record 에 다음 version 필드 보존: `doc_version`, `schema_version`, `parser_version`, `chunker_version`, `embedding_model`, `embedding_version`, `index_version`, `prompt_template_version`. 각 record 에 다음 version 필드 보존: `doc_version`, `schema_version`, `parser_version`, `chunker_version`, `embedding_model`, `embedding_version`, `index_version`, `prompt_template_version`.
## kb-app facade ## kebab-app facade
CLI/TUI/desktop 모두 facade 함수 호출. parser/DB/LLM adapter 직접 호출 금지. 초기 facade 메서드 stub: CLI/TUI/desktop 모두 facade 함수 호출. parser/DB/LLM adapter 직접 호출 금지. 초기 facade 메서드 stub:
@@ -91,9 +91,9 @@ pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
pub fn doctor() -> anyhow::Result<DoctorReport>; pub fn doctor() -> anyhow::Result<DoctorReport>;
``` ```
## kb-cli skeleton ## kebab-cli skeleton
`clap` derive. subcommand: `init`, `ingest`, `index`, `search`, `ask`, `inspect doc|chunk`, `doctor`. 본체는 `kb-app` 호출만. P0 에선 `--help` 만 동작. `clap` derive. subcommand: `init`, `ingest`, `index`, `search`, `ask`, `inspect doc|chunk`, `doctor`. 본체는 `kebab-app` 호출만. P0 에선 `--help` 만 동작.
## spec 문서 ## spec 문서
@@ -112,15 +112,15 @@ pub fn doctor() -> anyhow::Result<DoctorReport>;
## 의존성 경계 ## 의존성 경계
- `kb-core`: 외부 의존 최소 (serde, time, uuid, blake3, thiserror, tracing). - `kebab-core`: 외부 의존 최소 (serde, time, uuid, blake3, thiserror, tracing).
- `kb-cli``kb-app` 만 의존. parser/DB/LLM 직접 의존 금지. - `kebab-cli``kebab-app` 만 의존. parser/DB/LLM 직접 의존 금지.
- `kb-app` 은 trait 만 보고 동작. 구현체는 dyn injection. - `kebab-app` 은 trait 만 보고 동작. 구현체는 dyn injection.
## 완료 조건 ## 완료 조건
- [ ] `cargo check --workspace` 통과 - [ ] `cargo check --workspace` 통과
- [ ] `cargo test --workspace` 통과 (단위 테스트는 ID 생성/도메인 직렬화 round-trip) - [ ] `cargo test --workspace` 통과 (단위 테스트는 ID 생성/도메인 직렬화 round-trip)
- [ ] `kb --help` 출력 - [ ] `kebab --help` 출력
- [ ] `docs/spec/*` 7개 문서 존재 - [ ] `docs/spec/*` 7개 문서 존재
- [ ] `fixtures/markdown/*` 3개 존재 - [ ] `fixtures/markdown/*` 3개 존재
- [ ] domain type serde JSON snapshot test 1개 이상 - [ ] domain type serde JSON snapshot test 1개 이상

View File

@@ -3,47 +3,47 @@ phase: P1
title: "Markdown ingestion 파이프라인" title: "Markdown ingestion 파이프라인"
status: completed status: completed
depends_on: [P0] depends_on: [P0]
source: kb_local_rust_report.md §8, §14, §17 Phase 1 source: kebab_local_rust_report.md §8, §14, §17 Phase 1
--- ---
# P1 — Markdown ingestion 파이프라인 # P1 — Markdown ingestion 파이프라인
## 목표 ## 목표
`Markdown 파일 -> RawAsset -> CanonicalDocument -> Chunk -> SQLite` 흐름 완성. LLM/embedding 없이도 `kb ingest` / `kb list docs` / `kb inspect doc <id>` 동작. `Markdown 파일 -> RawAsset -> CanonicalDocument -> Chunk -> SQLite` 흐름 완성. LLM/embedding 없이도 `kebab ingest` / `kebab list docs` / `kebab inspect doc <id>` 동작.
## 산출 crate ## 산출 crate
| crate | 역할 | | crate | 역할 |
|-------|------| |-------|------|
| `kb-source-fs` | local folder scan, checksum, 변경 감지. `SourceConnector` 구현 | | `kebab-source-fs` | local folder scan, checksum, 변경 감지. `SourceConnector` 구현 |
| `kb-parse-md` | Markdown bytes → structured document. `Extractor` 구현 | | `kebab-parse-md` | Markdown bytes → structured document. `Extractor` 구현 |
| `kb-normalize` | parser output → `CanonicalDocument` | | `kebab-normalize` | parser output → `CanonicalDocument` |
| `kb-chunk` | block-aware chunking. `Chunker` 구현 (`md-heading-v1`) | | `kebab-chunk` | block-aware chunking. `Chunker` 구현 (`md-heading-v1`) |
| `kb-store-sqlite` | metadata, document, chunk, job table. FTS table 은 P2 에서 활성화 | | `kebab-store-sqlite` | metadata, document, chunk, job table. FTS table 은 P2 에서 활성화 |
## kb-source-fs ## kebab-source-fs
- 입력: `SourceScope { root: PathBuf, include: Vec<Glob>, exclude: Vec<Glob> }` - 입력: `SourceScope { root: PathBuf, include: Vec<Glob>, exclude: Vec<Glob> }`
- 동작: 재귀 walk → 각 파일 `blake3``RawAsset` 목록. - 동작: 재귀 walk → 각 파일 `blake3``RawAsset` 목록.
- 변경 감지: `(source_uri, checksum)` 기준 신/구 비교. 동일 checksum 은 skip. - 변경 감지: `(source_uri, checksum)` 기준 신/구 비교. 동일 checksum 은 skip.
- watch 모드는 P1 범위 밖 (config 만 정의, 구현 후순위). - watch 모드는 P1 범위 밖 (config 만 정의, 구현 후순위).
## kb-parse-md ## kebab-parse-md
- parser 후보: `pulldown-cmark` 1차. GFM table/task list 필요해지면 `comrak` 검토 (§8). - parser 후보: `pulldown-cmark` 1차. GFM table/task list 필요해지면 `comrak` 검토 (§8).
- 보존 대상: YAML/TOML frontmatter, heading tree, paragraph, list, code block + lang tag, table, blockquote, link, image ref, **line range**. - 보존 대상: YAML/TOML frontmatter, heading tree, paragraph, list, code block + lang tag, table, blockquote, link, image ref, **line range**.
- 출력: 중간 표현 (parser 고유). `kb-normalize` 가 canonical 로 변환. - 출력: 중간 표현 (parser 고유). `kebab-normalize` 가 canonical 로 변환.
- malformed markdown: panic 금지. 가능한 부분만 보존하고 `Provenance` 에 warning 기록. - malformed markdown: panic 금지. 가능한 부분만 보존하고 `Provenance` 에 warning 기록.
## kb-normalize ## kebab-normalize
- 책임: parser 중간 표현 → `CanonicalDocument`. - 책임: parser 중간 표현 → `CanonicalDocument`.
- frontmatter → `Metadata` (id, title, aliases, tags, created_at, updated_at, source_type, trust_level, lang). - frontmatter → `Metadata` (id, title, aliases, tags, created_at, updated_at, source_type, trust_level, lang).
- block 트리 평탄화 + `BlockId` 부여 (heading path + 순번 기반 deterministic). - block 트리 평탄화 + `BlockId` 부여 (heading path + 순번 기반 deterministic).
- `SourceSpan``LineRange { start, end }` 또는 `ByteRange` 둘 다 허용. Markdown 은 line range 1차. - `SourceSpan``LineRange { start, end }` 또는 `ByteRange` 둘 다 허용. Markdown 은 line range 1차.
## kb-chunk (`md-heading-v1`) ## kebab-chunk (`md-heading-v1`)
우선순위 (§14): 우선순위 (§14):
1. heading boundary 우선 1. heading boundary 우선
@@ -58,7 +58,7 @@ policy 기본값: `target_tokens = 500`, `overlap_tokens = 80`, `respect_markdow
token 추정: tokenizer 미도입 단계라 byte / 문자 기반 근사 OK. 실제 tokenizer 는 P3 embedding 도입 시 교체. token 추정: tokenizer 미도입 단계라 byte / 문자 기반 근사 OK. 실제 tokenizer 는 P3 embedding 도입 시 교체.
## kb-store-sqlite ## kebab-store-sqlite
스키마 (1차): 스키마 (1차):
@@ -117,7 +117,7 @@ CREATE TABLE jobs (
- transaction: ingest 1건 = 1 transaction. 부분 실패 시 rollback. - transaction: ingest 1건 = 1 transaction. 부분 실패 시 rollback.
- idempotent: 동일 `doc_id` 재수집은 UPSERT, version bump. - idempotent: 동일 `doc_id` 재수집은 UPSERT, version bump.
## kb-app facade 확장 ## kebab-app facade 확장
```rust ```rust
pub fn ingest(scope: SourceScope) -> anyhow::Result<IngestReport>; pub fn ingest(scope: SourceScope) -> anyhow::Result<IngestReport>;
@@ -131,10 +131,10 @@ pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
## CLI ## CLI
```text ```text
kb ingest <path> [--include <glob>] [--exclude <glob>] kebab ingest <path> [--include <glob>] [--exclude <glob>]
kb list docs [--tag <t>] kebab list docs [--tag <t>]
kb inspect doc <doc_id> kebab inspect doc <doc_id>
kb inspect chunk <chunk_id> kebab inspect chunk <chunk_id>
``` ```
## 테스트 ## 테스트
@@ -146,14 +146,14 @@ kb inspect chunk <chunk_id>
## 의존성 경계 ## 의존성 경계
`kb-parse-md` 금지: `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, embedding 호출. parser 는 순수 함수. `kebab-parse-md` 금지: `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, embedding 호출. parser 는 순수 함수.
## 완료 조건 ## 완료 조건
- [ ] `kb ingest <path>` 실행 후 SQLite 에 documents/blocks/chunks 채워짐 - [ ] `kebab ingest <path>` 실행 후 SQLite 에 documents/blocks/chunks 채워짐
- [ ] `kb list docs` 정상 출력 - [ ] `kebab list docs` 정상 출력
- [ ] `kb inspect doc <id>` JSON 출력 - [ ] `kebab inspect doc <id>` JSON 출력
- [ ] `kb inspect chunk <id>` JSON 출력 (heading path + source span 포함) - [ ] `kebab inspect chunk <id>` JSON 출력 (heading path + source span 포함)
- [ ] 같은 폴더 재수집 시 중복 row 없음 - [ ] 같은 폴더 재수집 시 중복 row 없음
- [ ] parser/chunker version 변경 시 재처리 대상 식별 가능 - [ ] parser/chunker version 변경 시 재처리 대상 식별 가능
- [ ] fixture snapshot test 통과 - [ ] fixture snapshot test 통과

View File

@@ -3,19 +3,19 @@ phase: P2
title: "SQLite FTS5 lexical 검색 + citation" title: "SQLite FTS5 lexical 검색 + citation"
status: completed status: completed
depends_on: [P1] depends_on: [P1]
source: kb_local_rust_report.md §10, §15, §17 Phase 2 source: kebab_local_rust_report.md §10, §15, §17 Phase 2
--- ---
# P2 — SQLite FTS5 lexical 검색 + citation # P2 — SQLite FTS5 lexical 검색 + citation
## 목표 ## 목표
embedding/LLM 없이 FTS5 만으로 동작하는 검색 + citation 출력. `kb search "..."` 가 chunk 와 source span 반환. embedding/LLM 없이 FTS5 만으로 동작하는 검색 + citation 출력. `kebab search "..."` 가 chunk 와 source span 반환.
## 산출 crate ## 산출 crate
- `kb-search` (lexical 모드) — `Retriever` trait 구현 1번째. - `kebab-search` (lexical 모드) — `Retriever` trait 구현 1번째.
- `kb-store-sqlite` 확장: FTS5 virtual table + trigger. - `kebab-store-sqlite` 확장: FTS5 virtual table + trigger.
## FTS5 스키마 ## FTS5 스키마
@@ -69,15 +69,15 @@ pub struct SearchHit {
} }
``` ```
`Citation` 형식: `notes/rust/kb.md:L12-L34`. `Citation` 형식: `notes/rust/kebab.md:L12-L34`.
## 인덱스 라이프사이클 ## 인덱스 라이프사이클
- ingest 시 trigger 로 자동 동기화. - ingest 시 trigger 로 자동 동기화.
- `kb index --rebuild-fts` command 로 FTS table 재구축 (chunker version bump 후 사용). - `kebab index --rebuild-fts` command 로 FTS table 재구축 (chunker version bump 후 사용).
- `index_version``(schema_version, fts_config_hash)` 조합. - `index_version``(schema_version, fts_config_hash)` 조합.
## kb-app facade 확장 ## kebab-app facade 확장
```rust ```rust
pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHit>>; pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHit>>;
@@ -86,16 +86,16 @@ pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHit>>;
## CLI ## CLI
```text ```text
kb search "Rust workspace 설계" [--k 10] [--tag rust] [--mode lexical] kebab search "Rust workspace 설계" [--k 10] [--tag rust] [--mode lexical]
kb index --rebuild-fts kebab index --rebuild-fts
``` ```
출력 예: 출력 예:
```text ```text
1. [0.82] Rust workspace는 여러 package를 하나로 관리한다… 1. [0.82] Rust workspace는 여러 package를 하나로 관리한다…
doc: notes/rust/kb.md doc: notes/rust/kebab.md
citation: notes/rust/kb.md:L12-L34 citation: notes/rust/kebab.md:L12-L34
heading: 아키텍처 > Rust workspace heading: 아키텍처 > Rust workspace
``` ```
@@ -108,13 +108,13 @@ kb index --rebuild-fts
## 의존성 경계 ## 의존성 경계
- `kb-search``kb-store-sqlite``kb-core` 만 의존. - `kebab-search``kebab-store-sqlite``kebab-core` 만 의존.
- LLM/embedding 호출 금지 (P2 단계). - LLM/embedding 호출 금지 (P2 단계).
- CLI 는 `kb-app` 통해서만 호출. - CLI 는 `kebab-app` 통해서만 호출.
## 완료 조건 ## 완료 조건
- [ ] `kb search "..."` top-k chunk 반환 - [ ] `kebab search "..."` top-k chunk 반환
- [ ] 모든 결과에 citation 포함 - [ ] 모든 결과에 citation 포함
- [ ] citation line range 가 원본과 일치 - [ ] citation line range 가 원본과 일치
- [ ] 한영 혼합 query 동작 (한국어 토큰화 한계는 노트로) - [ ] 한영 혼합 query 동작 (한국어 토큰화 한계는 노트로)

View File

@@ -3,23 +3,23 @@ phase: P3
title: "Local embedding + LanceDB + hybrid search" title: "Local embedding + LanceDB + hybrid search"
status: completed status: completed
depends_on: [P2] depends_on: [P2]
source: kb_local_rust_report.md §10, §11, §15, §17 Phase 3 source: kebab_local_rust_report.md §10, §11, §15, §17 Phase 3
--- ---
# P3 — Local embedding + LanceDB + hybrid search # P3 — Local embedding + LanceDB + hybrid search
## 목표 ## 목표
local embedding 으로 chunk vector 화 → LanceDB 저장 → vector 검색 + lexical 융합 (hybrid). `kb search --mode {lexical,vector,hybrid}` 동작. local embedding 으로 chunk vector 화 → LanceDB 저장 → vector 검색 + lexical 융합 (hybrid). `kebab search --mode {lexical,vector,hybrid}` 동작.
## 산출 crate ## 산출 crate
| crate | 역할 | | crate | 역할 |
|-------|------| |-------|------|
| `kb-embed` | `Embedder` trait + `EmbeddingInput`/output 타입 | | `kebab-embed` | `Embedder` trait + `EmbeddingInput`/output 타입 |
| `kb-embed-local` | `fastembed-rs` adapter (1차). later: Ollama embed endpoint, candle | | `kebab-embed-local` | `fastembed-rs` adapter (1차). later: Ollama embed endpoint, candle |
| `kb-store-vector` | LanceDB 연동. table 관리, upsert, vector search | | `kebab-store-vector` | LanceDB 연동. table 관리, upsert, vector search |
| `kb-search` | lexical + vector 병행 + score fusion | | `kebab-search` | lexical + vector 병행 + score fusion |
## Embedder ## Embedder
@@ -64,7 +64,7 @@ created_at : timestamp
## Indexing job ## Indexing job
```text ```text
kb index --embeddings [--model <id>] [--batch-size N] [--resume] kebab index --embeddings [--model <id>] [--batch-size N] [--resume]
``` ```
- chunk 중 `embedding_id = chunk_id + model_id + dim` 가 vector store 에 없는 것만 처리. - chunk 중 `embedding_id = chunk_id + model_id + dim` 가 vector store 에 없는 것만 처리.
@@ -91,7 +91,7 @@ k_rrf 기본 60.
P3 범위에선 reranker 미도입 (P+ 단계 노트). P3 범위에선 reranker 미도입 (P+ 단계 노트).
## kb-search 구조 ## kebab-search 구조
```rust ```rust
pub struct HybridRetriever { pub struct HybridRetriever {
@@ -102,11 +102,11 @@ pub struct HybridRetriever {
``` ```
- 각 sub retriever 는 `Retriever` trait 구현. - 각 sub retriever 는 `Retriever` trait 구현.
- `kb-app::search` 가 mode 따라 dispatch. - `kebab-app::search` 가 mode 따라 dispatch.
## kb-app facade 확장 ## kebab-app facade 확장
P3 동안 kb-app facade 의 `ingest` / `search` / `list_docs` / `inspect_doc` / `inspect_chunk` 의 stub 본체를 실제 라이브러리 호출로 대체. P0 부터 시그니처는 frozen 이므로 signature 변경 없이 body 만 swap. P3 동안 kebab-app facade 의 `ingest` / `search` / `list_docs` / `inspect_doc` / `inspect_chunk` 의 stub 본체를 실제 라이브러리 호출로 대체. P0 부터 시그니처는 frozen 이므로 signature 변경 없이 body 만 swap.
```rust ```rust
pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result<IngestReport>; // p3-5 pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result<IngestReport>; // p3-5
@@ -117,14 +117,14 @@ pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>; // p4-3 (stub remains) pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>; // p4-3 (stub remains)
``` ```
p3-5 는 LLM 미관여 facade 모두 (`ask` 제외) 를 한 번에 wire. 이후 `cargo run -p kb-cli -- index``cargo run -p kb-cli -- search --mode {lexical,vector,hybrid}` 가 실 동작. p3-5 는 LLM 미관여 facade 모두 (`ask` 제외) 를 한 번에 wire. 이후 `cargo run -p kebab-cli -- index``cargo run -p kebab-cli -- search --mode {lexical,vector,hybrid}` 가 실 동작.
## CLI ## CLI
```text ```text
kb index --embeddings kebab index --embeddings
kb search --mode vector "비슷한 설계 원칙" kebab search --mode vector "비슷한 설계 원칙"
kb search --mode hybrid "Markdown chunking 규칙" kebab search --mode hybrid "Markdown chunking 규칙"
``` ```
## 테스트 ## 테스트
@@ -137,19 +137,19 @@ kb search --mode hybrid "Markdown chunking 규칙"
## 의존성 경계 ## 의존성 경계
- `kb-embed-local` 만 ONNX/모델 binding 의존. 다른 crate 는 trait 만 사용. - `kebab-embed-local` 만 ONNX/모델 binding 의존. 다른 crate 는 trait 만 사용.
- `kb-store-vector``lancedb` 의존. SQLite 와 cross-write 금지 (각 store 책임 분리). - `kebab-store-vector``lancedb` 의존. SQLite 와 cross-write 금지 (각 store 책임 분리).
- LLM crate 와 분리 (§11.1). - LLM crate 와 분리 (§11.1).
## 완료 조건 ## 완료 조건
- [ ] `kb index` (= `kb-app::ingest`) 로 모든 chunk 가 SQLite + LanceDB 에 저장 (p3-5) - [ ] `kebab index` (= `kebab-app::ingest`) 로 모든 chunk 가 SQLite + LanceDB 에 저장 (p3-5)
- [ ] `kb search --mode vector` 정상 hit - [ ] `kebab search --mode vector` 정상 hit
- [ ] `kb search --mode hybrid` 정상 hit, citation 포함 - [ ] `kebab search --mode hybrid` 정상 hit, citation 포함
- [ ] 모델/차원 변경 시 별도 table 로 분리 저장 - [ ] 모델/차원 변경 시 별도 table 로 분리 저장
- [ ] resume 시 미완료 chunk 만 처리 (P+ 로 deferred) - [ ] resume 시 미완료 chunk 만 처리 (P+ 로 deferred)
- [ ] hit@k 측정 가능한 형태로 결과 구조화 (P5 준비) - [ ] hit@k 측정 가능한 형태로 결과 구조화 (P5 준비)
- [ ] `kb-app::{ingest,search,list_docs,inspect_doc,inspect_chunk}` 가 실 동작 (`ask` 는 P4-3 까지 stub) — p3-5 - [ ] `kebab-app::{ingest,search,list_docs,inspect_doc,inspect_chunk}` 가 실 동작 (`ask` 는 P4-3 까지 stub) — p3-5
## 리스크 / 주의 ## 리스크 / 주의

View File

@@ -3,22 +3,22 @@ phase: P4
title: "Local LLM + RAG + grounded answer" title: "Local LLM + RAG + grounded answer"
status: completed status: completed
depends_on: [P3] depends_on: [P3]
source: kb_local_rust_report.md §11, §15.2, §17 Phase 4 source: kebab_local_rust_report.md §11, §15.2, §17 Phase 4
--- ---
# P4 — Local LLM + RAG + grounded answer # P4 — Local LLM + RAG + grounded answer
## 목표 ## 목표
local LLM 으로 citation 포함 답변 생성. 근거 부족 시 거절. `kb ask "..."` 동작. local LLM 으로 citation 포함 답변 생성. 근거 부족 시 거절. `kebab ask "..."` 동작.
## 산출 crate ## 산출 crate
| crate | 역할 | | crate | 역할 |
|-------|------| |-------|------|
| `kb-llm` | `LanguageModel` trait + request/response 타입 | | `kebab-llm` | `LanguageModel` trait + request/response 타입 |
| `kb-llm-local` | Ollama adapter 1차. later: llama.cpp, candle | | `kebab-llm-local` | Ollama adapter 1차. later: llama.cpp, candle |
| `kb-rag` | retrieval → context packing → prompt → generate → citation 검증 | | `kebab-rag` | retrieval → context packing → prompt → generate → citation 검증 |
## LanguageModel ## LanguageModel
@@ -50,9 +50,9 @@ pub struct GenerateResponse {
- HTTP localhost 호출 (`http://127.0.0.1:11434/api/generate`). - HTTP localhost 호출 (`http://127.0.0.1:11434/api/generate`).
- 내부에서 async runtime 사용 가능. 외부 API 는 동기 wrapper 유지. - 내부에서 async runtime 사용 가능. 외부 API 는 동기 wrapper 유지.
- model 기본값 config (`qwen2.5:14b-instruct` 등). 실제 선택은 P5 eval 후 결정. - model 기본값 config (`qwen2.5:14b-instruct` 등). 실제 선택은 P5 eval 후 결정.
- 서버 미기동 시 명확한 에러 메시지 + `kb doctor` 진단. - 서버 미기동 시 명확한 에러 메시지 + `kebab doctor` 진단.
## kb-rag 파이프라인 ## kebab-rag 파이프라인
```text ```text
query query
@@ -118,7 +118,7 @@ pub struct Answer {
`answers` table 에 저장 (재현/감사용). 사용한 chunk_id 목록 + retrieval params 도 함께. `answers` table 에 저장 (재현/감사용). 사용한 chunk_id 목록 + retrieval params 도 함께.
## kb-app facade 확장 ## kebab-app facade 확장
```rust ```rust
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>; pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>;
@@ -127,9 +127,9 @@ pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>;
## CLI ## CLI
```text ```text
kb ask "내 KB 설계에서 저장소 전략은?" kebab ask "내 KB 설계에서 저장소 전략은?"
kb ask --k 8 --temperature 0 "..." kebab ask --k 8 --temperature 0 "..."
kb ask --explain "..." # retrieval trace + packed prompt 출력 kebab ask --explain "..." # retrieval trace + packed prompt 출력
``` ```
## 테스트 ## 테스트
@@ -142,13 +142,13 @@ kb ask --explain "..." # retrieval trace + packed prompt 출력
## 의존성 경계 ## 의존성 경계
- `kb-llm-local` 만 Ollama HTTP 의존. - `kebab-llm-local` 만 Ollama HTTP 의존.
- `kb-rag``kb-search` (Retriever trait) + `kb-llm` (LanguageModel trait) 만 사용. SQLite/LanceDB 직접 호출 금지. - `kebab-rag``kebab-search` (Retriever trait) + `kebab-llm` (LanguageModel trait) 만 사용. SQLite/LanceDB 직접 호출 금지.
- CLI 는 `kb-app::ask` 만 호출. - CLI 는 `kebab-app::ask` 만 호출.
## 완료 조건 ## 완료 조건
- [ ] `kb ask "..."` 동작 - [ ] `kebab ask "..."` 동작
- [ ] 답변에 citation 포함 - [ ] 답변에 citation 포함
- [ ] 근거 없는 질문 거절 - [ ] 근거 없는 질문 거절
- [ ] `--explain` 으로 retrieval trace 확인 - [ ] `--explain` 으로 retrieval trace 확인
@@ -158,6 +158,6 @@ kb ask --explain "..." # retrieval trace + packed prompt 출력
## 리스크 / 주의 ## 리스크 / 주의
- 모델 선택은 P5 golden set 으로 평가 후 확정. P4 에선 default 만. - 모델 선택은 P5 golden set 으로 평가 후 확정. P4 에선 default 만.
- Ollama 미기동 / 모델 미다운로드 → `kb doctor` 가 명확히 안내. - Ollama 미기동 / 모델 미다운로드 → `kebab doctor` 가 명확히 안내.
- LLM 답변에 hallucinated citation 자주 나옴. 후처리 검증이 핵심. - LLM 답변에 hallucinated citation 자주 나옴. 후처리 검증이 핵심.
- prompt template 변경은 `prompt_template_version` 반드시 bump. - prompt template 변경은 `prompt_template_version` 반드시 bump.

View File

@@ -3,7 +3,7 @@ phase: P5
title: "Golden query / regression eval" title: "Golden query / regression eval"
status: completed status: completed
depends_on: [P4] depends_on: [P4]
source: kb_local_rust_report.md §17 Phase 5, §18 source: kebab_local_rust_report.md §17 Phase 5, §18
--- ---
# P5 — Golden query / regression eval # P5 — Golden query / regression eval
@@ -14,7 +14,7 @@ source: kb_local_rust_report.md §17 Phase 5, §18
## 산출 crate ## 산출 crate
- `kb-eval` — golden query 실행기, 지표 계산, report 생성. - `kebab-eval` — golden query 실행기, 지표 계산, report 생성.
## Golden set fixture ## Golden set fixture
@@ -25,9 +25,9 @@ source: kb_local_rust_report.md §17 Phase 5, §18
query: "Markdown chunking 규칙" query: "Markdown chunking 규칙"
lang: ko lang: ko
expected_doc_ids: expected_doc_ids:
- doc:notes/rust/kb-architecture.md - doc:notes/rust/kebab-architecture.md
expected_chunk_ids: expected_chunk_ids:
- chunk:notes/rust/kb-architecture.md#chunking-policy - chunk:notes/rust/kebab-architecture.md#chunking-policy
must_contain: must_contain:
- "heading" - "heading"
- "code block" - "code block"
@@ -57,9 +57,9 @@ source: kb_local_rust_report.md §17 Phase 5, §18
## 실행 모드 ## 실행 모드
```text ```text
kb eval run --suite golden [--mode {lexical,vector,hybrid}] [--with-rag] kebab eval run --suite golden [--mode {lexical,vector,hybrid}] [--with-rag]
kb eval compare <run_id_a> <run_id_b> kebab eval compare <run_id_a> <run_id_b>
kb eval report <run_id> --format {json,md,html} kebab eval report <run_id> --format {json,md,html}
``` ```
run record: run record:
@@ -90,7 +90,7 @@ DB 저장 (`eval_runs`, `eval_query_results` table) 또는 JSON 파일. 재현
- 자동 hyperparameter 탐색 — 안 함. - 자동 hyperparameter 탐색 — 안 함.
- LLM judge ("LLM as a judge") — P5 범위 밖. groundedness 는 rule-based (`must_contain`) 만. - LLM judge ("LLM as a judge") — P5 범위 밖. groundedness 는 rule-based (`must_contain`) 만.
## kb-app facade 확장 ## kebab-app facade 확장
```rust ```rust
pub fn eval_run(opts: EvalRunOpts) -> anyhow::Result<EvalRun>; pub fn eval_run(opts: EvalRunOpts) -> anyhow::Result<EvalRun>;
@@ -105,13 +105,13 @@ pub fn eval_compare(a: &str, b: &str) -> anyhow::Result<CompareReport>;
## 의존성 경계 ## 의존성 경계
- `kb-eval``kb-app` 만 호출 (검색/ask 는 facade 통해서). 내부 store/LLM 직접 호출 금지. - `kebab-eval``kebab-app` 만 호출 (검색/ask 는 facade 통해서). 내부 store/LLM 직접 호출 금지.
## 완료 조건 ## 완료 조건
- [ ] `fixtures/golden_queries.yaml` 30+ 개 - [ ] `fixtures/golden_queries.yaml` 30+ 개
- [ ] `kb eval run` 으로 hit@k, MRR, citation_coverage 산출 - [ ] `kebab eval run` 으로 hit@k, MRR, citation_coverage 산출
- [ ] `kb eval compare` 로 두 run 비교 가능 - [ ] `kebab eval compare` 로 두 run 비교 가능
- [ ] config snapshot 이 run 에 저장됨 (chunker, embedding, llm, prompt 버전) - [ ] config snapshot 이 run 에 저장됨 (chunker, embedding, llm, prompt 버전)
- [ ] CI 로 회귀 감지 가능 (예: hit@5 가 baseline 대비 -3% 이상 떨어지면 실패) - [ ] CI 로 회귀 감지 가능 (예: hit@5 가 baseline 대비 -3% 이상 떨어지면 실패)

View File

@@ -3,7 +3,7 @@ phase: P6
title: "이미지 ingestion (OCR + caption)" title: "이미지 ingestion (OCR + caption)"
status: planned status: planned
depends_on: [P5] depends_on: [P5]
source: kb_local_rust_report.md §9.1, §17 Phase 6 source: kebab_local_rust_report.md §9.1, §17 Phase 6
--- ---
# P6 — 이미지 ingestion # P6 — 이미지 ingestion
@@ -14,8 +14,8 @@ source: kb_local_rust_report.md §9.1, §17 Phase 6
## 산출 crate ## 산출 crate
- `kb-parse-image``Extractor` 구현. 이미지 → CanonicalDocument. - `kebab-parse-image``Extractor` 구현. 이미지 → CanonicalDocument.
- (선택) `kb-ocr` / `kb-vlm` 어댑터 (외부 모델 분리 시). - (선택) `kebab-ocr` / `kebab-vlm` 어댑터 (외부 모델 분리 시).
## 추출 정보 3종 (§9.1) ## 추출 정보 3종 (§9.1)
@@ -83,10 +83,10 @@ photos/diagram-2026.png#caption # caption chunk
## CLI ## CLI
```text ```text
kb ingest ./assets/diagram.png kebab ingest ./assets/diagram.png
kb ingest ./assets/ # 폴더 안 이미지 자동 인식 kebab ingest ./assets/ # 폴더 안 이미지 자동 인식
kb search "이미지 안의 OCR 텍스트" kebab search "이미지 안의 OCR 텍스트"
kb inspect doc <image_doc_id> # OCR/caption/EXIF 모두 표시 kebab inspect doc <image_doc_id> # OCR/caption/EXIF 모두 표시
``` ```
## 테스트 ## 테스트
@@ -99,13 +99,13 @@ kb inspect doc <image_doc_id> # OCR/caption/EXIF 모두 표시
## 의존성 경계 ## 의존성 경계
- `kb-parse-image``kb-core` + 이미지 디코딩 (`image` crate) + OCR adapter 만. - `kebab-parse-image``kebab-core` + 이미지 디코딩 (`image` crate) + OCR adapter 만.
- LLM/embedding 호출 금지 (caption 은 별도 adapter 통해). - LLM/embedding 호출 금지 (caption 은 별도 adapter 통해).
- VLM caption 은 background job. ingest blocking 금지. - VLM caption 은 background job. ingest blocking 금지.
## 완료 조건 ## 완료 조건
- [ ] `kb ingest <image>` 동작 - [ ] `kebab ingest <image>` 동작
- [ ] OCR text 검색 가능 - [ ] OCR text 검색 가능
- [ ] OCR region citation 출력 - [ ] OCR region citation 출력
- [ ] caption 과 observed text provenance 분리 - [ ] caption 과 observed text provenance 분리

View File

@@ -3,7 +3,7 @@ phase: P7
title: "PDF text extraction + page citation" title: "PDF text extraction + page citation"
status: planned status: planned
depends_on: [P5] depends_on: [P5]
source: kb_local_rust_report.md §9.2, §17 Phase 7 source: kebab_local_rust_report.md §9.2, §17 Phase 7
--- ---
# P7 — PDF ingestion # P7 — PDF ingestion
@@ -14,7 +14,7 @@ text PDF 추출 → page-aware chunking → citation `paper.pdf:p13`. scanned PD
## 산출 crate ## 산출 crate
- `kb-parse-pdf``Extractor` 구현. - `kebab-parse-pdf``Extractor` 구현.
## 단계 분리 (§9.2) ## 단계 분리 (§9.2)
@@ -63,10 +63,10 @@ paper.pdf:p13:span=0-1240 # char range within page
## CLI ## CLI
```text ```text
kb ingest ./paper.pdf kebab ingest ./paper.pdf
kb ingest ./papers/ kebab ingest ./papers/
kb search "PDF 안의 특정 개념" kebab search "PDF 안의 특정 개념"
kb inspect doc <pdf_doc_id> kebab inspect doc <pdf_doc_id>
``` ```
## 테스트 ## 테스트
@@ -79,12 +79,12 @@ kb inspect doc <pdf_doc_id>
## 의존성 경계 ## 의존성 경계
- `kb-parse-pdf``kb-core` + `pdf-extract` / `lopdf` 만. - `kebab-parse-pdf``kebab-core` + `pdf-extract` / `lopdf` 만.
- OCR 호출은 별도 adapter 통해 (P6 OCR 인프라 재사용). - OCR 호출은 별도 adapter 통해 (P6 OCR 인프라 재사용).
## 완료 조건 ## 완료 조건
- [ ] `kb ingest <pdf>` 동작 - [ ] `kebab ingest <pdf>` 동작
- [ ] page-level chunk + citation - [ ] page-level chunk + citation
- [ ] 검색 결과에 `paper.pdf:p<n>` 포함 - [ ] 검색 결과에 `paper.pdf:p<n>` 포함
- [ ] 추출 실패 페이지에 대한 provenance warning - [ ] 추출 실패 페이지에 대한 provenance warning

View File

@@ -3,7 +3,7 @@ phase: P8
title: "음성 transcription + timestamp citation" title: "음성 transcription + timestamp citation"
status: planned status: planned
depends_on: [P5] depends_on: [P5]
source: kb_local_rust_report.md §9.3, §17 Phase 8 source: kebab_local_rust_report.md §9.3, §17 Phase 8
--- ---
# P8 — 음성 ingestion # P8 — 음성 ingestion
@@ -14,8 +14,8 @@ audio 파일 → transcript (timestamped segment) → CanonicalDocument → 동
## 산출 crate ## 산출 crate
- `kb-parse-audio``Extractor` 구현. - `kebab-parse-audio``Extractor` 구현.
- `kb-asr-whisper` (또는 `kb-parse-audio` 내부 모듈) — whisper.cpp adapter. - `kebab-asr-whisper` (또는 `kebab-parse-audio` 내부 모듈) — whisper.cpp adapter.
## 파이프라인 (§9.3) ## 파이프라인 (§9.3)
@@ -94,11 +94,11 @@ meeting-2026-04-27.m4a:00:13:42-00:14:10:speaker=S1
## CLI ## CLI
```text ```text
kb ingest ./meeting.m4a kebab ingest ./meeting.m4a
kb ingest ./recordings/ kebab ingest ./recordings/
kb search "회의에서 언급한 결정사항" kebab search "회의에서 언급한 결정사항"
kb inspect doc <audio_doc_id> # transcript + segment timestamp 표시 kebab inspect doc <audio_doc_id> # transcript + segment timestamp 표시
kb play <chunk_id> # (선택) 해당 구간 재생 — 후순위 kebab play <chunk_id> # (선택) 해당 구간 재생 — 후순위
``` ```
## 테스트 ## 테스트
@@ -111,12 +111,12 @@ kb play <chunk_id> # (선택) 해당 구간 재생 — 후순위
## 의존성 경계 ## 의존성 경계
- `kb-parse-audio``kb-core` + `Transcriber` adapter 만. - `kebab-parse-audio``kebab-core` + `Transcriber` adapter 만.
- LLM 호출 금지. RAG 단계는 transcript text 기반으로 동일 파이프라인. - LLM 호출 금지. RAG 단계는 transcript text 기반으로 동일 파이프라인.
## 완료 조건 ## 완료 조건
- [ ] `kb ingest <audio>` 동작 - [ ] `kebab ingest <audio>` 동작
- [ ] transcript 가 segment timestamp 와 함께 저장 - [ ] transcript 가 segment timestamp 와 함께 저장
- [ ] 검색 결과에 `00:hh:mm:ss-` citation 포함 - [ ] 검색 결과에 `00:hh:mm:ss-` citation 포함
- [ ] 동일 오디오 재수집 idempotent - [ ] 동일 오디오 재수집 idempotent

View File

@@ -3,7 +3,7 @@ phase: P9
title: "TUI + desktop app" title: "TUI + desktop app"
status: planned status: planned
depends_on: [P5] depends_on: [P5]
source: kb_local_rust_report.md §16, §17 Phase 9 source: kebab_local_rust_report.md §16, §17 Phase 9
--- ---
# P9 — TUI + desktop app # P9 — TUI + desktop app
@@ -15,18 +15,18 @@ CLI 위에 사용성 레이어 추가. domain/검색/RAG 가 안정된 뒤 마
## 순서 ## 순서
```text ```text
kb-tui (먼저) → kb-desktop (나중) kebab-tui (먼저) → kebab-desktop (나중)
``` ```
이유: TUI 는 domain 변화에 빠르게 적응 가능. desktop 은 packaging/배포 비용 큼. 이유: TUI 는 domain 변화에 빠르게 적응 가능. desktop 은 packaging/배포 비용 큼.
--- ---
## P9.A — kb-tui (Ratatui) ## P9.A — kebab-tui (Ratatui)
### 산출 crate ### 산출 crate
- `kb-tui` — Ratatui + crossterm 기반 terminal UI. - `kebab-tui` — Ratatui + crossterm 기반 terminal UI.
### 화면 구성 ### 화면 구성
@@ -51,8 +51,8 @@ q : 종료
### 의존성 경계 ### 의존성 경계
- `kb-tui``kb-app` 만. parser/store/LLM 직접 호출 금지. - `kebab-tui``kebab-app` 만. parser/store/LLM 직접 호출 금지.
- 비동기 I/O (검색/ask) 는 `kb-app` 비동기 wrapper 또는 thread + channel. - 비동기 I/O (검색/ask) 는 `kebab-app` 비동기 wrapper 또는 thread + channel.
### 완료 조건 ### 완료 조건
@@ -63,7 +63,7 @@ q : 종료
--- ---
## P9.B — kb-desktop ## P9.B — kebab-desktop
### 후보 비교 ### 후보 비교
@@ -72,14 +72,14 @@ q : 종료
| Tauri | Rust backend + web frontend. native webview, 작은 binary | web frontend 별도 stack (TS/JS) | | Tauri | Rust backend + web frontend. native webview, 작은 binary | web frontend 별도 stack (TS/JS) |
| egui/eframe | 순수 Rust, immediate-mode | 디자인 자유도/접근성 한계 | | egui/eframe | 순수 Rust, immediate-mode | 디자인 자유도/접근성 한계 |
추천: Tauri 1차. 기존 `kb-app` facade 그대로 backend 로 노출. frontend 는 가볍게 (svelte/solid/vanilla). 추천: Tauri 1차. 기존 `kebab-app` facade 그대로 backend 로 노출. frontend 는 가볍게 (svelte/solid/vanilla).
### 산출 crate / 구조 ### 산출 crate / 구조
- `kb-desktop` (Tauri app crate) - `kebab-desktop` (Tauri app crate)
- `kb-desktop-frontend/` (web 자산) - `kebab-desktop-frontend/` (web 자산)
Tauri command 는 `kb-app` 함수 1:1 wrap. 신규 비즈니스 로직 추가 금지. Tauri command 는 `kebab-app` 함수 1:1 wrap. 신규 비즈니스 로직 추가 금지.
### 화면 구성 (1차) ### 화면 구성 (1차)
@@ -100,7 +100,7 @@ Tauri command 는 `kb-app` 함수 1:1 wrap. 신규 비즈니스 로직 추가
### 의존성 경계 ### 의존성 경계
- frontend 는 Tauri command (= `kb-app` wrapper) 만 호출. SQLite/LanceDB 직접 접근 금지. - frontend 는 Tauri command (= `kebab-app` wrapper) 만 호출. SQLite/LanceDB 직접 접근 금지.
- 모델 다운로드/실행은 backend 책임. - 모델 다운로드/실행은 backend 책임.
### 완료 조건 ### 완료 조건