docs(rename): kb → kebab — README, tasks/, docs/, design doc, report
마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신. - 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈 path 표기 `kb_*` → `kebab_*` 포함). - 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`, `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`, `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때 같은 prefix 사용). - CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` / `kb doctor` / `kb inspect` / `kb list` / `kb eval` → `kebab <verb>`. fenced code block + 인라인 backtick 모두. - XDG paths + env vars + binary 경로 (`target/release/kb` → `target/release/kebab`) 동기화. - design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task spec 모든 reference 통일. - task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history 기록용 author 정보라 그대로 유지 (실제 git history 의 author 는 변경 불가). - `tasks/phase-5-evaluation.md` 의 `status: planned` → `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분). ## 검증 - `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]" --include="*.md"` 0 hits (task-decomposition.md 의 git author 제외). - 모든 file path reference 살아있음 (renamed file 들 모두 새 path 로 update). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
140
README.md
140
README.md
@@ -1,8 +1,8 @@
|
||||
# kb — Local-first Knowledge Base
|
||||
# kebab — Local-first Knowledge Base
|
||||
|
||||
> **상태:** P0–P4 구현 완료 (31 component task 중 17 완료) + 3건 post-merge hotfix 적용. `kb index` / `kb search --mode {lexical,vector,hybrid}` / `kb ask` 모두 실 동작. 다음 단계 = P5 (eval suite). 자세한 진행 상황은 [tasks/INDEX.md](tasks/INDEX.md), 머지 후 발견된 버그와 fix는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md).
|
||||
> **상태:** P0–P4 구현 완료 (31 component task 중 17 완료) + 3건 post-merge hotfix 적용. `kebab index` / `kebab search --mode {lexical,vector,hybrid}` / `kebab ask` 모두 실 동작. 다음 단계 = P5 (eval suite). 자세한 진행 상황은 [tasks/INDEX.md](tasks/INDEX.md), 머지 후 발견된 버그와 fix는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md).
|
||||
|
||||
`kb` 는 개인용 로컬 knowledge base + RAG 도구다. Markdown / PDF / 이미지 / 음성을 한 곳에 색인하고, 의미 검색 + citation 포함 LLM 답변을 단일 binary 로 제공한다. 모든 추론은 로컬 (Ollama / fastembed / whisper.cpp) 에서 돌아간다.
|
||||
`kebab` 는 개인용 로컬 knowledge base + RAG 도구다. Markdown / PDF / 이미지 / 음성을 한 곳에 색인하고, 의미 검색 + citation 포함 LLM 답변을 단일 binary 로 제공한다. 모든 추론은 로컬 (Ollama / fastembed / whisper.cpp) 에서 돌아간다.
|
||||
|
||||
대상 하드웨어: M4 48GB MacBook 1대, 사용자 1명.
|
||||
|
||||
@@ -12,14 +12,14 @@
|
||||
|
||||
| 명령 | 동작 | 상태 |
|
||||
|------|------|------|
|
||||
| `kb init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 | ✅ P0 |
|
||||
| `kb ingest [<path>]` | Markdown 색인 (idempotent). PDF/이미지/음성은 P6+. | ✅ P3-5 |
|
||||
| `kb search --mode {lexical,vector,hybrid} "<query>"` | 검색 — citation 포함, hybrid는 RRF fusion | ✅ P3-5 |
|
||||
| `kb list docs` | 색인된 문서 목록 | ✅ P3-5 |
|
||||
| `kb inspect doc <id>` / `kb inspect chunk <id>` | raw record 보기 | ✅ P3-5 |
|
||||
| `kb ask "<query>"` | RAG 답변 + 근거 인용. 근거 부족 시 거절. Ollama 필요. | ✅ P4-3 |
|
||||
| `kb doctor` | 설정/모델/DB 헬스 체크 | ✅ P0 |
|
||||
| `kb eval run / compare` | golden query 회귀 측정 | ⏳ P5 |
|
||||
| `kebab init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 | ✅ P0 |
|
||||
| `kebab ingest [<path>]` | Markdown 색인 (idempotent). PDF/이미지/음성은 P6+. | ✅ P3-5 |
|
||||
| `kebab search --mode {lexical,vector,hybrid} "<query>"` | 검색 — citation 포함, hybrid는 RRF fusion | ✅ P3-5 |
|
||||
| `kebab list docs` | 색인된 문서 목록 | ✅ P3-5 |
|
||||
| `kebab inspect doc <id>` / `kebab inspect chunk <id>` | raw record 보기 | ✅ P3-5 |
|
||||
| `kebab ask "<query>"` | RAG 답변 + 근거 인용. 근거 부족 시 거절. Ollama 필요. | ✅ P4-3 |
|
||||
| `kebab doctor` | 설정/모델/DB 헬스 체크 | ✅ P0 |
|
||||
| `kebab eval run / compare` | golden query 회귀 측정 | ⏳ P5 |
|
||||
|
||||
기계 친화 모드: 모든 명령에 `--json` 플래그. 출력은 frozen wire schema v1 (`schema_version` 필드 항상 포함, 예: `ingest_report.v1`, `search_hit.v1`, `answer.v1`, `doctor.v1`).
|
||||
|
||||
@@ -44,35 +44,35 @@
|
||||
| citation 형식 | URI fragment (`path#L12-L34`, W3C Media Fragments) |
|
||||
| ID 생성 | `blake3(canonical_json(tuple))[..32]` hex |
|
||||
| RRF fusion_score | `[0, 1]` 정규화 — `2 / (k_rrf + 1)` 로 나눠 mode 간 비교 가능 (post-merge hotfix) |
|
||||
| layout | XDG (`~/.local/share/kb/`, `~/.config/kb/`, …) |
|
||||
| layout | XDG (`~/.local/share/kebab/`, `~/.config/kebab/`, …) |
|
||||
|
||||
전체는 [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](docs/superpowers/specs/2026-04-27-kb-final-form-design.md) 참조.
|
||||
전체는 [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](docs/superpowers/specs/2026-04-27-kebab-final-form-design.md) 참조.
|
||||
|
||||
---
|
||||
|
||||
## 의존성 그래프
|
||||
|
||||
```text
|
||||
kb-cli, kb-tui, kb-desktop
|
||||
└─> kb-app
|
||||
├─> kb-source-fs
|
||||
├─> kb-parse-md / kb-parse-pdf / kb-parse-image / kb-parse-audio
|
||||
│ └─> kb-parse-types
|
||||
├─> kb-normalize
|
||||
│ └─> kb-parse-types
|
||||
├─> kb-chunk
|
||||
├─> kb-store-sqlite
|
||||
├─> kb-store-vector
|
||||
├─> kb-embed-local (kb-embed trait crate)
|
||||
├─> kb-search
|
||||
├─> kb-llm-local (kb-llm trait crate)
|
||||
├─> kb-rag
|
||||
├─> kb-eval
|
||||
└─> kb-config
|
||||
└─> kb-core (모두 의존)
|
||||
kebab-cli, kebab-tui, kebab-desktop
|
||||
└─> kebab-app
|
||||
├─> kebab-source-fs
|
||||
├─> kebab-parse-md / kebab-parse-pdf / kebab-parse-image / kebab-parse-audio
|
||||
│ └─> kebab-parse-types
|
||||
├─> kebab-normalize
|
||||
│ └─> kebab-parse-types
|
||||
├─> kebab-chunk
|
||||
├─> kebab-store-sqlite
|
||||
├─> kebab-store-vector
|
||||
├─> kebab-embed-local (kebab-embed trait crate)
|
||||
├─> kebab-search
|
||||
├─> kebab-llm-local (kebab-llm trait crate)
|
||||
├─> kebab-rag
|
||||
├─> kebab-eval
|
||||
└─> kebab-config
|
||||
└─> kebab-core (모두 의존)
|
||||
```
|
||||
|
||||
UI → store/llm/parse 직접 의존 금지. 모든 user-facing 진입은 `kb-app` facade 만 통한다 (design §8). `kb-cli` 가 `--config <path>` flag 를 honor 하려면 `kb_app::*_with_config(cfg, …)` companion 을 통해 Config 을 명시적으로 thread 하는 패턴 — 자세한 이유는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md) 의 `--config` 항목.
|
||||
UI → store/llm/parse 직접 의존 금지. 모든 user-facing 진입은 `kebab-app` facade 만 통한다 (design §8). `kebab-cli` 가 `--config <path>` flag 를 honor 하려면 `kebab_app::*_with_config(cfg, …)` companion 을 통해 Config 을 명시적으로 thread 하는 패턴 — 자세한 이유는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md) 의 `--config` 항목.
|
||||
|
||||
---
|
||||
|
||||
@@ -80,16 +80,16 @@ UI → store/llm/parse 직접 의존 금지. 모든 user-facing 진입은 `kb-ap
|
||||
|
||||
| Phase | 내용 | 핵심 산출 crate | 선행 | 상태 |
|
||||
|-------|------|----------------|------|------|
|
||||
| **P0** | Workspace 뼈대 + 도메인 계약 + ID recipe | `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` | – | ✅ 완료 |
|
||||
| **P1** | Markdown ingestion (walk → parse → chunk → SQLite) | `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-sqlite` | P0 | ✅ 완료 |
|
||||
| **P2** | SQLite FTS5 lexical 검색 + citation | `kb-search` (lexical) | P1 | ✅ 완료 |
|
||||
| **P3** | Local embedding + LanceDB + hybrid (RRF) + kb-app wiring | `kb-embed`, `kb-embed-local`, `kb-store-vector`, `kb-search` | P2 | ✅ 완료 |
|
||||
| **P4** | Local LLM + RAG + grounded answer | `kb-llm`, `kb-llm-local`, `kb-rag` | P3 | ✅ 완료 |
|
||||
| **P5** | Golden query / regression eval | `kb-eval` | P4 | ⏳ 다음 |
|
||||
| **P6** | 이미지 ingestion (OCR + caption) | `kb-parse-image` | P5 | ⏳ |
|
||||
| **P7** | PDF text + page citation | `kb-parse-pdf` | P5 | ⏳ |
|
||||
| **P8** | 음성 transcription + timestamp citation | `kb-parse-audio` | P5 | ⏳ |
|
||||
| **P9** | TUI + desktop app | `kb-tui`, `kb-desktop` | P5 | ⏳ |
|
||||
| **P0** | Workspace 뼈대 + 도메인 계약 + ID recipe | `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` | – | ✅ 완료 |
|
||||
| **P1** | Markdown ingestion (walk → parse → chunk → SQLite) | `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-sqlite` | P0 | ✅ 완료 |
|
||||
| **P2** | SQLite FTS5 lexical 검색 + citation | `kebab-search` (lexical) | P1 | ✅ 완료 |
|
||||
| **P3** | Local embedding + LanceDB + hybrid (RRF) + kebab-app wiring | `kebab-embed`, `kebab-embed-local`, `kebab-store-vector`, `kebab-search` | P2 | ✅ 완료 |
|
||||
| **P4** | Local LLM + RAG + grounded answer | `kebab-llm`, `kebab-llm-local`, `kebab-rag` | P3 | ✅ 완료 |
|
||||
| **P5** | Golden query / regression eval | `kebab-eval` | P4 | ⏳ 다음 |
|
||||
| **P6** | 이미지 ingestion (OCR + caption) | `kebab-parse-image` | P5 | ⏳ |
|
||||
| **P7** | PDF text + page citation | `kebab-parse-pdf` | P5 | ⏳ |
|
||||
| **P8** | 음성 transcription + timestamp citation | `kebab-parse-audio` | P5 | ⏳ |
|
||||
| **P9** | TUI + desktop app | `kebab-tui`, `kebab-desktop` | P5 | ⏳ |
|
||||
|
||||
P0~P5 직렬. P6~P9 P5 이후 병렬 가능.
|
||||
|
||||
@@ -100,13 +100,13 @@ P0~P5 직렬. P6~P9 P5 이후 병렬 가능.
|
||||
## 디렉토리 구조
|
||||
|
||||
```text
|
||||
kb/
|
||||
kebab/
|
||||
├── README.md # 이 파일
|
||||
├── kb_local_rust_report.md # 최초 설계 보고서 (방향성 + 근거)
|
||||
├── kebab_local_rust_report.md # 최초 설계 보고서 (방향성 + 근거)
|
||||
├── docs/
|
||||
│ ├── superpowers/
|
||||
│ │ ├── specs/
|
||||
│ │ │ └── 2026-04-27-kb-final-form-design.md # frozen design (12 sections)
|
||||
│ │ │ └── 2026-04-27-kebab-final-form-design.md # frozen design (12 sections)
|
||||
│ │ └── plans/
|
||||
│ │ └── 2026-04-27-task-decomposition.md # task 분해 implementation plan
|
||||
│ ├── SMOKE.md # 로컬 워크스페이스에 직접 돌려보는 절차
|
||||
@@ -127,19 +127,19 @@ kb/
|
||||
│ ├── p8/p8-1, p8-2 # (2)
|
||||
│ └── p9/p9-1 … p9-5 # (5)
|
||||
├── crates/
|
||||
│ ├── kb-core/ kb-parse-types/ kb-config/ # 도메인 + 설정 (P0)
|
||||
│ ├── kb-source-fs/ # 워크스페이스 walk + checksum (P1-1)
|
||||
│ ├── kb-parse-md/ # Markdown frontmatter + blocks (P1-2/3)
|
||||
│ ├── kb-normalize/ # ParsedBlock → CanonicalDocument (P1-4)
|
||||
│ ├── kb-chunk/ # heading-aware chunker (P1-5)
|
||||
│ ├── kb-store-sqlite/ # SQLite + FTS5 (V001/V002/V003) (P1-6, P2-1, P3-3)
|
||||
│ ├── kb-search/ # Lexical + Vector + Hybrid retriever (P2-2, P3-4)
|
||||
│ ├── kb-embed/ kb-embed-local/ # Embedder trait + fastembed adapter (P3-1, P3-2)
|
||||
│ ├── kb-store-vector/ # LanceDB VectorStore (P3-3)
|
||||
│ ├── kb-llm/ kb-llm-local/ # LanguageModel trait + Ollama adapter (P4-1, P4-2)
|
||||
│ ├── kb-rag/ # RAG pipeline (P4-3)
|
||||
│ ├── kb-app/ # facade (P0 시그니처 + P3-5 본체)
|
||||
│ └── kb-cli/ # binary (P0 → 핫픽스로 --config flag wiring 강화)
|
||||
│ ├── kebab-core/ kebab-parse-types/ kebab-config/ # 도메인 + 설정 (P0)
|
||||
│ ├── kebab-source-fs/ # 워크스페이스 walk + checksum (P1-1)
|
||||
│ ├── kebab-parse-md/ # Markdown frontmatter + blocks (P1-2/3)
|
||||
│ ├── kebab-normalize/ # ParsedBlock → CanonicalDocument (P1-4)
|
||||
│ ├── kebab-chunk/ # heading-aware chunker (P1-5)
|
||||
│ ├── kebab-store-sqlite/ # SQLite + FTS5 (V001/V002/V003) (P1-6, P2-1, P3-3)
|
||||
│ ├── kebab-search/ # Lexical + Vector + Hybrid retriever (P2-2, P3-4)
|
||||
│ ├── kebab-embed/ kebab-embed-local/ # Embedder trait + fastembed adapter (P3-1, P3-2)
|
||||
│ ├── kebab-store-vector/ # LanceDB VectorStore (P3-3)
|
||||
│ ├── kebab-llm/ kebab-llm-local/ # LanguageModel trait + Ollama adapter (P4-1, P4-2)
|
||||
│ ├── kebab-rag/ # RAG pipeline (P4-3)
|
||||
│ ├── kebab-app/ # facade (P0 시그니처 + P3-5 본체)
|
||||
│ └── kebab-cli/ # binary (P0 → 핫픽스로 --config flag wiring 강화)
|
||||
├── migrations/ # SQLite refinery V001/V002/V003
|
||||
└── fixtures/ # 테스트 fixture 트리
|
||||
```
|
||||
@@ -153,19 +153,19 @@ kb/
|
||||
cargo build --release
|
||||
|
||||
# 첫 실행 — XDG 경로에 config.toml 생성
|
||||
./target/release/kb init
|
||||
./target/release/kebab init
|
||||
|
||||
# config 손보고
|
||||
${EDITOR:-vi} ~/.config/kb/config.toml
|
||||
${EDITOR:-vi} ~/.config/kebab/config.toml
|
||||
|
||||
# 색인
|
||||
./target/release/kb ingest
|
||||
./target/release/kebab ingest
|
||||
|
||||
# 검색
|
||||
./target/release/kb search "Markdown chunking 규칙" --mode hybrid
|
||||
./target/release/kebab search "Markdown chunking 규칙" --mode hybrid
|
||||
|
||||
# 질문 (Ollama 필요)
|
||||
./target/release/kb ask "내 KB 설계에서 저장소 전략은?"
|
||||
./target/release/kebab ask "내 KB 설계에서 저장소 전략은?"
|
||||
```
|
||||
|
||||
워크스페이스를 격리해서 직접 돌려보는 패턴은 [docs/SMOKE.md](docs/SMOKE.md) 참조 — `--config <path>` 로 임시 디렉토리에 격리된 KB 를 만들 수 있다.
|
||||
@@ -181,17 +181,17 @@ ${EDITOR:-vi} ~/.config/kb/config.toml
|
||||
- multi-workspace (P+ 후순위)
|
||||
- LLM-as-judge eval (rule-based `must_contain` 만)
|
||||
- visual embedding (CLIP) — P+
|
||||
- desktop app `kb://` protocol handler — P+
|
||||
- desktop app `kebab://` protocol handler — P+
|
||||
|
||||
---
|
||||
|
||||
## 외부 AI 통합
|
||||
|
||||
`kb` 의 `--json` 모드 + frozen wire schema v1 은 외부 자동화의 stable contract. 가능한 통합:
|
||||
`kebab` 의 `--json` 모드 + frozen wire schema v1 은 외부 자동화의 stable contract. 가능한 통합:
|
||||
|
||||
1. **Claude Code / Codex skill** — 얇은 wrapper (`kb search --json` / `kb ask --json` 호출). ~50 lines.
|
||||
2. **MCP server** — `kb-mcp` binary (stdio JSON-RPC) 가 `kb-app` facade 를 1:1 노출. Claude Desktop / Cursor / Zed 등 공유.
|
||||
3. **HTTP wrapper** — `kb serve --bind 127.0.0.1:7711` (P+, local-only 가치 깨므로 신중).
|
||||
1. **Claude Code / Codex skill** — 얇은 wrapper (`kebab search --json` / `kebab ask --json` 호출). ~50 lines.
|
||||
2. **MCP server** — `kebab-mcp` binary (stdio JSON-RPC) 가 `kebab-app` facade 를 1:1 노출. Claude Desktop / Cursor / Zed 등 공유.
|
||||
3. **HTTP wrapper** — `kebab serve --bind 127.0.0.1:7711` (P+, local-only 가치 깨므로 신중).
|
||||
|
||||
---
|
||||
|
||||
@@ -199,7 +199,7 @@ ${EDITOR:-vi} ~/.config/kb/config.toml
|
||||
|
||||
이 repo 는 단일 사용자 프로젝트지만 spec 변경 절차는 명문화되어 있다.
|
||||
|
||||
1. **frozen design 변경** — `docs/superpowers/specs/2026-04-27-kb-final-form-design.md` 가 단일 contract. 변경 시 영향 받는 component task 모두 동시 갱신 필요. PR 1개로 묶기.
|
||||
1. **frozen design 변경** — `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` 가 단일 contract. 변경 시 영향 받는 component task 모두 동시 갱신 필요. PR 1개로 묶기.
|
||||
2. **새 component task 추가** — `tasks/_template.md` 복사 후 `tasks/p<phase>/p<phase>-<n>-<name>.md` 생성. `contract_sections` 에 design doc 섹션 명시. `Allowed/Forbidden dependencies` 는 design §8 module-boundary 표 따름.
|
||||
3. **구현** — component task 1개당 sub-agent 1세션 권장. `cargo test -p <crate>` + DoD 체크리스트 통과. PR 으로 머지.
|
||||
4. **버전 변경** — `parser_version` / `chunker_version` / `embedding_version` 등 변경은 design §9 의 cascade rule 따름. 영향 받는 record 는 재처리 필요.
|
||||
@@ -215,8 +215,8 @@ ${EDITOR:-vi} ~/.config/kb/config.toml
|
||||
|
||||
## 참고
|
||||
|
||||
- 최초 설계 보고서: [kb_local_rust_report.md](kb_local_rust_report.md)
|
||||
- Frozen design: [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](docs/superpowers/specs/2026-04-27-kb-final-form-design.md)
|
||||
- 최초 설계 보고서: [kebab_local_rust_report.md](kebab_local_rust_report.md)
|
||||
- Frozen design: [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](docs/superpowers/specs/2026-04-27-kebab-final-form-design.md)
|
||||
- Task 분해 plan: [docs/superpowers/plans/2026-04-27-task-decomposition.md](docs/superpowers/plans/2026-04-27-task-decomposition.md)
|
||||
- Task 인덱스: [tasks/INDEX.md](tasks/INDEX.md)
|
||||
- Post-merge 핫픽스 로그: [tasks/HOTFIXES.md](tasks/HOTFIXES.md)
|
||||
|
||||
@@ -1,21 +1,21 @@
|
||||
---
|
||||
title: "kb 스모크 실행 가이드"
|
||||
title: "kebab 스모크 실행 가이드"
|
||||
date: 2026-05-01
|
||||
---
|
||||
|
||||
# kb 스모크 실행 가이드
|
||||
# kebab 스모크 실행 가이드
|
||||
|
||||
P3-5 머지 후 (`kb-app::ingest` / `search` / `list` / `inspect` 와이어링) 부터, 그리고 P4-3 머지 후 (`kb ask` 와이어링) 부터 사용자가 자기 설치본을 직접 검증할 수 있다. 이 문서는 사용자 환경 (`~/.config/kb/`, `~/.local/share/kb/`) 을 건드리지 않고 임시 디렉토리에 격리된 KB 를 띄워 전체 파이프라인을 1세션 안에 한 번 돌리는 절차다.
|
||||
P3-5 머지 후 (`kebab-app::ingest` / `search` / `list` / `inspect` 와이어링) 부터, 그리고 P4-3 머지 후 (`kebab ask` 와이어링) 부터 사용자가 자기 설치본을 직접 검증할 수 있다. 이 문서는 사용자 환경 (`~/.config/kebab/`, `~/.local/share/kebab/`) 을 건드리지 않고 임시 디렉토리에 격리된 KB 를 띄워 전체 파이프라인을 1세션 안에 한 번 돌리는 절차다.
|
||||
|
||||
## 준비
|
||||
|
||||
빌드:
|
||||
|
||||
```bash
|
||||
cargo build --release -p kb-cli # debug 도 무방. 디버그가 더 빠르게 빌드됨.
|
||||
cargo build --release -p kebab-cli # debug 도 무방. 디버그가 더 빠르게 빌드됨.
|
||||
```
|
||||
|
||||
원격 Ollama (선택, `kb ask` 만 필요):
|
||||
원격 Ollama (선택, `kebab ask` 만 필요):
|
||||
|
||||
```bash
|
||||
# Mac 등 별도 호스트에서
|
||||
@@ -34,8 +34,8 @@ curl http://<host>:11434/api/tags
|
||||
## 격리된 워크스페이스 생성
|
||||
|
||||
```bash
|
||||
mkdir -p /tmp/kb-smoke/{workspace,data}
|
||||
cat > /tmp/kb-smoke/workspace/intro.md <<'EOF'
|
||||
mkdir -p /tmp/kebab-smoke/{workspace,data}
|
||||
cat > /tmp/kebab-smoke/workspace/intro.md <<'EOF'
|
||||
---
|
||||
title: 인사말
|
||||
tags: [demo]
|
||||
@@ -51,19 +51,19 @@ EOF
|
||||
|
||||
## 격리된 config
|
||||
|
||||
`/tmp/kb-smoke/config.toml`:
|
||||
`/tmp/kebab-smoke/config.toml`:
|
||||
|
||||
```toml
|
||||
schema_version = 1
|
||||
|
||||
[workspace]
|
||||
root = "/tmp/kb-smoke/workspace"
|
||||
root = "/tmp/kebab-smoke/workspace"
|
||||
include = ["**/*.md"]
|
||||
exclude = [".git/**", "node_modules/**", ".obsidian/**"]
|
||||
|
||||
[storage]
|
||||
data_dir = "/tmp/kb-smoke/data"
|
||||
sqlite = "{data_dir}/kb.sqlite"
|
||||
data_dir = "/tmp/kebab-smoke/data"
|
||||
sqlite = "{data_dir}/kebab.sqlite"
|
||||
vector_dir = "{data_dir}/lancedb"
|
||||
asset_dir = "{data_dir}/assets"
|
||||
artifact_dir = "{data_dir}/artifacts"
|
||||
@@ -110,12 +110,12 @@ explain_default = false
|
||||
max_context_tokens = 6000
|
||||
```
|
||||
|
||||
`KB_*` 환경변수로 override 가능 (`KB_MODELS_LLM_MODEL=qwen2.5:32b kb …` 등). 자세한 키 목록은 `crates/kb-config/src/lib.rs` 의 `apply_env` 매치 암.
|
||||
`KEBAB_*` 환경변수로 override 가능 (`KEBAB_MODELS_LLM_MODEL=qwen2.5:32b kebab …` 등). 자세한 키 목록은 `crates/kebab-config/src/lib.rs` 의 `apply_env` 매치 암.
|
||||
|
||||
## 명령 시퀀스
|
||||
|
||||
```bash
|
||||
KB() { ./target/debug/kb --config /tmp/kb-smoke/config.toml "$@"; }
|
||||
KEBAB() { ./target/debug/kebab --config /tmp/kebab-smoke/config.toml "$@"; }
|
||||
|
||||
KB doctor # 1. health check
|
||||
KB ingest # 2. 워크스페이스 색인
|
||||
@@ -128,31 +128,31 @@ KB ask "이 KB 안에서 ..." --mode hybrid --k 5 # 8. RAG 답변 (Ollama
|
||||
KB --json ask "..." --mode hybrid # 9. 기계 친화 출력 검증
|
||||
```
|
||||
|
||||
각 명령은 0 종료 코드면 정상. `kb ask` 는 거절 시 종료 코드 1 (`RefusalSignal`) — 의도된 동작.
|
||||
각 명령은 0 종료 코드면 정상. `kebab ask` 는 거절 시 종료 코드 1 (`RefusalSignal`) — 의도된 동작.
|
||||
|
||||
## 검증 체크리스트
|
||||
|
||||
- `kb doctor` 가 `--config` path 를 honor 하고 그 안의 `storage.data_dir` 를 출력 (XDG default 가 아님).
|
||||
- `kb ingest` idempotent — 두 번째 실행이 `new=0 updated=N`.
|
||||
- `kb list docs` 출력에 frontmatter 의 `title` 이 아닌 deterministic `doc_id` (32-hex) + `workspace_path` 가 보임.
|
||||
- `kb search --mode hybrid` 의 `fusion_score` 가 `[0, 1]` 범위 (top-1 종종 1.0 — 두 retriever 모두 rank 1 일 때).
|
||||
- `kb ask` JSON 응답에 `model.id` 가 config 의 모델 (`gemma4:26b` 등) 과 일치, `embedding.id = multilingual-e5-small`, `citations[].marker` 가 `[1]` / `[2]` 형식 (square-bracketed bare index).
|
||||
- 코퍼스에 없는 주제로 `kb ask` → `refusal_reason: "llm_self_judge"` (또는 `no_chunks` / `score_gate`) + `grounded: false`.
|
||||
- `kebab doctor` 가 `--config` path 를 honor 하고 그 안의 `storage.data_dir` 를 출력 (XDG default 가 아님).
|
||||
- `kebab ingest` idempotent — 두 번째 실행이 `new=0 updated=N`.
|
||||
- `kebab list docs` 출력에 frontmatter 의 `title` 이 아닌 deterministic `doc_id` (32-hex) + `workspace_path` 가 보임.
|
||||
- `kebab search --mode hybrid` 의 `fusion_score` 가 `[0, 1]` 범위 (top-1 종종 1.0 — 두 retriever 모두 rank 1 일 때).
|
||||
- `kebab ask` JSON 응답에 `model.id` 가 config 의 모델 (`gemma4:26b` 등) 과 일치, `embedding.id = multilingual-e5-small`, `citations[].marker` 가 `[1]` / `[2]` 형식 (square-bracketed bare index).
|
||||
- 코퍼스에 없는 주제로 `kebab ask` → `refusal_reason: "llm_self_judge"` (또는 `no_chunks` / `score_gate`) + `grounded: false`.
|
||||
|
||||
## 정리
|
||||
|
||||
```bash
|
||||
rm -rf /tmp/kb-smoke/data # 데이터만 날리고 다시 ingest 가능
|
||||
rm -rf /tmp/kb-smoke # 통째로 정리
|
||||
rm -rf /tmp/kebab-smoke/data # 데이터만 날리고 다시 ingest 가능
|
||||
rm -rf /tmp/kebab-smoke # 통째로 정리
|
||||
```
|
||||
|
||||
`~/.config/kb/` 와 `~/.local/share/kb/` 는 한 번도 터치되지 않는다 (`--config` flag 가 정확히 honor 되는 경우 — P3-5 hotfix 이후 보장).
|
||||
`~/.config/kebab/` 와 `~/.local/share/kebab/` 는 한 번도 터치되지 않는다 (`--config` flag 가 정확히 honor 되는 경우 — P3-5 hotfix 이후 보장).
|
||||
|
||||
## 알려진 동작
|
||||
|
||||
- 첫 `kb ingest` 시 fastembed 모델 다운로드 (~470MB) — `data_dir/models/fastembed/` 에 캐시.
|
||||
- `kb ask` 응답 시간 = LLM 토큰 throughput 에 종속. M4 Pro 48GB + gemma4:26b 기준 답변 50–100 토큰에 20–55초.
|
||||
- `--config` path 가 존재하지 않거나 malformed 면 `kb doctor` 가 hard fail (defaults 가 silently mask 하지 않게 하는 hotfix 동작).
|
||||
- 첫 `kebab ingest` 시 fastembed 모델 다운로드 (~470MB) — `data_dir/models/fastembed/` 에 캐시.
|
||||
- `kebab ask` 응답 시간 = LLM 토큰 throughput 에 종속. M4 Pro 48GB + gemma4:26b 기준 답변 50–100 토큰에 20–55초.
|
||||
- `--config` path 가 존재하지 않거나 malformed 면 `kebab doctor` 가 hard fail (defaults 가 silently mask 하지 않게 하는 hotfix 동작).
|
||||
- 매 CLI invocation 마다 fastembed 모델 init 비용 (~4초) — process-level 캐시 부재 때문. P9 TUI 진입 시 `App` 의 `OnceLock` 으로 세션 동안 한 번만 init.
|
||||
|
||||
자세한 history 와 발견된 버그는 [tasks/HOTFIXES.md](../tasks/HOTFIXES.md) 참조.
|
||||
|
||||
@@ -5,8 +5,8 @@ When implementing tasks against this codebase:
|
||||
- Treat the frozen design doc as the single source of truth. Do not invent
|
||||
new fields, traits, or enum variants.
|
||||
- Prefer editing existing files to creating new ones; reuse types from
|
||||
`kb-core` instead of duplicating shapes.
|
||||
`kebab-core` instead of duplicating shapes.
|
||||
- For each task, follow the task spec under `tasks/p<N>/p<N>-<i>.md`.
|
||||
|
||||
Canonical source:
|
||||
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §11 + §12.
|
||||
[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §11 + §12.
|
||||
|
||||
@@ -4,4 +4,4 @@ Medium-agnostic representation of a document with `Block`s, `SourceSpan`s,
|
||||
and provenance.
|
||||
|
||||
Canonical source:
|
||||
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3.4 + §3.7a.
|
||||
[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3.4 + §3.7a.
|
||||
|
||||
@@ -5,4 +5,4 @@
|
||||
`policy_hash` so chunk IDs include the policy.
|
||||
|
||||
Canonical source:
|
||||
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3.5 + §7.1 + §7.2.
|
||||
[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3.5 + §7.1 + §7.2.
|
||||
|
||||
@@ -4,4 +4,4 @@ Citations use W3C Media Fragments URIs to locate evidence inside a
|
||||
document. Five variants: `Line`, `Page`, `Region`, `Caption`, `Time`.
|
||||
|
||||
Canonical source:
|
||||
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3.5 + §0 Q3.
|
||||
[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3.5 + §0 Q3.
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Domain model
|
||||
|
||||
The domain types live in `kb-core` and mirror the frozen design exactly.
|
||||
The domain types live in `kebab-core` and mirror the frozen design exactly.
|
||||
|
||||
Canonical source:
|
||||
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3.
|
||||
[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3.
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# ID recipe
|
||||
|
||||
All `kb-*` IDs are 32 hex chars: the first 32 of `blake3(canonical_json(tuple))`.
|
||||
All `kebab-*` IDs are 32 hex chars: the first 32 of `blake3(canonical_json(tuple))`.
|
||||
|
||||
Canonical source:
|
||||
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §4.
|
||||
[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §4.
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
# Module boundaries
|
||||
|
||||
`kb-core` is leaf — every other crate depends on it. Parsers depend on
|
||||
`kb-parse-types` (not on `kb-normalize`); `kb-normalize` depends on
|
||||
`kb-parse-types` (not on parsers). UI crates depend only on `kb-app`.
|
||||
`kebab-core` is leaf — every other crate depends on it. Parsers depend on
|
||||
`kebab-parse-types` (not on `kebab-normalize`); `kebab-normalize` depends on
|
||||
`kebab-parse-types` (not on parsers). UI crates depend only on `kebab-app`.
|
||||
|
||||
Canonical source:
|
||||
[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §8.
|
||||
[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §8.
|
||||
|
||||
@@ -8,11 +8,11 @@
|
||||
- Phase A — Author the canonical task spec template (`tasks/_template.md`).
|
||||
- Phase B — Decompose P1 (Markdown ingestion) into 6 component task specs to validate the template.
|
||||
- Phase C — After Phase B passes review, decompose P0 + P2..P9 into the remaining ~24 task specs in one pass.
|
||||
- Each task spec cites only the frozen design doc (`docs/superpowers/specs/2026-04-27-kb-final-form-design.md`) for types, traits, schema, layout. No new domain types or traits are introduced inside task specs.
|
||||
- Each task spec cites only the frozen design doc (`docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`) for types, traits, schema, layout. No new domain types or traits are introduced inside task specs.
|
||||
|
||||
**Tech Stack:** Plain Markdown documents under `tasks/`. No code changes in this plan — produces specs that AI sub-agents will later implement against.
|
||||
|
||||
**Frozen contract source:** [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../specs/2026-04-27-kb-final-form-design.md). All task specs reference this. Modifications to the contract require updating that file first, then re-checking dependent task specs.
|
||||
**Frozen contract source:** [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../specs/2026-04-27-kebab-final-form-design.md). All task specs reference this. Modifications to the contract require updating that file first, then re-checking dependent task specs.
|
||||
|
||||
**Phase task index (target file layout):**
|
||||
|
||||
@@ -83,7 +83,7 @@ Existing per-phase epic files (`tasks/phase-0-skeleton.md` … `phase-9-ui.md`)
|
||||
- [ ] **Step 1: Verify the design doc path resolves**
|
||||
|
||||
```bash
|
||||
test -f docs/superpowers/specs/2026-04-27-kb-final-form-design.md && echo OK
|
||||
test -f docs/superpowers/specs/2026-04-27-kebab-final-form-design.md && echo OK
|
||||
```
|
||||
|
||||
Expected: `OK`
|
||||
@@ -101,7 +101,7 @@ title: "<Component title>"
|
||||
status: planned
|
||||
depends_on: [] # other task_ids
|
||||
unblocks: [] # other task_ids
|
||||
contract_source: ../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
|
||||
---
|
||||
|
||||
@@ -117,7 +117,7 @@ contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kebab-core`
|
||||
- <other crates per design §8>
|
||||
- <external crates with versions>
|
||||
|
||||
@@ -166,7 +166,7 @@ If any item here is needed during implementation, STOP and update the frozen des
|
||||
| unit | ... | ... |
|
||||
| snapshot | ... (JSON freeze) | `fixtures/...` |
|
||||
| contract | trait round-trip | mock impls |
|
||||
| integration | end-to-end via `kb-app` facade | tmp workspace |
|
||||
| integration | end-to-end via `kebab-app` facade | tmp workspace |
|
||||
|
||||
All tests must run under `cargo test -p <crate>` and not require external network or Ollama unless explicitly stated.
|
||||
|
||||
@@ -247,7 +247,7 @@ git add tasks/p1 tasks/INDEX.md
|
||||
git commit -m "tasks: prepare P1 component decomposition skeleton"
|
||||
```
|
||||
|
||||
### Task B1: `p1-1-source-fs.md` (kb-source-fs)
|
||||
### Task B1: `p1-1-source-fs.md` (kebab-source-fs)
|
||||
|
||||
**Files:**
|
||||
- Create: `tasks/p1/p1-1-source-fs.md`
|
||||
@@ -259,13 +259,13 @@ Write the following content to `tasks/p1/p1-1-source-fs.md`:
|
||||
````markdown
|
||||
---
|
||||
phase: P1
|
||||
component: kb-source-fs
|
||||
component: kebab-source-fs
|
||||
task_id: p1-1
|
||||
title: "Local filesystem source connector"
|
||||
status: planned
|
||||
depends_on: [p0-1]
|
||||
unblocks: [p1-2, p1-3, p1-4, p1-5, p1-6]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.3, §6.2, §6.6, §7.1, §7.2 SourceConnector, §8]
|
||||
---
|
||||
|
||||
@@ -281,8 +281,8 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `ignore` (gitignore semantics)
|
||||
- `blake3`
|
||||
- `walkdir`
|
||||
@@ -293,21 +293,21 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `SourceScope` | `kb_core::SourceScope` | `kb-app` from config |
|
||||
| `SourceScope` | `kebab_core::SourceScope` | `kebab-app` from config |
|
||||
| filesystem | `&Path` | OS |
|
||||
| `.kbignore` | text file | workspace root, optional |
|
||||
| `.kebabignore` | text file | workspace root, optional |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream consumer |
|
||||
|--------|------|---------------------|
|
||||
| `Vec<RawAsset>` | `kb_core::RawAsset` | `kb-parse-md`, asset writer in `kb-store-sqlite` (via `kb-app`) |
|
||||
| `Vec<RawAsset>` | `kebab_core::RawAsset` | `kebab-parse-md`, asset writer in `kebab-store-sqlite` (via `kebab-app`) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -315,11 +315,11 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
|
||||
pub struct FsSourceConnector { /* internal */ }
|
||||
|
||||
impl FsSourceConnector {
|
||||
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
|
||||
}
|
||||
|
||||
impl kb_core::SourceConnector for FsSourceConnector {
|
||||
fn scan(&self, scope: &kb_core::SourceScope) -> anyhow::Result<Vec<kb_core::RawAsset>>;
|
||||
impl kebab_core::SourceConnector for FsSourceConnector {
|
||||
fn scan(&self, scope: &kebab_core::SourceScope) -> anyhow::Result<Vec<kebab_core::RawAsset>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -329,7 +329,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
|
||||
- `asset_id` derived per design §4.2 from `blake3(raw bytes)` full hex.
|
||||
- `media_type` selected from extension + libmagic-like sniff fallback (`.md` → Markdown, others fall through to `MediaType::Other`).
|
||||
- `discovered_at` = current `OffsetDateTime::now_utc()` at scan time.
|
||||
- Combine `config.workspace.exclude` ∪ `.kbignore` for filter (union; ordering does not matter).
|
||||
- Combine `config.workspace.exclude` ∪ `.kebabignore` for filter (union; ordering does not matter).
|
||||
- Symbolic links: follow once, detect cycles via `canonicalize` + visited set.
|
||||
- Files larger than `storage.copy_threshold_mb` MB → emit `AssetStorage::Reference { path, sha }` (do not copy bytes here; copying is done by the asset writer task).
|
||||
- Idempotent: same input → same `Vec<RawAsset>` (sort by `workspace_path`).
|
||||
@@ -337,7 +337,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
|
||||
## Storage / wire effects
|
||||
|
||||
- Reads: filesystem under `config.workspace.root`.
|
||||
- Writes: nothing. (Asset copy is handled by the asset writer in `kb-store-sqlite`.)
|
||||
- Writes: nothing. (Asset copy is handled by the asset writer in `kebab-store-sqlite`.)
|
||||
|
||||
## Test plan
|
||||
|
||||
@@ -346,25 +346,25 @@ impl kb_core::SourceConnector for FsSourceConnector {
|
||||
| unit | POSIX path normalization | inline cases incl. `./a/b.md`, `a//b.md`, `a/b.md` → identical |
|
||||
| unit | blake3 of known bytes matches expected hex | inline |
|
||||
| unit | gitignore filter (`*.tmp`, `node_modules/**`) excludes correctly | tmp tree built in test |
|
||||
| unit | `.kbignore` ∪ config exclude works | tmp tree |
|
||||
| unit | `.kebabignore` ∪ config exclude works | tmp tree |
|
||||
| unit | symlink cycle does not loop | tmp tree with `a -> b -> a` |
|
||||
| snapshot | `Vec<RawAsset>` serialized JSON for fixture tree is stable | `fixtures/source-fs/tree-1` |
|
||||
| determinism | re-running scan twice produces byte-identical JSON | `fixtures/source-fs/tree-1` |
|
||||
|
||||
All tests run under `cargo test -p kb-source-fs` with no network and no model.
|
||||
All tests run under `cargo test -p kebab-source-fs` with no network and no model.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-source-fs` passes
|
||||
- [ ] `cargo test -p kb-source-fs` passes
|
||||
- [ ] `cargo check -p kebab-source-fs` passes
|
||||
- [ ] `cargo test -p kebab-source-fs` passes
|
||||
- [ ] Snapshot test `fixtures/source-fs/tree-1` round-trips deterministically
|
||||
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kb-source-fs`)
|
||||
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kebab-source-fs`)
|
||||
- [ ] PR description links to design §3.3, §6.2, §7.2
|
||||
|
||||
## Out of scope
|
||||
|
||||
- File watching (P+).
|
||||
- Asset copy/reference storage on disk (`kb-store-sqlite` task p1-6).
|
||||
- Asset copy/reference storage on disk (`kebab-store-sqlite` task p1-6).
|
||||
- Non-fs source connectors (HTTP, S3 — P+).
|
||||
|
||||
## Risks / notes
|
||||
@@ -400,13 +400,13 @@ Write to `tasks/p1/p1-2-parse-md-frontmatter.md`:
|
||||
````markdown
|
||||
---
|
||||
phase: P1
|
||||
component: kb-parse-md (frontmatter submodule)
|
||||
component: kebab-parse-md (frontmatter submodule)
|
||||
task_id: p1-2
|
||||
title: "Markdown frontmatter parsing → Metadata"
|
||||
status: planned
|
||||
depends_on: [p0-1]
|
||||
unblocks: [p1-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.6 Metadata, §0 Q9 frontmatter, §10 errors]
|
||||
---
|
||||
|
||||
@@ -414,15 +414,15 @@ contract_sections: [§3.6 Metadata, §0 Q9 frontmatter, §10 errors]
|
||||
|
||||
## Goal
|
||||
|
||||
Parse YAML/TOML frontmatter from Markdown bytes into `kb_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
|
||||
Parse YAML/TOML frontmatter from Markdown bytes into `kebab_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kb-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
|
||||
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kebab-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kebab-core`
|
||||
- `serde`
|
||||
- `serde_yaml` (or `yaml-rust2`) for YAML
|
||||
- `toml` for TOML
|
||||
@@ -432,7 +432,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-tui`, `kb-desktop`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `pulldown-cmark` (block parser is a sibling task)
|
||||
- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-tui`, `kebab-desktop`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `pulldown-cmark` (block parser is a sibling task)
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -445,7 +445,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `(Metadata, Option<FrontmatterSpan>, Vec<Warning>)` | tuple | `kb-normalize` → CanonicalDocument |
|
||||
| `(Metadata, Option<FrontmatterSpan>, Vec<Warning>)` | tuple | `kebab-normalize` → CanonicalDocument |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -453,7 +453,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
|
||||
pub fn parse_frontmatter(
|
||||
bytes: &[u8],
|
||||
hints: &BodyHints,
|
||||
) -> anyhow::Result<(kb_core::Metadata, Option<FrontmatterSpan>, Vec<Warning>)>;
|
||||
) -> anyhow::Result<(kebab_core::Metadata, Option<FrontmatterSpan>, Vec<Warning>)>;
|
||||
```
|
||||
|
||||
`FrontmatterSpan` and `Warning` are crate-internal helpers; if any new public type is needed, STOP and update the frozen design doc first.
|
||||
@@ -490,12 +490,12 @@ pub fn parse_frontmatter(
|
||||
| snapshot | `fixtures/markdown/frontmatter-only.md` produces stable JSON | fixture |
|
||||
| snapshot | mixed-language body with no `lang:` detects `ko` or `en` | `fixtures/markdown/mixed-lang.md` |
|
||||
|
||||
All tests under `cargo test -p kb-parse-md --lib frontmatter`.
|
||||
All tests under `cargo test -p kebab-parse-md --lib frontmatter`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-md` passes
|
||||
- [ ] `cargo test -p kb-parse-md frontmatter` passes
|
||||
- [ ] `cargo check -p kebab-parse-md` passes
|
||||
- [ ] `cargo test -p kebab-parse-md frontmatter` passes
|
||||
- [ ] No `pulldown-cmark` import in this submodule
|
||||
- [ ] Snapshot tests stable across two consecutive runs
|
||||
- [ ] PR links design §0 Q9, §3.6
|
||||
@@ -534,13 +534,13 @@ Write to `tasks/p1/p1-3-parse-md-blocks.md`:
|
||||
````markdown
|
||||
---
|
||||
phase: P1
|
||||
component: kb-parse-md (blocks submodule)
|
||||
component: kebab-parse-md (blocks submodule)
|
||||
task_id: p1-3
|
||||
title: "Markdown body → Block tree with line spans"
|
||||
status: planned
|
||||
depends_on: [p0-1]
|
||||
unblocks: [p1-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4 Block, §3.4 SourceSpan, §0 Q3 citation]
|
||||
---
|
||||
|
||||
@@ -548,7 +548,7 @@ contract_sections: [§3.4 Block, §3.4 SourceSpan, §0 Q3 citation]
|
||||
|
||||
## Goal
|
||||
|
||||
Parse Markdown body bytes into a flat `Vec<ParsedBlock>` (intermediate, crate-private) with heading paths and line ranges preserved, ready for `kb-normalize` to lift into `CanonicalDocument`.
|
||||
Parse Markdown body bytes into a flat `Vec<ParsedBlock>` (intermediate, crate-private) with heading paths and line ranges preserved, ready for `kebab-normalize` to lift into `CanonicalDocument`.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -556,14 +556,14 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kebab-core`
|
||||
- `pulldown-cmark` (CommonMark with source-map; GFM tables enabled via feature)
|
||||
- `serde`
|
||||
- `thiserror`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `kb-tui`, `kb-desktop`, `comrak` (alternative parser; pick one)
|
||||
- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `kebab-tui`, `kebab-desktop`, `comrak` (alternative parser; pick one)
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -576,7 +576,7 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<ParsedBlock>` (intermediate type, crate-private) | – | `kb-normalize` |
|
||||
| `Vec<ParsedBlock>` (intermediate type, crate-private) | – | `kebab-normalize` |
|
||||
| `Vec<Warning>` | – | propagated into Provenance |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
@@ -585,7 +585,7 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
|
||||
pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec<ParsedBlock>, Vec<Warning>)>;
|
||||
```
|
||||
|
||||
`ParsedBlock` is a crate-internal mirror that maps 1:1 to `kb_core::Block` variants once `kb-normalize` assigns `BlockId`s.
|
||||
`ParsedBlock` is a crate-internal mirror that maps 1:1 to `kebab_core::Block` variants once `kebab-normalize` assigns `BlockId`s.
|
||||
|
||||
## Behavior contract
|
||||
|
||||
@@ -593,7 +593,7 @@ pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec<
|
||||
- Heading tree: every block records its ancestor heading texts in order (e.g., `["아키텍처", "Chunking 정책"]`).
|
||||
- Code blocks: language tag preserved (` ```rust ` → `Some("rust")`), fenced content not split.
|
||||
- Tables: GFM tables produce `TableBlock` with header row + body rows; if a table cell is malformed, fall back to a `Paragraph` block + warning.
|
||||
- Image references: `` produces `ImageRefBlock` with `asset_id = None`, `src = "..."`, `alt = "..."`. Resolution to `AssetId` happens later in `kb-normalize`.
|
||||
- Image references: `` produces `ImageRefBlock` with `asset_id = None`, `src = "..."`, `alt = "..."`. Resolution to `AssetId` happens later in `kebab-normalize`.
|
||||
- Lists: ordered/unordered preserved; nested list items flattened into one `ListBlock` with each top-level item's text.
|
||||
- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per design §3.4). Drop other inlines silently.
|
||||
- Malformed input never panics. Worst case: empty `Vec<ParsedBlock>` + warning.
|
||||
@@ -616,12 +616,12 @@ pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec<
|
||||
| snapshot | `fixtures/markdown/nested-headings.md` → ParsedBlock JSON stable | fixture |
|
||||
| snapshot | `fixtures/markdown/code-and-table.md` → JSON stable | fixture |
|
||||
|
||||
All tests under `cargo test -p kb-parse-md --lib blocks`.
|
||||
All tests under `cargo test -p kebab-parse-md --lib blocks`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-md` passes
|
||||
- [ ] `cargo test -p kb-parse-md blocks` passes
|
||||
- [ ] `cargo check -p kebab-parse-md` passes
|
||||
- [ ] `cargo test -p kebab-parse-md blocks` passes
|
||||
- [ ] Snapshot tests stable across two runs
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.4
|
||||
@@ -629,7 +629,7 @@ All tests under `cargo test -p kb-parse-md --lib blocks`.
|
||||
## Out of scope
|
||||
|
||||
- Frontmatter (p1-2).
|
||||
- Lifting `ParsedBlock` → `kb_core::Block` with `BlockId` (p1-4).
|
||||
- Lifting `ParsedBlock` → `kebab_core::Block` with `BlockId` (p1-4).
|
||||
- Chunking (p1-5).
|
||||
|
||||
## Risks / notes
|
||||
@@ -658,13 +658,13 @@ Write to `tasks/p1/p1-4-normalize.md`:
|
||||
````markdown
|
||||
---
|
||||
phase: P1
|
||||
component: kb-normalize
|
||||
component: kebab-normalize
|
||||
task_id: p1-4
|
||||
title: "Lift parser output → CanonicalDocument with deterministic IDs"
|
||||
status: planned
|
||||
depends_on: [p1-2, p1-3]
|
||||
unblocks: [p1-5, p1-6]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4, §4 ID recipe, §3.6 Provenance]
|
||||
---
|
||||
|
||||
@@ -676,12 +676,12 @@ Combine `Metadata` (p1-2) + `Vec<ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
|
||||
Single responsibility: ID generation + struct assembly. Keeps `kebab-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `serde`
|
||||
- `serde-json-canonicalizer` (canonical JSON for ID hashing)
|
||||
- `blake3`
|
||||
@@ -691,38 +691,38 @@ Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` pure
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md` (consumed via plain types only — must not couple back), `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md` (consumed via plain types only — must not couple back), `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
Note: this crate accepts `ParsedBlock` from `kb-parse-md` either by (a) exposing `ParsedBlock` as a `kb-core` type, or (b) `kb-parse-md` re-exporting via a public DTO. Pick (a): move `ParsedBlock` into `kb-core` so this task does not import `kb-parse-md`.
|
||||
Note: this crate accepts `ParsedBlock` from `kebab-parse-md` either by (a) exposing `ParsedBlock` as a `kebab-core` type, or (b) `kebab-parse-md` re-exporting via a public DTO. Pick (a): move `ParsedBlock` into `kebab-core` so this task does not import `kebab-parse-md`.
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `RawAsset` | `kb_core::RawAsset` | p1-1 |
|
||||
| `RawAsset` | `kebab_core::RawAsset` | p1-1 |
|
||||
| `Metadata` + frontmatter span + warnings | from p1-2 | parser caller |
|
||||
| `Vec<ParsedBlock>` + warnings | from p1-3 | parser caller |
|
||||
| `parser_version` | `kb_core::ParserVersion` | constant in `kb-parse-md` |
|
||||
| `parser_version` | `kebab_core::ParserVersion` | constant in `kebab-parse-md` |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk`, `kb-store-sqlite` |
|
||||
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk`, `kebab-store-sqlite` |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub fn build_canonical_document(
|
||||
asset: &kb_core::RawAsset,
|
||||
metadata: kb_core::Metadata,
|
||||
blocks: Vec<kb_core::ParsedBlock>,
|
||||
parser_version: &kb_core::ParserVersion,
|
||||
asset: &kebab_core::RawAsset,
|
||||
metadata: kebab_core::Metadata,
|
||||
blocks: Vec<kebab_core::ParsedBlock>,
|
||||
parser_version: &kebab_core::ParserVersion,
|
||||
warnings: Vec<Warning>,
|
||||
) -> anyhow::Result<kb_core::CanonicalDocument>;
|
||||
) -> anyhow::Result<kebab_core::CanonicalDocument>;
|
||||
|
||||
pub fn id_for_doc(workspace_path: &kb_core::WorkspacePath, asset: &kb_core::AssetId, parser_version: &kb_core::ParserVersion) -> kb_core::DocumentId;
|
||||
pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kb_core::SourceSpan) -> kb_core::BlockId;
|
||||
pub fn id_for_doc(workspace_path: &kebab_core::WorkspacePath, asset: &kebab_core::AssetId, parser_version: &kebab_core::ParserVersion) -> kebab_core::DocumentId;
|
||||
pub fn id_for_block(doc: &kebab_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kebab_core::SourceSpan) -> kebab_core::BlockId;
|
||||
```
|
||||
|
||||
## Behavior contract
|
||||
@@ -750,14 +750,14 @@ pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[Strin
|
||||
| unit | provenance contains Discovered/Parsed/Normalized in order | inline |
|
||||
| snapshot | `fixtures/markdown/code-and-table.md` → CanonicalDocument JSON stable (incl. all IDs) | fixture |
|
||||
|
||||
All tests under `cargo test -p kb-normalize`.
|
||||
All tests under `cargo test -p kebab-normalize`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-normalize` passes
|
||||
- [ ] `cargo test -p kb-normalize` passes
|
||||
- [ ] `cargo check -p kebab-normalize` passes
|
||||
- [ ] `cargo test -p kebab-normalize` passes
|
||||
- [ ] Determinism test runs ≥ 1000 iterations under 1 second
|
||||
- [ ] No `kb-parse-md` import (consumed via `kb-core::ParsedBlock`)
|
||||
- [ ] No `kebab-parse-md` import (consumed via `kebab-core::ParsedBlock`)
|
||||
- [ ] PR links design §4.2, §4.3
|
||||
|
||||
## Out of scope
|
||||
@@ -791,13 +791,13 @@ Write to `tasks/p1/p1-5-chunk.md`:
|
||||
````markdown
|
||||
---
|
||||
phase: P1
|
||||
component: kb-chunk
|
||||
component: kebab-chunk
|
||||
task_id: p1-5
|
||||
title: "Markdown heading-aware chunker (md-heading-v1)"
|
||||
status: planned
|
||||
depends_on: [p1-4]
|
||||
unblocks: [p1-6, p2-2, p3-2]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §7.2 Chunker, §0 Q3 citation]
|
||||
---
|
||||
|
||||
@@ -813,8 +813,8 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `serde`
|
||||
- `blake3` (policy_hash)
|
||||
- `serde-json-canonicalizer`
|
||||
@@ -822,30 +822,30 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize` (consumes `CanonicalDocument` only via `kb-core`), `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize` (consumes `CanonicalDocument` only via `kebab-core`), `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 |
|
||||
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` from config |
|
||||
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
|
||||
| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` from config |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite` (p1-6), `kb-embed*` (P3) |
|
||||
| `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite` (p1-6), `kebab-embed*` (P3) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub struct MdHeadingV1Chunker;
|
||||
|
||||
impl kb_core::Chunker for MdHeadingV1Chunker {
|
||||
fn chunker_version(&self) -> kb_core::ChunkerVersion;
|
||||
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String;
|
||||
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>;
|
||||
impl kebab_core::Chunker for MdHeadingV1Chunker {
|
||||
fn chunker_version(&self) -> kebab_core::ChunkerVersion;
|
||||
fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
|
||||
fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -882,12 +882,12 @@ impl kb_core::Chunker for MdHeadingV1Chunker {
|
||||
| determinism | identical input + identical policy → identical chunk_ids | inline |
|
||||
| snapshot | `fixtures/markdown/long-section.md` → Vec<Chunk> JSON stable | fixture |
|
||||
|
||||
All tests under `cargo test -p kb-chunk`.
|
||||
All tests under `cargo test -p kebab-chunk`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-chunk` passes
|
||||
- [ ] `cargo test -p kb-chunk` passes
|
||||
- [ ] `cargo check -p kebab-chunk` passes
|
||||
- [ ] `cargo test -p kebab-chunk` passes
|
||||
- [ ] Snapshot stable across two runs
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.5, §4.2
|
||||
@@ -924,13 +924,13 @@ Write to `tasks/p1/p1-6-store-sqlite.md`:
|
||||
````markdown
|
||||
---
|
||||
phase: P1
|
||||
component: kb-store-sqlite (P1 subset)
|
||||
component: kebab-store-sqlite (P1 subset)
|
||||
task_id: p1-6
|
||||
title: "SQLite store: assets/documents/blocks/chunks + asset writer + migrations"
|
||||
status: planned
|
||||
depends_on: [p1-1, p1-4, p1-5]
|
||||
unblocks: [p2-1, p3-3, p4-3]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§5 DDL (5.1, 5.2, 5.3, 5.4, 5.5 chunks only — FTS handled in p2-1), §5.7 jobs/ingest_runs, §5.8 transactions, §6.3 data_dir layout]
|
||||
---
|
||||
|
||||
@@ -946,8 +946,8 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `rusqlite` (with `bundled-sqlcipher` disabled; use `bundled` feature)
|
||||
- `refinery` for migrations
|
||||
- `serde_json`
|
||||
@@ -958,7 +958,7 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs` (only types via `kb-core`), `kb-parse-md`, `kb-normalize`, `kb-chunk` (only types via `kb-core`), `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs` (only types via `kebab-core`), `kebab-parse-md`, `kebab-normalize`, `kebab-chunk` (only types via `kebab-core`), `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -966,17 +966,17 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|
||||
|-------|------|--------|
|
||||
| migrations | `migrations/V001__init.sql` | repo |
|
||||
| `RawAsset` + bytes | `(RawAsset, Vec<u8>)` | p1-1 + reader |
|
||||
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 |
|
||||
| `Vec<Chunk>` | `kb_core::Chunk` | p1-5 |
|
||||
| `IngestRun` aggregates | `(scope, counts, duration)` | `kb-app` |
|
||||
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
|
||||
| `Vec<Chunk>` | `kebab_core::Chunk` | p1-5 |
|
||||
| `IngestRun` aggregates | `(scope, counts, duration)` | `kebab-app` |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `data_dir/kb.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | – | every later phase |
|
||||
| `data_dir/kebab.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | – | every later phase |
|
||||
| `data_dir/assets/<aa>/<asset_id>` bytes (when copied) | – | future re-extraction, integrity verification |
|
||||
| `IngestReport` (wire schema v1) | `kb_core::IngestReport` | `kb-cli`, eval |
|
||||
| `IngestReport` (wire schema v1) | `kebab_core::IngestReport` | `kebab-cli`, eval |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -984,23 +984,23 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|
||||
pub struct SqliteStore { /* internal */ }
|
||||
|
||||
impl SqliteStore {
|
||||
pub fn open(config: &kb_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn open(config: &kebab_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn run_migrations(&self) -> anyhow::Result<()>;
|
||||
|
||||
pub fn put_asset_with_bytes(&self, asset: &kb_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
|
||||
pub fn put_asset_with_bytes(&self, asset: &kebab_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
|
||||
}
|
||||
|
||||
impl kb_core::DocumentStore for SqliteStore {
|
||||
fn put_asset(&self, a: &kb_core::RawAsset) -> anyhow::Result<()>;
|
||||
fn put_document(&self, d: &kb_core::CanonicalDocument) -> anyhow::Result<()>;
|
||||
fn put_blocks(&self, doc: &kb_core::DocumentId, blocks: &[kb_core::Block]) -> anyhow::Result<()>;
|
||||
fn put_chunks(&self, doc: &kb_core::DocumentId, chunks: &[kb_core::Chunk]) -> anyhow::Result<()>;
|
||||
fn get_document(&self, id: &kb_core::DocumentId) -> anyhow::Result<Option<kb_core::CanonicalDocument>>;
|
||||
fn get_chunk(&self, id: &kb_core::ChunkId) -> anyhow::Result<Option<kb_core::Chunk>>;
|
||||
fn list_documents(&self, filter: &kb_core::DocFilter) -> anyhow::Result<Vec<kb_core::DocSummary>>;
|
||||
impl kebab_core::DocumentStore for SqliteStore {
|
||||
fn put_asset(&self, a: &kebab_core::RawAsset) -> anyhow::Result<()>;
|
||||
fn put_document(&self, d: &kebab_core::CanonicalDocument) -> anyhow::Result<()>;
|
||||
fn put_blocks(&self, doc: &kebab_core::DocumentId, blocks: &[kebab_core::Block]) -> anyhow::Result<()>;
|
||||
fn put_chunks(&self, doc: &kebab_core::DocumentId, chunks: &[kebab_core::Chunk]) -> anyhow::Result<()>;
|
||||
fn get_document(&self, id: &kebab_core::DocumentId) -> anyhow::Result<Option<kebab_core::CanonicalDocument>>;
|
||||
fn get_chunk(&self, id: &kebab_core::ChunkId) -> anyhow::Result<Option<kebab_core::Chunk>>;
|
||||
fn list_documents(&self, filter: &kebab_core::DocFilter) -> anyhow::Result<Vec<kebab_core::DocSummary>>;
|
||||
}
|
||||
|
||||
impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
|
||||
impl kebab_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
|
||||
```
|
||||
|
||||
## Behavior contract
|
||||
@@ -1020,7 +1020,7 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Writes: `kb.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
|
||||
- Writes: `kebab.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
|
||||
- Reads on subsequent calls: same DB.
|
||||
|
||||
## Test plan
|
||||
@@ -1036,14 +1036,14 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
|
||||
| contract | DocumentStore trait round-trip for fixture document | `fixtures/markdown/code-and-table.md` |
|
||||
| snapshot | IngestReport JSON for fixture run | fixture |
|
||||
|
||||
All tests under `cargo test -p kb-store-sqlite` with no network.
|
||||
All tests under `cargo test -p kebab-store-sqlite` with no network.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-store-sqlite` passes
|
||||
- [ ] `cargo test -p kb-store-sqlite` passes
|
||||
- [ ] `cargo check -p kebab-store-sqlite` passes
|
||||
- [ ] `cargo test -p kebab-store-sqlite` passes
|
||||
- [ ] migration `V001__init.sql` matches design §5 verbatim (diff-checked in CI)
|
||||
- [ ] Writes to `~/.local/share/kb/` are gated by `kb-config`'s `data_dir` and never escape it
|
||||
- [ ] Writes to `~/.local/share/kebab/` are gated by `kebab-config`'s `data_dir` and never escape it
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §5
|
||||
|
||||
@@ -1056,7 +1056,7 @@ All tests under `cargo test -p kb-store-sqlite` with no network.
|
||||
|
||||
## Risks / notes
|
||||
|
||||
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kb.sqlite-wal` / `-shm`.
|
||||
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kebab.sqlite-wal` / `-shm`.
|
||||
- Asset directory shard prefix uses `asset_id[..2]`; using `asset_id[..1]` would create at most 16 dirs (insufficient).
|
||||
````
|
||||
|
||||
@@ -1074,7 +1074,7 @@ git commit -m "tasks: add p1-6 store-sqlite component spec"
|
||||
|
||||
```bash
|
||||
ls tasks/p1/ | sort
|
||||
for f in tasks/p1/p1-*.md; do grep -q '2026-04-27-kb-final-form-design.md' "$f" || echo "MISSING REF in $f"; done
|
||||
for f in tasks/p1/p1-*.md; do grep -q '2026-04-27-kebab-final-form-design.md' "$f" || echo "MISSING REF in $f"; done
|
||||
echo done
|
||||
```
|
||||
|
||||
@@ -1106,9 +1106,9 @@ For each component task below, the steps are: (1) write file, (2) verify, (3) co
|
||||
|
||||
**Files:** Create: `tasks/p0/p0-1-skeleton.md`. Also `mkdir -p tasks/p0`.
|
||||
|
||||
`contract_sections`: §3 (all subsections), §4, §5 (migrations meta only), §6, §7, §8, §10. `Allowed`: workspace + `kb-core` + `kb-config` + `kb-app` + `kb-cli` only.
|
||||
`contract_sections`: §3 (all subsections), §4, §5 (migrations meta only), §6, §7, §8, §10. `Allowed`: workspace + `kebab-core` + `kebab-config` + `kebab-app` + `kebab-cli` only.
|
||||
|
||||
Body covers: workspace `Cargo.toml` resolver=3, edition 2024, member list (`kb-core`, `kb-config`, `kb-app`, `kb-cli`), workspace dependencies, `kb-core` types and traits per design §3 / §7, deterministic ID functions per §4 (with full unit tests), `kb-config` loader (TOML + env + CLI override per §6.4), `kb-app` facade signatures (`ingest`, `search`, `ask`, `inspect_doc`, `inspect_chunk`, `doctor`, `init`), `kb-cli` skeleton with clap + `--help`. DoD: `cargo check --workspace`, `cargo test --workspace` (Newtype+ID+canonical-json tests only), `kb --help` works, `docs/spec/*` stubs created (link to frozen design doc), `docs/wire-schema/v1/*.schema.json` stubs (one file per object in §2).
|
||||
Body covers: workspace `Cargo.toml` resolver=3, edition 2024, member list (`kebab-core`, `kebab-config`, `kebab-app`, `kebab-cli`), workspace dependencies, `kebab-core` types and traits per design §3 / §7, deterministic ID functions per §4 (with full unit tests), `kebab-config` loader (TOML + env + CLI override per §6.4), `kebab-app` facade signatures (`ingest`, `search`, `ask`, `inspect_doc`, `inspect_chunk`, `doctor`, `init`), `kebab-cli` skeleton with clap + `--help`. DoD: `cargo check --workspace`, `cargo test --workspace` (Newtype+ID+canonical-json tests only), `kebab --help` works, `docs/spec/*` stubs created (link to frozen design doc), `docs/wire-schema/v1/*.schema.json` stubs (one file per object in §2).
|
||||
|
||||
- [ ] **Step 1: Create directory and file**
|
||||
|
||||
@@ -1134,11 +1134,11 @@ git -c user.name=kb -c user.email=kb@local commit -m "tasks: add p0-1 skeleton c
|
||||
|
||||
#### `p2-1-fts-schema.md`
|
||||
|
||||
`contract_sections`: §5.5 FTS5 + triggers, §9 versioning. `Allowed`: `kb-core`, `kb-config`, `kb-store-sqlite` (extends migrations). `depends_on: [p1-6]`. Migration `V002__fts.sql` adds `chunks_fts` virtual table and three triggers verbatim from §5.5. Tests: backfill from existing chunks via `INSERT INTO chunks_fts SELECT ... FROM chunks`, then assert FTS row count == chunks row count; insert/update/delete in `chunks` reflects in `chunks_fts`.
|
||||
`contract_sections`: §5.5 FTS5 + triggers, §9 versioning. `Allowed`: `kebab-core`, `kebab-config`, `kebab-store-sqlite` (extends migrations). `depends_on: [p1-6]`. Migration `V002__fts.sql` adds `chunks_fts` virtual table and three triggers verbatim from §5.5. Tests: backfill from existing chunks via `INSERT INTO chunks_fts SELECT ... FROM chunks`, then assert FTS row count == chunks row count; insert/update/delete in `chunks` reflects in `chunks_fts`.
|
||||
|
||||
#### `p2-2-lexical-retriever.md`
|
||||
|
||||
`contract_sections`: §3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5 search output (for snippet length defaults), §2.2 wire schema. `Allowed`: `kb-core`, `kb-config`, `kb-store-sqlite`. `depends_on: [p2-1]`. Implements `Retriever` trait with `bm25(chunks_fts)` ranking, snippet via SQLite `snippet()` (≤ `snippet_chars` chars), citation built per §0 Q3 from `source_spans`. Tests: top-k correctness on fixture corpus, citation line range round-trip against original Markdown, deterministic across two runs.
|
||||
`contract_sections`: §3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5 search output (for snippet length defaults), §2.2 wire schema. `Allowed`: `kebab-core`, `kebab-config`, `kebab-store-sqlite`. `depends_on: [p2-1]`. Implements `Retriever` trait with `bm25(chunks_fts)` ranking, snippet via SQLite `snippet()` (≤ `snippet_chars` chars), citation built per §0 Q3 from `source_spans`. Tests: top-k correctness on fixture corpus, citation line range round-trip against original Markdown, deterministic across two runs.
|
||||
|
||||
- [ ] **Step 1: Create directory and both spec files** (template per Phase B; bodies as described above).
|
||||
- [ ] **Step 2: Verify with `for f in tasks/p2/p2-*.md; do test -s "$f" || echo MISSING $f; done; echo done` (expect only `done`).**
|
||||
@@ -1152,10 +1152,10 @@ git add tasks/p2 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
|
||||
|
||||
**Files:** `mkdir -p tasks/p3` and create `p3-1-embedder-trait.md`, `p3-2-fastembed-adapter.md`, `p3-3-lancedb-store.md`, `p3-4-hybrid-fusion.md`.
|
||||
|
||||
- `p3-1-embedder-trait.md`: §3.7, §7.2 Embedder, §11. Allowed: `kb-core`, `kb-config`. No external embedding dep. Public surface: `Embedder` trait + `EmbeddingInput`/`EmbeddingKind` already in core (validate they exist; if not, this task is also a `kb-core` patch). `depends_on: [p0-1]`. Tests: trait dyn dispatch, mock embedder.
|
||||
- `p3-2-fastembed-adapter.md`: §11.3, §6.4 `[models.embedding]`. Allowed: `kb-core`, `kb-config`, `fastembed`, `tokenizers`, `ort`. `depends_on: [p3-1]`. Provides `FastembedEmbedder` implementing `Embedder` for `multilingual-e5-small` (default), with required Document/Query prefix per §11.3. Tests: dimension check, deterministic vector for fixed input (hash compare on first 8 floats with epsilon), batch size respected.
|
||||
- `p3-3-lancedb-store.md`: §3.5, §5.6 embedding_records, §6.3 lancedb table naming. Allowed: `kb-core`, `kb-config`, `lancedb`, `arrow`, `kb-store-sqlite` (write `embedding_records` row only — no other table). `depends_on: [p3-2, p1-6]`. Implements `VectorStore` trait. Table naming `chunk_embeddings_<model>_<dim>.lance`. `ensure_table` creates if missing. `upsert` inserts vectors and writes a matching `embedding_records` row in same logical operation (best-effort 2PC: lance commit, then SQLite insert; on SQLite failure, log warning + leave lance row — re-upsert is idempotent because of the `UNIQUE(chunk_id, model_id, model_version, dimensions)` constraint and lance upsert semantics). `search` filters via SearchFilters and returns top-k. Tests: smoke (insert+search), dimension mismatch error, model isolation (two models stay in two tables).
|
||||
- `p3-4-hybrid-fusion.md`: §3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 `[search]` rrf settings. Allowed: `kb-core`, `kb-config`, `kb-store-sqlite` (lexical Retriever from p2-2), `kb-store-vector` (vector Retriever wrapper around `VectorStore::search`). `depends_on: [p2-2, p3-3]`. Implements `HybridRetriever` that dispatches by `SearchMode`, fuses with RRF (k from config, default 60), populates `lexical_score`, `vector_score`, `lexical_rank`, `vector_rank`, `fusion_score`. Tests: pure lexical mode == p2-2 output; pure vector mode == p3-3 output; hybrid produces strictly larger or equal coverage of expected hits than either single mode on a small fixture; deterministic.
|
||||
- `p3-1-embedder-trait.md`: §3.7, §7.2 Embedder, §11. Allowed: `kebab-core`, `kebab-config`. No external embedding dep. Public surface: `Embedder` trait + `EmbeddingInput`/`EmbeddingKind` already in core (validate they exist; if not, this task is also a `kebab-core` patch). `depends_on: [p0-1]`. Tests: trait dyn dispatch, mock embedder.
|
||||
- `p3-2-fastembed-adapter.md`: §11.3, §6.4 `[models.embedding]`. Allowed: `kebab-core`, `kebab-config`, `fastembed`, `tokenizers`, `ort`. `depends_on: [p3-1]`. Provides `FastembedEmbedder` implementing `Embedder` for `multilingual-e5-small` (default), with required Document/Query prefix per §11.3. Tests: dimension check, deterministic vector for fixed input (hash compare on first 8 floats with epsilon), batch size respected.
|
||||
- `p3-3-lancedb-store.md`: §3.5, §5.6 embedding_records, §6.3 lancedb table naming. Allowed: `kebab-core`, `kebab-config`, `lancedb`, `arrow`, `kebab-store-sqlite` (write `embedding_records` row only — no other table). `depends_on: [p3-2, p1-6]`. Implements `VectorStore` trait. Table naming `chunk_embeddings_<model>_<dim>.lance`. `ensure_table` creates if missing. `upsert` inserts vectors and writes a matching `embedding_records` row in same logical operation (best-effort 2PC: lance commit, then SQLite insert; on SQLite failure, log warning + leave lance row — re-upsert is idempotent because of the `UNIQUE(chunk_id, model_id, model_version, dimensions)` constraint and lance upsert semantics). `search` filters via SearchFilters and returns top-k. Tests: smoke (insert+search), dimension mismatch error, model isolation (two models stay in two tables).
|
||||
- `p3-4-hybrid-fusion.md`: §3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 `[search]` rrf settings. Allowed: `kebab-core`, `kebab-config`, `kebab-store-sqlite` (lexical Retriever from p2-2), `kebab-store-vector` (vector Retriever wrapper around `VectorStore::search`). `depends_on: [p2-2, p3-3]`. Implements `HybridRetriever` that dispatches by `SearchMode`, fuses with RRF (k from config, default 60), populates `lexical_score`, `vector_score`, `lexical_rank`, `vector_rank`, `fusion_score`. Tests: pure lexical mode == p2-2 output; pure vector mode == p3-3 output; hybrid produces strictly larger or equal coverage of expected hits than either single mode on a small fixture; deterministic.
|
||||
|
||||
- [ ] **Step 1: Create directory and 4 files** (template per Phase B).
|
||||
- [ ] **Step 2: Verify**
|
||||
@@ -1176,9 +1176,9 @@ git add tasks/p3 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
|
||||
|
||||
**Files:** `mkdir -p tasks/p4` and create `p4-1-llm-trait.md`, `p4-2-ollama-adapter.md`, `p4-3-rag-pipeline.md`.
|
||||
|
||||
- `p4-1-llm-trait.md`: §7.2 LanguageModel + TokenChunk, §0 Q5 streaming, §3.8 Answer types referenced. Allowed: `kb-core`, `kb-config`. `depends_on: [p0-1]`. Defines (or validates) `LanguageModel` trait, `GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage` per design. Tests: trait dyn dispatch, mock LM streams 3 tokens.
|
||||
- `p4-2-ollama-adapter.md`: §11.2 Ollama, §6.4 `[models.llm]`, §0 Q5 streaming. Allowed: `kb-core`, `kb-config`, `reqwest` (blocking + json + stream feature) or `ureq` + manual SSE; `serde_json`, `tokio`/runtime if needed. `depends_on: [p4-1]`. Implements `OllamaLanguageModel` with streaming `/api/generate`. `temperature=0.0` default, `seed` honored for determinism. Reachability/missing-model errors map to `LlmError` per design §10. Tests: against a mock HTTP server (`wiremock` or hand-rolled `tiny_http`); deterministic stream collect equals buffered concatenation; missing model returns `LlmError::ModelNotPulled` with proper hint.
|
||||
- `p4-3-rag-pipeline.md`: §0 Q4 refusal (two-layer), §0 Q7 footer, §1.1–1.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 `[rag]`. Allowed: `kb-core`, `kb-config`, `kb-search` (Retriever), `kb-llm` (LanguageModel). `depends_on: [p3-4, p4-2]`. Pipeline: retrieve top-k → score gate (`refusal_reason: ScoreGate` if top1 < gate) → context packer (token budget + heading_path header `[#n doc=… heading=… span=…]`) → render `rag-v1` prompt → stream → collect → citation extraction (regex `\[(\d+)\]`) → citation validation (each `[n]` must map to a packed chunk; otherwise `grounded=false`, `refusal_reason: LlmSelfJudge`) → write `answers` row. Tests: happy path produces grounded Answer with citations; query with all chunks below gate produces ScoreGate refusal; query whose LLM emits a citation pointing to non-existent `[7]` becomes LlmSelfJudge refusal; identical query under temperature=0 produces byte-identical Answer (snapshot).
|
||||
- `p4-1-llm-trait.md`: §7.2 LanguageModel + TokenChunk, §0 Q5 streaming, §3.8 Answer types referenced. Allowed: `kebab-core`, `kebab-config`. `depends_on: [p0-1]`. Defines (or validates) `LanguageModel` trait, `GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage` per design. Tests: trait dyn dispatch, mock LM streams 3 tokens.
|
||||
- `p4-2-ollama-adapter.md`: §11.2 Ollama, §6.4 `[models.llm]`, §0 Q5 streaming. Allowed: `kebab-core`, `kebab-config`, `reqwest` (blocking + json + stream feature) or `ureq` + manual SSE; `serde_json`, `tokio`/runtime if needed. `depends_on: [p4-1]`. Implements `OllamaLanguageModel` with streaming `/api/generate`. `temperature=0.0` default, `seed` honored for determinism. Reachability/missing-model errors map to `LlmError` per design §10. Tests: against a mock HTTP server (`wiremock` or hand-rolled `tiny_http`); deterministic stream collect equals buffered concatenation; missing model returns `LlmError::ModelNotPulled` with proper hint.
|
||||
- `p4-3-rag-pipeline.md`: §0 Q4 refusal (two-layer), §0 Q7 footer, §1.1–1.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 `[rag]`. Allowed: `kebab-core`, `kebab-config`, `kebab-search` (Retriever), `kebab-llm` (LanguageModel). `depends_on: [p3-4, p4-2]`. Pipeline: retrieve top-k → score gate (`refusal_reason: ScoreGate` if top1 < gate) → context packer (token budget + heading_path header `[#n doc=… heading=… span=…]`) → render `rag-v1` prompt → stream → collect → citation extraction (regex `\[(\d+)\]`) → citation validation (each `[n]` must map to a packed chunk; otherwise `grounded=false`, `refusal_reason: LlmSelfJudge`) → write `answers` row. Tests: happy path produces grounded Answer with citations; query with all chunks below gate produces ScoreGate refusal; query whose LLM emits a citation pointing to non-existent `[7]` becomes LlmSelfJudge refusal; identical query under temperature=0 produces byte-identical Answer (snapshot).
|
||||
|
||||
- [ ] **Step 1, 2, 3** as in C3.
|
||||
|
||||
@@ -1193,8 +1193,8 @@ git add tasks/p4 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
|
||||
|
||||
**Files:** `mkdir -p tasks/p5` and create `p5-1-golden-fixture-runner.md`, `p5-2-metrics-compare.md`.
|
||||
|
||||
- `p5-1-golden-fixture-runner.md`: phase epic + §5.7 eval_runs/eval_query_results, §6.3 runs_dir. Allowed: `kb-core`, `kb-config`, `kb-app` (calls facade for search/ask), `serde_yaml`. `depends_on: [p4-3]`. Loads `fixtures/golden_queries.yaml`, runs each query in selected mode (lexical/vector/hybrid/rag), captures per-query results to `eval_query_results` and to `runs_dir/<run_id>/per_query.jsonl`. Tests: fixture with 3 queries runs end-to-end on a tiny corpus, all rows recorded.
|
||||
- `p5-2-metrics-compare.md`: phase epic, §0 Q6 wire schema. Allowed: `kb-core`, `kb-config`, `kb-store-sqlite` (read eval rows). `depends_on: [p5-1]`. Computes hit@k, MRR, recall@k_doc, citation_coverage, groundedness (rule-based via `must_contain`), empty_result_rate, refusal_correctness. `kb eval compare a b` produces wins/losses/draws + delta. Tests: fixed input rows produce expected metric values; compare produces stable sorted output.
|
||||
- `p5-1-golden-fixture-runner.md`: phase epic + §5.7 eval_runs/eval_query_results, §6.3 runs_dir. Allowed: `kebab-core`, `kebab-config`, `kebab-app` (calls facade for search/ask), `serde_yaml`. `depends_on: [p4-3]`. Loads `fixtures/golden_queries.yaml`, runs each query in selected mode (lexical/vector/hybrid/rag), captures per-query results to `eval_query_results` and to `runs_dir/<run_id>/per_query.jsonl`. Tests: fixture with 3 queries runs end-to-end on a tiny corpus, all rows recorded.
|
||||
- `p5-2-metrics-compare.md`: phase epic, §0 Q6 wire schema. Allowed: `kebab-core`, `kebab-config`, `kebab-store-sqlite` (read eval rows). `depends_on: [p5-1]`. Computes hit@k, MRR, recall@k_doc, citation_coverage, groundedness (rule-based via `must_contain`), empty_result_rate, refusal_correctness. `kebab eval compare a b` produces wins/losses/draws + delta. Tests: fixed input rows produce expected metric values; compare produces stable sorted output.
|
||||
|
||||
- [ ] **Step 1, 2, 3** as in C3.
|
||||
|
||||
@@ -1209,9 +1209,9 @@ git add tasks/p5 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
|
||||
|
||||
`mkdir -p tasks/p6` and create:
|
||||
|
||||
- `p6-1-image-extractor-exif.md`: phase epic §9.1, §3.4 ImageRefBlock, §3.7a ImageType. Allowed: `kb-core`, `kb-config`, `image`, `kamadak-exif`. Implements `Extractor` for `MediaType::Image(_)` producing a `CanonicalDocument` whose body is exactly one `ImageRefBlock`. EXIF goes to `metadata.user`. `depends_on: [p0-1, p1-6]`. Tests: PNG/JPEG decode metadata; EXIF extraction; deterministic doc_id.
|
||||
- `p6-2-ocr-adapter.md`: phase epic §9.1. Allowed: `kb-core`, `kb-config`, `image`, OS-specific OCR (feature `apple-vision` for macOS via sidecar binary; feature `tesseract` for cross-platform; default tesseract). Defines `OcrEngine` trait + adapter. Populates `ImageRefBlock.ocr` `OcrText` (`joined`, regions, engine, engine_version). `depends_on: [p6-1]`. Tests: deterministic text on a fixed fixture image with high-confidence text.
|
||||
- `p6-3-caption-adapter.md`: phase epic §9.1 caption section, §3.7a ModelCaption. Allowed: `kb-core`, `kb-config`, `kb-llm` (reuse LanguageModel for VLM). Optional/feature-gated. `depends_on: [p6-1, p4-2]`. Populates `ImageRefBlock.caption`. Tests: with mock LM, caption recorded with model id; absence of feature flag leaves caption=None.
|
||||
- `p6-1-image-extractor-exif.md`: phase epic §9.1, §3.4 ImageRefBlock, §3.7a ImageType. Allowed: `kebab-core`, `kebab-config`, `image`, `kamadak-exif`. Implements `Extractor` for `MediaType::Image(_)` producing a `CanonicalDocument` whose body is exactly one `ImageRefBlock`. EXIF goes to `metadata.user`. `depends_on: [p0-1, p1-6]`. Tests: PNG/JPEG decode metadata; EXIF extraction; deterministic doc_id.
|
||||
- `p6-2-ocr-adapter.md`: phase epic §9.1. Allowed: `kebab-core`, `kebab-config`, `image`, OS-specific OCR (feature `apple-vision` for macOS via sidecar binary; feature `tesseract` for cross-platform; default tesseract). Defines `OcrEngine` trait + adapter. Populates `ImageRefBlock.ocr` `OcrText` (`joined`, regions, engine, engine_version). `depends_on: [p6-1]`. Tests: deterministic text on a fixed fixture image with high-confidence text.
|
||||
- `p6-3-caption-adapter.md`: phase epic §9.1 caption section, §3.7a ModelCaption. Allowed: `kebab-core`, `kebab-config`, `kebab-llm` (reuse LanguageModel for VLM). Optional/feature-gated. `depends_on: [p6-1, p4-2]`. Populates `ImageRefBlock.caption`. Tests: with mock LM, caption recorded with model id; absence of feature flag leaves caption=None.
|
||||
|
||||
- [ ] **Step 1, 2, 3** as in C3.
|
||||
|
||||
@@ -1226,8 +1226,8 @@ git add tasks/p6 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
|
||||
|
||||
`mkdir -p tasks/p7` and create:
|
||||
|
||||
- `p7-1-pdf-text-extractor.md`: phase epic §9.2, §3.4 SourceSpan::Page. Allowed: `kb-core`, `kb-config`, `pdf-extract`, `lopdf` (page metadata). `depends_on: [p0-1, p1-6]`. Extractor for `MediaType::Pdf` produces a `CanonicalDocument` with one `Paragraph` per page, `SourceSpan::Page`. Failed-text pages are emitted as paragraphs with empty text and a `Provenance` warning marking them as scanned candidates. Tests: page count, span correctness, failure handling.
|
||||
- `p7-2-pdf-page-chunker.md`: phase epic §9.2, §3.5, §0 Q3 citation. Allowed: `kb-core`, `kb-config`. New chunker version `pdf-page-v1` that respects page boundaries. `depends_on: [p7-1]`. Tests: chunk does not cross page boundary; very long page subdivides per `target_tokens`.
|
||||
- `p7-1-pdf-text-extractor.md`: phase epic §9.2, §3.4 SourceSpan::Page. Allowed: `kebab-core`, `kebab-config`, `pdf-extract`, `lopdf` (page metadata). `depends_on: [p0-1, p1-6]`. Extractor for `MediaType::Pdf` produces a `CanonicalDocument` with one `Paragraph` per page, `SourceSpan::Page`. Failed-text pages are emitted as paragraphs with empty text and a `Provenance` warning marking them as scanned candidates. Tests: page count, span correctness, failure handling.
|
||||
- `p7-2-pdf-page-chunker.md`: phase epic §9.2, §3.5, §0 Q3 citation. Allowed: `kebab-core`, `kebab-config`. New chunker version `pdf-page-v1` that respects page boundaries. `depends_on: [p7-1]`. Tests: chunk does not cross page boundary; very long page subdivides per `target_tokens`.
|
||||
|
||||
- [ ] **Step 1, 2, 3** as in C3.
|
||||
|
||||
@@ -1242,7 +1242,7 @@ git add tasks/p7 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
|
||||
|
||||
`mkdir -p tasks/p8` and create:
|
||||
|
||||
- `p8-1-whisper-adapter.md`: phase epic §9.3, §3.4 AudioRefBlock + `Transcript`. Allowed: `kb-core`, `kb-config`, whisper.cpp Rust binding (`whisper-rs`) or sidecar binary. `depends_on: [p0-1, p1-6]`. Implements `Transcriber` trait. Default model `large-v3` via config; tests use a tiny model (e.g., `base.en`) for speed. Tests: monotone segment timestamps, language detection populated, deterministic transcript on fixed audio.
|
||||
- `p8-1-whisper-adapter.md`: phase epic §9.3, §3.4 AudioRefBlock + `Transcript`. Allowed: `kebab-core`, `kebab-config`, whisper.cpp Rust binding (`whisper-rs`) or sidecar binary. `depends_on: [p0-1, p1-6]`. Implements `Transcriber` trait. Default model `large-v3` via config; tests use a tiny model (e.g., `base.en`) for speed. Tests: monotone segment timestamps, language detection populated, deterministic transcript on fixed audio.
|
||||
- `p8-2-segment-chunker.md`: phase epic §9.3, §3.5. New `audio-segment-v1` chunker that groups segments up to `target_tokens` with priority on speaker turn boundaries (when present). `depends_on: [p8-1]`. Tests: chunk timestamp == first/last segment timestamp; speaker change forces split.
|
||||
|
||||
- [ ] **Step 1, 2, 3** as in C3.
|
||||
@@ -1258,11 +1258,11 @@ git add tasks/p8 && git -c user.name=kb -c user.email=kb@local commit -m "tasks:
|
||||
|
||||
`mkdir -p tasks/p9` and create:
|
||||
|
||||
- `p9-1-tui-library.md`: phase epic §16.2, §3.7. Allowed: `kb-core`, `kb-app` only (UI law). `ratatui`, `crossterm`. `depends_on: [p1-6]`. Library list view + tag filter. Tests: snapshot of rendered frame against fixture corpus list.
|
||||
- `p9-1-tui-library.md`: phase epic §16.2, §3.7. Allowed: `kebab-core`, `kebab-app` only (UI law). `ratatui`, `crossterm`. `depends_on: [p1-6]`. Library list view + tag filter. Tests: snapshot of rendered frame against fixture corpus list.
|
||||
- `p9-2-tui-search.md`: phase epic §16.2, §1.5. Allowed: same as p9-1. `depends_on: [p2-2, p3-4]`. Search input + result list + preview pane; `Enter` triggers external editor jump (`$EDITOR +<line> <path>`). Tests: search results render; `g` keybinding constructs the correct editor command.
|
||||
- `p9-3-tui-ask.md`: phase epic §16.2, §1.1, §1.2. Allowed: same. `depends_on: [p4-3]`. Ask pane shows streaming tokens; `--explain` toggle. Tests: streaming render, refusal render.
|
||||
- `p9-4-tui-inspect.md`: §1.6 inspect, §3.5. Allowed: same. `depends_on: [p1-6, p3-3]`. Renders Document and Chunk inspection per wire schemas 2.5/2.6.
|
||||
- `p9-5-desktop-tauri.md`: phase epic §16.3, §1 all scenes. Allowed: `kb-core`, `kb-app`, Tauri backend; frontend stack TBD by user (vanilla TS by default). `depends_on: [p9-1, p9-2, p9-3, p9-4]`. Backend exposes Tauri commands that wrap `kb-app` 1:1. Source viewer per medium (Markdown render, PDF page, image with region overlay, audio with seek). Tests: backend command unit tests (no frontend e2e in this task).
|
||||
- `p9-5-desktop-tauri.md`: phase epic §16.3, §1 all scenes. Allowed: `kebab-core`, `kebab-app`, Tauri backend; frontend stack TBD by user (vanilla TS by default). `depends_on: [p9-1, p9-2, p9-3, p9-4]`. Backend exposes Tauri commands that wrap `kebab-app` 1:1. Source viewer per medium (Markdown render, PDF page, image with region overlay, audio with seek). Tests: backend command unit tests (no frontend e2e in this task).
|
||||
|
||||
- [ ] **Step 1, 2, 3** as in C3.
|
||||
|
||||
@@ -1351,5 +1351,5 @@ git -c user.name=kb -c user.email=kb@local commit -m "tasks: update INDEX with f
|
||||
- [ ] `tasks/p0/`, `tasks/p1/` … `tasks/p9/` exist with the component spec files listed above.
|
||||
- [ ] Every component spec contains: frontmatter (with `contract_sections`), Allowed/Forbidden, Inputs, Outputs, Public surface, Behavior contract, Storage/wire effects, Test plan, Definition of Done, Out of scope, Risks/notes.
|
||||
- [ ] `tasks/INDEX.md` lists every component task.
|
||||
- [ ] No new domain types introduced inside any component spec — every type referenced is defined in [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../specs/2026-04-27-kb-final-form-design.md).
|
||||
- [ ] No new domain types introduced inside any component spec — every type referenced is defined in [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../specs/2026-04-27-kebab-final-form-design.md).
|
||||
- [ ] All commits authored sequentially per task; rollback is per-task.
|
||||
|
||||
@@ -3,7 +3,7 @@ title: "KB v1 최종 결과물 형태 — Frozen Design"
|
||||
date: 2026-04-27
|
||||
status: frozen
|
||||
purpose: 작은 단위 분해 작업 시 spec 변경을 막기 위한 단단한 contract 동결
|
||||
source_report: ../../../kb_local_rust_report.md
|
||||
source_report: ../../../kebab_local_rust_report.md
|
||||
related_tasks: ../../../tasks/INDEX.md
|
||||
---
|
||||
|
||||
@@ -11,7 +11,7 @@ related_tasks: ../../../tasks/INDEX.md
|
||||
|
||||
이 문서는 사용자가 만족할 **최종 결과물의 매우 구체적 형태**를 동결한다. 각 phase 의 task 분해는 이 contract 위에서 수행되며, 이 문서가 바뀌지 않는 한 task 들의 인터페이스는 변하지 않는다.
|
||||
|
||||
전제 보고서는 [`kb_local_rust_report.md`](../../../kb_local_rust_report.md). 그 보고서가 *방향*과 *근거*를 제공하며, 이 문서가 *형태*를 못박는다.
|
||||
전제 보고서는 [`kebab_local_rust_report.md`](../../../kebab_local_rust_report.md). 그 보고서가 *방향*과 *근거*를 제공하며, 이 문서가 *형태*를 못박는다.
|
||||
|
||||
---
|
||||
|
||||
@@ -20,7 +20,7 @@ related_tasks: ../../../tasks/INDEX.md
|
||||
| # | 결정 | 값 | 근거 |
|
||||
|---|------|-----|------|
|
||||
| Q1 | scope 우선순위 | UX → Data 역도출 | 사용자 만족이 spec 안정성 lever |
|
||||
| Q2 | headline UX | `kb ask` 답변 화면 | 검색/citation/RAG/refusal/모델메타 모두 노출 |
|
||||
| Q2 | headline UX | `kebab ask` 답변 화면 | 검색/citation/RAG/refusal/모델메타 모두 노출 |
|
||||
| – | ask 기본 형식 | inline numeric refs `[1]…[n]` + footer | 일상 가독성 |
|
||||
| – | ask `--explain` | per-claim 분해 + verbose footer + retrieval trace | 디버그 단일 플래그 |
|
||||
| Q3 | citation 문자열 | URI fragment (`path#k=v…`, W3C Media Fragments) | 표준 정합 + Windows path 안전 + 브라우저 자동 스크롤 |
|
||||
@@ -34,7 +34,7 @@ related_tasks: ../../../tasks/INDEX.md
|
||||
| Q10 | workspace | single root + XDG layout | personal v1 적정 |
|
||||
| – | asset 보존 | content-addressable copy, `copy_threshold_mb=100` 초과 시 reference + checksum | reproducibility + 디스크 절감 |
|
||||
| – | wire 버전 | additive within `vN`, breaking → `vN+1` | 외부 깨짐 방지 |
|
||||
| – | ignore | gitignore 문법 + `.kbignore` | 익숙함 |
|
||||
| – | ignore | gitignore 문법 + `.kebabignore` | 익숙함 |
|
||||
| – | 에러 | thiserror per crate, anyhow at boundary | 추적성 + UX |
|
||||
| – | sync | watch=false default | v1 명시 ingest |
|
||||
|
||||
@@ -42,43 +42,43 @@ related_tasks: ../../../tasks/INDEX.md
|
||||
|
||||
## 1. Headline UX scenes
|
||||
|
||||
### 1.1 `kb ask` (default)
|
||||
### 1.1 `kebab ask` (default)
|
||||
|
||||
```text
|
||||
$ kb ask "Markdown chunking 규칙은?"
|
||||
$ kebab ask "Markdown chunking 규칙은?"
|
||||
|
||||
heading boundary 우선 [1]. code block 중간 분할 금지 [2]. table 가능한 한
|
||||
단일 chunk 유지 [2]. 긴 section 은 paragraph 단위로 분할 [1]. chunk 마다
|
||||
heading_path 와 source_span 보존 [1].
|
||||
|
||||
─────────────────────────────────────────────────────────
|
||||
[1] notes/rust/kb-architecture.md#L661-L672
|
||||
[1] notes/rust/kebab-architecture.md#L661-L672
|
||||
§14 Chunking 정책
|
||||
[2] notes/rust/kb-architecture.md#L665-L668
|
||||
[2] notes/rust/kebab-architecture.md#L665-L668
|
||||
§14 Chunking 정책
|
||||
|
||||
grounded ✓ qwen2.5:14b-instruct rag-v1 3 chunks
|
||||
```
|
||||
|
||||
### 1.2 `kb ask --explain`
|
||||
### 1.2 `kebab ask --explain`
|
||||
|
||||
```text
|
||||
$ kb ask --explain "Markdown chunking 규칙은?"
|
||||
$ kebab ask --explain "Markdown chunking 규칙은?"
|
||||
|
||||
▎ heading boundary 우선
|
||||
└ notes/rust/kb-architecture.md#L662
|
||||
└ notes/rust/kebab-architecture.md#L662
|
||||
「heading boundary를 우선한다」
|
||||
|
||||
▎ code block 중간 분할 금지
|
||||
└ notes/rust/kb-architecture.md#L663
|
||||
└ notes/rust/kebab-architecture.md#L663
|
||||
「code block은 중간에서 자르지 않는다」
|
||||
|
||||
▎ table 단일 chunk 유지
|
||||
└ notes/rust/kb-architecture.md#L664
|
||||
└ notes/rust/kebab-architecture.md#L664
|
||||
「table은 가능한 한 하나의 chunk로」
|
||||
|
||||
▎ heading_path / source_span 보존
|
||||
└ notes/rust/kb-architecture.md#L668-L670
|
||||
└ notes/rust/kebab-architecture.md#L668-L670
|
||||
|
||||
retrieval trace
|
||||
query "Markdown chunking 규칙은?"
|
||||
@@ -87,8 +87,8 @@ retrieval trace
|
||||
threshold (gate) 0.30 → top-1 0.82 pass
|
||||
fusion rrf (k=60)
|
||||
chunks (used) 3 / 8 returned
|
||||
#1 0.82 notes/rust/kb-architecture.md#L661-L672 bm25=12.4 vec=0.78
|
||||
#2 0.78 notes/rust/kb-architecture.md#L692-L713 bm25=10.1 vec=0.74
|
||||
#1 0.82 notes/rust/kebab-architecture.md#L661-L672 bm25=12.4 vec=0.78
|
||||
#2 0.78 notes/rust/kebab-architecture.md#L692-L713 bm25=10.1 vec=0.74
|
||||
#3 0.55 guides/markdown-style.md#L4-L18 bm25=8.2 vec=0.61
|
||||
|
||||
grounded ✓ qwen2.5:14b-instruct rag-v1 3 chunks
|
||||
@@ -96,10 +96,10 @@ prompt 1184 tokens completion 312 tokens latency 1842 ms
|
||||
embedding multilingual-e5-small index v1.0
|
||||
```
|
||||
|
||||
### 1.3 `kb ask` (refusal — score gate)
|
||||
### 1.3 `kebab ask` (refusal — score gate)
|
||||
|
||||
```text
|
||||
$ kb ask "당신의 회사 매출은?"
|
||||
$ kebab ask "당신의 회사 매출은?"
|
||||
|
||||
근거 부족. KB 에 해당 내용 없음.
|
||||
가까운 후보 (모두 임계 0.30 미만):
|
||||
@@ -108,10 +108,10 @@ $ kb ask "당신의 회사 매출은?"
|
||||
grounded ✗ qwen2.5:14b-instruct rag-v1 0 chunks used
|
||||
```
|
||||
|
||||
### 1.4 `kb ask` (refusal — LLM self-judge)
|
||||
### 1.4 `kebab ask` (refusal — LLM self-judge)
|
||||
|
||||
```text
|
||||
$ kb ask "이 책의 23쪽 결론은?"
|
||||
$ kebab ask "이 책의 23쪽 결론은?"
|
||||
|
||||
근거 부족. 제공된 chunk 중 결론 내용 없음.
|
||||
검색은 됨, LLM 이 결론 부재 판단:
|
||||
@@ -121,17 +121,17 @@ $ kb ask "이 책의 23쪽 결론은?"
|
||||
grounded ✗ qwen2.5:14b-instruct rag-v1 3 chunks searched, 0 grounded
|
||||
```
|
||||
|
||||
### 1.5 `kb search` (dense)
|
||||
### 1.5 `kebab search` (dense)
|
||||
|
||||
```text
|
||||
$ kb search "Markdown chunking 규칙"
|
||||
$ kebab search "Markdown chunking 규칙"
|
||||
|
||||
1. 0.82 notes/rust/kb-architecture.md#L661-L672
|
||||
1. 0.82 notes/rust/kebab-architecture.md#L661-L672
|
||||
§14 Chunking 정책
|
||||
heading boundary 우선. code block 중간 분할 금지.
|
||||
table 가능한 한 단일 chunk…
|
||||
|
||||
2. 0.71 notes/rust/kb-architecture.md#L692-L713
|
||||
2. 0.71 notes/rust/kebab-architecture.md#L692-L713
|
||||
§15 검색과 RAG 정책
|
||||
검색은 처음부터 hybrid 로 설계하되 구현은 단계적…
|
||||
|
||||
@@ -142,7 +142,7 @@ $ kb search "Markdown chunking 규칙"
|
||||
3 hits hybrid index v1.0 bm25+e5-small/RRF
|
||||
```
|
||||
|
||||
### 1.6 `kb search --explain`
|
||||
### 1.6 `kebab search --explain`
|
||||
|
||||
각 hit 아래 추가:
|
||||
|
||||
@@ -174,8 +174,8 @@ $ kb search "Markdown chunking 규칙"
|
||||
{
|
||||
"schema_version": "citation.v1",
|
||||
"kind": "line|page|region|caption|time",
|
||||
"path": "notes/rust/kb.md",
|
||||
"uri": "notes/rust/kb.md#L12-L34",
|
||||
"path": "notes/rust/kebab.md",
|
||||
"uri": "notes/rust/kebab.md#L12-L34",
|
||||
|
||||
"line": { "start": 12, "end": 34, "section": "§14 Chunking 정책" },
|
||||
"page": { "page": 13, "section": "Experiment Setup" },
|
||||
@@ -196,7 +196,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
|
||||
"score": 0.82,
|
||||
"chunk_id": "9b4a8c1e7d3f2a05",
|
||||
"doc_id": "3f9a2c10ee4d6b78",
|
||||
"doc_path": "notes/rust/kb-architecture.md",
|
||||
"doc_path": "notes/rust/kebab-architecture.md",
|
||||
"heading_path": ["아키텍처", "Chunking 정책"],
|
||||
"section_label": "§14 Chunking 정책",
|
||||
"snippet": "heading boundary 우선. code block 중간 분할 금지…",
|
||||
@@ -261,7 +261,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
|
||||
{
|
||||
"kind": "new|updated|skipped|error",
|
||||
"doc_id": "3f9a2c10ee4d6b78",
|
||||
"doc_path": "notes/rust/kb-architecture.md",
|
||||
"doc_path": "notes/rust/kebab-architecture.md",
|
||||
"asset_id": "8c1e7d3f2a05",
|
||||
"byte_len": 41822,
|
||||
"block_count": 184,
|
||||
@@ -277,13 +277,13 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
|
||||
|
||||
`--summary-only` 시 `items: null`.
|
||||
|
||||
### 2.5 DocSummary (`kb list docs`)
|
||||
### 2.5 DocSummary (`kebab list docs`)
|
||||
|
||||
```json
|
||||
{
|
||||
"schema_version": "doc_summary.v1",
|
||||
"doc_id": "3f9a2c10ee4d6b78",
|
||||
"doc_path": "notes/rust/kb-architecture.md",
|
||||
"doc_path": "notes/rust/kebab-architecture.md",
|
||||
"title": "Rust 로컬 Knowledge Base 설계",
|
||||
"lang": "ko",
|
||||
"tags": ["knowledge-base", "rust", "rag"],
|
||||
@@ -305,7 +305,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
|
||||
"schema_version": "chunk_inspection.v1",
|
||||
"chunk_id": "9b4a8c1e7d3f2a05",
|
||||
"doc_id": "3f9a2c10ee4d6b78",
|
||||
"doc_path": "notes/rust/kb-architecture.md",
|
||||
"doc_path": "notes/rust/kebab-architecture.md",
|
||||
"heading_path": ["아키텍처", "Chunking 정책"],
|
||||
"text": "heading boundary 우선…",
|
||||
"source_spans": [{ "kind": "line", "start": 661, "end": 672 }],
|
||||
@@ -325,9 +325,9 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
|
||||
"schema_version": "doctor.v1",
|
||||
"ok": true,
|
||||
"checks": [
|
||||
{ "name": "config_loaded", "ok": true, "detail": "~/.config/kb/config.toml" },
|
||||
{ "name": "data_dir_writable", "ok": true, "detail": "~/.local/share/kb" },
|
||||
{ "name": "sqlite_open", "ok": true, "detail": "kb.sqlite (schema v1)" },
|
||||
{ "name": "config_loaded", "ok": true, "detail": "~/.config/kebab/config.toml" },
|
||||
{ "name": "data_dir_writable", "ok": true, "detail": "~/.local/share/kebab" },
|
||||
{ "name": "sqlite_open", "ok": true, "detail": "kebab.sqlite (schema v1)" },
|
||||
{ "name": "lancedb_open", "ok": true, "detail": "lancedb/" },
|
||||
{ "name": "embedding_model", "ok": true, "detail": "multilingual-e5-small (384d)" },
|
||||
{ "name": "ollama_reachable", "ok": true, "detail": "http://127.0.0.1:11434" },
|
||||
@@ -346,7 +346,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는
|
||||
|
||||
---
|
||||
|
||||
## 3. 도메인 모델 (kb-core)
|
||||
## 3. 도메인 모델 (kebab-core)
|
||||
|
||||
### 3.1 Newtype IDs
|
||||
|
||||
@@ -581,7 +581,7 @@ pub struct RetrievalDetail {
|
||||
|
||||
### 3.7a Forward-declared types
|
||||
|
||||
`Block::ImageRef` / `AudioRef` variant 은 v1 부터 존재하나, 그 안의 `ocr` / `caption` / `transcript` 필드는 P1 에선 항상 `None`. 다음 타입은 `kb-core` 에 stub 으로 둠 (최종 도메인 모델 슬롯):
|
||||
`Block::ImageRef` / `AudioRef` variant 은 v1 부터 존재하나, 그 안의 `ocr` / `caption` / `transcript` 필드는 P1 에선 항상 `None`. 다음 타입은 `kebab-core` 에 stub 으로 둠 (최종 도메인 모델 슬롯):
|
||||
|
||||
```rust
|
||||
pub struct OcrText { pub joined: String, pub regions: Vec<OcrRegion>, pub engine: String, pub engine_version: String }
|
||||
@@ -596,41 +596,41 @@ pub enum ImageType { Png, Jpeg, Webp, Gif, Tiff, Other(String) }
|
||||
pub enum AudioType { M4a, Mp3, Wav, Flac, Ogg, Other(String) }
|
||||
```
|
||||
|
||||
`ExtractConfig`, `DocFilter`, `JobKind`, `JobStatus`, `JobFilter`, `JobRow`, `JobId`, `VectorRecord`, `VectorHit`, `RefusalSignal`, `NoHitSignal`, `DoctorUnhealthy` 도 `kb-core` 에 정의 (자세한 필드는 사용 시 결정, 이 spec 에서 forward-ref 만 보장).
|
||||
`ExtractConfig`, `DocFilter`, `JobKind`, `JobStatus`, `JobFilter`, `JobRow`, `JobId`, `VectorRecord`, `VectorHit`, `RefusalSignal`, `NoHitSignal`, `DoctorUnhealthy` 도 `kebab-core` 에 정의 (자세한 필드는 사용 시 결정, 이 spec 에서 forward-ref 만 보장).
|
||||
|
||||
`OffsetDateTime` 는 `time::OffsetDateTime`, `Result` 는 crate-local alias.
|
||||
|
||||
### 3.7b Parser intermediate types — `kb-parse-types`
|
||||
### 3.7b Parser intermediate types — `kebab-parse-types`
|
||||
|
||||
Parser 의 *중간* 표현 (`ParsedBlock` 류) 은 `kb-core` 가 아니라 별도의 thin crate **`kb-parse-types`** 에 둔다. 이유: `kb-normalize` 는 medium-agnostic 한 ID/Provenance lift 를 책임지고 어떤 parser 도 직접 import 하면 안 된다. 그러나 normalize 에 들어오는 입력 타입이 어딘가에 정의되어야 하는데, 그것을 `kb-core` 에 박으면 (a) parser-별 ParsedBlock 변종 (`ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`) 이 향후 합류할 때 core 의 namespace 가 폭발하고, (b) parser 의 의미 변경이 core 변경이 되어 모든 의존자가 영향을 받는다.
|
||||
Parser 의 *중간* 표현 (`ParsedBlock` 류) 은 `kebab-core` 가 아니라 별도의 thin crate **`kebab-parse-types`** 에 둔다. 이유: `kebab-normalize` 는 medium-agnostic 한 ID/Provenance lift 를 책임지고 어떤 parser 도 직접 import 하면 안 된다. 그러나 normalize 에 들어오는 입력 타입이 어딘가에 정의되어야 하는데, 그것을 `kebab-core` 에 박으면 (a) parser-별 ParsedBlock 변종 (`ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`) 이 향후 합류할 때 core 의 namespace 가 폭발하고, (b) parser 의 의미 변경이 core 변경이 되어 모든 의존자가 영향을 받는다.
|
||||
|
||||
`kb-parse-types` 는 이 둘 사이의 **유일한 layer** 다. 의존 그래프:
|
||||
`kebab-parse-types` 는 이 둘 사이의 **유일한 layer** 다. 의존 그래프:
|
||||
|
||||
```text
|
||||
kb-core (도메인 모델 — Block, Chunk, SourceSpan, IDs, …)
|
||||
kebab-core (도메인 모델 — Block, Chunk, SourceSpan, IDs, …)
|
||||
▲
|
||||
│
|
||||
kb-parse-types (parser 중간 표현 — ParsedBlock, ParsedImageRegion[P+], ParsedPdfPage[P+], ParsedAudioSegment[P+], Inline)
|
||||
kebab-parse-types (parser 중간 표현 — ParsedBlock, ParsedImageRegion[P+], ParsedPdfPage[P+], ParsedAudioSegment[P+], Inline)
|
||||
▲ ▲
|
||||
│ │
|
||||
kb-parse-md, kb-parse-pdf, kb-normalize
|
||||
kb-parse-image, kb-parse-audio
|
||||
kebab-parse-md, kebab-parse-pdf, kebab-normalize
|
||||
kebab-parse-image, kebab-parse-audio
|
||||
```
|
||||
|
||||
`kb-parse-types` 는:
|
||||
- `kb-core` 에만 의존 (`Block`, `SourceSpan`, `Lang` 등 도메인 타입 사용).
|
||||
- 다른 어떤 `kb-*` 에도 의존하지 않는다.
|
||||
`kebab-parse-types` 는:
|
||||
- `kebab-core` 에만 의존 (`Block`, `SourceSpan`, `Lang` 등 도메인 타입 사용).
|
||||
- 다른 어떤 `kebab-*` 에도 의존하지 않는다.
|
||||
- 어떤 parser 의 구체 라이브러리 (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`) 에도 의존하지 않는다.
|
||||
- serde + thiserror 정도의 외부 의존만 가진다.
|
||||
|
||||
P1 에서 정의되는 타입:
|
||||
|
||||
```rust
|
||||
// kb-parse-types — depends on kb-core only.
|
||||
// kebab-parse-types — depends on kebab-core only.
|
||||
pub struct ParsedBlock {
|
||||
pub kind: ParsedBlockKind,
|
||||
pub heading_path: Vec<String>,
|
||||
pub source_span: kb_core::SourceSpan,
|
||||
pub source_span: kebab_core::SourceSpan,
|
||||
pub payload: ParsedPayload,
|
||||
}
|
||||
|
||||
@@ -638,11 +638,11 @@ pub enum ParsedBlockKind { Heading, Paragraph, List, Code, Table, Quote, ImageRe
|
||||
|
||||
pub enum ParsedPayload {
|
||||
Heading { level: u8, text: String },
|
||||
Paragraph { text: String, inlines: Vec<kb_core::Inline> },
|
||||
List { ordered: bool, items: Vec<Vec<kb_core::Inline>> },
|
||||
Paragraph { text: String, inlines: Vec<kebab_core::Inline> },
|
||||
List { ordered: bool, items: Vec<Vec<kebab_core::Inline>> },
|
||||
Code { lang: Option<String>, code: String },
|
||||
Table { headers: Vec<String>, rows: Vec<Vec<String>> },
|
||||
Quote { text: String, inlines: Vec<kb_core::Inline> },
|
||||
Quote { text: String, inlines: Vec<kebab_core::Inline> },
|
||||
ImageRef { src: String, alt: String },
|
||||
AudioRef { src: String }, // duration_ms filled by extractor before chunking
|
||||
}
|
||||
@@ -651,7 +651,7 @@ pub struct Warning { pub kind: WarningKind, pub note: String }
|
||||
pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed }
|
||||
```
|
||||
|
||||
`Inline` 은 `kb-core` (§3.4) 에 있는 도메인 타입. `kb-parse-types` 는 그것을 *참조* 만 한다 — 같은 의미를 두 crate 에 중복 정의하지 않는다 (그러면 normalize 가 identity-conversion 을 해야 해서 무의미).
|
||||
`Inline` 은 `kebab-core` (§3.4) 에 있는 도메인 타입. `kebab-parse-types` 는 그것을 *참조* 만 한다 — 같은 의미를 두 crate 에 중복 정의하지 않는다 (그러면 normalize 가 identity-conversion 을 해야 해서 무의미).
|
||||
|
||||
P6/P7/P8 에서 추가될 타입 (forward-ref):
|
||||
|
||||
@@ -661,7 +661,7 @@ pub struct ParsedPdfPage { pub page: u32, pub text: String }
|
||||
pub struct ParsedAudioSegment { pub start_ms: u64, pub end_ms: u64, pub text: String }
|
||||
```
|
||||
|
||||
→ 새 medium 추가 시 `kb-core::Block` 변종은 변하지 않고, `kb-parse-types` 만 확장된다.
|
||||
→ 새 medium 추가 시 `kebab-core::Block` 변종은 변하지 않고, `kebab-parse-types` 만 확장된다.
|
||||
|
||||
### 3.8 Answer / RAG types
|
||||
|
||||
@@ -995,28 +995,28 @@ CREATE TABLE eval_query_results (
|
||||
| 종류 | 기본 위치 |
|
||||
|------|-----------|
|
||||
| 워크스페이스 | `~/KnowledgeBase/` |
|
||||
| config | `~/.config/kb/config.toml` |
|
||||
| data | `~/.local/share/kb/` |
|
||||
| cache | `~/.cache/kb/` |
|
||||
| state (logs) | `~/.local/state/kb/` |
|
||||
| config | `~/.config/kebab/config.toml` |
|
||||
| data | `~/.local/share/kebab/` |
|
||||
| cache | `~/.cache/kebab/` |
|
||||
| state (logs) | `~/.local/state/kebab/` |
|
||||
|
||||
`~`, `$HOME`, `${KB_*}` expand. 절대 path 정규화 후 사용.
|
||||
`~`, `$HOME`, `${KEBAB_*}` expand. 절대 path 정규화 후 사용.
|
||||
|
||||
### 6.2 Workspace 구조
|
||||
|
||||
```
|
||||
~/KnowledgeBase/
|
||||
├── inbox/ notes/ papers/ photos/ recordings/
|
||||
└── .kbignore
|
||||
└── .kebabignore
|
||||
```
|
||||
|
||||
`.kbignore` 와 `config.workspace.exclude` 합집합.
|
||||
`.kebabignore` 와 `config.workspace.exclude` 합집합.
|
||||
|
||||
### 6.3 Data dir 구조
|
||||
|
||||
```
|
||||
~/.local/share/kb/
|
||||
├── kb.sqlite (+ -wal, -shm)
|
||||
~/.local/share/kebab/
|
||||
├── kebab.sqlite (+ -wal, -shm)
|
||||
├── lancedb/
|
||||
│ └── chunk_embeddings_<model>_<dim>.lance/
|
||||
├── assets/<aa>/<asset_id> # shard
|
||||
@@ -1025,7 +1025,7 @@ CREATE TABLE eval_query_results (
|
||||
└── runs/<run_id>/ # eval per_query.jsonl + report.md
|
||||
```
|
||||
|
||||
### 6.4 Config (`~/.config/kb/config.toml`) — frozen schema
|
||||
### 6.4 Config (`~/.config/kebab/config.toml`) — frozen schema
|
||||
|
||||
```toml
|
||||
schema_version = 1
|
||||
@@ -1036,8 +1036,8 @@ include = ["**/*.md"]
|
||||
exclude = [".git/**", "node_modules/**", ".obsidian/**"]
|
||||
|
||||
[storage]
|
||||
data_dir = "${XDG_DATA_HOME:-~/.local/share}/kb"
|
||||
sqlite = "{data_dir}/kb.sqlite"
|
||||
data_dir = "${XDG_DATA_HOME:-~/.local/share}/kebab"
|
||||
sqlite = "{data_dir}/kebab.sqlite"
|
||||
vector_dir = "{data_dir}/lancedb"
|
||||
asset_dir = "{data_dir}/assets"
|
||||
artifact_dir = "{data_dir}/artifacts"
|
||||
@@ -1086,15 +1086,15 @@ max_context_tokens = 8000
|
||||
|
||||
config 우선순위: default → file → env (`KB_<SECTION>_<KEY>`) → CLI flag.
|
||||
|
||||
### 6.5 `kb init` 출력
|
||||
### 6.5 `kebab init` 출력
|
||||
|
||||
```text
|
||||
$ kb init
|
||||
created ~/.config/kb/config.toml
|
||||
created ~/.local/share/kb/
|
||||
$ kebab init
|
||||
created ~/.config/kebab/config.toml
|
||||
created ~/.local/share/kebab/
|
||||
created ~/KnowledgeBase/
|
||||
opened ~/.local/share/kb/kb.sqlite (schema v1)
|
||||
hint edit ~/.config/kb/config.toml then `kb ingest ~/KnowledgeBase`
|
||||
opened ~/.local/share/kebab/kebab.sqlite (schema v1)
|
||||
hint edit ~/.config/kebab/config.toml then `kebab ingest ~/KnowledgeBase`
|
||||
```
|
||||
|
||||
기존 파일 보존, `--force` 명시 필요.
|
||||
@@ -1107,7 +1107,7 @@ hint edit ~/.config/kb/config.toml then `kb ingest ~/KnowledgeBase`
|
||||
|
||||
---
|
||||
|
||||
## 7. Trait contracts (kb-core)
|
||||
## 7. Trait contracts (kebab-core)
|
||||
|
||||
### 7.1 입출력 보조
|
||||
|
||||
@@ -1210,34 +1210,34 @@ pub trait JobRepo {
|
||||
## 8. 모듈 경계 (Allowed / Forbidden)
|
||||
|
||||
```text
|
||||
kb-cli, kb-tui, kb-desktop
|
||||
└─> kb-app
|
||||
├─> kb-source-fs
|
||||
├─> kb-parse-md / kb-parse-pdf / kb-parse-image / kb-parse-audio
|
||||
│ └─> kb-parse-types (parser intermediate)
|
||||
├─> kb-normalize
|
||||
│ └─> kb-parse-types
|
||||
├─> kb-chunk
|
||||
├─> kb-store-sqlite (DocumentStore, JobRepo, Retriever[lexical])
|
||||
├─> kb-store-vector (VectorStore)
|
||||
├─> kb-embed-local
|
||||
├─> kb-search (Retriever[hybrid])
|
||||
├─> kb-llm-local
|
||||
├─> kb-rag
|
||||
├─> kb-eval
|
||||
└─> kb-config
|
||||
└─> kb-core (모두 의존)
|
||||
kebab-cli, kebab-tui, kebab-desktop
|
||||
└─> kebab-app
|
||||
├─> kebab-source-fs
|
||||
├─> kebab-parse-md / kebab-parse-pdf / kebab-parse-image / kebab-parse-audio
|
||||
│ └─> kebab-parse-types (parser intermediate)
|
||||
├─> kebab-normalize
|
||||
│ └─> kebab-parse-types
|
||||
├─> kebab-chunk
|
||||
├─> kebab-store-sqlite (DocumentStore, JobRepo, Retriever[lexical])
|
||||
├─> kebab-store-vector (VectorStore)
|
||||
├─> kebab-embed-local
|
||||
├─> kebab-search (Retriever[hybrid])
|
||||
├─> kebab-llm-local
|
||||
├─> kebab-rag
|
||||
├─> kebab-eval
|
||||
└─> kebab-config
|
||||
└─> kebab-core (모두 의존)
|
||||
```
|
||||
|
||||
`kb-parse-types` 는 `kb-core` 와 parsers/normalize 사이의 thin layer (§3.7b 참조). parser-별 중간 표현 (`ParsedBlock`, `ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`, `Inline`) 을 한 곳에 모아 (a) `kb-core` 의 namespace 폭발을 막고 (b) `kb-normalize` 가 parser 를 직접 import 하지 않게 한다.
|
||||
`kebab-parse-types` 는 `kebab-core` 와 parsers/normalize 사이의 thin layer (§3.7b 참조). parser-별 중간 표현 (`ParsedBlock`, `ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`, `Inline`) 을 한 곳에 모아 (a) `kebab-core` 의 namespace 폭발을 막고 (b) `kebab-normalize` 가 parser 를 직접 import 하지 않게 한다.
|
||||
|
||||
핵심 금지:
|
||||
- UI → store/llm/parse 직접 의존 ✗
|
||||
- parse-* → store/llm/embed ✗
|
||||
- parse-* → kb-normalize ✗ (단방향: parsers → kb-parse-types ← normalize)
|
||||
- parse-* → kebab-normalize ✗ (단방향: parsers → kebab-parse-types ← normalize)
|
||||
- chunk → llm/embed ✗
|
||||
- normalize → store / parse-* ✗
|
||||
- kb-parse-types → 어떤 parser/normalize/store/llm/embed/search/rag/ui ✗ (`kb-core` 만 의존)
|
||||
- kebab-parse-types → 어떤 parser/normalize/store/llm/embed/search/rag/ui ✗ (`kebab-core` 만 의존)
|
||||
- 다른 store 와 cross-write ✗
|
||||
|
||||
`cargo deny` + workspace deny.toml + CI 체크로 강제.
|
||||
@@ -1270,7 +1270,7 @@ CI:
|
||||
## 10. 에러 모델 + exit codes
|
||||
|
||||
```rust
|
||||
// kb-core
|
||||
// kebab-core
|
||||
pub enum CoreError { InvalidId, InvalidCitation, InvalidSpan, Malformed }
|
||||
// crate-local examples
|
||||
pub enum ParseMdError { Yaml(String), Encoding, Pulldown(String), Span }
|
||||
@@ -1278,7 +1278,7 @@ pub enum StoreError { Sqlx(rusqlite::Error), Migration(String), Conflict(String)
|
||||
pub enum LlmError { Unreachable, ModelNotPulled(String), Timeout, Stream(String) }
|
||||
```
|
||||
|
||||
Boundary (`kb-app`, `kb-cli`) 에서 `anyhow::Error` 합침. exit code 매핑:
|
||||
Boundary (`kebab-app`, `kebab-cli`) 에서 `anyhow::Error` 합침. exit code 매핑:
|
||||
|
||||
```rust
|
||||
fn exit_code(err: &anyhow::Error) -> i32 {
|
||||
@@ -1295,17 +1295,17 @@ fn exit_code(err: &anyhow::Error) -> i32 {
|
||||
| `--verbose` | + anyhow chain |
|
||||
| `--debug` 또는 `RUST_LOG=debug` | + tracing target/level/span |
|
||||
|
||||
Refusal 은 에러 아님. `kb ask` 거절은 정상 stdout (Answer with grounded=false) + exit 1.
|
||||
Refusal 은 에러 아님. `kebab ask` 거절은 정상 stdout (Answer with grounded=false) + exit 1.
|
||||
|
||||
Logging: `tracing` + `tracing-subscriber` + `tracing-appender` daily roll, `~/.local/state/kb/logs/`. structured (`trace_id`, `doc_id`, `chunk_id`).
|
||||
Logging: `tracing` + `tracing-subscriber` + `tracing-appender` daily roll, `~/.local/state/kebab/logs/`. structured (`trace_id`, `doc_id`, `chunk_id`).
|
||||
|
||||
`kb doctor` 출력 (사람):
|
||||
`kebab doctor` 출력 (사람):
|
||||
|
||||
```text
|
||||
$ kb doctor
|
||||
✓ config_loaded ~/.config/kb/config.toml
|
||||
✓ data_dir_writable ~/.local/share/kb
|
||||
✓ sqlite_open kb.sqlite (schema v1)
|
||||
$ kebab doctor
|
||||
✓ config_loaded ~/.config/kebab/config.toml
|
||||
✓ data_dir_writable ~/.local/share/kebab
|
||||
✓ sqlite_open kebab.sqlite (schema v1)
|
||||
✓ lancedb_open lancedb/
|
||||
✓ embedding_model multilingual-e5-small (384d)
|
||||
✓ ollama_reachable http://127.0.0.1:11434
|
||||
@@ -1322,7 +1322,7 @@ $ kb doctor
|
||||
이 문서가 동결 ↔ 다음 컴포넌트 분해 작업이 안전:
|
||||
|
||||
- 모든 wire schema (`docs/wire-schema/v1/*.schema.json`)
|
||||
- 모든 trait 시그니처 (kb-core)
|
||||
- 모든 trait 시그니처 (kebab-core)
|
||||
- 모든 ID recipe (4.2)
|
||||
- SQLite DDL (5장)
|
||||
- Filesystem + config schema (6장)
|
||||
@@ -1334,7 +1334,7 @@ $ kb doctor
|
||||
**의도적으로 빠진 것 (out of scope, P+)**:
|
||||
- multi-workspace
|
||||
- watch mode
|
||||
- desktop app `kb://` protocol handler
|
||||
- desktop app `kebab://` protocol handler
|
||||
- LLM-as-judge eval
|
||||
- visual embedding (CLIP)
|
||||
- real-time collab
|
||||
|
||||
@@ -17,11 +17,11 @@ urlcolor: blue
|
||||
|
||||
최종 방향은 다음 한 문장으로 요약할 수 있다.
|
||||
|
||||
> Markdown을 1등급 지식 소스로 삼고, 이미지, PDF, 음성은 각각 extractor adapter를 통해 동일한 `CanonicalDocument -> Chunk -> Embed -> Index -> Search -> RAG` 파이프라인으로 흘려보낸다. CLI, TUI, desktop app은 모두 같은 `kb-app` facade를 함수 호출로 사용한다.
|
||||
> Markdown을 1등급 지식 소스로 삼고, 이미지, PDF, 음성은 각각 extractor adapter를 통해 동일한 `CanonicalDocument -> Chunk -> Embed -> Index -> Search -> RAG` 파이프라인으로 흘려보낸다. CLI, TUI, desktop app은 모두 같은 `kebab-app` facade를 함수 호출로 사용한다.
|
||||
|
||||
가장 먼저 만들 것은 채팅 UI가 아니다. 먼저 만들어야 할 것은 다음 7가지다.
|
||||
|
||||
1. `kb-core`의 도메인 모델과 trait 계약
|
||||
1. `kebab-core`의 도메인 모델과 trait 계약
|
||||
2. deterministic ID 규칙
|
||||
3. Markdown canonicalization
|
||||
4. chunking policy
|
||||
@@ -66,16 +66,16 @@ urlcolor: blue
|
||||
Markdown files
|
||||
|
|
||||
v
|
||||
kb-source-fs
|
||||
kebab-source-fs
|
||||
|
|
||||
v
|
||||
kb-parse-md
|
||||
kebab-parse-md
|
||||
|
|
||||
v
|
||||
CanonicalDocument
|
||||
|
|
||||
v
|
||||
kb-chunk
|
||||
kebab-chunk
|
||||
|
|
||||
v
|
||||
Chunks
|
||||
@@ -88,15 +88,15 @@ SQLite metadata/FTS LanceDB vectors Raw asset store
|
||||
+--------------------+--------------------+
|
||||
|
|
||||
v
|
||||
kb-search
|
||||
kebab-search
|
||||
|
|
||||
v
|
||||
kb-rag
|
||||
kebab-rag
|
||||
|
|
||||
+---------------+---------------+
|
||||
| | |
|
||||
v v v
|
||||
kb-cli kb-tui kb-desktop
|
||||
kebab-cli kebab-tui kebab-desktop
|
||||
```
|
||||
|
||||
추후 확장 후 구조는 다음과 같다.
|
||||
@@ -133,23 +133,23 @@ Cargo workspace는 여러 package를 함께 관리하는 구조이며, 공통 `C
|
||||
[workspace]
|
||||
resolver = "3"
|
||||
members = [
|
||||
"crates/kb-core",
|
||||
"crates/kb-config",
|
||||
"crates/kb-source-fs",
|
||||
"crates/kb-parse-md",
|
||||
"crates/kb-normalize",
|
||||
"crates/kb-chunk",
|
||||
"crates/kb-store-sqlite",
|
||||
"crates/kb-store-vector",
|
||||
"crates/kb-embed",
|
||||
"crates/kb-embed-local",
|
||||
"crates/kb-search",
|
||||
"crates/kb-llm",
|
||||
"crates/kb-llm-local",
|
||||
"crates/kb-rag",
|
||||
"crates/kb-eval",
|
||||
"crates/kb-app",
|
||||
"crates/kb-cli"
|
||||
"crates/kebab-core",
|
||||
"crates/kebab-config",
|
||||
"crates/kebab-source-fs",
|
||||
"crates/kebab-parse-md",
|
||||
"crates/kebab-normalize",
|
||||
"crates/kebab-chunk",
|
||||
"crates/kebab-store-sqlite",
|
||||
"crates/kebab-store-vector",
|
||||
"crates/kebab-embed",
|
||||
"crates/kebab-embed-local",
|
||||
"crates/kebab-search",
|
||||
"crates/kebab-llm",
|
||||
"crates/kebab-llm-local",
|
||||
"crates/kebab-rag",
|
||||
"crates/kebab-eval",
|
||||
"crates/kebab-app",
|
||||
"crates/kebab-cli"
|
||||
]
|
||||
|
||||
[workspace.package]
|
||||
@@ -173,7 +173,7 @@ tracing = "0.1"
|
||||
초기 repo는 이렇게 잡는다.
|
||||
|
||||
```text
|
||||
kb/
|
||||
kebab/
|
||||
Cargo.toml
|
||||
README.md
|
||||
docs/
|
||||
@@ -191,70 +191,70 @@ kb/
|
||||
nested-headings.md
|
||||
code-and-table.md
|
||||
crates/
|
||||
kb-core/
|
||||
kb-config/
|
||||
kb-source-fs/
|
||||
kb-parse-md/
|
||||
kb-normalize/
|
||||
kb-chunk/
|
||||
kb-store-sqlite/
|
||||
kb-store-vector/
|
||||
kb-embed/
|
||||
kb-embed-local/
|
||||
kb-search/
|
||||
kb-llm/
|
||||
kb-llm-local/
|
||||
kb-rag/
|
||||
kb-eval/
|
||||
kb-app/
|
||||
kb-cli/
|
||||
kebab-core/
|
||||
kebab-config/
|
||||
kebab-source-fs/
|
||||
kebab-parse-md/
|
||||
kebab-normalize/
|
||||
kebab-chunk/
|
||||
kebab-store-sqlite/
|
||||
kebab-store-vector/
|
||||
kebab-embed/
|
||||
kebab-embed-local/
|
||||
kebab-search/
|
||||
kebab-llm/
|
||||
kebab-llm-local/
|
||||
kebab-rag/
|
||||
kebab-eval/
|
||||
kebab-app/
|
||||
kebab-cli/
|
||||
```
|
||||
|
||||
나중에 추가할 crate는 다음과 같다.
|
||||
|
||||
```text
|
||||
crates/kb-parse-image/
|
||||
crates/kb-parse-pdf/
|
||||
crates/kb-parse-audio/
|
||||
crates/kb-rerank/
|
||||
crates/kb-tui/
|
||||
crates/kb-desktop/
|
||||
crates/kebab-parse-image/
|
||||
crates/kebab-parse-pdf/
|
||||
crates/kebab-parse-audio/
|
||||
crates/kebab-rerank/
|
||||
crates/kebab-tui/
|
||||
crates/kebab-desktop/
|
||||
```
|
||||
|
||||
중요한 의존성 규칙은 다음과 같다.
|
||||
|
||||
```text
|
||||
kb-cli, kb-tui, kb-desktop
|
||||
-> kb-app
|
||||
-> kb-index / kb-search / kb-rag
|
||||
-> kb-core traits
|
||||
kebab-cli, kebab-tui, kebab-desktop
|
||||
-> kebab-app
|
||||
-> kebab-index / kebab-search / kebab-rag
|
||||
-> kebab-core traits
|
||||
-> concrete adapters
|
||||
```
|
||||
|
||||
UI crate는 절대로 parser, DB, LLM adapter를 직접 호출하지 않는다. 모든 user-facing command는 `kb-app` facade를 통해 호출한다.
|
||||
UI crate는 절대로 parser, DB, LLM adapter를 직접 호출하지 않는다. 모든 user-facing command는 `kebab-app` facade를 통해 호출한다.
|
||||
|
||||
# 5. 컴포넌트 목록과 책임
|
||||
|
||||
| 컴포넌트 | 책임 | 초기 구현 |
|
||||
|---|---|---|
|
||||
| `kb-core` | domain type, trait, error, ID 규칙 | 필수 |
|
||||
| `kb-config` | config 파일 로딩, 기본값, 경로 확장 | 필수 |
|
||||
| `kb-source-fs` | 로컬 폴더 scan, checksum, 변경 감지 | 필수 |
|
||||
| `kb-parse-md` | Markdown -> structured document | 필수 |
|
||||
| `kb-normalize` | parser output -> `CanonicalDocument` | 필수 |
|
||||
| `kb-chunk` | block-aware chunking | 필수 |
|
||||
| `kb-store-sqlite` | metadata, document, chunk, job, FTS | 필수 |
|
||||
| `kb-store-vector` | vector upsert/search | P1 |
|
||||
| `kb-embed` | embedding trait | P1 |
|
||||
| `kb-embed-local` | local embedding adapter | P1 |
|
||||
| `kb-search` | lexical, vector, hybrid retrieval | P1 |
|
||||
| `kb-llm` | language model trait | P1 |
|
||||
| `kb-llm-local` | Ollama 또는 llama.cpp adapter | P1 |
|
||||
| `kb-rag` | context packing, answer, citation | P1 |
|
||||
| `kb-eval` | golden query, regression test | P1 |
|
||||
| `kb-cli` | command line interface | 필수 |
|
||||
| `kb-tui` | terminal UI | P2 |
|
||||
| `kb-desktop` | desktop app | P3 |
|
||||
| `kebab-core` | domain type, trait, error, ID 규칙 | 필수 |
|
||||
| `kebab-config` | config 파일 로딩, 기본값, 경로 확장 | 필수 |
|
||||
| `kebab-source-fs` | 로컬 폴더 scan, checksum, 변경 감지 | 필수 |
|
||||
| `kebab-parse-md` | Markdown -> structured document | 필수 |
|
||||
| `kebab-normalize` | parser output -> `CanonicalDocument` | 필수 |
|
||||
| `kebab-chunk` | block-aware chunking | 필수 |
|
||||
| `kebab-store-sqlite` | metadata, document, chunk, job, FTS | 필수 |
|
||||
| `kebab-store-vector` | vector upsert/search | P1 |
|
||||
| `kebab-embed` | embedding trait | P1 |
|
||||
| `kebab-embed-local` | local embedding adapter | P1 |
|
||||
| `kebab-search` | lexical, vector, hybrid retrieval | P1 |
|
||||
| `kebab-llm` | language model trait | P1 |
|
||||
| `kebab-llm-local` | Ollama 또는 llama.cpp adapter | P1 |
|
||||
| `kebab-rag` | context packing, answer, citation | P1 |
|
||||
| `kebab-eval` | golden query, regression test | P1 |
|
||||
| `kebab-cli` | command line interface | 필수 |
|
||||
| `kebab-tui` | terminal UI | P2 |
|
||||
| `kebab-desktop` | desktop app | P3 |
|
||||
|
||||
# 6. 핵심 도메인 모델
|
||||
|
||||
@@ -398,10 +398,10 @@ Markdown frontmatter 기본 규약은 다음 정도로 시작한다.
|
||||
|
||||
```yaml
|
||||
---
|
||||
id: rust-kb-architecture
|
||||
id: rust-kebab-architecture
|
||||
title: Rust 로컬 Knowledge Base 설계
|
||||
aliases:
|
||||
- local kb
|
||||
- local kebab
|
||||
- rust rag
|
||||
tags:
|
||||
- knowledge-base
|
||||
@@ -418,7 +418,7 @@ lang: ko
|
||||
Markdown citation은 line range를 기본으로 한다.
|
||||
|
||||
```text
|
||||
notes/rust/kb.md:L12-L34
|
||||
notes/rust/kebab.md:L12-L34
|
||||
```
|
||||
|
||||
# 9. 이미지, PDF, 음성 확장 전략
|
||||
@@ -534,13 +534,13 @@ Ollama 문서는 macOS Sonoma 이상에서 Apple M series CPU/GPU support를 언
|
||||
초기 adapter는 다음처럼 둔다.
|
||||
|
||||
```text
|
||||
kb-llm-local
|
||||
kebab-llm-local
|
||||
- OllamaLanguageModel
|
||||
- later: LlamaCppLanguageModel
|
||||
- later: CandleLanguageModel
|
||||
```
|
||||
|
||||
Ollama가 내부적으로 local server를 쓰더라도, 프로젝트 아키텍처 관점에서는 HTTP MSA가 아니다. `kb-llm-local` 안에 캡슐화된 model adapter일 뿐이다.
|
||||
Ollama가 내부적으로 local server를 쓰더라도, 프로젝트 아키텍처 관점에서는 HTTP MSA가 아니다. `kebab-llm-local` 안에 캡슐화된 model adapter일 뿐이다.
|
||||
|
||||
## 11.3 Local embedding
|
||||
|
||||
@@ -583,10 +583,10 @@ M4 48GB MacBook은 개인용 local KB에 충분한 타겟이지만, indexing과
|
||||
root = "~/KnowledgeBase"
|
||||
|
||||
[storage]
|
||||
sqlite_path = "~/.local/share/kb/kb.sqlite"
|
||||
vector_path = "~/.local/share/kb/lancedb"
|
||||
raw_asset_path = "~/.local/share/kb/assets"
|
||||
artifact_path = "~/.local/share/kb/artifacts"
|
||||
sqlite_path = "~/.local/share/kebab/kebab.sqlite"
|
||||
vector_path = "~/.local/share/kebab/lancedb"
|
||||
raw_asset_path = "~/.local/share/kebab/assets"
|
||||
artifact_path = "~/.local/share/kebab/artifacts"
|
||||
|
||||
[indexing]
|
||||
max_parallel_extractors = 2
|
||||
@@ -609,7 +609,7 @@ context_tokens = 32768
|
||||
|
||||
# 13. Trait 계약
|
||||
|
||||
컴포넌트는 trait으로 연결한다. 아래 계약을 `kb-core`에 둔다.
|
||||
컴포넌트는 trait으로 연결한다. 아래 계약을 `kebab-core`에 둔다.
|
||||
|
||||
```rust
|
||||
pub trait SourceConnector {
|
||||
@@ -741,14 +741,14 @@ pub struct Answer {
|
||||
CLI는 가장 먼저 만든다.
|
||||
|
||||
```text
|
||||
kb init
|
||||
kb ingest <path>
|
||||
kb index
|
||||
kb search <query>
|
||||
kb ask <query>
|
||||
kb inspect doc <doc_id>
|
||||
kb inspect chunk <chunk_id>
|
||||
kb doctor
|
||||
kebab init
|
||||
kebab ingest <path>
|
||||
kebab index
|
||||
kebab search <query>
|
||||
kebab ask <query>
|
||||
kebab inspect doc <doc_id>
|
||||
kebab inspect chunk <chunk_id>
|
||||
kebab doctor
|
||||
```
|
||||
|
||||
CLI는 개발과 테스트의 기준점이다. TUI와 desktop app은 CLI 기능이 안정된 뒤 붙인다.
|
||||
@@ -791,10 +791,10 @@ Desktop app은 가장 나중에 만든다. 이유는 UI보다 먼저 domain mode
|
||||
산출물:
|
||||
|
||||
```text
|
||||
kb-core
|
||||
kb-config
|
||||
kb-app
|
||||
kb-cli
|
||||
kebab-core
|
||||
kebab-config
|
||||
kebab-app
|
||||
kebab-cli
|
||||
docs/spec/*
|
||||
fixtures/markdown/*
|
||||
```
|
||||
@@ -804,7 +804,7 @@ fixtures/markdown/*
|
||||
```text
|
||||
cargo check --workspace
|
||||
cargo test --workspace
|
||||
kb --help
|
||||
kebab --help
|
||||
```
|
||||
|
||||
## Phase 1 - Markdown ingestion
|
||||
@@ -814,19 +814,19 @@ kb --help
|
||||
구현 crate:
|
||||
|
||||
```text
|
||||
kb-source-fs
|
||||
kb-parse-md
|
||||
kb-normalize
|
||||
kb-chunk
|
||||
kb-store-sqlite
|
||||
kebab-source-fs
|
||||
kebab-parse-md
|
||||
kebab-normalize
|
||||
kebab-chunk
|
||||
kebab-store-sqlite
|
||||
```
|
||||
|
||||
완료 조건:
|
||||
|
||||
```text
|
||||
kb ingest ~/KnowledgeBase
|
||||
kb list docs
|
||||
kb inspect doc <doc_id>
|
||||
kebab ingest ~/KnowledgeBase
|
||||
kebab list docs
|
||||
kebab inspect doc <doc_id>
|
||||
```
|
||||
|
||||
## Phase 2 - Lexical search
|
||||
@@ -836,14 +836,14 @@ kb inspect doc <doc_id>
|
||||
완료 조건:
|
||||
|
||||
```text
|
||||
kb search "Rust workspace 설계"
|
||||
kebab search "Rust workspace 설계"
|
||||
```
|
||||
|
||||
결과는 citation을 포함해야 한다.
|
||||
|
||||
```text
|
||||
1. Rust workspace는 여러 package를 하나로 관리한다...
|
||||
source: notes/rust/kb.md:L12-L34
|
||||
source: notes/rust/kebab.md:L12-L34
|
||||
```
|
||||
|
||||
## Phase 3 - Vector search와 embedding
|
||||
@@ -853,18 +853,18 @@ kb search "Rust workspace 설계"
|
||||
구현 crate:
|
||||
|
||||
```text
|
||||
kb-embed
|
||||
kb-embed-local
|
||||
kb-store-vector
|
||||
kb-search
|
||||
kebab-embed
|
||||
kebab-embed-local
|
||||
kebab-store-vector
|
||||
kebab-search
|
||||
```
|
||||
|
||||
완료 조건:
|
||||
|
||||
```text
|
||||
kb index --embeddings
|
||||
kb search --mode vector "비슷한 설계 원칙"
|
||||
kb search --mode hybrid "Markdown chunking 규칙"
|
||||
kebab index --embeddings
|
||||
kebab search --mode vector "비슷한 설계 원칙"
|
||||
kebab search --mode hybrid "Markdown chunking 규칙"
|
||||
```
|
||||
|
||||
## Phase 4 - Local LLM RAG
|
||||
@@ -874,15 +874,15 @@ kb search --mode hybrid "Markdown chunking 규칙"
|
||||
구현 crate:
|
||||
|
||||
```text
|
||||
kb-llm
|
||||
kb-llm-local
|
||||
kb-rag
|
||||
kebab-llm
|
||||
kebab-llm-local
|
||||
kebab-rag
|
||||
```
|
||||
|
||||
완료 조건:
|
||||
|
||||
```text
|
||||
kb ask "내 KB 설계에서 저장소 전략은?"
|
||||
kebab ask "내 KB 설계에서 저장소 전략은?"
|
||||
```
|
||||
|
||||
답변은 citation을 포함해야 하며, 근거가 없으면 거절해야 한다.
|
||||
@@ -895,7 +895,7 @@ kb ask "내 KB 설계에서 저장소 전략은?"
|
||||
|
||||
```text
|
||||
fixtures/golden_queries.yaml
|
||||
kb-eval
|
||||
kebab-eval
|
||||
```
|
||||
|
||||
측정값:
|
||||
@@ -915,8 +915,8 @@ answer groundedness
|
||||
완료 조건:
|
||||
|
||||
```text
|
||||
kb ingest ./assets/diagram.png
|
||||
kb search "이미지 안의 OCR 텍스트"
|
||||
kebab ingest ./assets/diagram.png
|
||||
kebab search "이미지 안의 OCR 텍스트"
|
||||
```
|
||||
|
||||
## Phase 7 - PDF support
|
||||
@@ -926,8 +926,8 @@ kb search "이미지 안의 OCR 텍스트"
|
||||
완료 조건:
|
||||
|
||||
```text
|
||||
kb ingest ./paper.pdf
|
||||
kb search "PDF 안의 특정 개념"
|
||||
kebab ingest ./paper.pdf
|
||||
kebab search "PDF 안의 특정 개념"
|
||||
```
|
||||
|
||||
## Phase 8 - 음성 support
|
||||
@@ -937,8 +937,8 @@ kb search "PDF 안의 특정 개념"
|
||||
완료 조건:
|
||||
|
||||
```text
|
||||
kb ingest ./meeting.m4a
|
||||
kb search "회의에서 언급한 결정사항"
|
||||
kebab ingest ./meeting.m4a
|
||||
kebab search "회의에서 언급한 결정사항"
|
||||
```
|
||||
|
||||
## Phase 9 - TUI와 desktop app
|
||||
@@ -948,7 +948,7 @@ kb search "회의에서 언급한 결정사항"
|
||||
순서:
|
||||
|
||||
```text
|
||||
kb-tui -> kb-desktop
|
||||
kebab-tui -> kebab-desktop
|
||||
```
|
||||
|
||||
# 18. 테스트 전략
|
||||
@@ -986,7 +986,7 @@ AI에게 “전체 repo를 만들어줘”라고 시키지 말고, component spe
|
||||
템플릿은 다음과 같다.
|
||||
|
||||
```text
|
||||
Component: kb-parse-md
|
||||
Component: kebab-parse-md
|
||||
|
||||
Responsibility:
|
||||
- Markdown bytes를 CanonicalDocument로 변환한다.
|
||||
@@ -994,17 +994,17 @@ Responsibility:
|
||||
- line range 또는 byte range를 최대한 보존한다.
|
||||
|
||||
Allowed dependencies:
|
||||
- kb-core
|
||||
- kebab-core
|
||||
- pulldown-cmark 또는 comrak
|
||||
- serde
|
||||
- thiserror
|
||||
|
||||
Forbidden dependencies:
|
||||
- kb-store
|
||||
- kb-llm
|
||||
- kb-rag
|
||||
- kb-tui
|
||||
- kb-desktop
|
||||
- kebab-store
|
||||
- kebab-llm
|
||||
- kebab-rag
|
||||
- kebab-tui
|
||||
- kebab-desktop
|
||||
|
||||
Inputs:
|
||||
- RawAsset
|
||||
@@ -1052,7 +1052,7 @@ Non-goals:
|
||||
## 1-2일차
|
||||
|
||||
- workspace 생성
|
||||
- `kb-core` 도메인 타입 초안
|
||||
- `kebab-core` 도메인 타입 초안
|
||||
- ID 규칙 문서화
|
||||
- CLI skeleton
|
||||
|
||||
@@ -1073,7 +1073,7 @@ Non-goals:
|
||||
|
||||
- SQLite FTS5 검색
|
||||
- citation 출력
|
||||
- `kb inspect` 구현
|
||||
- `kebab inspect` 구현
|
||||
|
||||
## 12-14일차
|
||||
|
||||
@@ -1100,7 +1100,7 @@ MVP 완료 조건은 다음과 같다.
|
||||
- [ ] parser/chunker version을 바꾸면 재처리 대상이 식별된다.
|
||||
- [ ] local embedding을 붙일 수 있는 trait이 있다.
|
||||
- [ ] local LLM을 붙일 수 있는 trait이 있다.
|
||||
- [ ] `kb-app` facade를 통해 CLI가 동작한다.
|
||||
- [ ] `kebab-app` facade를 통해 CLI가 동작한다.
|
||||
|
||||
P1 완료 조건은 다음과 같다.
|
||||
|
||||
|
||||
@@ -14,30 +14,30 @@ historical contract that was implemented; this file accumulates the
|
||||
deltas so phase 5+ readers can find the live behavior without diffing
|
||||
git history.
|
||||
|
||||
## 2026-05-01 — `--config` flag silently ignored across all kb-cli subcommands
|
||||
## 2026-05-01 — `--config` flag silently ignored across all kebab-cli subcommands
|
||||
|
||||
**Discovered**: post-P3-5 manual smoke at `/tmp/kb-smoke/`.
|
||||
**Discovered**: post-P3-5 manual smoke at `/tmp/kebab-smoke/`.
|
||||
|
||||
**Symptom**: `kb --config /path/to/config.toml ingest|search|list|inspect|doctor` ignored the flag and fell back to `~/.config/kb/config.toml` (XDG default). Users had to use `KB_*` env vars to point at a non-default config.
|
||||
**Symptom**: `kebab --config /path/to/config.toml ingest|search|list|inspect|doctor` ignored the flag and fell back to `~/.config/kebab/config.toml` (XDG default). Users had to use `KEBAB_*` env vars to point at a non-default config.
|
||||
|
||||
**Root cause**: `kb-cli` read `cli.config` only inside `Cmd::Ingest` to build `SourceScope`, then called bare `kb_app::ingest(scope, summary_only)` which internally re-loaded `Config::load(None)` (XDG path). Same pattern in `Cmd::Search` / `List` / `Inspect` / `Doctor`. P3-5 introduced `*_with_config` test seams via `#[doc(hidden)] pub fn` but kb-cli never used them.
|
||||
**Root cause**: `kebab-cli` read `cli.config` only inside `Cmd::Ingest` to build `SourceScope`, then called bare `kebab_app::ingest(scope, summary_only)` which internally re-loaded `Config::load(None)` (XDG path). Same pattern in `Cmd::Search` / `List` / `Inspect` / `Doctor`. P3-5 introduced `*_with_config` test seams via `#[doc(hidden)] pub fn` but kebab-cli never used them.
|
||||
|
||||
**Fix** (PR #20, fix/cli-config-flag-and-search-output):
|
||||
- `kb-cli` now builds the Config once via `Config::load(cli.config.as_deref())` at the top of every subcommand and threads it into `kb_app::*_with_config(cfg, ...)` instead of `kb_app::*(...)`.
|
||||
- `kb_app::doctor()` rewritten as `doctor_with_config_path(Option<&Path>)` that reports the actual path probed and hard-fails when `--config <path>` doesn't exist (defaults would otherwise mask user intent).
|
||||
- `kb-app` module doc-comment updated: `#[doc(hidden)] pub fn *_with_config` is no longer "test-only seam" — it's the official "config-explicit" API consumed by CLI `--config`, integration tests, and TUI sessions.
|
||||
- Same PR also improved `kb search` printer: `{:.4}` score formatting (RRF range collapses on `{:.2}`) and `> heading_path` suffix so chunks from the same document are visually distinct.
|
||||
- `kebab-cli` now builds the Config once via `Config::load(cli.config.as_deref())` at the top of every subcommand and threads it into `kebab_app::*_with_config(cfg, ...)` instead of `kebab_app::*(...)`.
|
||||
- `kebab_app::doctor()` rewritten as `doctor_with_config_path(Option<&Path>)` that reports the actual path probed and hard-fails when `--config <path>` doesn't exist (defaults would otherwise mask user intent).
|
||||
- `kebab-app` module doc-comment updated: `#[doc(hidden)] pub fn *_with_config` is no longer "test-only seam" — it's the official "config-explicit" API consumed by CLI `--config`, integration tests, and TUI sessions.
|
||||
- Same PR also improved `kebab search` printer: `{:.4}` score formatting (RRF range collapses on `{:.2}`) and `> heading_path` suffix so chunks from the same document are visually distinct.
|
||||
|
||||
**Amends**: tasks/p3/p3-5-app-wiring.md (the test seam was always meant to be the config-explicit API; only the doc-comment lied).
|
||||
|
||||
### 2026-05-01 — `--config` regression in `kb ask` (P4-3 follow-up)
|
||||
### 2026-05-01 — `--config` regression in `kebab ask` (P4-3 follow-up)
|
||||
|
||||
**Discovered**: post-P4-3 manual smoke against 192.168.0.47 Ollama with `gemma4:26b`.
|
||||
|
||||
**Symptom**: `kb --config <path> ask` returned `model.id = qwen2.5:14b-instruct` (XDG default model) and `score_gate = 0.30` (XDG default), instead of `gemma4:26b` / `0.05` from the explicit config. P4-3 added the ask body but kb-cli's `Cmd::Ask` arm still called bare `kb_app::ask(query, opts)` — same regression class as the P3-5 fix above, just missed when ask was wired.
|
||||
**Symptom**: `kebab --config <path> ask` returned `model.id = qwen2.5:14b-instruct` (XDG default model) and `score_gate = 0.30` (XDG default), instead of `gemma4:26b` / `0.05` from the explicit config. P4-3 added the ask body but kebab-cli's `Cmd::Ask` arm still called bare `kebab_app::ask(query, opts)` — same regression class as the P3-5 fix above, just missed when ask was wired.
|
||||
|
||||
**Fix** (PR #24, fix/cli-ask-honor-config-flag):
|
||||
- `kb-cli` builds `Config::load(cli.config.as_deref())` once at the top of `Cmd::Ask` and calls `kb_app::ask_with_config(cfg, query, opts)`.
|
||||
- `kebab-cli` builds `Config::load(cli.config.as_deref())` once at the top of `Cmd::Ask` and calls `kebab_app::ask_with_config(cfg, query, opts)`.
|
||||
|
||||
**Amends**: tasks/p4/p4-3-rag-pipeline.md.
|
||||
|
||||
@@ -48,15 +48,15 @@ git history.
|
||||
**Root cause**: RRF formula `score(c) = Σ 1/(k_rrf + rank_m(c))` produces values bounded by `num_retrievers / (k_rrf + 1)`. With `num_retrievers = 2` and the default `k_rrf = 60`, the upper bound is `2/61 ≈ 0.0328`. The default `config.rag.score_gate = 0.05` was calibrated for vector / lexical scores already in `[0, 1]` and silently refused every hybrid query. `fusion_score` was also incomparable across modes — Lexical / Vector lived in `[0, 1]`, Hybrid lived in `(0, 0.033]`.
|
||||
|
||||
**Fix** (PR #25, fix/rrf-fusion-score-normalize-and-docs):
|
||||
- `crates/kb-search/src/hybrid.rs` divides every raw RRF score by `2 / (k_rrf + 1)` so `fusion_score` always lives in `[0, 1]` regardless of mode. Both retrievers contributing rank 1 normalises to `1.0`; chunks present in only one retriever cap around `0.5`. RRF's rank-ordering invariants are preserved (same constant divides every score), so sort + tiebreak behaviour is identical.
|
||||
- `crates/kebab-search/src/hybrid.rs` divides every raw RRF score by `2 / (k_rrf + 1)` so `fusion_score` always lives in `[0, 1]` regardless of mode. Both retrievers contributing rank 1 normalises to `1.0`; chunks present in only one retriever cap around `0.5`. RRF's rank-ordering invariants are preserved (same constant divides every score), so sort + tiebreak behaviour is identical.
|
||||
- One unit test (`rrf_formula_matches_known_value`) updated to expect the normalised value `(1/61 + 1/62) / (2/61) ≈ 0.9919`.
|
||||
- The integration snapshot `crates/kb-search/tests/fixtures/search/hybrid/run-1.json` already used presence checks (`fusion_score_positive: true`) rather than absolute values, so it didn't need regeneration.
|
||||
- The integration snapshot `crates/kebab-search/tests/fixtures/search/hybrid/run-1.json` already used presence checks (`fusion_score_positive: true`) rather than absolute values, so it didn't need regeneration.
|
||||
|
||||
**Why not a per-mode `score_gate` config**: separate `lexical_score_gate / vector_score_gate / hybrid_score_gate` would force every downstream consumer (CLI, eval, TUI) to know which mode picks which threshold. Normalising the score itself is a one-line change at the source and makes `Answer.retrieval.score_gate` semantically meaningful without per-mode bookkeeping.
|
||||
|
||||
**Amends**: tasks/p3/p3-4-hybrid-fusion.md (RRF formula now divides by `2/(k_rrf+1)` after summation), tasks/phase-3-vector-hybrid.md (RRF section).
|
||||
|
||||
**Verification**: post-fix smoke at `/tmp/kb-smoke/` with default `score_gate = 0.05` succeeded across four scenarios — Korean→Korean, English→English, cross-language, and out-of-corpus refusal.
|
||||
**Verification**: post-fix smoke at `/tmp/kebab-smoke/` with default `score_gate = 0.05` succeeded across four scenarios — Korean→Korean, English→English, cross-language, and out-of-corpus refusal.
|
||||
|
||||
## How to add an entry
|
||||
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
title: "KB 작업 단위 인덱스"
|
||||
source: kb_local_rust_report.md
|
||||
source: kebab_local_rust_report.md
|
||||
date: 2026-04-27
|
||||
---
|
||||
|
||||
# KB 작업 단위 인덱스
|
||||
|
||||
[`kb_local_rust_report.md`](../kb_local_rust_report.md) 의 Phase 로드맵을 아키텍처 수준 작업 단위로 분해. 각 task 문서는 독립적으로 착수/검수 가능한 단위.
|
||||
[`kebab_local_rust_report.md`](../kebab_local_rust_report.md) 의 Phase 로드맵을 아키텍처 수준 작업 단위로 분해. 각 task 문서는 독립적으로 착수/검수 가능한 단위.
|
||||
|
||||
## 의존 그래프
|
||||
|
||||
@@ -25,16 +25,16 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능.
|
||||
|
||||
| # | 코드 | 제목 | 핵심 산출 crate | 선행 |
|
||||
|---|------|------|----------------|------|
|
||||
| P0 | [phase-0-skeleton.md](phase-0-skeleton.md) | Workspace 뼈대 + 도메인 계약 | kb-core, kb-parse-types, kb-config, kb-app, kb-cli | – |
|
||||
| P1 | [phase-1-markdown-ingestion.md](phase-1-markdown-ingestion.md) | Markdown ingestion 파이프라인 | kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-sqlite | P0 |
|
||||
| P2 | [phase-2-lexical-search.md](phase-2-lexical-search.md) | SQLite FTS5 lexical 검색 + citation | kb-search (lexical) | P1 |
|
||||
| P3 | [phase-3-vector-hybrid.md](phase-3-vector-hybrid.md) | Local embedding + LanceDB + hybrid | kb-embed, kb-embed-local, kb-store-vector, kb-search | P2 |
|
||||
| P4 | [phase-4-local-llm-rag.md](phase-4-local-llm-rag.md) | Local LLM + RAG + grounded answer | kb-llm, kb-llm-local, kb-rag | P3 |
|
||||
| P5 | [phase-5-evaluation.md](phase-5-evaluation.md) | Golden query / regression eval | kb-eval | P4 |
|
||||
| P6 | [phase-6-image.md](phase-6-image.md) | 이미지 ingestion (OCR + caption) | kb-parse-image | P5 |
|
||||
| P7 | [phase-7-pdf.md](phase-7-pdf.md) | PDF text + page citation | kb-parse-pdf | P5 |
|
||||
| P8 | [phase-8-audio.md](phase-8-audio.md) | 음성 transcription + timestamp citation | kb-parse-audio | P5 |
|
||||
| P9 | [phase-9-ui.md](phase-9-ui.md) | TUI + desktop app | kb-tui, kb-desktop | P5 |
|
||||
| P0 | [phase-0-skeleton.md](phase-0-skeleton.md) | Workspace 뼈대 + 도메인 계약 | kebab-core, kebab-parse-types, kebab-config, kebab-app, kebab-cli | – |
|
||||
| P1 | [phase-1-markdown-ingestion.md](phase-1-markdown-ingestion.md) | Markdown ingestion 파이프라인 | kebab-source-fs, kebab-parse-md, kebab-normalize, kebab-chunk, kebab-store-sqlite | P0 |
|
||||
| P2 | [phase-2-lexical-search.md](phase-2-lexical-search.md) | SQLite FTS5 lexical 검색 + citation | kebab-search (lexical) | P1 |
|
||||
| P3 | [phase-3-vector-hybrid.md](phase-3-vector-hybrid.md) | Local embedding + LanceDB + hybrid | kebab-embed, kebab-embed-local, kebab-store-vector, kebab-search | P2 |
|
||||
| P4 | [phase-4-local-llm-rag.md](phase-4-local-llm-rag.md) | Local LLM + RAG + grounded answer | kebab-llm, kebab-llm-local, kebab-rag | P3 |
|
||||
| P5 | [phase-5-evaluation.md](phase-5-evaluation.md) | Golden query / regression eval | kebab-eval | P4 |
|
||||
| P6 | [phase-6-image.md](phase-6-image.md) | 이미지 ingestion (OCR + caption) | kebab-parse-image | P5 |
|
||||
| P7 | [phase-7-pdf.md](phase-7-pdf.md) | PDF text + page citation | kebab-parse-pdf | P5 |
|
||||
| P8 | [phase-8-audio.md](phase-8-audio.md) | 음성 transcription + timestamp citation | kebab-parse-audio | P5 |
|
||||
| P9 | [phase-9-ui.md](phase-9-ui.md) | TUI + desktop app | kebab-tui, kebab-desktop | P5 |
|
||||
|
||||
## Component task decomposition (per phase)
|
||||
|
||||
|
||||
@@ -6,7 +6,7 @@ title: "<Component title>"
|
||||
status: planned
|
||||
depends_on: [] # other task_ids
|
||||
unblocks: [] # other task_ids
|
||||
contract_source: ../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
|
||||
---
|
||||
|
||||
@@ -22,7 +22,7 @@ contract_sections: [] # e.g. [§3.5, §5.5, §7.2]
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kebab-core`
|
||||
- <other crates per design §8>
|
||||
- <external crates with versions>
|
||||
|
||||
@@ -71,7 +71,7 @@ If any item here is needed during implementation, STOP and update the frozen des
|
||||
| unit | ... | ... |
|
||||
| snapshot | ... (JSON freeze) | `fixtures/...` |
|
||||
| contract | trait round-trip | mock impls |
|
||||
| integration | end-to-end via `kb-app` facade | tmp workspace |
|
||||
| integration | end-to-end via `kebab-app` facade | tmp workspace |
|
||||
|
||||
All tests must run under `cargo test -p <crate>` and not require external network or Ollama unless explicitly stated.
|
||||
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P0
|
||||
component: workspace + kb-core + kb-config + kb-app + kb-cli
|
||||
component: workspace + kebab-core + kebab-config + kebab-app + kebab-cli
|
||||
task_id: p0-1
|
||||
title: "Workspace skeleton + frozen domain types/traits + ID recipe + facade"
|
||||
status: completed
|
||||
depends_on: []
|
||||
unblocks: [p1-1, p1-2, p1-3, p1-4, p1-5, p1-6, p2-1, p2-2, p3-1, p3-2, p3-3, p3-4, p4-1, p4-2, p4-3, p5-1, p5-2, p6-1, p6-2, p6-3, p7-1, p7-2, p8-1, p8-2, p9-1, p9-2, p9-3, p9-4, p9-5]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3 (all), §4, §5.1 schema_meta+migrations, §6 (config + XDG), §7 (all traits), §8 module boundaries, §9 versioning, §10 errors+exit codes, §2.8 wire schema_version]
|
||||
---
|
||||
|
||||
@@ -14,56 +14,56 @@ contract_sections: [§3 (all), §4, §5.1 schema_meta+migrations, §6 (config +
|
||||
|
||||
## Goal
|
||||
|
||||
Stand up the Cargo workspace (Rust 2024, resolver=3) with `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts.
|
||||
Stand up the Cargo workspace (Rust 2024, resolver=3) with `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
Every other task imports `kb-core`. If types or trait signatures wobble after this point, every downstream task spec drifts. This task is large but indivisible: types + traits + ID recipe + facade + CLI skeleton + wire schema stubs must land together so the rest of the workspace can compile against them.
|
||||
Every other task imports `kebab-core`. If types or trait signatures wobble after this point, every downstream task spec drifts. This task is large but indivisible: types + traits + ID recipe + facade + CLI skeleton + wire schema stubs must land together so the rest of the workspace can compile against them.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- workspace `[workspace.dependencies]`: `anyhow = "1"`, `thiserror = "2"`, `serde = { version = "1", features = ["derive"] }`, `serde_json = "1"`, `time = { version = "0.3", features = ["serde", "macros"] }`, `uuid = { version = "1", features = ["v7", "serde"] }`, `blake3 = "1"`, `tracing = "0.1"`
|
||||
- per crate:
|
||||
- `kb-core`: workspace deps + `serde_json::Map`, `serde-json-canonicalizer`, `unicode-normalization`
|
||||
- `kb-parse-types`: workspace deps + `kb-core` ONLY (no parsers, no stores, no normalize). Defines parser intermediate representations per design §3.7b.
|
||||
- `kb-config`: workspace deps + `toml = "0.8"`, `dirs = "5"` (XDG paths)
|
||||
- `kb-app`: workspace deps + `kb-core`, `kb-config`, `tracing-subscriber`, `tracing-appender`
|
||||
- `kb-cli`: workspace deps + `kb-core`, `kb-config`, `kb-app`, `clap = { version = "4", features = ["derive"] }`
|
||||
- `kebab-core`: workspace deps + `serde_json::Map`, `serde-json-canonicalizer`, `unicode-normalization`
|
||||
- `kebab-parse-types`: workspace deps + `kebab-core` ONLY (no parsers, no stores, no normalize). Defines parser intermediate representations per design §3.7b.
|
||||
- `kebab-config`: workspace deps + `toml = "0.8"`, `dirs = "5"` (XDG paths)
|
||||
- `kebab-app`: workspace deps + `kebab-core`, `kebab-config`, `tracing-subscriber`, `tracing-appender`
|
||||
- `kebab-cli`: workspace deps + `kebab-core`, `kebab-config`, `kebab-app`, `clap = { version = "4", features = ["derive"] }`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-core` MUST NOT depend on any other `kb-*` crate.
|
||||
- `kb-parse-types` MUST depend ONLY on `kb-core`. No parser libraries (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`, …), no other `kb-*` crate.
|
||||
- `kb-config` MUST NOT depend on `kb-app`, `kb-cli`, parsers, stores, embedders, search, llm, rag, tui, desktop.
|
||||
- `kb-app` MUST NOT yet depend on parsers/stores/embedders/search/llm/rag (those crates do not exist yet — facade methods stub out and return `unimplemented!()` or `anyhow::bail!("not yet wired (Pn-i)")`).
|
||||
- `kb-cli` MUST NOT call any non-`kb-app` crate directly.
|
||||
- `kebab-core` MUST NOT depend on any other `kebab-*` crate.
|
||||
- `kebab-parse-types` MUST depend ONLY on `kebab-core`. No parser libraries (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`, …), no other `kebab-*` crate.
|
||||
- `kebab-config` MUST NOT depend on `kebab-app`, `kebab-cli`, parsers, stores, embedders, search, llm, rag, tui, desktop.
|
||||
- `kebab-app` MUST NOT yet depend on parsers/stores/embedders/search/llm/rag (those crates do not exist yet — facade methods stub out and return `unimplemented!()` or `anyhow::bail!("not yet wired (Pn-i)")`).
|
||||
- `kebab-cli` MUST NOT call any non-`kebab-app` crate directly.
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| frozen design doc | Markdown | `docs/superpowers/specs/2026-04-27-kb-final-form-design.md` |
|
||||
| user `kb` invocation | command-line args | end user |
|
||||
| frozen design doc | Markdown | `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` |
|
||||
| user `kebab` invocation | command-line args | end user |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream consumer |
|
||||
|--------|------|---------------------|
|
||||
| compiling workspace | Rust crates | every later task |
|
||||
| `kb-core` types/traits | Rust API | every other crate |
|
||||
| `kb-core` ID functions | Rust API | parsers, normalize, chunkers, embedders, search, rag |
|
||||
| `kb-config::Config` | Rust struct | every other crate |
|
||||
| `kb-app` facade methods (stubs) | Rust API | `kb-cli`, future TUI/desktop |
|
||||
| `kb` binary | executable | end user |
|
||||
| `kebab-core` types/traits | Rust API | every other crate |
|
||||
| `kebab-core` ID functions | Rust API | parsers, normalize, chunkers, embedders, search, rag |
|
||||
| `kebab-config::Config` | Rust struct | every other crate |
|
||||
| `kebab-app` facade methods (stubs) | Rust API | `kebab-cli`, future TUI/desktop |
|
||||
| `kebab` binary | executable | end user |
|
||||
| `docs/wire-schema/v1/*.schema.json` stubs | JSON Schema files | future wire emitters and consumers |
|
||||
| `docs/spec/*.md` stubs (link to frozen design) | Markdown | future contributors |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
All types/traits below are defined in `kb-core` exactly per design §3 and §7 (no additions, no renames). Subagent must copy field-for-field.
|
||||
All types/traits below are defined in `kebab-core` exactly per design §3 and §7 (no additions, no renames). Subagent must copy field-for-field.
|
||||
|
||||
```rust
|
||||
// ── kb-core ─────────────────────────────────────────────────────────────────
|
||||
// ── kebab-core ─────────────────────────────────────────────────────────────────
|
||||
|
||||
// Newtype IDs (design §3.1) — Display + FromStr implemented.
|
||||
pub struct AssetId(pub String);
|
||||
@@ -114,7 +114,7 @@ pub struct AudioRefBlock { /* per §3.4 */ }
|
||||
pub enum Inline { /* per §3.4 */ }
|
||||
pub enum SourceSpan { /* per §3.4 */ }
|
||||
|
||||
// (ParsedBlock + parser intermediates live in kb-parse-types per design §3.7b — NOT in kb-core.)
|
||||
// (ParsedBlock + parser intermediates live in kebab-parse-types per design §3.7b — NOT in kebab-core.)
|
||||
|
||||
// Chunk + Citation (§3.5)
|
||||
pub struct Chunk { /* per §3.5 */ }
|
||||
@@ -232,14 +232,14 @@ pub fn nfc(input: &str) -> String; // §4.1
|
||||
```
|
||||
|
||||
```rust
|
||||
// ── kb-parse-types ──────────────────────────────────────────────────────────
|
||||
// ── kebab-parse-types ──────────────────────────────────────────────────────────
|
||||
// Per design §3.7b. Defines parser intermediate representations consumed by
|
||||
// kb-normalize. Depends on kb-core only — never on parser libraries.
|
||||
// kebab-normalize. Depends on kebab-core only — never on parser libraries.
|
||||
|
||||
pub struct ParsedBlock {
|
||||
pub kind: ParsedBlockKind,
|
||||
pub heading_path: Vec<String>,
|
||||
pub source_span: kb_core::SourceSpan,
|
||||
pub source_span: kebab_core::SourceSpan,
|
||||
pub payload: ParsedPayload,
|
||||
}
|
||||
|
||||
@@ -247,16 +247,16 @@ pub enum ParsedBlockKind { Heading, Paragraph, List, Code, Table, Quote, ImageRe
|
||||
|
||||
pub enum ParsedPayload {
|
||||
Heading { level: u8, text: String },
|
||||
Paragraph { text: String, inlines: Vec<kb_core::Inline> },
|
||||
List { ordered: bool, items: Vec<Vec<kb_core::Inline>> },
|
||||
Paragraph { text: String, inlines: Vec<kebab_core::Inline> },
|
||||
List { ordered: bool, items: Vec<Vec<kebab_core::Inline>> },
|
||||
Code { lang: Option<String>, code: String },
|
||||
Table { headers: Vec<String>, rows: Vec<Vec<String>> },
|
||||
Quote { text: String, inlines: Vec<kb_core::Inline> },
|
||||
Quote { text: String, inlines: Vec<kebab_core::Inline> },
|
||||
ImageRef { src: String, alt: String },
|
||||
AudioRef { src: String },
|
||||
}
|
||||
|
||||
// `Inline` itself lives in kb-core (§3.4) — parse-types references it, never duplicates it.
|
||||
// `Inline` itself lives in kebab-core (§3.4) — parse-types references it, never duplicates it.
|
||||
|
||||
pub struct Warning { pub kind: WarningKind, pub note: String }
|
||||
pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed }
|
||||
@@ -268,31 +268,31 @@ pub struct ParsedAudioSegment;
|
||||
```
|
||||
|
||||
```rust
|
||||
// ── kb-config ───────────────────────────────────────────────────────────────
|
||||
// ── kebab-config ───────────────────────────────────────────────────────────────
|
||||
pub struct Config { /* full schema per §6.4 */ }
|
||||
impl Config {
|
||||
pub fn load(path: Option<&std::path::Path>) -> anyhow::Result<Self>;
|
||||
pub fn from_file(path: &std::path::Path) -> anyhow::Result<Self>;
|
||||
pub fn defaults() -> Self;
|
||||
pub fn apply_env(self, env: &std::collections::HashMap<String, String>) -> Self;
|
||||
pub fn xdg_config_path() -> std::path::PathBuf; // ~/.config/kb/config.toml
|
||||
pub fn xdg_data_dir() -> std::path::PathBuf; // ~/.local/share/kb
|
||||
pub fn xdg_config_path() -> std::path::PathBuf; // ~/.config/kebab/config.toml
|
||||
pub fn xdg_data_dir() -> std::path::PathBuf; // ~/.local/share/kebab
|
||||
pub fn xdg_cache_dir() -> std::path::PathBuf;
|
||||
pub fn xdg_state_dir() -> std::path::PathBuf;
|
||||
}
|
||||
```
|
||||
|
||||
```rust
|
||||
// ── kb-app ──────────────────────────────────────────────────────────────────
|
||||
// ── kebab-app ──────────────────────────────────────────────────────────────────
|
||||
pub fn init_workspace(force: bool) -> anyhow::Result<()>;
|
||||
pub fn ingest(scope: kb_core::SourceScope, summary_only: bool) -> anyhow::Result<kb_core::IngestReport>;
|
||||
pub fn list_docs(filter: kb_core::DocFilter) -> anyhow::Result<Vec<kb_core::DocSummary>>;
|
||||
pub fn inspect_doc(id: &kb_core::DocumentId) -> anyhow::Result<kb_core::CanonicalDocument>;
|
||||
pub fn inspect_chunk(id: &kb_core::ChunkId) -> anyhow::Result<kb_core::Chunk>;
|
||||
pub fn search(query: kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>;
|
||||
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<kb_core::Answer>;
|
||||
pub fn ingest(scope: kebab_core::SourceScope, summary_only: bool) -> anyhow::Result<kebab_core::IngestReport>;
|
||||
pub fn list_docs(filter: kebab_core::DocFilter) -> anyhow::Result<Vec<kebab_core::DocSummary>>;
|
||||
pub fn inspect_doc(id: &kebab_core::DocumentId) -> anyhow::Result<kebab_core::CanonicalDocument>;
|
||||
pub fn inspect_chunk(id: &kebab_core::ChunkId) -> anyhow::Result<kebab_core::Chunk>;
|
||||
pub fn search(query: kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
|
||||
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<kebab_core::Answer>;
|
||||
pub fn doctor() -> anyhow::Result<DoctorReport>;
|
||||
pub struct AskOpts { pub k: usize, pub explain: bool, pub mode: kb_core::SearchMode, pub temperature: Option<f32>, pub seed: Option<u64> }
|
||||
pub struct AskOpts { pub k: usize, pub explain: bool, pub mode: kebab_core::SearchMode, pub temperature: Option<f32>, pub seed: Option<u64> }
|
||||
pub struct DoctorReport { pub ok: bool, pub checks: Vec<DoctorCheck> }
|
||||
pub struct DoctorCheck { pub name: String, pub ok: bool, pub detail: String, pub hint: Option<String> }
|
||||
```
|
||||
@@ -300,9 +300,9 @@ pub struct DoctorCheck { pub name: String, pub ok: bool, pub detail: String, pub
|
||||
P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; later phases replace bodies but never change signatures.
|
||||
|
||||
```rust
|
||||
// ── kb-cli ──────────────────────────────────────────────────────────────────
|
||||
// ── kebab-cli ──────────────────────────────────────────────────────────────────
|
||||
// clap subcommands: init | ingest | list (docs) | inspect (doc|chunk) | search | ask | doctor | eval (subcommand placeholder)
|
||||
// Each maps 1:1 to a kb_app function. Exit code mapping per §10.
|
||||
// Each maps 1:1 to a kebab_app function. Exit code mapping per §10.
|
||||
```
|
||||
|
||||
## Behavior contract
|
||||
@@ -312,18 +312,18 @@ P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; late
|
||||
- `id_from` uses `serde-json-canonicalizer` exactly as design §4.2 specifies and truncates blake3 to 32 hex chars.
|
||||
- `Citation::to_uri` emits W3C Media Fragments URIs per §0 Q3 (`#L<a>-L<b>`, `#p=<n>`, `#xywh=…`, `#caption`, `#t=hh:mm:ss,hh:mm:ss[&speaker=…]`).
|
||||
- `Citation::parse` is the strict inverse (round-trip property).
|
||||
- `kb-config` resolves XDG paths via `dirs` crate; respects `XDG_CONFIG_HOME`, `XDG_DATA_HOME`, `XDG_CACHE_HOME`, `XDG_STATE_HOME` if set.
|
||||
- Config layer order: defaults → file → env (`KB_<SECTION>_<KEY>`) → CLI flag (CLI override is applied by `kb-cli` after `Config::load`).
|
||||
- `kb-cli` global flags: `--config <path>`, `--verbose`, `--debug`, `--json`, `--explain` (where applicable). On `--json`, output conforms to wire schema v1.
|
||||
- `kb-cli` exit codes: 0 success, 1 no-hit/refusal, 2 error, 3 doctor unhealthy (per §10).
|
||||
- `kebab-config` resolves XDG paths via `dirs` crate; respects `XDG_CONFIG_HOME`, `XDG_DATA_HOME`, `XDG_CACHE_HOME`, `XDG_STATE_HOME` if set.
|
||||
- Config layer order: defaults → file → env (`KB_<SECTION>_<KEY>`) → CLI flag (CLI override is applied by `kebab-cli` after `Config::load`).
|
||||
- `kebab-cli` global flags: `--config <path>`, `--verbose`, `--debug`, `--json`, `--explain` (where applicable). On `--json`, output conforms to wire schema v1.
|
||||
- `kebab-cli` exit codes: 0 success, 1 no-hit/refusal, 2 error, 3 doctor unhealthy (per §10).
|
||||
- All facade-returned wire objects emit `schema_version` per §2 (e.g., `"answer.v1"`, `"search_hit.v1"`).
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Filesystem: creates `~/.config/kb/`, `~/.local/share/kb/`, `~/KnowledgeBase/` only when `kb init` runs; never on `Config::load`.
|
||||
- Filesystem: creates `~/.config/kebab/`, `~/.local/share/kebab/`, `~/KnowledgeBase/` only when `kebab init` runs; never on `Config::load`.
|
||||
- Wire schemas: ships `docs/wire-schema/v1/{citation,search_hit,answer,ingest_report,doc_summary,chunk_inspection,doctor}.schema.json` as **stubs** declaring the top-level `schema_version` and required fields per §2. Full property validation can land later.
|
||||
- DB: workspace ships `migrations/V001__init.sql` containing **only** §5.1 `schema_meta` + `migrations` tables (the full schema lands in p1-6's migration file or p0-1 may pre-stage the empty migrations directory; choose the former to keep this task within `kb-core`/`kb-config`/`kb-app`/`kb-cli` scope).
|
||||
- Logging: `tracing` initialized in `kb-cli`; daily-rolling file in `~/.local/state/kb/logs/`.
|
||||
- DB: workspace ships `migrations/V001__init.sql` containing **only** §5.1 `schema_meta` + `migrations` tables (the full schema lands in p1-6's migration file or p0-1 may pre-stage the empty migrations directory; choose the former to keep this task within `kebab-core`/`kebab-config`/`kebab-app`/`kebab-cli` scope).
|
||||
- Logging: `tracing` initialized in `kebab-cli`; daily-rolling file in `~/.local/state/kebab/logs/`.
|
||||
|
||||
## Test plan
|
||||
|
||||
@@ -336,20 +336,20 @@ P0 facade implementations call `anyhow::bail!("not yet wired (P<n>-<i>)")`; late
|
||||
| unit | newtype `Display`/`FromStr` rejects invalid lengths/chars | inline |
|
||||
| unit | `Config::defaults` + env override + CLI override produces expected merged config | inline |
|
||||
| snapshot | `Config::defaults` JSON serde stable | inline (round-trip) |
|
||||
| smoke | `kb --help`, `kb init`, `kb doctor` run; doctor reports config_loaded ✓ data_dir_writable ✓ even with no DB present (downstream checks may fail with hint) | tmp `XDG_*` env |
|
||||
| smoke | `kebab --help`, `kebab init`, `kebab doctor` run; doctor reports config_loaded ✓ data_dir_writable ✓ even with no DB present (downstream checks may fail with hint) | tmp `XDG_*` env |
|
||||
| build | `cargo check --workspace` and `cargo test --workspace` pass | repo |
|
||||
|
||||
All tests must run with no network, no Ollama, no models.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `Cargo.toml` workspace lists `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` and resolver=3, edition 2024
|
||||
- [ ] `Cargo.toml` workspace lists `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` and resolver=3, edition 2024
|
||||
- [ ] `cargo check --workspace` passes
|
||||
- [ ] `cargo test --workspace` passes
|
||||
- [ ] `kb-parse-types` `cargo tree` shows ONLY `kb-core` + `serde`/`thiserror` style deps (no parser libs, no other `kb-*`)
|
||||
- [ ] `kb --help` prints subcommands
|
||||
- [ ] `kb init` creates XDG dirs idempotently and writes `config.toml`
|
||||
- [ ] `kb doctor` returns wire JSON conforming to `doctor.v1` (in `--json` mode)
|
||||
- [ ] `kebab-parse-types` `cargo tree` shows ONLY `kebab-core` + `serde`/`thiserror` style deps (no parser libs, no other `kebab-*`)
|
||||
- [ ] `kebab --help` prints subcommands
|
||||
- [ ] `kebab init` creates XDG dirs idempotently and writes `config.toml`
|
||||
- [ ] `kebab doctor` returns wire JSON conforming to `doctor.v1` (in `--json` mode)
|
||||
- [ ] `docs/wire-schema/v1/*.schema.json` stubs exist (7 files: citation, search_hit, answer, ingest_report, doc_summary, chunk_inspection, doctor)
|
||||
- [ ] `docs/spec/` stubs exist linking to the frozen design (one file per: domain-model, ids, canonical-document, chunk-policy, citation-policy, module-boundaries, ai-generation-guidelines)
|
||||
- [ ] `fixtures/` root directory created with all subdirectories that downstream tasks reference: `fixtures/markdown/`, `fixtures/source-fs/`, `fixtures/search/lexical/`, `fixtures/search/hybrid/`, `fixtures/embed/`, `fixtures/vector/`, `fixtures/rag/`, `fixtures/eval/`, `fixtures/image/`, `fixtures/pdf/`, `fixtures/audio/`. Each subdir gets a `.gitkeep` so it tracks. P1 ships at minimum `fixtures/markdown/{simple-note,nested-headings,code-and-table}.md` (per epic phase-0); other dirs stay empty until their phase lands.
|
||||
@@ -361,12 +361,12 @@ All tests must run with no network, no Ollama, no models.
|
||||
- Any parser / store / embedder / search / llm / rag / tui / desktop logic (downstream phases).
|
||||
- Full schema migrations (most DDL lands in p1-6 / p2-1 / p3-3).
|
||||
- Wire schema deep validation (only required fields + `schema_version` checked here).
|
||||
- Real `kb-app` business logic (functions stub with `unimplemented!()` or explicit `bail!`).
|
||||
- Real `kebab-app` business logic (functions stub with `unimplemented!()` or explicit `bail!`).
|
||||
|
||||
## Risks / notes
|
||||
|
||||
- ID recipe is the contract that every later record depends on. Any change after this task lands forces a `parser_version` / `chunker_version` / `embedding_version` cascade per §9. Treat changes as schema migrations and update the design doc first.
|
||||
- Newtype IDs use `String` (not `[u8; 16]`) to keep serde simple; tests must still enforce 32-char hex constraint on `FromStr`.
|
||||
- `kb-app` stubs must use `bail!` not `panic!` so the CLI exits with code 2 cleanly per §10.
|
||||
- `kebab-app` stubs must use `bail!` not `panic!` so the CLI exits with code 2 cleanly per §10.
|
||||
- `clap` v4 derive: subcommand `inspect` has nested `doc` / `chunk` variants; ensure exit code 0/1/2 mapping wraps the facade call uniformly.
|
||||
- XDG path discovery on macOS: spec uses XDG (not `Application Support`) per §6.1 — `dirs` crate honors XDG env vars; tests must set them explicitly.
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P1
|
||||
component: kb-source-fs
|
||||
component: kebab-source-fs
|
||||
task_id: p1-1
|
||||
title: "Local filesystem source connector"
|
||||
status: completed
|
||||
depends_on: [p0-1]
|
||||
unblocks: [p1-2, p1-3, p1-4, p1-5, p1-6]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.3, §6.2, §6.6, §7.1, §7.2 SourceConnector, §8]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `ignore` (gitignore semantics)
|
||||
- `blake3`
|
||||
- `walkdir`
|
||||
@@ -34,21 +34,21 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `SourceScope` | `kb_core::SourceScope` | `kb-app` from config |
|
||||
| `SourceScope` | `kebab_core::SourceScope` | `kebab-app` from config |
|
||||
| filesystem | `&Path` | OS |
|
||||
| `.kbignore` | text file | workspace root, optional |
|
||||
| `.kebabignore` | text file | workspace root, optional |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream consumer |
|
||||
|--------|------|---------------------|
|
||||
| `Vec<RawAsset>` | `kb_core::RawAsset` | `kb-parse-md`, asset writer in `kb-store-sqlite` (via `kb-app`) |
|
||||
| `Vec<RawAsset>` | `kebab_core::RawAsset` | `kebab-parse-md`, asset writer in `kebab-store-sqlite` (via `kebab-app`) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -56,11 +56,11 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums
|
||||
pub struct FsSourceConnector { /* internal */ }
|
||||
|
||||
impl FsSourceConnector {
|
||||
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
|
||||
}
|
||||
|
||||
impl kb_core::SourceConnector for FsSourceConnector {
|
||||
fn scan(&self, scope: &kb_core::SourceScope) -> anyhow::Result<Vec<kb_core::RawAsset>>;
|
||||
impl kebab_core::SourceConnector for FsSourceConnector {
|
||||
fn scan(&self, scope: &kebab_core::SourceScope) -> anyhow::Result<Vec<kebab_core::RawAsset>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -70,7 +70,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
|
||||
- `asset_id` derived per design §4.2 from `blake3(raw bytes)` full hex.
|
||||
- `media_type` selected from extension + libmagic-like sniff fallback (`.md` → Markdown, others fall through to `MediaType::Other`).
|
||||
- `discovered_at` = current `OffsetDateTime::now_utc()` at scan time.
|
||||
- Combine `config.workspace.exclude` ∪ `.kbignore` for filter (union; ordering does not matter).
|
||||
- Combine `config.workspace.exclude` ∪ `.kebabignore` for filter (union; ordering does not matter).
|
||||
- Symbolic links: follow once, detect cycles via `canonicalize` + visited set.
|
||||
- Files larger than `storage.copy_threshold_mb` MB → emit `AssetStorage::Reference { path, sha }` (do not copy bytes here; copying is done by the asset writer task).
|
||||
- Idempotent: same input → same `Vec<RawAsset>` (sort by `workspace_path`).
|
||||
@@ -78,7 +78,7 @@ impl kb_core::SourceConnector for FsSourceConnector {
|
||||
## Storage / wire effects
|
||||
|
||||
- Reads: filesystem under `config.workspace.root`.
|
||||
- Writes: nothing. (Asset copy is handled by the asset writer in `kb-store-sqlite`.)
|
||||
- Writes: nothing. (Asset copy is handled by the asset writer in `kebab-store-sqlite`.)
|
||||
|
||||
## Test plan
|
||||
|
||||
@@ -87,25 +87,25 @@ impl kb_core::SourceConnector for FsSourceConnector {
|
||||
| unit | POSIX path normalization | inline cases incl. `./a/b.md`, `a//b.md`, `a/b.md` → identical |
|
||||
| unit | blake3 of known bytes matches expected hex | inline |
|
||||
| unit | gitignore filter (`*.tmp`, `node_modules/**`) excludes correctly | tmp tree built in test |
|
||||
| unit | `.kbignore` ∪ config exclude works | tmp tree |
|
||||
| unit | `.kebabignore` ∪ config exclude works | tmp tree |
|
||||
| unit | symlink cycle does not loop | tmp tree with `a -> b -> a` |
|
||||
| snapshot | `Vec<RawAsset>` serialized JSON for fixture tree is stable | `fixtures/source-fs/tree-1` |
|
||||
| determinism | re-running scan twice produces byte-identical JSON | `fixtures/source-fs/tree-1` |
|
||||
|
||||
All tests run under `cargo test -p kb-source-fs` with no network and no model.
|
||||
All tests run under `cargo test -p kebab-source-fs` with no network and no model.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-source-fs` passes
|
||||
- [ ] `cargo test -p kb-source-fs` passes
|
||||
- [ ] `cargo check -p kebab-source-fs` passes
|
||||
- [ ] `cargo test -p kebab-source-fs` passes
|
||||
- [ ] Snapshot test `fixtures/source-fs/tree-1` round-trips deterministically
|
||||
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kb-source-fs`)
|
||||
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kebab-source-fs`)
|
||||
- [ ] PR description links to design §3.3, §6.2, §7.2
|
||||
|
||||
## Out of scope
|
||||
|
||||
- File watching (P+).
|
||||
- Asset copy/reference storage on disk (`kb-store-sqlite` task p1-6).
|
||||
- Asset copy/reference storage on disk (`kebab-store-sqlite` task p1-6).
|
||||
- Non-fs source connectors (HTTP, S3 — P+).
|
||||
|
||||
## Risks / notes
|
||||
|
||||
@@ -1,29 +1,29 @@
|
||||
---
|
||||
phase: P1
|
||||
component: kb-parse-md (frontmatter submodule)
|
||||
component: kebab-parse-md (frontmatter submodule)
|
||||
task_id: p1-2
|
||||
title: "Markdown frontmatter parsing → Metadata"
|
||||
status: completed
|
||||
depends_on: [p0-1]
|
||||
unblocks: [p1-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_sections: [design §3.6 Metadata, design §3.7b kb-parse-types (Warning), design §0 Q9 frontmatter, design §10 errors]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [design §3.6 Metadata, design §3.7b kebab-parse-types (Warning), design §0 Q9 frontmatter, design §10 errors]
|
||||
---
|
||||
|
||||
# p1-2 — Markdown frontmatter parsing
|
||||
|
||||
## Goal
|
||||
|
||||
Parse YAML/TOML frontmatter from Markdown bytes into `kb_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
|
||||
Parse YAML/TOML frontmatter from Markdown bytes into `kebab_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kb-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
|
||||
Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kebab-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-parse-types` (provides shared `Warning` + `WarningKind` per design §3.7b)
|
||||
- `kebab-core`
|
||||
- `kebab-parse-types` (provides shared `Warning` + `WarningKind` per design §3.7b)
|
||||
- `serde`
|
||||
- `serde_yaml` (or `yaml-rust2`) for YAML
|
||||
- `toml` for TOML
|
||||
@@ -33,7 +33,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-tui`, `kb-desktop`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `pulldown-cmark` (block parser is a sibling task)
|
||||
- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-tui`, `kebab-desktop`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `pulldown-cmark` (block parser is a sibling task)
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -46,7 +46,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `(Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)` | tuple | `kb-normalize` → CanonicalDocument |
|
||||
| `(Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)` | tuple | `kebab-normalize` → CanonicalDocument |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -54,10 +54,10 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from
|
||||
pub fn parse_frontmatter(
|
||||
bytes: &[u8],
|
||||
hints: &BodyHints,
|
||||
) -> anyhow::Result<(kb_core::Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)>;
|
||||
) -> anyhow::Result<(kebab_core::Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)>;
|
||||
```
|
||||
|
||||
`Warning` / `WarningKind` come from `kb-parse-types` (shared with `p1-3` blocks parser and downstream `kb-normalize`). `FrontmatterSpan` is crate-internal; if any new public type is needed, STOP and update the frozen design doc first.
|
||||
`Warning` / `WarningKind` come from `kebab-parse-types` (shared with `p1-3` blocks parser and downstream `kebab-normalize`). `FrontmatterSpan` is crate-internal; if any new public type is needed, STOP and update the frozen design doc first.
|
||||
|
||||
## Behavior contract
|
||||
|
||||
@@ -68,7 +68,7 @@ pub fn parse_frontmatter(
|
||||
- `source_type` default `markdown`; `trust_level` default `primary`.
|
||||
- `aliases`, `tags` default empty.
|
||||
- Unknown keys → `metadata.user` (`serde_json::Map`), preserved verbatim, no warning.
|
||||
- Unknown enum value (e.g. `trust_level: weird`) → emit `kb_parse_types::Warning { kind: WarningKind::MalformedFrontmatter, note: "unknown trust_level=weird, defaulted to primary" }` + ingest continues with default.
|
||||
- Unknown enum value (e.g. `trust_level: weird`) → emit `kebab_parse_types::Warning { kind: WarningKind::MalformedFrontmatter, note: "unknown trust_level=weird, defaulted to primary" }` + ingest continues with default.
|
||||
- Malformed YAML → frontmatter discarded, body still parsed, `Warning { kind: WarningKind::MalformedFrontmatter, note: "<error msg>" }` emitted.
|
||||
- No frontmatter at all → defaults applied silently.
|
||||
- `id:` field captured into `metadata.user_id_alias` (alias only — does NOT influence `doc_id` per design §4.2).
|
||||
@@ -91,12 +91,12 @@ pub fn parse_frontmatter(
|
||||
| snapshot | `fixtures/markdown/frontmatter-only.md` produces stable JSON | fixture |
|
||||
| snapshot | mixed-language body with no `lang:` detects `ko` or `en` | `fixtures/markdown/mixed-lang.md` |
|
||||
|
||||
All tests under `cargo test -p kb-parse-md --lib frontmatter`.
|
||||
All tests under `cargo test -p kebab-parse-md --lib frontmatter`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-md` passes
|
||||
- [ ] `cargo test -p kb-parse-md frontmatter` passes
|
||||
- [ ] `cargo check -p kebab-parse-md` passes
|
||||
- [ ] `cargo test -p kebab-parse-md frontmatter` passes
|
||||
- [ ] No `pulldown-cmark` import in this submodule
|
||||
- [ ] Snapshot tests stable across two consecutive runs
|
||||
- [ ] PR links design §0 Q9, §3.6
|
||||
|
||||
@@ -1,20 +1,20 @@
|
||||
---
|
||||
phase: P1
|
||||
component: kb-parse-md (blocks submodule)
|
||||
component: kebab-parse-md (blocks submodule)
|
||||
task_id: p1-3
|
||||
title: "Markdown body → Block tree with line spans"
|
||||
status: completed
|
||||
depends_on: [p0-1]
|
||||
unblocks: [p1-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_sections: [§3.4 Block, §3.4 SourceSpan, §3.7b kb-parse-types, §0 Q3 citation]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4 Block, §3.4 SourceSpan, §3.7b kebab-parse-types, §0 Q3 citation]
|
||||
---
|
||||
|
||||
# p1-3 — Markdown body → Block tree
|
||||
|
||||
## Goal
|
||||
|
||||
Parse Markdown body bytes into a flat `Vec<kb_parse_types::ParsedBlock>` with heading paths and line ranges preserved, ready for `kb-normalize` to lift into `CanonicalDocument`.
|
||||
Parse Markdown body bytes into a flat `Vec<kebab_parse_types::ParsedBlock>` with heading paths and line ranges preserved, ready for `kebab-normalize` to lift into `CanonicalDocument`.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -22,15 +22,15 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-parse-types` (defines `ParsedBlock`, `ParsedPayload`, `Warning`)
|
||||
- `kebab-core`
|
||||
- `kebab-parse-types` (defines `ParsedBlock`, `ParsedPayload`, `Warning`)
|
||||
- `pulldown-cmark` (CommonMark with source-map; GFM tables enabled via feature)
|
||||
- `serde`
|
||||
- `thiserror`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `kb-tui`, `kb-desktop`, `comrak` (alternative parser; pick one)
|
||||
- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `kebab-tui`, `kebab-desktop`, `comrak` (alternative parser; pick one)
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -43,8 +43,8 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<kb_parse_types::ParsedBlock>` | shared type from `kb-parse-types` | `kb-normalize` |
|
||||
| `Vec<kb_parse_types::Warning>` | shared type | propagated into Provenance |
|
||||
| `Vec<kebab_parse_types::ParsedBlock>` | shared type from `kebab-parse-types` | `kebab-normalize` |
|
||||
| `Vec<kebab_parse_types::Warning>` | shared type | propagated into Provenance |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -52,10 +52,10 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
|
||||
pub fn parse_blocks(
|
||||
body: &[u8],
|
||||
body_offset_lines: u32,
|
||||
) -> anyhow::Result<(Vec<kb_parse_types::ParsedBlock>, Vec<kb_parse_types::Warning>)>;
|
||||
) -> anyhow::Result<(Vec<kebab_parse_types::ParsedBlock>, Vec<kebab_parse_types::Warning>)>;
|
||||
```
|
||||
|
||||
`ParsedBlock` is defined in `kb-parse-types` (design §3.7b). `kb-parse-md` does NOT define its own; it consumes the shared type. Lift to `kb_core::Block` (with `BlockId` assignment) is `kb-normalize`'s job (p1-4).
|
||||
`ParsedBlock` is defined in `kebab-parse-types` (design §3.7b). `kebab-parse-md` does NOT define its own; it consumes the shared type. Lift to `kebab_core::Block` (with `BlockId` assignment) is `kebab-normalize`'s job (p1-4).
|
||||
|
||||
## Behavior contract
|
||||
|
||||
@@ -63,9 +63,9 @@ pub fn parse_blocks(
|
||||
- Heading tree: every block records its ancestor heading texts in order (e.g., `["아키텍처", "Chunking 정책"]`).
|
||||
- Code blocks: `ParsedPayload::Code { lang: Some("rust"), code }` — fenced content not split.
|
||||
- Tables: GFM tables produce `ParsedPayload::Table { headers, rows }`; if a table cell is malformed, fall back to `ParsedPayload::Paragraph` + `Warning::MalformedTable`.
|
||||
- Image references: `` produces `ParsedPayload::ImageRef { src, alt }`. `AssetId` resolution happens later in `kb-normalize` (when image src can be matched to a workspace asset).
|
||||
- Lists: ordered/unordered preserved via `ParsedPayload::List { ordered, items }`; nested list items flattened so each `items[i]` is a `Vec<kb_core::Inline>` for one top-level item.
|
||||
- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per `kb_core::Inline` per design §3.4). Drop other inlines silently.
|
||||
- Image references: `` produces `ParsedPayload::ImageRef { src, alt }`. `AssetId` resolution happens later in `kebab-normalize` (when image src can be matched to a workspace asset).
|
||||
- Lists: ordered/unordered preserved via `ParsedPayload::List { ordered, items }`; nested list items flattened so each `items[i]` is a `Vec<kebab_core::Inline>` for one top-level item.
|
||||
- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per `kebab_core::Inline` per design §3.4). Drop other inlines silently.
|
||||
- Malformed input never panics. Worst case: empty `Vec<ParsedBlock>` + `Warning::ExtractFailed`.
|
||||
|
||||
## Storage / wire effects
|
||||
@@ -86,12 +86,12 @@ pub fn parse_blocks(
|
||||
| snapshot | `fixtures/markdown/nested-headings.md` → ParsedBlock JSON stable | fixture |
|
||||
| snapshot | `fixtures/markdown/code-and-table.md` → JSON stable | fixture |
|
||||
|
||||
All tests under `cargo test -p kb-parse-md --lib blocks`.
|
||||
All tests under `cargo test -p kebab-parse-md --lib blocks`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-md` passes
|
||||
- [ ] `cargo test -p kb-parse-md blocks` passes
|
||||
- [ ] `cargo check -p kebab-parse-md` passes
|
||||
- [ ] `cargo test -p kebab-parse-md blocks` passes
|
||||
- [ ] Snapshot tests stable across two runs
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.4
|
||||
@@ -99,7 +99,7 @@ All tests under `cargo test -p kb-parse-md --lib blocks`.
|
||||
## Out of scope
|
||||
|
||||
- Frontmatter (p1-2).
|
||||
- Lifting `kb_parse_types::ParsedBlock` → `kb_core::Block` with `BlockId` (p1-4 normalize).
|
||||
- Lifting `kebab_parse_types::ParsedBlock` → `kebab_core::Block` with `BlockId` (p1-4 normalize).
|
||||
- Chunking (p1-5).
|
||||
|
||||
## Risks / notes
|
||||
|
||||
@@ -1,30 +1,30 @@
|
||||
---
|
||||
phase: P1
|
||||
component: kb-normalize
|
||||
component: kebab-normalize
|
||||
task_id: p1-4
|
||||
title: "Lift parser output → CanonicalDocument with deterministic IDs"
|
||||
status: completed
|
||||
depends_on: [p1-2, p1-3]
|
||||
unblocks: [p1-5, p1-6]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_sections: [§3.4, §3.7b kb-parse-types, §4 ID recipe, §3.6 Provenance, §8 module boundaries]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4, §3.7b kebab-parse-types, §4 ID recipe, §3.6 Provenance, §8 module boundaries]
|
||||
---
|
||||
|
||||
# p1-4 — Lift to CanonicalDocument
|
||||
|
||||
## Goal
|
||||
|
||||
Combine `Metadata` (p1-2) + `Vec<kb_parse_types::ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a `kb_core::CanonicalDocument` with deterministic `doc_id` and `block_id`s per design §4 recipe.
|
||||
Combine `Metadata` (p1-2) + `Vec<kebab_parse_types::ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a `kebab_core::CanonicalDocument` with deterministic `doc_id` and `block_id`s per design §4 recipe.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
|
||||
Single responsibility: ID generation + struct assembly. Keeps `kebab-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-parse-types` (input shapes — `ParsedBlock`, `ParsedPayload`, `Warning`)
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-parse-types` (input shapes — `ParsedBlock`, `ParsedPayload`, `Warning`)
|
||||
- `kebab-config`
|
||||
- `serde`
|
||||
- `serde-json-canonicalizer` (canonical JSON for ID hashing)
|
||||
- `blake3`
|
||||
@@ -34,36 +34,36 @@ Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` pure
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md` (consumed via shared `kb-parse-types` only — `kb-parse-md` must NOT appear in this crate's `cargo tree`), `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md` (consumed via shared `kebab-parse-types` only — `kebab-parse-md` must NOT appear in this crate's `cargo tree`), `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `RawAsset` | `kb_core::RawAsset` | p1-1 |
|
||||
| `Metadata` + frontmatter span + warnings | `(kb_core::Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)` | p1-2 |
|
||||
| `Vec<ParsedBlock>` + warnings | `(Vec<kb_parse_types::ParsedBlock>, Vec<kb_parse_types::Warning>)` | p1-3 |
|
||||
| `parser_version` | `kb_core::ParserVersion` | constant in `kb-parse-md` |
|
||||
| `RawAsset` | `kebab_core::RawAsset` | p1-1 |
|
||||
| `Metadata` + frontmatter span + warnings | `(kebab_core::Metadata, Option<FrontmatterSpan>, Vec<kebab_parse_types::Warning>)` | p1-2 |
|
||||
| `Vec<ParsedBlock>` + warnings | `(Vec<kebab_parse_types::ParsedBlock>, Vec<kebab_parse_types::Warning>)` | p1-3 |
|
||||
| `parser_version` | `kebab_core::ParserVersion` | constant in `kebab-parse-md` |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk`, `kb-store-sqlite` |
|
||||
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk`, `kebab-store-sqlite` |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub fn build_canonical_document(
|
||||
asset: &kb_core::RawAsset,
|
||||
metadata: kb_core::Metadata,
|
||||
blocks: Vec<kb_parse_types::ParsedBlock>,
|
||||
parser_version: &kb_core::ParserVersion,
|
||||
warnings: Vec<kb_parse_types::Warning>,
|
||||
) -> anyhow::Result<kb_core::CanonicalDocument>;
|
||||
asset: &kebab_core::RawAsset,
|
||||
metadata: kebab_core::Metadata,
|
||||
blocks: Vec<kebab_parse_types::ParsedBlock>,
|
||||
parser_version: &kebab_core::ParserVersion,
|
||||
warnings: Vec<kebab_parse_types::Warning>,
|
||||
) -> anyhow::Result<kebab_core::CanonicalDocument>;
|
||||
|
||||
pub fn id_for_doc(workspace_path: &kb_core::WorkspacePath, asset: &kb_core::AssetId, parser_version: &kb_core::ParserVersion) -> kb_core::DocumentId;
|
||||
pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kb_core::SourceSpan) -> kb_core::BlockId;
|
||||
pub fn id_for_doc(workspace_path: &kebab_core::WorkspacePath, asset: &kebab_core::AssetId, parser_version: &kebab_core::ParserVersion) -> kebab_core::DocumentId;
|
||||
pub fn id_for_block(doc: &kebab_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kebab_core::SourceSpan) -> kebab_core::BlockId;
|
||||
```
|
||||
|
||||
## Behavior contract
|
||||
@@ -91,14 +91,14 @@ pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[Strin
|
||||
| unit | provenance contains Discovered/Parsed/Normalized in order | inline |
|
||||
| snapshot | `fixtures/markdown/code-and-table.md` → CanonicalDocument JSON stable (incl. all IDs) | fixture |
|
||||
|
||||
All tests under `cargo test -p kb-normalize`.
|
||||
All tests under `cargo test -p kebab-normalize`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-normalize` passes
|
||||
- [ ] `cargo test -p kb-normalize` passes
|
||||
- [ ] `cargo check -p kebab-normalize` passes
|
||||
- [ ] `cargo test -p kebab-normalize` passes
|
||||
- [ ] Determinism test runs ≥ 1000 iterations under 1 second
|
||||
- [ ] No `kb-parse-md` (or any other parser crate) appears in `cargo tree -p kb-normalize` — input types come from `kb-parse-types` only
|
||||
- [ ] No `kebab-parse-md` (or any other parser crate) appears in `cargo tree -p kebab-normalize` — input types come from `kebab-parse-types` only
|
||||
- [ ] PR links design §3.7b, §4.2, §4.3, §8
|
||||
|
||||
## Out of scope
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P1
|
||||
component: kb-chunk
|
||||
component: kebab-chunk
|
||||
task_id: p1-5
|
||||
title: "Markdown heading-aware chunker (md-heading-v1)"
|
||||
status: completed
|
||||
depends_on: [p1-4]
|
||||
unblocks: [p1-6, p2-2, p3-2]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §7.2 Chunker, §0 Q3 citation]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `serde`
|
||||
- `blake3` (policy_hash)
|
||||
- `serde-json-canonicalizer`
|
||||
@@ -31,30 +31,30 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize` (consumes `CanonicalDocument` only via `kb-core`), `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize` (consumes `CanonicalDocument` only via `kebab-core`), `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 |
|
||||
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` from config |
|
||||
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
|
||||
| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` from config |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite` (p1-6), `kb-embed*` (P3) |
|
||||
| `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite` (p1-6), `kebab-embed*` (P3) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub struct MdHeadingV1Chunker;
|
||||
|
||||
impl kb_core::Chunker for MdHeadingV1Chunker {
|
||||
fn chunker_version(&self) -> kb_core::ChunkerVersion;
|
||||
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String;
|
||||
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>;
|
||||
impl kebab_core::Chunker for MdHeadingV1Chunker {
|
||||
fn chunker_version(&self) -> kebab_core::ChunkerVersion;
|
||||
fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
|
||||
fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -91,12 +91,12 @@ impl kb_core::Chunker for MdHeadingV1Chunker {
|
||||
| determinism | identical input + identical policy → identical chunk_ids | inline |
|
||||
| snapshot | `fixtures/markdown/long-section.md` → Vec<Chunk> JSON stable | fixture |
|
||||
|
||||
All tests under `cargo test -p kb-chunk`.
|
||||
All tests under `cargo test -p kebab-chunk`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-chunk` passes
|
||||
- [ ] `cargo test -p kb-chunk` passes
|
||||
- [ ] `cargo check -p kebab-chunk` passes
|
||||
- [ ] `cargo test -p kebab-chunk` passes
|
||||
- [ ] Snapshot stable across two runs
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.5, §4.2
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P1
|
||||
component: kb-store-sqlite (P1 subset)
|
||||
component: kebab-store-sqlite (P1 subset)
|
||||
task_id: p1-6
|
||||
title: "SQLite store: assets/documents/blocks/chunks + asset writer + migrations"
|
||||
status: completed
|
||||
depends_on: [p1-1, p1-4, p1-5]
|
||||
unblocks: [p2-1, p3-3, p4-3]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§5 DDL (5.1, 5.2, 5.3, 5.4, 5.5 chunks only — FTS handled in p2-1), §5.7 jobs/ingest_runs, §5.8 transactions, §6.3 data_dir layout]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `rusqlite` (with `bundled-sqlcipher` disabled; use `bundled` feature)
|
||||
- `refinery` for migrations
|
||||
- `serde_json`
|
||||
@@ -34,7 +34,7 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs` (only types via `kb-core`), `kb-parse-md`, `kb-normalize`, `kb-chunk` (only types via `kb-core`), `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs` (only types via `kebab-core`), `kebab-parse-md`, `kebab-normalize`, `kebab-chunk` (only types via `kebab-core`), `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -42,17 +42,17 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|
||||
|-------|------|--------|
|
||||
| migrations | `migrations/V001__init.sql` | repo |
|
||||
| `RawAsset` + bytes | `(RawAsset, Vec<u8>)` | p1-1 + reader |
|
||||
| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 |
|
||||
| `Vec<Chunk>` | `kb_core::Chunk` | p1-5 |
|
||||
| `IngestRun` aggregates | `(scope, counts, duration)` | `kb-app` |
|
||||
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 |
|
||||
| `Vec<Chunk>` | `kebab_core::Chunk` | p1-5 |
|
||||
| `IngestRun` aggregates | `(scope, counts, duration)` | `kebab-app` |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `data_dir/kb.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | – | every later phase |
|
||||
| `data_dir/kebab.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | – | every later phase |
|
||||
| `data_dir/assets/<aa>/<asset_id>` bytes (when copied) | – | future re-extraction, integrity verification |
|
||||
| `IngestReport` (wire schema v1) | `kb_core::IngestReport` | `kb-cli`, eval |
|
||||
| `IngestReport` (wire schema v1) | `kebab_core::IngestReport` | `kebab-cli`, eval |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -60,23 +60,23 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT
|
||||
pub struct SqliteStore { /* internal */ }
|
||||
|
||||
impl SqliteStore {
|
||||
pub fn open(config: &kb_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn open(config: &kebab_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn run_migrations(&self) -> anyhow::Result<()>;
|
||||
|
||||
pub fn put_asset_with_bytes(&self, asset: &kb_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
|
||||
pub fn put_asset_with_bytes(&self, asset: &kebab_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>;
|
||||
}
|
||||
|
||||
impl kb_core::DocumentStore for SqliteStore {
|
||||
fn put_asset(&self, a: &kb_core::RawAsset) -> anyhow::Result<()>;
|
||||
fn put_document(&self, d: &kb_core::CanonicalDocument) -> anyhow::Result<()>;
|
||||
fn put_blocks(&self, doc: &kb_core::DocumentId, blocks: &[kb_core::Block]) -> anyhow::Result<()>;
|
||||
fn put_chunks(&self, doc: &kb_core::DocumentId, chunks: &[kb_core::Chunk]) -> anyhow::Result<()>;
|
||||
fn get_document(&self, id: &kb_core::DocumentId) -> anyhow::Result<Option<kb_core::CanonicalDocument>>;
|
||||
fn get_chunk(&self, id: &kb_core::ChunkId) -> anyhow::Result<Option<kb_core::Chunk>>;
|
||||
fn list_documents(&self, filter: &kb_core::DocFilter) -> anyhow::Result<Vec<kb_core::DocSummary>>;
|
||||
impl kebab_core::DocumentStore for SqliteStore {
|
||||
fn put_asset(&self, a: &kebab_core::RawAsset) -> anyhow::Result<()>;
|
||||
fn put_document(&self, d: &kebab_core::CanonicalDocument) -> anyhow::Result<()>;
|
||||
fn put_blocks(&self, doc: &kebab_core::DocumentId, blocks: &[kebab_core::Block]) -> anyhow::Result<()>;
|
||||
fn put_chunks(&self, doc: &kebab_core::DocumentId, chunks: &[kebab_core::Chunk]) -> anyhow::Result<()>;
|
||||
fn get_document(&self, id: &kebab_core::DocumentId) -> anyhow::Result<Option<kebab_core::CanonicalDocument>>;
|
||||
fn get_chunk(&self, id: &kebab_core::ChunkId) -> anyhow::Result<Option<kebab_core::Chunk>>;
|
||||
fn list_documents(&self, filter: &kebab_core::DocFilter) -> anyhow::Result<Vec<kebab_core::DocSummary>>;
|
||||
}
|
||||
|
||||
impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
|
||||
impl kebab_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
|
||||
```
|
||||
|
||||
## Behavior contract
|
||||
@@ -96,7 +96,7 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Writes: `kb.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
|
||||
- Writes: `kebab.sqlite` (multiple tables), `data_dir/assets/<aa>/<asset_id>` (copied case).
|
||||
- Reads on subsequent calls: same DB.
|
||||
|
||||
## Test plan
|
||||
@@ -112,14 +112,14 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ }
|
||||
| contract | DocumentStore trait round-trip for fixture document | `fixtures/markdown/code-and-table.md` |
|
||||
| snapshot | IngestReport JSON for fixture run | fixture |
|
||||
|
||||
All tests under `cargo test -p kb-store-sqlite` with no network.
|
||||
All tests under `cargo test -p kebab-store-sqlite` with no network.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-store-sqlite` passes
|
||||
- [ ] `cargo test -p kb-store-sqlite` passes
|
||||
- [ ] `cargo check -p kebab-store-sqlite` passes
|
||||
- [ ] `cargo test -p kebab-store-sqlite` passes
|
||||
- [ ] migration `V001__init.sql` matches design §5 verbatim (diff-checked in CI)
|
||||
- [ ] Writes to `~/.local/share/kb/` are gated by `kb-config`'s `data_dir` and never escape it
|
||||
- [ ] Writes to `~/.local/share/kebab/` are gated by `kebab-config`'s `data_dir` and never escape it
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §5
|
||||
|
||||
@@ -132,5 +132,5 @@ All tests under `cargo test -p kb-store-sqlite` with no network.
|
||||
|
||||
## Risks / notes
|
||||
|
||||
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kb.sqlite-wal` / `-shm`.
|
||||
- WAL mode requires careful test cleanup: tests must drop the connection before removing `kebab.sqlite-wal` / `-shm`.
|
||||
- Asset directory shard prefix uses `asset_id[..2]`; using `asset_id[..1]` would create at most 16 dirs (insufficient).
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P2
|
||||
component: kb-store-sqlite (FTS5 migration)
|
||||
component: kebab-store-sqlite (FTS5 migration)
|
||||
task_id: p2-1
|
||||
title: "FTS5 virtual table + triggers (V002 migration)"
|
||||
status: completed
|
||||
depends_on: [p1-6]
|
||||
unblocks: [p2-2]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§5.5 chunks_fts + triggers, §9 versioning]
|
||||
---
|
||||
|
||||
@@ -18,19 +18,19 @@ Add `chunks_fts` virtual table and three sync triggers via migration `V002__fts.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
`chunks_fts` is the lexical index for `kb-search`. Splitting it from p1-6 keeps P1 focused on relational data; bringing it as `V002` lets users upgrade an existing P1 DB without re-ingesting.
|
||||
`chunks_fts` is the lexical index for `kebab-search`. Splitting it from p1-6 keeps P1 focused on relational data; bringing it as `V002` lets users upgrade an existing P1 DB without re-ingesting.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-store-sqlite` (extends migrations)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-store-sqlite` (extends migrations)
|
||||
- `rusqlite`
|
||||
- `refinery`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search` (consumer is p2-2), `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search` (consumer is p2-2), `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -52,7 +52,7 @@ Add `chunks_fts` virtual table and three sync triggers via migration `V002__fts.
|
||||
pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>;
|
||||
```
|
||||
|
||||
(Used by `kb index --rebuild-fts`. Re-runs `INSERT INTO chunks_fts SELECT ... FROM chunks` after `DELETE FROM chunks_fts;`.)
|
||||
(Used by `kebab index --rebuild-fts`. Re-runs `INSERT INTO chunks_fts SELECT ... FROM chunks` after `DELETE FROM chunks_fts;`.)
|
||||
|
||||
## Behavior contract
|
||||
|
||||
@@ -64,7 +64,7 @@ pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>;
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Writes: `chunks_fts` virtual table inside `kb.sqlite`.
|
||||
- Writes: `chunks_fts` virtual table inside `kebab.sqlite`.
|
||||
- Reads: existing `chunks` rows for backfill.
|
||||
|
||||
## Test plan
|
||||
@@ -78,12 +78,12 @@ pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>;
|
||||
| function | `rebuild_chunks_fts` produces deterministic content equal to fresh backfill | tmp DB |
|
||||
| migration | running `V002` twice is a no-op (refinery handles idempotency) | tmp DB |
|
||||
|
||||
All tests under `cargo test -p kb-store-sqlite fts`.
|
||||
All tests under `cargo test -p kebab-store-sqlite fts`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-store-sqlite` passes
|
||||
- [ ] `cargo test -p kb-store-sqlite fts` passes
|
||||
- [ ] `cargo check -p kebab-store-sqlite` passes
|
||||
- [ ] `cargo test -p kebab-store-sqlite fts` passes
|
||||
- [ ] `migrations/V002__fts.sql` matches design §5.5 verbatim (CI diff check)
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §5.5
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P2
|
||||
component: kb-search (lexical mode)
|
||||
component: kebab-search (lexical mode)
|
||||
task_id: p2-2
|
||||
title: "Lexical Retriever via SQLite FTS5 + bm25 + citation"
|
||||
status: completed
|
||||
depends_on: [p2-1]
|
||||
unblocks: [p3-4, p4-3]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5/1.6 search output, §2.2 wire schema, §6.4 search settings]
|
||||
---
|
||||
|
||||
@@ -14,38 +14,38 @@ contract_sections: [§3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5
|
||||
|
||||
## Goal
|
||||
|
||||
Implement `kb_core::Retriever` for `SearchMode::Lexical` using SQLite FTS5. Returns `SearchHit` with `bm25` ranking, `snippet()`-derived preview, and proper W3C-fragment citation.
|
||||
Implement `kebab_core::Retriever` for `SearchMode::Lexical` using SQLite FTS5. Returns `SearchHit` with `bm25` ranking, `snippet()`-derived preview, and proper W3C-fragment citation.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
First concrete `Retriever`. Lets `kb search --mode lexical` work without any embedding/LLM infrastructure. Establishes the SearchHit construction contract that hybrid (p3-4) reuses.
|
||||
First concrete `Retriever`. Lets `kebab search --mode lexical` work without any embedding/LLM infrastructure. Establishes the SearchHit construction contract that hybrid (p3-4) reuses.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-store-sqlite` (read access to `chunks_fts` + `chunks` + `documents`)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-store-sqlite` (read access to `chunks_fts` + `chunks` + `documents`)
|
||||
- `rusqlite`
|
||||
- `tracing`
|
||||
- `thiserror`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `SearchQuery` (mode=Lexical) | `kb_core::SearchQuery` | `kb-app::search` |
|
||||
| `kb-config::search` settings (`default_k`, `snippet_chars`) | `kb_config::Config` | runtime |
|
||||
| SQLite connection (read) | `rusqlite::Connection` | `kb-store-sqlite` |
|
||||
| `SearchQuery` (mode=Lexical) | `kebab_core::SearchQuery` | `kebab-app::search` |
|
||||
| `kebab-config::search` settings (`default_k`, `snippet_chars`) | `kebab_config::Config` | runtime |
|
||||
| SQLite connection (read) | `rusqlite::Connection` | `kebab-store-sqlite` |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<SearchHit>` | `kb_core::SearchHit` | `kb-cli` printer, `kb-rag` packer (P4), hybrid (p3-4) |
|
||||
| `Vec<SearchHit>` | `kebab_core::SearchHit` | `kebab-cli` printer, `kebab-rag` packer (P4), hybrid (p3-4) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -53,12 +53,12 @@ First concrete `Retriever`. Lets `kb search --mode lexical` work without any emb
|
||||
pub struct LexicalRetriever { /* internal: holds an Arc<rusqlite::Connection> + IndexVersion */ }
|
||||
|
||||
impl LexicalRetriever {
|
||||
pub fn new(store: std::sync::Arc<kb_store_sqlite::SqliteStore>, index_version: kb_core::IndexVersion) -> Self;
|
||||
pub fn new(store: std::sync::Arc<kebab_store_sqlite::SqliteStore>, index_version: kebab_core::IndexVersion) -> Self;
|
||||
}
|
||||
|
||||
impl kb_core::Retriever for LexicalRetriever {
|
||||
fn search(&self, query: &kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>;
|
||||
fn index_version(&self) -> kb_core::IndexVersion;
|
||||
impl kebab_core::Retriever for LexicalRetriever {
|
||||
fn search(&self, query: &kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
|
||||
fn index_version(&self) -> kebab_core::IndexVersion;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -98,8 +98,8 @@ impl kb_core::Retriever for LexicalRetriever {
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Reads only. Never mutates `kb.sqlite`.
|
||||
- Wire: `Vec<SearchHit>` serialized via wire schema `search_hit.v1` when `kb-cli --json` is used.
|
||||
- Reads only. Never mutates `kebab.sqlite`.
|
||||
- Wire: `Vec<SearchHit>` serialized via wire schema `search_hit.v1` when `kebab-cli --json` is used.
|
||||
|
||||
## Test plan
|
||||
|
||||
@@ -114,13 +114,13 @@ impl kb_core::Retriever for LexicalRetriever {
|
||||
| determinism | identical query twice produces identical hit order and scores | tmp DB |
|
||||
| snapshot | `Vec<SearchHit>` JSON for fixed corpus stable | `fixtures/search/lexical/run-1.json` |
|
||||
|
||||
All tests under `cargo test -p kb-search lexical`.
|
||||
All tests under `cargo test -p kebab-search lexical`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-search` passes
|
||||
- [ ] `cargo test -p kb-search lexical` passes
|
||||
- [ ] No imports outside Allowed dependencies (`cargo tree -p kb-search` audit)
|
||||
- [ ] `cargo check -p kebab-search` passes
|
||||
- [ ] `cargo test -p kebab-search lexical` passes
|
||||
- [ ] No imports outside Allowed dependencies (`cargo tree -p kebab-search` audit)
|
||||
- [ ] Output JSON conforms to `docs/wire-schema/v1/search_hit.schema.json`
|
||||
- [ ] PR links design §3.7, §0 Q3, §2.2
|
||||
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P3
|
||||
component: kb-embed (trait crate)
|
||||
component: kebab-embed (trait crate)
|
||||
task_id: p3-1
|
||||
title: "Embedder trait + EmbeddingInput/Kind validation"
|
||||
status: completed
|
||||
depends_on: [p0-1]
|
||||
unblocks: [p3-2, p3-3, p3-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [design §3.7 SearchHit.embedding_model, design §7.1 EmbeddingInput/Kind, design §7.2 Embedder, report §11 LLM/embedding split]
|
||||
---
|
||||
|
||||
@@ -14,16 +14,16 @@ contract_sections: [design §3.7 SearchHit.embedding_model, design §7.1 Embeddi
|
||||
|
||||
## Goal
|
||||
|
||||
Provide the `kb-embed` crate that re-exports `Embedder` trait, `EmbeddingInput`/`EmbeddingKind`, and offers a mock implementation for downstream tests. This task is **trait-only**; concrete adapters live in p3-2.
|
||||
Provide the `kebab-embed` crate that re-exports `Embedder` trait, `EmbeddingInput`/`EmbeddingKind`, and offers a mock implementation for downstream tests. This task is **trait-only**; concrete adapters live in p3-2.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface. Owning the trait + a mock implementation in a tiny crate keeps `kb-store-vector` and `kb-search` testable without touching real models.
|
||||
Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface. Owning the trait + a mock implementation in a tiny crate keeps `kebab-store-vector` and `kebab-search` testable without touching real models.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `serde`
|
||||
- `thiserror`
|
||||
- `tracing`
|
||||
@@ -31,35 +31,35 @@ Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface.
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `fastembed`, `ort`, `tokenizers`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `fastembed`, `ort`, `tokenizers`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `EmbeddingInput` | `kb_core::EmbeddingInput<'_>` | callers (parser-side or query-side) |
|
||||
| `EmbeddingInput` | `kebab_core::EmbeddingInput<'_>` | callers (parser-side or query-side) |
|
||||
| model identity | `(EmbeddingModelId, EmbeddingVersion, dimensions)` | adapter at construction |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<Vec<f32>>` | row-aligned with input | `kb-store-vector`, `kb-search` (vector mode) |
|
||||
| `Vec<Vec<f32>>` | row-aligned with input | `kebab-store-vector`, `kebab-search` (vector mode) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub use kb_core::{EmbeddingInput, EmbeddingKind, EmbeddingModelId, EmbeddingVersion, Embedder};
|
||||
pub use kebab_core::{EmbeddingInput, EmbeddingKind, EmbeddingModelId, EmbeddingVersion, Embedder};
|
||||
|
||||
/// Test-only mock that produces deterministic vectors. Compiled only when `mock` feature is on.
|
||||
#[cfg(feature = "mock")]
|
||||
pub struct MockEmbedder { /* internal: model_id, dims, seed */ }
|
||||
#[cfg(feature = "mock")]
|
||||
impl MockEmbedder {
|
||||
pub fn new(model_id: kb_core::EmbeddingModelId, version: kb_core::EmbeddingVersion, dimensions: usize) -> Self;
|
||||
pub fn new(model_id: kebab_core::EmbeddingModelId, version: kebab_core::EmbeddingVersion, dimensions: usize) -> Self;
|
||||
}
|
||||
#[cfg(feature = "mock")]
|
||||
impl kb_core::Embedder for MockEmbedder { /* per §7.2 */ }
|
||||
impl kebab_core::Embedder for MockEmbedder { /* per §7.2 */ }
|
||||
```
|
||||
|
||||
## Behavior contract
|
||||
@@ -84,21 +84,21 @@ impl kb_core::Embedder for MockEmbedder { /* per §7.2 */ }
|
||||
| unit | dimensions match construction-time value | inline |
|
||||
| contract | property test: 100 random inputs, each vector has length == dimensions, all finite floats | inline (proptest) |
|
||||
|
||||
All tests under `cargo test -p kb-embed`.
|
||||
All tests under `cargo test -p kebab-embed`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-embed` passes
|
||||
- [ ] `cargo test -p kb-embed` passes
|
||||
- [ ] `cargo check -p kebab-embed` passes
|
||||
- [ ] `cargo test -p kebab-embed` passes
|
||||
- [ ] No external embedding dep present
|
||||
- [ ] PR links design §7.2 Embedder, §11
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Real adapter (`kb-embed-local` is p3-2).
|
||||
- Real adapter (`kebab-embed-local` is p3-2).
|
||||
- Reranker traits (P+).
|
||||
|
||||
## Risks / notes
|
||||
|
||||
- `MockEmbedder` is gated by `mock` feature (default OFF). Downstream tests opt in via `[dev-dependencies] kb-embed = { path = "...", features = ["mock"] }`. CI build of release binary (`cargo build --release` without `--features mock`) MUST NOT include `MockEmbedder` symbol — verifiable via `cargo bloat` or `nm` symbol scan.
|
||||
- Trait re-exports keep the call site stable even if `kb-core` reorganizes; downstream crates should `use kb_embed::Embedder` rather than `use kb_core::Embedder`.
|
||||
- `MockEmbedder` is gated by `mock` feature (default OFF). Downstream tests opt in via `[dev-dependencies] kebab-embed = { path = "...", features = ["mock"] }`. CI build of release binary (`cargo build --release` without `--features mock`) MUST NOT include `MockEmbedder` symbol — verifiable via `cargo bloat` or `nm` symbol scan.
|
||||
- Trait re-exports keep the call site stable even if `kebab-core` reorganizes; downstream crates should `use kebab_embed::Embedder` rather than `use kebab_core::Embedder`.
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P3
|
||||
component: kb-embed-local (fastembed adapter)
|
||||
component: kebab-embed-local (fastembed adapter)
|
||||
task_id: p3-2
|
||||
title: "fastembed-rs Embedder for multilingual-e5-small"
|
||||
status: completed
|
||||
depends_on: [p3-1]
|
||||
unblocks: [p3-3, p3-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [design §7.2 Embedder, report §11.3 local embedding, design §6.4 [models.embedding], design §9 versioning]
|
||||
---
|
||||
|
||||
@@ -22,9 +22,9 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-embed`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-embed`
|
||||
- `fastembed = "4"` (or current stable)
|
||||
- `tokenizers`
|
||||
- `ort` (transitive via fastembed)
|
||||
@@ -33,21 +33,21 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, network HTTP libs (model download is fastembed's responsibility)
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, network HTTP libs (model download is fastembed's responsibility)
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `kb-config::Config.models.embedding` | settings | runtime |
|
||||
| `EmbeddingInput[..]` | `kb_core::EmbeddingInput<'_>[]` | callers |
|
||||
| `kebab-config::Config.models.embedding` | settings | runtime |
|
||||
| `EmbeddingInput[..]` | `kebab_core::EmbeddingInput<'_>[]` | callers |
|
||||
| model cache | `data_dir/models/fastembed/` | filesystem |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<Vec<f32>>` | row-aligned, `dimensions = 384` | `kb-store-vector`, query vectors for hybrid search |
|
||||
| `Vec<Vec<f32>>` | row-aligned, `dimensions = 384` | `kebab-store-vector`, query vectors for hybrid search |
|
||||
| model identity | `(EmbeddingModelId, EmbeddingVersion, usize)` | record fields, `embedding_id` recipe |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
@@ -56,14 +56,14 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d
|
||||
pub struct FastembedEmbedder { /* internal: TextEmbedding instance + model meta */ }
|
||||
|
||||
impl FastembedEmbedder {
|
||||
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
|
||||
}
|
||||
|
||||
impl kb_core::Embedder for FastembedEmbedder {
|
||||
fn model_id(&self) -> kb_core::EmbeddingModelId;
|
||||
fn model_version(&self) -> kb_core::EmbeddingVersion;
|
||||
impl kebab_core::Embedder for FastembedEmbedder {
|
||||
fn model_id(&self) -> kebab_core::EmbeddingModelId;
|
||||
fn model_version(&self) -> kebab_core::EmbeddingVersion;
|
||||
fn dimensions(&self) -> usize;
|
||||
fn embed(&self, inputs: &[kb_core::EmbeddingInput<'_>]) -> anyhow::Result<Vec<Vec<f32>>>;
|
||||
fn embed(&self, inputs: &[kebab_core::EmbeddingInput<'_>]) -> anyhow::Result<Vec<Vec<f32>>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -96,12 +96,12 @@ impl kb_core::Embedder for FastembedEmbedder {
|
||||
| performance | batch of 64 short inputs completes in < 5s on CI host | tmp config (skip on slow CI via `#[ignore]`) |
|
||||
| snapshot | aggregate hash of vectors for 5 known sentences stable across runs | `fixtures/embed/known-sentences.json` |
|
||||
|
||||
All tests under `cargo test -p kb-embed-local`. Mark slow tests `#[ignore]` and run via `cargo test -- --ignored` in dedicated CI lane.
|
||||
All tests under `cargo test -p kebab-embed-local`. Mark slow tests `#[ignore]` and run via `cargo test -- --ignored` in dedicated CI lane.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-embed-local` passes
|
||||
- [ ] `cargo test -p kb-embed-local` passes (excluding `#[ignore]`)
|
||||
- [ ] `cargo check -p kebab-embed-local` passes
|
||||
- [ ] `cargo test -p kebab-embed-local` passes (excluding `#[ignore]`)
|
||||
- [ ] First-run model download works under `data_dir/models/fastembed/`
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §11.3, §6.4, §9
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P3
|
||||
component: kb-store-vector (LanceDB)
|
||||
component: kebab-store-vector (LanceDB)
|
||||
task_id: p3-3
|
||||
title: "LanceDB VectorStore + embedding_records writer"
|
||||
status: completed
|
||||
depends_on: [p3-2, p1-6]
|
||||
unblocks: [p3-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§5.6 embedding_records, §6.3 lancedb table naming, §7.2 VectorStore, §9 versioning]
|
||||
---
|
||||
|
||||
@@ -18,13 +18,13 @@ Implement `VectorStore` over LanceDB (embedded). Stores per-model tables (`chunk
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-4) can compose lexical + vector retrievers without leaking storage details.
|
||||
Closes the loop chunk → vector. Splits cleanly from `kebab-search` so hybrid (p3-4) can compose lexical + vector retrievers without leaking storage details.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-store-sqlite` (only for writing/reading rows in `embedding_records`)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-store-sqlite` (only for writing/reading rows in `embedding_records`)
|
||||
- `lancedb`
|
||||
- `arrow` (and `arrow-array`, `arrow-schema`)
|
||||
- `serde`, `serde_json`
|
||||
@@ -33,16 +33,16 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-embed*` (consumes `Vec<f32>` via input only — no embedding logic here), `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-embed*` (consumes `Vec<f32>` via input only — no embedding logic here), `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `VectorRecord[..]` | `kb_core::VectorRecord` | `kb-app::embed_index` (P3 facade) |
|
||||
| query vector | `&[f32]` | `kb-embed-local` (`Embedder::embed` for query) |
|
||||
| filters | `kb_core::SearchFilters` | `SearchQuery` |
|
||||
| `kb-config::Config.storage.vector_dir` | path | runtime |
|
||||
| `VectorRecord[..]` | `kebab_core::VectorRecord` | `kebab-app::embed_index` (P3 facade) |
|
||||
| query vector | `&[f32]` | `kebab-embed-local` (`Embedder::embed` for query) |
|
||||
| filters | `kebab_core::SearchFilters` | `SearchQuery` |
|
||||
| `kebab-config::Config.storage.vector_dir` | path | runtime |
|
||||
|
||||
## Outputs
|
||||
|
||||
@@ -50,7 +50,7 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-
|
||||
|--------|------|------------|
|
||||
| Lance tables under `vector_dir/chunk_embeddings_<model>_<dim>.lance/` | filesystem | future searches |
|
||||
| `embedding_records` rows | SQLite | reverse lookup, reindex bookkeeping |
|
||||
| `Vec<VectorHit>` | `kb_core::VectorHit` | hybrid retriever (p3-4) |
|
||||
| `Vec<VectorHit>` | `kebab_core::VectorHit` | hybrid retriever (p3-4) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -58,13 +58,13 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-
|
||||
pub struct LanceVectorStore { /* internal: connection + sqlite handle */ }
|
||||
|
||||
impl LanceVectorStore {
|
||||
pub fn new(config: &kb_config::Config, sqlite: std::sync::Arc<kb_store_sqlite::SqliteStore>) -> anyhow::Result<Self>;
|
||||
pub fn new(config: &kebab_config::Config, sqlite: std::sync::Arc<kebab_store_sqlite::SqliteStore>) -> anyhow::Result<Self>;
|
||||
}
|
||||
|
||||
impl kb_core::VectorStore for LanceVectorStore {
|
||||
fn ensure_table(&self, model: &kb_core::EmbeddingModelId, dim: usize) -> anyhow::Result<kb_core::IndexId>;
|
||||
fn upsert(&self, recs: &[kb_core::VectorRecord]) -> anyhow::Result<()>;
|
||||
fn search(&self, query_vec: &[f32], k: usize, filters: &kb_core::SearchFilters) -> anyhow::Result<Vec<kb_core::VectorHit>>;
|
||||
impl kebab_core::VectorStore for LanceVectorStore {
|
||||
fn ensure_table(&self, model: &kebab_core::EmbeddingModelId, dim: usize) -> anyhow::Result<kebab_core::IndexId>;
|
||||
fn upsert(&self, recs: &[kebab_core::VectorRecord]) -> anyhow::Result<()>;
|
||||
fn search(&self, query_vec: &[f32], k: usize, filters: &kebab_core::SearchFilters) -> anyhow::Result<Vec<kebab_core::VectorHit>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -116,12 +116,12 @@ impl kb_core::VectorStore for LanceVectorStore {
|
||||
| determinism | same query vector + same data → same top-k order | tmp data_dir |
|
||||
| snapshot | `Vec<VectorHit>` JSON for fixed corpus stable | `fixtures/vector/run-1.json` |
|
||||
|
||||
All tests under `cargo test -p kb-store-vector`.
|
||||
All tests under `cargo test -p kebab-store-vector`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-store-vector` passes
|
||||
- [ ] `cargo test -p kb-store-vector` passes
|
||||
- [ ] `cargo check -p kebab-store-vector` passes
|
||||
- [ ] `cargo test -p kebab-store-vector` passes
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] `embedding_records` rows align 1:1 with Lance rows after a successful upsert batch
|
||||
- [ ] PR links design §5.6, §6.3, §7.2
|
||||
@@ -130,7 +130,7 @@ All tests under `cargo test -p kb-store-vector`.
|
||||
|
||||
- IVF / PQ index tuning (P+).
|
||||
- Image / multimodal vector tables (P6).
|
||||
- `kb-app` orchestration of indexing jobs (`embed_index` facade method body).
|
||||
- `kebab-app` orchestration of indexing jobs (`embed_index` facade method body).
|
||||
|
||||
## Risks / notes
|
||||
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P3
|
||||
component: kb-search (hybrid)
|
||||
component: kebab-search (hybrid)
|
||||
task_id: p3-4
|
||||
title: "Hybrid Retriever (RRF) over lexical + vector"
|
||||
status: completed
|
||||
depends_on: [p2-2, p3-3]
|
||||
unblocks: [p4-3]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 [search] rrf settings]
|
||||
---
|
||||
|
||||
@@ -22,17 +22,17 @@ Single mediator. Keeps the lexical and vector retrievers focused; only this task
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-store-sqlite` (for `LexicalRetriever`)
|
||||
- `kb-store-vector` (for `LanceVectorStore`)
|
||||
- `kb-embed` (trait only — for query embedding via `Embedder`)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-store-sqlite` (for `LexicalRetriever`)
|
||||
- `kebab-store-vector` (for `LanceVectorStore`)
|
||||
- `kebab-embed` (trait only — for query embedding via `Embedder`)
|
||||
- `tracing`
|
||||
- `thiserror`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`. (`kb-embed-local` is a runtime-injected `dyn Embedder`; this crate must not depend on the concrete adapter directly.)
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`. (`kebab-embed-local` is a runtime-injected `dyn Embedder`; this crate must not depend on the concrete adapter directly.)
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -41,21 +41,21 @@ Single mediator. Keeps the lexical and vector retrievers focused; only this task
|
||||
| `LexicalRetriever` | trait object | constructed elsewhere |
|
||||
| `LanceVectorStore` | trait object | constructed elsewhere |
|
||||
| `Box<dyn Embedder>` | for query embedding | runtime-injected |
|
||||
| `kb-config::Config.search` | `default_k`, `hybrid_fusion`, `rrf_k` | runtime |
|
||||
| `SearchQuery` | `kb_core::SearchQuery` | `kb-app::search` |
|
||||
| `kebab-config::Config.search` | `default_k`, `hybrid_fusion`, `rrf_k` | runtime |
|
||||
| `SearchQuery` | `kebab_core::SearchQuery` | `kebab-app::search` |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<SearchHit>` (with full `RetrievalDetail`) | `kb_core::SearchHit` | `kb-cli` printer, `kb-rag` packer |
|
||||
| `Vec<SearchHit>` (with full `RetrievalDetail`) | `kebab_core::SearchHit` | `kebab-cli` printer, `kebab-rag` packer |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub struct HybridRetriever {
|
||||
lexical: std::sync::Arc<dyn kb_core::Retriever>,
|
||||
vector: std::sync::Arc<dyn kb_core::Retriever>, // wrapper over LanceVectorStore + Embedder
|
||||
lexical: std::sync::Arc<dyn kebab_core::Retriever>,
|
||||
vector: std::sync::Arc<dyn kebab_core::Retriever>, // wrapper over LanceVectorStore + Embedder
|
||||
fusion: FusionPolicy,
|
||||
k: usize,
|
||||
}
|
||||
@@ -64,27 +64,27 @@ pub enum FusionPolicy { Rrf { k_rrf: u32 } }
|
||||
|
||||
impl HybridRetriever {
|
||||
pub fn new(
|
||||
config: &kb_config::Config,
|
||||
lexical: std::sync::Arc<dyn kb_core::Retriever>,
|
||||
vector: std::sync::Arc<dyn kb_core::Retriever>,
|
||||
config: &kebab_config::Config,
|
||||
lexical: std::sync::Arc<dyn kebab_core::Retriever>,
|
||||
vector: std::sync::Arc<dyn kebab_core::Retriever>,
|
||||
) -> Self;
|
||||
}
|
||||
|
||||
impl kb_core::Retriever for HybridRetriever {
|
||||
fn search(&self, query: &kb_core::SearchQuery) -> anyhow::Result<Vec<kb_core::SearchHit>>;
|
||||
fn index_version(&self) -> kb_core::IndexVersion;
|
||||
impl kebab_core::Retriever for HybridRetriever {
|
||||
fn search(&self, query: &kebab_core::SearchQuery) -> anyhow::Result<Vec<kebab_core::SearchHit>>;
|
||||
fn index_version(&self) -> kebab_core::IndexVersion;
|
||||
}
|
||||
|
||||
/// Wrapper that turns a VectorStore + Embedder into a Retriever.
|
||||
pub struct VectorRetriever {
|
||||
store: std::sync::Arc<dyn kb_core::VectorStore>,
|
||||
embed: std::sync::Arc<dyn kb_core::Embedder>,
|
||||
/* heading_path/snippet enrichment hits SQLite via kb-store-sqlite read accessor */
|
||||
store: std::sync::Arc<dyn kebab_core::VectorStore>,
|
||||
embed: std::sync::Arc<dyn kebab_core::Embedder>,
|
||||
/* heading_path/snippet enrichment hits SQLite via kebab-store-sqlite read accessor */
|
||||
}
|
||||
impl VectorRetriever {
|
||||
pub fn new(store: std::sync::Arc<dyn kb_core::VectorStore>, embed: std::sync::Arc<dyn kb_core::Embedder>, sqlite: std::sync::Arc<kb_store_sqlite::SqliteStore>) -> Self;
|
||||
pub fn new(store: std::sync::Arc<dyn kebab_core::VectorStore>, embed: std::sync::Arc<dyn kebab_core::Embedder>, sqlite: std::sync::Arc<kebab_store_sqlite::SqliteStore>) -> Self;
|
||||
}
|
||||
impl kb_core::Retriever for VectorRetriever { /* per §7.2 */ }
|
||||
impl kebab_core::Retriever for VectorRetriever { /* per §7.2 */ }
|
||||
```
|
||||
|
||||
## Behavior contract
|
||||
@@ -124,12 +124,12 @@ impl kb_core::Retriever for VectorRetriever { /* per §7.2 */ }
|
||||
| determinism | identical query twice → byte-identical `Vec<SearchHit>` | tmp DB |
|
||||
| snapshot | hybrid output JSON stable | `fixtures/search/hybrid/run-1.json` |
|
||||
|
||||
All tests under `cargo test -p kb-search hybrid`.
|
||||
All tests under `cargo test -p kebab-search hybrid`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-search` passes
|
||||
- [ ] `cargo test -p kb-search hybrid` passes
|
||||
- [ ] `cargo check -p kebab-search` passes
|
||||
- [ ] `cargo test -p kebab-search hybrid` passes
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.7, §6.4 search, §0 Q3
|
||||
|
||||
|
||||
@@ -1,64 +1,64 @@
|
||||
---
|
||||
phase: P3
|
||||
component: kb-app (facade wiring)
|
||||
component: kebab-app (facade wiring)
|
||||
task_id: p3-5
|
||||
title: "Wire kb-app facade — ingest / search / list / inspect end-to-end"
|
||||
title: "Wire kebab-app facade — ingest / search / list / inspect end-to-end"
|
||||
status: completed
|
||||
depends_on: [p1-6, p2-2, p3-2, p3-3, p3-4]
|
||||
unblocks: [p4-3, p9-1, p9-2, p9-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§7 components, §7.2 traits, §1 UX scenes (ingest, search, list, inspect), §6.4 config, §10 errors]
|
||||
---
|
||||
|
||||
# p3-5 — Wire kb-app facade
|
||||
# p3-5 — Wire kebab-app facade
|
||||
|
||||
## Goal
|
||||
|
||||
Replace the `bail!("not yet wired")` stubs in `kb-app` (currently `ingest`, `search`, `list_docs`, `inspect_doc`, `inspect_chunk`) with real bodies that compose the libraries shipped in P1–P3. After this task, `kb index` actually walks a workspace and persists chunks, and `kb search --mode {lexical,vector,hybrid}` returns real `SearchHit`s. `kb-app::ask` stays stubbed (P4-3 owns it).
|
||||
Replace the `bail!("not yet wired")` stubs in `kebab-app` (currently `ingest`, `search`, `list_docs`, `inspect_doc`, `inspect_chunk`) with real bodies that compose the libraries shipped in P1–P3. After this task, `kebab index` actually walks a workspace and persists chunks, and `kebab search --mode {lexical,vector,hybrid}` returns real `SearchHit`s. `kebab-app::ask` stays stubbed (P4-3 owns it).
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
P1–P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where `kb-app::ask` will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; `ask` wiring lives in P4-3.
|
||||
P1–P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where `kebab-app::ask` will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; `ask` wiring lives in P4-3.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
`kb-app` may depend on:
|
||||
`kebab-app` may depend on:
|
||||
|
||||
- `kb-core`, `kb-config` (already)
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-sqlite` (P1)
|
||||
- `kb-search` (P2-2, P3-4)
|
||||
- `kb-store-vector`, `kb-embed`, `kb-embed-local` (P3-2, P3-3, P3-4)
|
||||
- `kebab-core`, `kebab-config` (already)
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-sqlite` (P1)
|
||||
- `kebab-search` (P2-2, P3-4)
|
||||
- `kebab-store-vector`, `kebab-embed`, `kebab-embed-local` (P3-2, P3-3, P3-4)
|
||||
- `tracing`, `anyhow`, `serde`, `serde_json`, `time`, `dirs`, `toml` (existing)
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-llm*`, `kb-rag` (P4 ownership — `ask` stays stubbed)
|
||||
- `kb-tui`, `kb-desktop` (P9 — those *consume* `kb-app`, not the other way)
|
||||
- `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio` (P6/P7/P8)
|
||||
- network HTTP libs (model download is `kb-embed-local`'s responsibility via fastembed)
|
||||
- `kebab-llm*`, `kebab-rag` (P4 ownership — `ask` stays stubbed)
|
||||
- `kebab-tui`, `kebab-desktop` (P9 — those *consume* `kebab-app`, not the other way)
|
||||
- `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio` (P6/P7/P8)
|
||||
- network HTTP libs (model download is `kebab-embed-local`'s responsibility via fastembed)
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `kb-config::Config` | `kb_config::Config` | loaded once at process start; threaded through every facade call |
|
||||
| `SourceScope` | `kb_core::SourceScope` | CLI |
|
||||
| `SearchQuery` | `kb_core::SearchQuery` | CLI |
|
||||
| `DocFilter` | `kb_core::DocFilter` | CLI |
|
||||
| `DocumentId` / `ChunkId` | `kb_core::ids::*` | CLI |
|
||||
| `kebab-config::Config` | `kebab_config::Config` | loaded once at process start; threaded through every facade call |
|
||||
| `SourceScope` | `kebab_core::SourceScope` | CLI |
|
||||
| `SearchQuery` | `kebab_core::SearchQuery` | CLI |
|
||||
| `DocFilter` | `kebab_core::DocFilter` | CLI |
|
||||
| `DocumentId` / `ChunkId` | `kebab_core::ids::*` | CLI |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `IngestReport` | `kb_core::IngestReport` | `kb-cli` (`ingest_report.v1`), `kb-eval` (P5) |
|
||||
| `Vec<SearchHit>` | `kb_core::SearchHit` | `kb-cli` (`search_hit.v1`), `kb-rag` (P4-3) |
|
||||
| `Vec<DocSummary>` | `kb_core::DocSummary` | `kb-cli` (`doc_summary.v1`) |
|
||||
| `CanonicalDocument` / `Chunk` | `kb_core::*` | `kb-cli` (`chunk_inspection.v1`) |
|
||||
| `IngestReport` | `kebab_core::IngestReport` | `kebab-cli` (`ingest_report.v1`), `kebab-eval` (P5) |
|
||||
| `Vec<SearchHit>` | `kebab_core::SearchHit` | `kebab-cli` (`search_hit.v1`), `kebab-rag` (P4-3) |
|
||||
| `Vec<DocSummary>` | `kebab_core::DocSummary` | `kebab-cli` (`doc_summary.v1`) |
|
||||
| `CanonicalDocument` / `Chunk` | `kebab_core::*` | `kebab-cli` (`chunk_inspection.v1`) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
The signatures already live on `kb-app`. This task only swaps the bodies. Frozen surface:
|
||||
The signatures already live on `kebab-app`. This task only swaps the bodies. Frozen surface:
|
||||
|
||||
```rust
|
||||
pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result<IngestReport>;
|
||||
@@ -69,7 +69,7 @@ pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHi
|
||||
// `ask` and `init_workspace` / `doctor` stay as-is.
|
||||
```
|
||||
|
||||
If a constructor helper is needed (e.g., a single `App::open(&Config)` that opens `SqliteStore`, `LanceVectorStore`, embedder, retrievers once), it MAY be added pub-or-pub(crate) to `kb-app`. Keep all newly-added types crate-private unless explicitly required by `kb-cli`.
|
||||
If a constructor helper is needed (e.g., a single `App::open(&Config)` that opens `SqliteStore`, `LanceVectorStore`, embedder, retrievers once), it MAY be added pub-or-pub(crate) to `kebab-app`. Keep all newly-added types crate-private unless explicitly required by `kebab-cli`.
|
||||
|
||||
## Behavior contract
|
||||
|
||||
@@ -77,8 +77,8 @@ If a constructor helper is needed (e.g., a single `App::open(&Config)` that open
|
||||
|
||||
Pipeline per design §1.2:
|
||||
|
||||
1. Resolve `scope` → set of files under `config.workspace.root` filtered by `config.workspace.include`/`exclude`. Use `kb-source-fs::FsSourceConnector` and pass each `RawAsset` + bytes through.
|
||||
2. For each markdown asset: `kb_parse_md::frontmatter::parse_frontmatter` + `kb_parse_md::blocks::parse_blocks` → `kb_normalize::build_canonical_document` → `kb_chunk::MdHeadingV1Chunker::chunk`.
|
||||
1. Resolve `scope` → set of files under `config.workspace.root` filtered by `config.workspace.include`/`exclude`. Use `kebab-source-fs::FsSourceConnector` and pass each `RawAsset` + bytes through.
|
||||
2. For each markdown asset: `kebab_parse_md::frontmatter::parse_frontmatter` + `kebab_parse_md::blocks::parse_blocks` → `kebab_normalize::build_canonical_document` → `kebab_chunk::MdHeadingV1Chunker::chunk`.
|
||||
3. SQLite writes per design §5.8 (one ingest = one transaction): `put_asset_with_bytes` → `put_document` → `put_blocks` → `put_chunks`.
|
||||
4. If `config.models.embedding.provider == "fastembed"`: build `FastembedEmbedder` once at the top of the run, embed every chunk's `text` as `EmbeddingKind::Document`, batch by `config.models.embedding.batch_size`, then `LanceVectorStore::ensure_table` + `upsert`. Skip vector indexing if provider is `"none"` or `embedding.dimensions == 0` (config opt-out path).
|
||||
5. Record an `ingest_runs` row via `JobRepo` with the aggregate counts. `summary_only=true` writes `items_json=NULL`.
|
||||
@@ -90,9 +90,9 @@ Errors: any per-file parse failure should be recorded as a `Warning` in `Provena
|
||||
|
||||
Per `query.mode`:
|
||||
|
||||
- `Lexical` → `kb_search::LexicalRetriever::search`. Takes a read-only borrow of the SQLite connection.
|
||||
- `Vector` → `kb_search::VectorRetriever::search`. Requires `Embedder` (built from config) + `VectorStore` (LanceDB).
|
||||
- `Hybrid` → `kb_search::HybridRetriever::search` composing the above two.
|
||||
- `Lexical` → `kebab_search::LexicalRetriever::search`. Takes a read-only borrow of the SQLite connection.
|
||||
- `Vector` → `kebab_search::VectorRetriever::search`. Requires `Embedder` (built from config) + `VectorStore` (LanceDB).
|
||||
- `Hybrid` → `kebab_search::HybridRetriever::search` composing the above two.
|
||||
|
||||
Each call constructs (or reuses) the retrievers from a single `App` context that holds `Arc<SqliteStore>`, `Arc<LanceVectorStore>`, `Arc<dyn Embedder>` so cold-start cost is paid once per process. Document the lifetime model: a CLI invocation paths through `App::open(&Config)` once, runs the requested op, then drops everything on exit. The TUI (P9) holds the `App` for the session.
|
||||
|
||||
@@ -100,17 +100,17 @@ Each call constructs (or reuses) the retrievers from a single `App` context that
|
||||
|
||||
### `list_docs(filter)` / `inspect_doc(id)` / `inspect_chunk(id)`
|
||||
|
||||
Direct delegation to `kb_core::DocumentStore` trait methods on `SqliteStore`. No vector / embedding involvement.
|
||||
Direct delegation to `kebab_core::DocumentStore` trait methods on `SqliteStore`. No vector / embedding involvement.
|
||||
|
||||
### Lifecycle
|
||||
|
||||
Define an internal `pub(crate) struct App { config: Arc<Config>, sqlite: Arc<SqliteStore>, vector: Option<Arc<LanceVectorStore>>, embedder: Option<Arc<dyn Embedder + Send + Sync>>, /* retrievers built per call */ }` with a `pub(crate) fn open(config: &Config) -> Result<Self>`. Each public `kb-app::*` function builds an `App` (or accepts one in tests) and uses it. The free functions stay the public API so `kb-cli` and `kb-tui` don't need to refactor.
|
||||
Define an internal `pub(crate) struct App { config: Arc<Config>, sqlite: Arc<SqliteStore>, vector: Option<Arc<LanceVectorStore>>, embedder: Option<Arc<dyn Embedder + Send + Sync>>, /* retrievers built per call */ }` with a `pub(crate) fn open(config: &Config) -> Result<Self>`. Each public `kebab-app::*` function builds an `App` (or accepts one in tests) and uses it. The free functions stay the public API so `kebab-cli` and `kebab-tui` don't need to refactor.
|
||||
|
||||
`vector` and `embedder` are `Option` because the spec allows running KB without embeddings (lexical-only mode for resource-constrained boxes — `provider == "none"`). When absent, `search` with `Vector` / `Hybrid` modes returns `anyhow::Error` with a hint to enable embeddings; `ingest` skips the vector step.
|
||||
|
||||
### Versioning
|
||||
|
||||
- `parser_version` from `kb-parse-md` constant.
|
||||
- `parser_version` from `kebab-parse-md` constant.
|
||||
- `chunker_version` from `MdHeadingV1Chunker::chunker_version()`.
|
||||
- `embedding_model` / `embedding_version` from the embedder.
|
||||
- `index_version` from the retriever (or `Lexical`/`Vector` IndexId).
|
||||
@@ -119,15 +119,15 @@ All wired through to the persisted records and the wire output.
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Writes: `data_dir/kb.sqlite` (assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs), `data_dir/assets/<aa>/<id>` (when copied), `data_dir/lancedb/chunk_embeddings_<model>_<dim>.lance/` (when embeddings on), `data_dir/models/fastembed/` (model cache, first run only).
|
||||
- Writes: `data_dir/kebab.sqlite` (assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs), `data_dir/assets/<aa>/<id>` (when copied), `data_dir/lancedb/chunk_embeddings_<model>_<dim>.lance/` (when embeddings on), `data_dir/models/fastembed/` (model cache, first run only).
|
||||
- Reads on subsequent calls: same DB.
|
||||
- Wire JSON conforms to `ingest_report.v1`, `search_hit.v1`, `doc_summary.v1`, `chunk_inspection.v1` — `kb-cli` already has the wrappers.
|
||||
- Wire JSON conforms to `ingest_report.v1`, `search_hit.v1`, `doc_summary.v1`, `chunk_inspection.v1` — `kebab-cli` already has the wrappers.
|
||||
|
||||
## Test plan
|
||||
|
||||
| kind | description | fixture / data |
|
||||
|------|-------------|----------------|
|
||||
| integration | `ingest` over a 3-file fixture workspace produces non-empty `IngestReport` and writes the expected SQLite rows + Lance rows | `crates/kb-app/tests/fixtures/workspace/`, tmp `data_dir` |
|
||||
| integration | `ingest` over a 3-file fixture workspace produces non-empty `IngestReport` and writes the expected SQLite rows + Lance rows | `crates/kebab-app/tests/fixtures/workspace/`, tmp `data_dir` |
|
||||
| integration | `ingest` with `provider="none"` skips Lance and produces an `IngestReport` with `embeddings_indexed=0` | tmp config |
|
||||
| integration | `ingest` is idempotent: re-running on the same workspace updates `documents.updated_at` and bumps `doc_version` without duplicating rows | tmp `data_dir` |
|
||||
| integration | `search` with `mode=Lexical` returns hits whose `embedding_model` is `None` | tmp `data_dir` |
|
||||
@@ -136,36 +136,36 @@ All wired through to the persisted records and the wire output.
|
||||
| integration | `search` with `mode=Vector` and `provider="none"` returns a clear error | tmp config |
|
||||
| integration | `list_docs` with `tags_any=["rust"]` filters correctly | tmp `data_dir` |
|
||||
| integration | `inspect_doc` round-trips a document; `inspect_chunk` round-trips a chunk | tmp `data_dir` |
|
||||
| smoke | end-to-end CLI smoke: `kb index` + `kb search --mode hybrid "..."` against the fixture workspace using `cargo run` (manual / `assert_cmd`) | fixture |
|
||||
| smoke | end-to-end CLI smoke: `kebab index` + `kebab search --mode hybrid "..."` against the fixture workspace using `cargo run` (manual / `assert_cmd`) | fixture |
|
||||
|
||||
`#[ignore]` AVX-gated tests follow the P3-3/P3-4 pattern: `require_avx_or_panic()` at the top of each LanceDB-touching body. Default `cargo test -p kb-app` runs the lexical-only and SQLite-only paths.
|
||||
`#[ignore]` AVX-gated tests follow the P3-3/P3-4 pattern: `require_avx_or_panic()` at the top of each LanceDB-touching body. Default `cargo test -p kebab-app` runs the lexical-only and SQLite-only paths.
|
||||
|
||||
All tests under `cargo test -p kb-app`. CLI smoke optional via `assert_cmd` if it doesn't add a heavyweight dep tree.
|
||||
All tests under `cargo test -p kebab-app`. CLI smoke optional via `assert_cmd` if it doesn't add a heavyweight dep tree.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-app` passes.
|
||||
- [ ] `cargo test -p kb-app` passes (default lane).
|
||||
- [ ] `cargo test -p kb-app -- --ignored` passes on AVX hardware.
|
||||
- [ ] `cargo run -p kb-cli -- index` succeeds against a non-empty fixture workspace.
|
||||
- [ ] `cargo run -p kb-cli -- search --mode hybrid "<term>"` returns real hits + citations.
|
||||
- [ ] `cargo run -p kb-cli -- list` returns the indexed documents.
|
||||
- [ ] `cargo check -p kebab-app` passes.
|
||||
- [ ] `cargo test -p kebab-app` passes (default lane).
|
||||
- [ ] `cargo test -p kebab-app -- --ignored` passes on AVX hardware.
|
||||
- [ ] `cargo run -p kebab-cli -- index` succeeds against a non-empty fixture workspace.
|
||||
- [ ] `cargo run -p kebab-cli -- search --mode hybrid "<term>"` returns real hits + citations.
|
||||
- [ ] `cargo run -p kebab-cli -- list` returns the indexed documents.
|
||||
- [ ] `cargo clippy --workspace --all-targets -- -D warnings` clean.
|
||||
- [ ] No imports outside Allowed dependencies.
|
||||
- [ ] PR links design §1.2, §1.5, §1.6, §7.
|
||||
|
||||
## Out of scope
|
||||
|
||||
- `kb-app::ask` body — P4-3 (RAG pipeline).
|
||||
- `kb index --rebuild-fts` CLI option — wire `kb_store_sqlite::rebuild_chunks_fts` only when needed by a downstream task.
|
||||
- `kb index --resume` checkpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer.
|
||||
- `kebab-app::ask` body — P4-3 (RAG pipeline).
|
||||
- `kebab index --rebuild-fts` CLI option — wire `kebab_store_sqlite::rebuild_chunks_fts` only when needed by a downstream task.
|
||||
- `kebab index --resume` checkpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer.
|
||||
- `--watch` mode — P+.
|
||||
- TUI / desktop integration — P9 consumes the wired facade.
|
||||
|
||||
## Risks / notes
|
||||
|
||||
- Cold-start cost: first `kb index` run downloads the fastembed model (~470MB) and warms the ONNX session. Surface via `tracing::info!` (already wired in P3-2).
|
||||
- **Post-merge fix (2026-05)**: original P3-5 implementation left every `kb-cli` subcommand calling the bare `kb_app::*` free functions, which silently re-loaded `Config::load(None)` (XDG default) and ignored the user's `--config <path>` flag. CLI now goes through `kb_app::*_with_config(cfg, ...)` everywhere. The `#[doc(hidden)] pub fn *_with_config` companions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See [HOTFIXES.md](../HOTFIXES.md) for details.
|
||||
- Cold-start cost: first `kebab index` run downloads the fastembed model (~470MB) and warms the ONNX session. Surface via `tracing::info!` (already wired in P3-2).
|
||||
- **Post-merge fix (2026-05)**: original P3-5 implementation left every `kebab-cli` subcommand calling the bare `kebab_app::*` free functions, which silently re-loaded `Config::load(None)` (XDG default) and ignored the user's `--config <path>` flag. CLI now goes through `kebab_app::*_with_config(cfg, ...)` everywhere. The `#[doc(hidden)] pub fn *_with_config` companions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See [HOTFIXES.md](../HOTFIXES.md) for details.
|
||||
- The `App` lifecycle struct is internal but its construction is the natural seam for adding caching / connection pooling later. Keep it `pub(crate)` so future refactors don't break the CLI.
|
||||
- Mismatched `index_version` across stored records and the live retriever should fail loud at `App::open` (not at first search). Reuse the `tracing::warn!` from `HybridRetriever::new` (P3-4).
|
||||
- The fastembed adapter holds a `tokio::runtime` (P3-3); `App` must be constructed from a synchronous context. Document on `App::open`.
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P4
|
||||
component: kb-llm (trait crate)
|
||||
component: kebab-llm (trait crate)
|
||||
task_id: p4-1
|
||||
title: "LanguageModel trait + GenerateRequest/TokenChunk"
|
||||
status: completed
|
||||
depends_on: [p0-1]
|
||||
unblocks: [p4-2, p4-3]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§7.1 GenerateRequest/TokenChunk, §7.2 LanguageModel, §0 Q5 streaming, §3.8 ModelRef]
|
||||
---
|
||||
|
||||
@@ -14,16 +14,16 @@ contract_sections: [§7.1 GenerateRequest/TokenChunk, §7.2 LanguageModel, §0 Q
|
||||
|
||||
## Goal
|
||||
|
||||
Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper types (`GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage`, `ModelRef`), plus a `MockLanguageModel` for downstream tests.
|
||||
Provide the `kebab-llm` crate that re-exports the `LanguageModel` trait and helper types (`GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage`, `ModelRef`), plus a `MockLanguageModel` for downstream tests.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
`kb-rag` (p4-3) consumes a `LanguageModel` trait object. Owning the trait + a deterministic mock here lets RAG tests run with no Ollama dependency. Real adapters (Ollama, llama.cpp, candle) live in p4-2 and beyond.
|
||||
`kebab-rag` (p4-3) consumes a `LanguageModel` trait object. Owning the trait + a deterministic mock here lets RAG tests run with no Ollama dependency. Real adapters (Ollama, llama.cpp, candle) live in p4-2 and beyond.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `serde`
|
||||
- `thiserror`
|
||||
- `tracing`
|
||||
@@ -31,13 +31,13 @@ Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `reqwest`, `ureq`, `tokio`, `whisper-rs`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `reqwest`, `ureq`, `tokio`, `whisper-rs`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `GenerateRequest` | `kb_core::GenerateRequest` | RAG pipeline |
|
||||
| `GenerateRequest` | `kebab_core::GenerateRequest` | RAG pipeline |
|
||||
| concrete adapter at runtime | `dyn LanguageModel` | p4-2+ |
|
||||
|
||||
## Outputs
|
||||
@@ -45,12 +45,12 @@ Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| streaming `TokenChunk` iterator | `Box<dyn Iterator<Item=anyhow::Result<TokenChunk>> + Send>` | RAG pipeline |
|
||||
| `ModelRef` identity | `kb_core::ModelRef` | Answer.model |
|
||||
| `ModelRef` identity | `kebab_core::ModelRef` | Answer.model |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub use kb_core::{LanguageModel, GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef};
|
||||
pub use kebab_core::{LanguageModel, GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef};
|
||||
|
||||
/// Test-only deterministic mock. Compiled only when `mock` feature is on.
|
||||
#[cfg(feature = "mock")]
|
||||
@@ -59,12 +59,12 @@ pub struct MockLanguageModel {
|
||||
pub provider: String,
|
||||
pub context_tokens: usize,
|
||||
pub canned_response: String, // emitted token-by-token
|
||||
pub canned_finish: kb_core::FinishReason,
|
||||
pub canned_usage: kb_core::TokenUsage,
|
||||
pub canned_finish: kebab_core::FinishReason,
|
||||
pub canned_usage: kebab_core::TokenUsage,
|
||||
}
|
||||
|
||||
#[cfg(feature = "mock")]
|
||||
impl kb_core::LanguageModel for MockLanguageModel { /* per §7.2 */ }
|
||||
impl kebab_core::LanguageModel for MockLanguageModel { /* per §7.2 */ }
|
||||
```
|
||||
|
||||
## Behavior contract
|
||||
@@ -89,12 +89,12 @@ impl kb_core::LanguageModel for MockLanguageModel { /* per §7.2 */ }
|
||||
| unit | concatenation of streamed `TokenChunk::Token` equals canned text (truncated by stop strings) | inline |
|
||||
| contract | `model_ref()` populates `provider` and leaves `dimensions = None` | inline |
|
||||
|
||||
All tests under `cargo test -p kb-llm`.
|
||||
All tests under `cargo test -p kebab-llm`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-llm` passes
|
||||
- [ ] `cargo test -p kb-llm` passes
|
||||
- [ ] `cargo check -p kebab-llm` passes
|
||||
- [ ] `cargo test -p kebab-llm` passes
|
||||
- [ ] No HTTP / async runtime deps present
|
||||
- [ ] PR links design §7.2 LanguageModel, §0 Q5
|
||||
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P4
|
||||
component: kb-llm-local (Ollama adapter)
|
||||
component: kebab-llm-local (Ollama adapter)
|
||||
task_id: p4-2
|
||||
title: "OllamaLanguageModel — streaming /api/generate"
|
||||
status: completed
|
||||
depends_on: [p4-1]
|
||||
unblocks: [p4-3]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [design §7.2 LanguageModel, report §11.2 Ollama, design §6.4 [models.llm], design §0 Q5 streaming, design §10 errors]
|
||||
---
|
||||
|
||||
@@ -18,13 +18,13 @@ Implement `OllamaLanguageModel` against Ollama's local HTTP API (`POST /api/gene
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so swapping providers stays config-only.
|
||||
First real LM. Required for `kebab ask` to function. Isolated from RAG pipeline so swapping providers stays config-only.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-llm`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-llm`
|
||||
- `reqwest = { version = "0.12", default-features = false, features = ["blocking", "json", "rustls-tls"] }`
|
||||
- `serde`, `serde_json`
|
||||
- `tracing`
|
||||
@@ -32,21 +32,21 @@ First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `tokio`, `async-std`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-tui`, `kb-desktop`. (Streaming uses `reqwest::blocking::Response::bytes_stream` via line-delimited JSON; no async runtime needed.)
|
||||
- `tokio`, `async-std`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-tui`, `kebab-desktop`. (Streaming uses `reqwest::blocking::Response::bytes_stream` via line-delimited JSON; no async runtime needed.)
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `kb-config::Config.models.llm` | endpoint, model, context, temperature, seed | runtime |
|
||||
| `GenerateRequest` | `kb_core::GenerateRequest` | RAG pipeline |
|
||||
| `kebab-config::Config.models.llm` | endpoint, model, context, temperature, seed | runtime |
|
||||
| `GenerateRequest` | `kebab_core::GenerateRequest` | RAG pipeline |
|
||||
| Ollama HTTP server (local) | `http://127.0.0.1:11434` | external process |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| streaming `TokenChunk` iterator | per §7.2 | `kb-rag` |
|
||||
| streaming `TokenChunk` iterator | per §7.2 | `kebab-rag` |
|
||||
| `ModelRef` | `{ id, provider="ollama", dimensions=None }` | `Answer.model` |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
@@ -55,14 +55,14 @@ First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so
|
||||
pub struct OllamaLanguageModel { /* internal: reqwest::blocking::Client + config */ }
|
||||
|
||||
impl OllamaLanguageModel {
|
||||
pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
|
||||
}
|
||||
|
||||
impl kb_core::LanguageModel for OllamaLanguageModel {
|
||||
fn model_ref(&self) -> kb_core::ModelRef;
|
||||
impl kebab_core::LanguageModel for OllamaLanguageModel {
|
||||
fn model_ref(&self) -> kebab_core::ModelRef;
|
||||
fn context_tokens(&self) -> usize;
|
||||
fn generate_stream(&self, req: kb_core::GenerateRequest)
|
||||
-> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<kb_core::TokenChunk>> + Send>>;
|
||||
fn generate_stream(&self, req: kebab_core::GenerateRequest)
|
||||
-> anyhow::Result<Box<dyn Iterator<Item = anyhow::Result<kebab_core::TokenChunk>> + Send>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -93,7 +93,7 @@ impl kb_core::LanguageModel for OllamaLanguageModel {
|
||||
- UTF-8 boundary: buffer incomplete byte sequences across stream lines before emitting `TokenChunk::Token`.
|
||||
- Determinism: with `temperature=0` and fixed `seed`, Ollama's output is reproducible (modulo nondeterminism in the model itself); tests that verify determinism use a fixed seed and may rely on aggregate hash with tolerance, NOT byte equality.
|
||||
- `model_ref().provider = "ollama"`, `dimensions = None`.
|
||||
- Reachability check: `OllamaLanguageModel::new` does NOT eagerly hit the network; first failure surfaces on `generate_stream`. Use `kb doctor` (separate task) to probe.
|
||||
- Reachability check: `OllamaLanguageModel::new` does NOT eagerly hit the network; first failure surfaces on `generate_stream`. Use `kebab doctor` (separate task) to probe.
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
@@ -112,12 +112,12 @@ impl kb_core::LanguageModel for OllamaLanguageModel {
|
||||
| determinism | identical request + temperature=0 + seed=0 produces identical token stream against mock | mocked HTTP |
|
||||
| `#[ignore]` integration | real Ollama on `localhost:11434` with `qwen2.5:14b-instruct` produces non-empty output | requires user opt-in |
|
||||
|
||||
All non-ignored tests under `cargo test -p kb-llm-local`. Real-LM integration runs via `cargo test -p kb-llm-local -- --ignored`.
|
||||
All non-ignored tests under `cargo test -p kebab-llm-local`. Real-LM integration runs via `cargo test -p kebab-llm-local -- --ignored`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-llm-local` passes
|
||||
- [ ] `cargo test -p kb-llm-local` passes (mocked tests; real LM behind `#[ignore]`)
|
||||
- [ ] `cargo check -p kebab-llm-local` passes
|
||||
- [ ] `cargo test -p kebab-llm-local` passes (mocked tests; real LM behind `#[ignore]`)
|
||||
- [ ] No async runtime present (uses `reqwest::blocking`)
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §11.2, §0 Q5, §10
|
||||
@@ -125,7 +125,7 @@ All non-ignored tests under `cargo test -p kb-llm-local`. Real-LM integration ru
|
||||
## Out of scope
|
||||
|
||||
- llama.cpp / candle adapters (P+).
|
||||
- Embedding via Ollama's `/api/embed` endpoint (alternate adapter inside `kb-embed-local` if requested later).
|
||||
- Embedding via Ollama's `/api/embed` endpoint (alternate adapter inside `kebab-embed-local` if requested later).
|
||||
- Cancellation / abort tokens (P+).
|
||||
- Connection pooling tuning (default `reqwest::blocking` is sufficient for single-user CLI).
|
||||
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P4
|
||||
component: kb-rag
|
||||
component: kebab-rag
|
||||
task_id: p4-3
|
||||
title: "RAG pipeline: retrieve → gate → pack → generate → cite-validate"
|
||||
status: completed
|
||||
depends_on: [p3-4, p4-2]
|
||||
unblocks: [p5-1]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§0 Q4 refusal (two-layer), §0 Q7 footer, §1.1–1.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 [rag], §10 errors]
|
||||
---
|
||||
|
||||
@@ -22,11 +22,11 @@ This is the user-facing payoff. Splitting it further would couple too many inter
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-search` (Retriever trait object)
|
||||
- `kb-llm` (LanguageModel trait object)
|
||||
- `kb-store-sqlite` (read chunk full text/section + write `answers` row)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-search` (Retriever trait object)
|
||||
- `kebab-llm` (LanguageModel trait object)
|
||||
- `kebab-store-sqlite` (read chunk full text/section + write `answers` row)
|
||||
- `serde`, `serde_json`
|
||||
- `regex` (for citation marker extraction)
|
||||
- `time`
|
||||
@@ -35,51 +35,51 @@ This is the user-facing payoff. Splitting it further would couple too many inter
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector` (only via Retriever trait), `kb-embed*` (only via Retriever), `kb-llm-local` (only via LanguageModel trait), `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector` (only via Retriever trait), `kebab-embed*` (only via Retriever), `kebab-llm-local` (only via LanguageModel trait), `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `query: &str` | text | `kb-app::ask` |
|
||||
| `query: &str` | text | `kebab-app::ask` |
|
||||
| `AskOpts` | k, explain, mode, temperature, seed | CLI |
|
||||
| `dyn Retriever` | hybrid retriever from p3-4 | runtime injection |
|
||||
| `dyn LanguageModel` | from p4-2 (or mock) | runtime injection |
|
||||
| `dyn DocumentStore` | for chunk full-text fetch | from p1-6 |
|
||||
| `kb-config::Config.rag` | `prompt_template_version`, `score_gate`, `max_context_tokens` | runtime |
|
||||
| `kebab-config::Config.rag` | `prompt_template_version`, `score_gate`, `max_context_tokens` | runtime |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Answer` | `kb_core::Answer` | `kb-cli` printer, `answers` table |
|
||||
| `Answer` | `kebab_core::Answer` | `kebab-cli` printer, `answers` table |
|
||||
| `answers` table row | SQLite | history, eval |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub struct RagPipeline {
|
||||
retriever: std::sync::Arc<dyn kb_core::Retriever>,
|
||||
llm: std::sync::Arc<dyn kb_core::LanguageModel>,
|
||||
docs: std::sync::Arc<kb_store_sqlite::SqliteStore>,
|
||||
config: kb_config::Config,
|
||||
retriever: std::sync::Arc<dyn kebab_core::Retriever>,
|
||||
llm: std::sync::Arc<dyn kebab_core::LanguageModel>,
|
||||
docs: std::sync::Arc<kebab_store_sqlite::SqliteStore>,
|
||||
config: kebab_config::Config,
|
||||
}
|
||||
|
||||
impl RagPipeline {
|
||||
pub fn new(
|
||||
config: kb_config::Config,
|
||||
retriever: std::sync::Arc<dyn kb_core::Retriever>,
|
||||
llm: std::sync::Arc<dyn kb_core::LanguageModel>,
|
||||
docs: std::sync::Arc<kb_store_sqlite::SqliteStore>,
|
||||
config: kebab_config::Config,
|
||||
retriever: std::sync::Arc<dyn kebab_core::Retriever>,
|
||||
llm: std::sync::Arc<dyn kebab_core::LanguageModel>,
|
||||
docs: std::sync::Arc<kebab_store_sqlite::SqliteStore>,
|
||||
) -> Self;
|
||||
|
||||
pub fn ask(&self, query: &str, opts: AskOpts) -> anyhow::Result<kb_core::Answer>;
|
||||
pub fn ask(&self, query: &str, opts: AskOpts) -> anyhow::Result<kebab_core::Answer>;
|
||||
}
|
||||
|
||||
pub struct AskOpts {
|
||||
pub k: usize,
|
||||
pub explain: bool,
|
||||
pub mode: kb_core::SearchMode,
|
||||
pub mode: kebab_core::SearchMode,
|
||||
pub temperature: Option<f32>,
|
||||
pub seed: Option<u64>,
|
||||
pub stream_sink: Option<std::sync::mpsc::Sender<String>>, // tty/UI token streaming
|
||||
@@ -158,12 +158,12 @@ pub struct AskOpts {
|
||||
| determinism | identical inputs + temperature=0 + seed=0 → identical Answer (snapshot) | mock |
|
||||
| snapshot | `Answer` JSON for fixed query stable | `fixtures/rag/run-1.json` |
|
||||
|
||||
All tests under `cargo test -p kb-rag` with no real Ollama (mock LM only).
|
||||
All tests under `cargo test -p kebab-rag` with no real Ollama (mock LM only).
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-rag` passes
|
||||
- [ ] `cargo test -p kb-rag` passes
|
||||
- [ ] `cargo check -p kebab-rag` passes
|
||||
- [ ] `cargo test -p kebab-rag` passes
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] All paths write an `answers` row
|
||||
- [ ] Output JSON conforms to `answer.v1`
|
||||
@@ -178,9 +178,9 @@ All tests under `cargo test -p kb-rag` with no real Ollama (mock LM only).
|
||||
|
||||
## Risks / notes
|
||||
|
||||
- Citation regex is STRICT `\[#(\d{1,3})\]` only. Models that emit `[1]`/`[ #1 ]`/`[foo]` are treated as no-marker → refusal. This is intentional: a noisy citation grammar lets prose `[1]` or `vec![1]` slip through as false positives, which corrupts both `grounded` and `kb eval` `citation_coverage`. The prompt template (`rag-v1`) explicitly instructs `[#번호]`.
|
||||
- **Post-merge fix (2026-05)**: kb-cli's `Cmd::Ask` arm originally called bare `kb_app::ask(query, opts)`, ignoring `--config <path>` and silently using XDG defaults (manifested as wrong model id / score_gate / data_dir surfacing in `Answer.retrieval`). Fixed by routing through `kb_app::ask_with_config(cfg, query, opts)`. See [HOTFIXES.md](../HOTFIXES.md).
|
||||
- **Post-merge fix (2026-05)**: `config.rag.score_gate` default `0.05` was incompatible with hybrid RRF `fusion_score` (raw range `(0, 2/k_rrf]` ≈ `≤ 0.033` at default k_rrf=60), refusing every hybrid `kb ask`. Closed by normalizing RRF in p3-4 to `[0, 1]`. See p3-4 spec + HOTFIXES.md.
|
||||
- Citation regex is STRICT `\[#(\d{1,3})\]` only. Models that emit `[1]`/`[ #1 ]`/`[foo]` are treated as no-marker → refusal. This is intentional: a noisy citation grammar lets prose `[1]` or `vec![1]` slip through as false positives, which corrupts both `grounded` and `kebab eval` `citation_coverage`. The prompt template (`rag-v1`) explicitly instructs `[#번호]`.
|
||||
- **Post-merge fix (2026-05)**: kebab-cli's `Cmd::Ask` arm originally called bare `kebab_app::ask(query, opts)`, ignoring `--config <path>` and silently using XDG defaults (manifested as wrong model id / score_gate / data_dir surfacing in `Answer.retrieval`). Fixed by routing through `kebab_app::ask_with_config(cfg, query, opts)`. See [HOTFIXES.md](../HOTFIXES.md).
|
||||
- **Post-merge fix (2026-05)**: `config.rag.score_gate` default `0.05` was incompatible with hybrid RRF `fusion_score` (raw range `(0, 2/k_rrf]` ≈ `≤ 0.033` at default k_rrf=60), refusing every hybrid `kebab ask`. Closed by normalizing RRF in p3-4 to `[0, 1]`. See p3-4 spec + HOTFIXES.md.
|
||||
- `stream_sink` channel: pipeline `send`s tokens; if the receiver is dropped (caller cancelled), `SendError` is silently swallowed and generation continues to completion (so the `Answer` row still gets persisted). Pipeline does NOT panic on a dead sink.
|
||||
- `temperature=0` does not fully eliminate stochasticity in some quantized Ollama models; document this and rely on `must_contain` rule-based metrics in P5 instead of exact match.
|
||||
- Prompt-injection defense lives entirely in the system prompt; do NOT mutate `[근거]` text. If chunk text contains `<|system|>` or similar tokens, do not strip them — they are inert when wrapped.
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P5
|
||||
component: kb-eval (runner)
|
||||
component: kebab-eval (runner)
|
||||
task_id: p5-1
|
||||
title: "Golden query fixture loader + per-query runner"
|
||||
status: completed
|
||||
depends_on: [p4-3]
|
||||
unblocks: [p5-2]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§5.7 eval_runs/eval_query_results, §6.3 runs_dir, phase epic tasks/phase-5-evaluation.md]
|
||||
---
|
||||
|
||||
@@ -14,7 +14,7 @@ contract_sections: [§5.7 eval_runs/eval_query_results, §6.3 runs_dir, phase ep
|
||||
|
||||
## Goal
|
||||
|
||||
Load `fixtures/golden_queries.yaml`, run each query through `kb-app` (lexical / vector / hybrid / rag), and persist results into `eval_query_results` + `runs_dir/<run_id>/per_query.jsonl`.
|
||||
Load `fixtures/golden_queries.yaml`, run each query through `kebab-app` (lexical / vector / hybrid / rag), and persist results into `eval_query_results` + `runs_dir/<run_id>/per_query.jsonl`.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -22,10 +22,10 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-app` (calls facade for search / ask)
|
||||
- `kb-store-sqlite` (writes eval rows)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-app` (calls facade for search / ask)
|
||||
- `kebab-store-sqlite` (writes eval rows)
|
||||
- `serde`, `serde_yaml`, `serde_json`
|
||||
- `time`
|
||||
- `tracing`
|
||||
@@ -33,7 +33,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (all reached via `kb-app` facade only), `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (all reached via `kebab-app` facade only), `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -41,7 +41,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
|
||||
|-------|------|--------|
|
||||
| `fixtures/golden_queries.yaml` | YAML | repo-shipped |
|
||||
| `EvalRunOpts` | suite, mode, with_rag, k, temperature, seed | CLI |
|
||||
| `kb-app` facade | search/ask | runtime |
|
||||
| `kebab-app` facade | search/ask | runtime |
|
||||
|
||||
## Outputs
|
||||
|
||||
@@ -50,7 +50,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
|
||||
| `eval_runs` row | SQLite | p5-2, history |
|
||||
| `eval_query_results` rows | SQLite | p5-2 |
|
||||
| `runs_dir/<run_id>/per_query.jsonl` | filesystem | external tools, audits |
|
||||
| `EvalRun` struct | `kb_eval::EvalRun` | caller |
|
||||
| `EvalRun` struct | `kebab_eval::EvalRun` | caller |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -58,9 +58,9 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t
|
||||
pub struct GoldenQuery {
|
||||
pub id: String,
|
||||
pub query: String,
|
||||
pub lang: kb_core::Lang,
|
||||
pub expected_doc_ids: Vec<kb_core::DocumentId>,
|
||||
pub expected_chunk_ids: Vec<kb_core::ChunkId>,
|
||||
pub lang: kebab_core::Lang,
|
||||
pub expected_doc_ids: Vec<kebab_core::DocumentId>,
|
||||
pub expected_chunk_ids: Vec<kebab_core::ChunkId>,
|
||||
pub must_contain: Vec<String>,
|
||||
pub forbidden: Vec<String>,
|
||||
pub difficulty: Option<String>,
|
||||
@@ -68,7 +68,7 @@ pub struct GoldenQuery {
|
||||
|
||||
pub struct EvalRunOpts {
|
||||
pub suite: String, // "golden" default
|
||||
pub mode: kb_core::SearchMode,
|
||||
pub mode: kebab_core::SearchMode,
|
||||
pub with_rag: bool,
|
||||
pub k: usize,
|
||||
pub temperature: Option<f32>,
|
||||
@@ -86,9 +86,9 @@ pub struct EvalRun {
|
||||
pub struct QueryResult {
|
||||
pub query_id: String,
|
||||
pub query: String,
|
||||
pub mode: kb_core::SearchMode,
|
||||
pub hits_top_k: Vec<kb_core::SearchHit>,
|
||||
pub answer: Option<kb_core::Answer>,
|
||||
pub mode: kebab_core::SearchMode,
|
||||
pub hits_top_k: Vec<kebab_core::SearchHit>,
|
||||
pub answer: Option<kebab_core::Answer>,
|
||||
pub elapsed_ms: u32,
|
||||
pub error: Option<String>,
|
||||
}
|
||||
@@ -103,10 +103,10 @@ pub fn run_eval(opts: &EvalRunOpts) -> anyhow::Result<EvalRun>;
|
||||
- Parses YAML; required fields: `id`, `query`. Optional: everything else (defaults to empty / `None`).
|
||||
- Validates uniqueness of `id` and that `expected_doc_ids` / `expected_chunk_ids` exist in DB; missing → return error listing the offenders.
|
||||
- `run_eval`:
|
||||
- Loads `fixtures/golden_queries.yaml` (path overridable via env `KB_EVAL_GOLDEN`).
|
||||
- Loads `fixtures/golden_queries.yaml` (path overridable via env `KEBAB_EVAL_GOLDEN`).
|
||||
- Generates `run_id = "run_" + ulid_lower()`.
|
||||
- Captures `config_snapshot_json`: serialized `kb_config::Config` plus `chunker_version`, `embedding_model+version+dims`, `llm.model_id`, `prompt_template_version`, `score_gate`, `rrf_k`, `index_version`.
|
||||
- For each query: call `kb_app::search(SearchQuery { mode: opts.mode, k: opts.k, .. })`. If `opts.with_rag`, also call `kb_app::ask(query, AskOpts { mode: opts.mode, k: opts.k, explain: true, temperature: opts.temperature, seed: opts.seed, .. })`.
|
||||
- Captures `config_snapshot_json`: serialized `kebab_config::Config` plus `chunker_version`, `embedding_model+version+dims`, `llm.model_id`, `prompt_template_version`, `score_gate`, `rrf_k`, `index_version`.
|
||||
- For each query: call `kebab_app::search(SearchQuery { mode: opts.mode, k: opts.k, .. })`. If `opts.with_rag`, also call `kebab_app::ask(query, AskOpts { mode: opts.mode, k: opts.k, explain: true, temperature: opts.temperature, seed: opts.seed, .. })`.
|
||||
- Each `QueryResult` measured by elapsed wall-clock (ms).
|
||||
- Errors are caught per-query (do not abort the run). Failed queries record `error: Some(msg)` and `hits_top_k = vec![]`.
|
||||
- Determinism: with `temperature=0` and fixed `seed`, two consecutive runs produce byte-identical `per_query.jsonl` for non-RAG queries; RAG queries may differ in negligible token budget telemetry.
|
||||
@@ -130,12 +130,12 @@ pub fn run_eval(opts: &EvalRunOpts) -> anyhow::Result<EvalRun>;
|
||||
| determinism | re-running same suite + fixed seed → identical `per_query.jsonl` (lexical only) | tmp DB, fixed corpus |
|
||||
| snapshot | `EvalRun` (with mock LM for `with_rag`) JSON stable | `fixtures/eval/run-1.json` |
|
||||
|
||||
All tests under `cargo test -p kb-eval runner`.
|
||||
All tests under `cargo test -p kebab-eval runner`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-eval` passes
|
||||
- [ ] `cargo test -p kb-eval runner` passes
|
||||
- [ ] `cargo check -p kebab-eval` passes
|
||||
- [ ] `cargo test -p kebab-eval runner` passes
|
||||
- [ ] `fixtures/golden_queries.yaml` template shipped (≥ 5 example entries)
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §5.7
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P5
|
||||
component: kb-eval (metrics + compare)
|
||||
component: kebab-eval (metrics + compare)
|
||||
task_id: p5-2
|
||||
title: "Metrics computation + compare report"
|
||||
status: completed
|
||||
depends_on: [p5-1]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§5.7 eval_runs.aggregate_json, phase epic tasks/phase-5-evaluation.md]
|
||||
---
|
||||
|
||||
@@ -14,7 +14,7 @@ contract_sections: [§5.7 eval_runs.aggregate_json, phase epic tasks/phase-5-eva
|
||||
|
||||
## Goal
|
||||
|
||||
Compute hit@k, MRR, recall@k_doc, citation_coverage, groundedness, empty_result_rate, refusal_correctness from stored `eval_query_results`. Write `aggregate_json` back into `eval_runs`. Provide `kb eval compare a b` that diffs two runs.
|
||||
Compute hit@k, MRR, recall@k_doc, citation_coverage, groundedness, empty_result_rate, refusal_correctness from stored `eval_query_results`. Write `aggregate_json` back into `eval_runs`. Provide `kebab eval compare a b` that diffs two runs.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -22,16 +22,16 @@ Metric formulas + comparison logic are pure computation. Splitting them from p5-
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-store-sqlite` (read eval rows, write `aggregate_json`)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-store-sqlite` (read eval rows, write `aggregate_json`)
|
||||
- `serde`, `serde_json`
|
||||
- `tracing`
|
||||
- `thiserror`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-app`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-app`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -46,7 +46,7 @@ Metric formulas + comparison logic are pure computation. Splitting them from p5-
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `eval_runs.aggregate_json` updated | SQLite | history, CI checks |
|
||||
| `CompareReport` | `kb_eval::CompareReport` | `kb-cli` printer |
|
||||
| `CompareReport` | `kebab_eval::CompareReport` | `kebab-cli` printer |
|
||||
| optional `runs_dir/<run_id>/report.md` | filesystem | human-readable summary |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
@@ -127,15 +127,15 @@ pub fn render_report_md(report: &CompareReport) -> String;
|
||||
| determinism | running `compute_aggregate` twice produces identical `AggregateMetrics` | inline |
|
||||
| snapshot | `CompareReport` JSON for a fixed pair of runs stable | `fixtures/eval/compare-1.json` |
|
||||
|
||||
All tests under `cargo test -p kb-eval metrics`.
|
||||
All tests under `cargo test -p kebab-eval metrics`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-eval` passes
|
||||
- [ ] `cargo test -p kb-eval metrics` passes
|
||||
- [ ] `cargo check -p kebab-eval` passes
|
||||
- [ ] `cargo test -p kebab-eval metrics` passes
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] `eval_runs.aggregate_json` always populated after `store_aggregate`
|
||||
- [ ] `kb eval compare` CLI surface integrated via `kb-app` (call `compare_runs` + `render_report_md`)
|
||||
- [ ] `kebab eval compare` CLI surface integrated via `kebab-app` (call `compare_runs` + `render_report_md`)
|
||||
- [ ] PR links phase epic tasks/phase-5-evaluation.md
|
||||
|
||||
## Out of scope
|
||||
@@ -172,11 +172,11 @@ same one this spec defines, the names / wiring just differ.
|
||||
/ `compare_runs(a, b)` so integration tests can drive the pipeline
|
||||
against a TempDir-backed `Config`. The no-arg forms wrap them with
|
||||
`Config::load(None)`.
|
||||
- **CLI surface lives on `kb-cli` directly, not via `kb-app`.** DoD
|
||||
asks for `kb eval compare` to be reached "via kb-app", but `kb-app`
|
||||
already depends on `kb-eval` (the P5-1 runner uses the App facade),
|
||||
so routing the CLI through `kb-app` would form a cycle. `kb-cli` →
|
||||
`kb-eval` is wired directly; `kb-app` is unchanged.
|
||||
- **CLI surface lives on `kebab-cli` directly, not via `kebab-app`.** DoD
|
||||
asks for `kebab eval compare` to be reached "via kebab-app", but `kebab-app`
|
||||
already depends on `kebab-eval` (the P5-1 runner uses the App facade),
|
||||
so routing the CLI through `kebab-app` would form a cycle. `kebab-cli` →
|
||||
`kebab-eval` is wired directly; `kebab-app` is unchanged.
|
||||
- **`AggregateMetrics` is `Serialize + Deserialize`.** The spec defines
|
||||
only the field shape; we add `Deserialize` so the stored
|
||||
`aggregate_json` can round-trip back into the type for follow-up
|
||||
@@ -184,12 +184,12 @@ same one this spec defines, the names / wiring just differ.
|
||||
- **`anyhow`** is used in `Result` returns since the rest of the
|
||||
workspace already speaks anyhow; not in the spec's Allowed list but
|
||||
matches every other crate.
|
||||
- **`kb-eval` crate-level `kb-app` dep stays.** The crate already
|
||||
depends on `kb-app` from P5-1 (the runner uses the `App` facade), so
|
||||
- **`kebab-eval` crate-level `kebab-app` dep stays.** The crate already
|
||||
depends on `kebab-app` from P5-1 (the runner uses the `App` facade), so
|
||||
the Cargo.toml entry remains. The new modules (`metrics.rs`,
|
||||
`compare.rs`) do not import `kb-app` themselves — they're behind the
|
||||
`compare.rs`) do not import `kebab-app` themselves — they're behind the
|
||||
same crate boundary as the runner, but the metric/compare *surface*
|
||||
is `kb-app`-clean. Splitting the crate to avoid a transitive Cargo
|
||||
is `kebab-app`-clean. Splitting the crate to avoid a transitive Cargo
|
||||
edge would be churn for no behavior gain.
|
||||
- **`citation_coverage` is intentionally weaker than the spec literal.**
|
||||
Spec calls for "every citation resolves to a real chunk in the DB".
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P6
|
||||
component: kb-parse-image (image extractor + EXIF)
|
||||
component: kebab-parse-image (image extractor + EXIF)
|
||||
task_id: p6-1
|
||||
title: "Image Extractor producing single-block CanonicalDocument + EXIF metadata"
|
||||
status: planned
|
||||
depends_on: [p0-1, p1-6]
|
||||
unblocks: [p6-2, p6-3]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4 Block::ImageRef + ImageRefBlock, §3.7a OcrText/ModelCaption stubs, §9.1 image extraction policy, §9 versioning]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ Establishes the image-as-document contract and decouples extraction (asset → I
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `image = "0.25"` (decoding for size + format detect)
|
||||
- `kamadak-exif` for EXIF
|
||||
- `serde`, `serde_json`
|
||||
@@ -33,31 +33,31 @@ Establishes the image-as-document contract and decouples extraction (asset → I
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, OCR libs, LLM libs
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, OCR libs, LLM libs
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `RawAsset` | `kb_core::RawAsset` | from `kb-source-fs` |
|
||||
| `RawAsset` | `kebab_core::RawAsset` | from `kebab-source-fs` |
|
||||
| image bytes | `&[u8]` | filesystem |
|
||||
| `parser_version` | `kb_core::ParserVersion` | constant in this crate (`"image-meta-v1"`) |
|
||||
| `parser_version` | `kebab_core::ParserVersion` | constant in this crate (`"image-meta-v1"`) |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (image-region chunker) → `kb-store-sqlite` |
|
||||
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (image-region chunker) → `kebab-store-sqlite` |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub struct ImageExtractor;
|
||||
|
||||
impl kb_core::Extractor for ImageExtractor {
|
||||
fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Image(_)) }
|
||||
fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("image-meta-v1".into()) }
|
||||
fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kb_core::CanonicalDocument>;
|
||||
impl kebab_core::Extractor for ImageExtractor {
|
||||
fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Image(_)) }
|
||||
fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("image-meta-v1".into()) }
|
||||
fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kebab_core::CanonicalDocument>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -69,7 +69,7 @@ impl kb_core::Extractor for ImageExtractor {
|
||||
- `metadata.source_type = SourceType::Reference` (per design enum); `trust_level = TrustLevel::Primary`; `tags`/`aliases` empty.
|
||||
- `metadata.user["exif"]` = JSON object with whitelisted EXIF tags (DateTimeOriginal, GPS lat/lon, Make, Model, Orientation, Software). Missing tags omitted.
|
||||
- `metadata.user["dimensions"] = { "w": <u32>, "h": <u32>, "format": "<png|jpeg|...>" }`.
|
||||
- `provenance` includes `Discovered`, `Parsed` events (no Normalized — ID assignment happens here directly per §3.4 stub from p1-4 logic, OR pipe through `kb-normalize` if available; this task's choice: emit a fully formed CanonicalDocument with deterministic IDs by calling `kb_core::id_for_doc` and `kb_core::id_for_block` directly).
|
||||
- `provenance` includes `Discovered`, `Parsed` events (no Normalized — ID assignment happens here directly per §3.4 stub from p1-4 logic, OR pipe through `kebab-normalize` if available; this task's choice: emit a fully formed CanonicalDocument with deterministic IDs by calling `kebab_core::id_for_doc` and `kebab_core::id_for_block` directly).
|
||||
- Failure modes:
|
||||
- Truncated/corrupt image → still emits a CanonicalDocument with `dimensions = null`, EXIF empty, `Provenance` warning event with the decoder error message.
|
||||
- Unsupported format → `anyhow::Error` (caller skips).
|
||||
@@ -77,7 +77,7 @@ impl kb_core::Extractor for ImageExtractor {
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- None directly (the caller persists via `kb-store-sqlite`).
|
||||
- None directly (the caller persists via `kebab-store-sqlite`).
|
||||
|
||||
## Test plan
|
||||
|
||||
@@ -90,12 +90,12 @@ impl kb_core::Extractor for ImageExtractor {
|
||||
| determinism | identical bytes → identical `doc_id`, `block_id` across two runs | inline |
|
||||
| snapshot | `CanonicalDocument` JSON stable for fixture | `fixtures/image/red-100x50.png` |
|
||||
|
||||
All tests under `cargo test -p kb-parse-image`.
|
||||
All tests under `cargo test -p kebab-parse-image`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-image` passes
|
||||
- [ ] `cargo test -p kb-parse-image` passes
|
||||
- [ ] `cargo check -p kebab-parse-image` passes
|
||||
- [ ] `cargo test -p kebab-parse-image` passes
|
||||
- [ ] No OCR/caption/embedding code present
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.4, §9.1
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P6
|
||||
component: kb-parse-image (OCR adapter)
|
||||
component: kebab-parse-image (OCR adapter)
|
||||
task_id: p6-2
|
||||
title: "OcrEngine trait + Tesseract adapter (Apple Vision feature-gated)"
|
||||
status: planned
|
||||
depends_on: [p6-1]
|
||||
unblocks: [p6-3]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4 ImageRefBlock.ocr, §3.7a OcrText/OcrRegion, §9.1 OCR vs caption provenance]
|
||||
---
|
||||
|
||||
@@ -22,9 +22,9 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-parse-image` (consumes its types)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-parse-image` (consumes its types)
|
||||
- `tesseract = "0.13"` (feature `tesseract`, default ON)
|
||||
- For feature `apple-vision`: `std::process::Command` only (sidecar binary, not a Rust dep)
|
||||
- `serde`, `serde_json`
|
||||
@@ -34,21 +34,21 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| image bytes | `&[u8]` | from extractor |
|
||||
| optional language hint | `kb_core::Lang` | metadata |
|
||||
| `kb-config` OCR settings | engine name, languages | runtime |
|
||||
| optional language hint | `kebab_core::Lang` | metadata |
|
||||
| `kebab-config` OCR settings | engine name, languages | runtime |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `OcrText` | `kb_core::OcrText` | merged into `ImageRefBlock.ocr` |
|
||||
| `OcrText` | `kebab_core::OcrText` | merged into `ImageRefBlock.ocr` |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -56,11 +56,11 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini
|
||||
pub trait OcrEngine: Send + Sync {
|
||||
fn engine_name(&self) -> &'static str;
|
||||
fn engine_version(&self) -> String;
|
||||
fn recognize(&self, image_bytes: &[u8], lang_hint: Option<&kb_core::Lang>) -> anyhow::Result<kb_core::OcrText>;
|
||||
fn recognize(&self, image_bytes: &[u8], lang_hint: Option<&kebab_core::Lang>) -> anyhow::Result<kebab_core::OcrText>;
|
||||
}
|
||||
|
||||
pub struct TesseractOcr { /* internal: lazy api handle */ }
|
||||
impl TesseractOcr { pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>; }
|
||||
impl TesseractOcr { pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>; }
|
||||
impl OcrEngine for TesseractOcr { /* per trait */ }
|
||||
|
||||
#[cfg(feature = "apple-vision")]
|
||||
@@ -71,8 +71,8 @@ impl OcrEngine for AppleVisionOcr { /* per trait */ }
|
||||
pub fn apply_ocr(
|
||||
engine: &dyn OcrEngine,
|
||||
image_bytes: &[u8],
|
||||
block: &mut kb_core::ImageRefBlock,
|
||||
lang_hint: Option<&kb_core::Lang>,
|
||||
block: &mut kebab_core::ImageRefBlock,
|
||||
lang_hint: Option<&kebab_core::Lang>,
|
||||
) -> anyhow::Result<()>;
|
||||
```
|
||||
|
||||
@@ -85,7 +85,7 @@ pub fn apply_ocr(
|
||||
- `joined` = `regions.iter().map(|r| r.text).join(" ")` (no smart layout reconstruction in v1).
|
||||
- `engine = "tesseract"`, `engine_version = <tesseract version string>`. The `tesseract` crate (0.13+) does NOT expose a stable Rust `version()` accessor. Use one of: (a) call libtesseract's `TessVersion()` via the bundled FFI surface, OR (b) at adapter construction, shell-out `tesseract --version` once and cache the parsed `"5.3.4"`-style string. Both are deterministic for a fixed install. Pin the chosen approach in the implementation PR.
|
||||
- Apple Vision sidecar (feature `apple-vision`):
|
||||
- Spawn a small Swift binary `kb-vision-ocr` (path from `config.ocr.apple_vision_binary`) feeding the image via stdin and reading JSON `{ regions: [{x,y,w,h,text,confidence}, ...] }` from stdout.
|
||||
- Spawn a small Swift binary `kebab-vision-ocr` (path from `config.ocr.apple_vision_binary`) feeding the image via stdin and reading JSON `{ regions: [{x,y,w,h,text,confidence}, ...] }` from stdout.
|
||||
- Same threshold and `joined` rules as Tesseract. `engine = "apple-vision"`, `engine_version = sidecar's --version`.
|
||||
- This subagent task does NOT write the Swift sidecar; it only wires the Rust side. Document the expected sidecar interface in `docs/spec/sidecar-vision.md` (separate doc spec stub, optional).
|
||||
- `apply_ocr` calls `engine.recognize`, sets `block.ocr = Some(text)`, and appends a `Provenance::OcrApplied` event in the caller's CanonicalDocument (caller responsibility — this task exposes a helper).
|
||||
@@ -109,12 +109,12 @@ pub fn apply_ocr(
|
||||
| determinism | two runs of recognize on same input → identical OcrText | fixture |
|
||||
| `#[cfg(feature = "apple-vision")]` smoke | sidecar invocation captured (mock binary echoes fixed JSON) | inline mock |
|
||||
|
||||
All tests under `cargo test -p kb-parse-image ocr`. Tesseract install required on CI host.
|
||||
All tests under `cargo test -p kebab-parse-image ocr`. Tesseract install required on CI host.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-image --features tesseract` passes
|
||||
- [ ] `cargo test -p kb-parse-image ocr` passes
|
||||
- [ ] `cargo check -p kebab-parse-image --features tesseract` passes
|
||||
- [ ] `cargo test -p kebab-parse-image ocr` passes
|
||||
- [ ] `apple-vision` feature compiles on macOS and gracefully no-ops on Linux
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.4, §3.7a, §9.1
|
||||
@@ -129,5 +129,5 @@ All tests under `cargo test -p kb-parse-image ocr`. Tesseract install required o
|
||||
## Risks / notes
|
||||
|
||||
- Tesseract performance varies wildly with image quality; document `min_confidence` and default page-segmentation mode.
|
||||
- Apple Vision sidecar requires code signing for distribution; for v1 dev builds, accept unsigned binary from `~/.local/bin/kb-vision-ocr`.
|
||||
- Apple Vision sidecar requires code signing for distribution; for v1 dev builds, accept unsigned binary from `~/.local/bin/kebab-vision-ocr`.
|
||||
- Large image downscale loses small-text recognition; expose `config.ocr.max_pixels` so power users can tune.
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P6
|
||||
component: kb-parse-image (caption adapter)
|
||||
component: kebab-parse-image (caption adapter)
|
||||
task_id: p6-3
|
||||
title: "ModelCaption adapter (LanguageModel-driven, feature-gated)"
|
||||
status: planned
|
||||
depends_on: [p6-1, p4-2]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4 ImageRefBlock.caption, §3.7a ModelCaption, §9.1 caption (model-generated, low trust)]
|
||||
---
|
||||
|
||||
@@ -22,10 +22,10 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-parse-image`
|
||||
- `kb-llm` (LanguageModel trait)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-parse-image`
|
||||
- `kebab-llm` (LanguageModel trait)
|
||||
- `base64`
|
||||
- `serde`, `serde_json`
|
||||
- `image` (resize for prompt cost control)
|
||||
@@ -34,7 +34,7 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-llm-local` (only via trait), `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-llm-local` (only via trait), `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
@@ -42,28 +42,28 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le
|
||||
|-------|------|--------|
|
||||
| image bytes | `&[u8]` | extractor |
|
||||
| `dyn LanguageModel` (vision-capable) | runtime | injected |
|
||||
| `kb-config.image.caption` | `{ enabled, max_pixels, prompt_template_version }` | runtime |
|
||||
| `kebab-config.image.caption` | `{ enabled, max_pixels, prompt_template_version }` | runtime |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `ModelCaption` | `kb_core::ModelCaption` | merged into `ImageRefBlock.caption` |
|
||||
| `ModelCaption` | `kebab_core::ModelCaption` | merged into `ImageRefBlock.caption` |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub fn caption_image(
|
||||
llm: &dyn kb_core::LanguageModel,
|
||||
llm: &dyn kebab_core::LanguageModel,
|
||||
image_bytes: &[u8],
|
||||
cfg: &kb_config::Config,
|
||||
) -> anyhow::Result<kb_core::ModelCaption>;
|
||||
cfg: &kebab_config::Config,
|
||||
) -> anyhow::Result<kebab_core::ModelCaption>;
|
||||
|
||||
pub fn apply_caption(
|
||||
llm: &dyn kb_core::LanguageModel,
|
||||
llm: &dyn kebab_core::LanguageModel,
|
||||
image_bytes: &[u8],
|
||||
block: &mut kb_core::ImageRefBlock,
|
||||
cfg: &kb_config::Config,
|
||||
block: &mut kebab_core::ImageRefBlock,
|
||||
cfg: &kebab_config::Config,
|
||||
) -> anyhow::Result<()>;
|
||||
```
|
||||
|
||||
@@ -86,7 +86,7 @@ pub fn apply_caption(
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- None directly. Caller persists via `kb-store-sqlite`.
|
||||
- None directly. Caller persists via `kebab-store-sqlite`.
|
||||
|
||||
## Test plan
|
||||
|
||||
@@ -99,12 +99,12 @@ pub fn apply_caption(
|
||||
| unit | downscale honors `max_pixels` (resulting bytes < some threshold) | fixture large image |
|
||||
| determinism | identical input + temperature=0 + seed=0 → identical caption (mock) | inline |
|
||||
|
||||
All tests under `cargo test -p kb-parse-image caption` with mock LM only.
|
||||
All tests under `cargo test -p kebab-parse-image caption` with mock LM only.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-image --features caption` passes
|
||||
- [ ] `cargo test -p kb-parse-image caption` passes
|
||||
- [ ] `cargo check -p kebab-parse-image --features caption` passes
|
||||
- [ ] `cargo test -p kebab-parse-image caption` passes
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] Feature default OFF; only on when user opts in via config
|
||||
- [ ] PR links design §3.4 ImageRefBlock.caption, §9.1
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P7
|
||||
component: kb-parse-pdf (text extractor)
|
||||
component: kebab-parse-pdf (text extractor)
|
||||
task_id: p7-1
|
||||
title: "Text PDF extractor → CanonicalDocument with page-level blocks"
|
||||
status: planned
|
||||
depends_on: [p0-1, p1-6]
|
||||
unblocks: [p7-2]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4 SourceSpan::Page, §3.4 Block::Paragraph, §9.2 PDF text extraction, §9 versioning]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ Strict scope: page text + page numbers. Layout reconstruction (multi-column merg
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `pdf-extract = "0.7"` (or current stable)
|
||||
- `lopdf = "0.32"` for page metadata (count, optional title from /Info)
|
||||
- `serde`, `serde_json`
|
||||
@@ -33,30 +33,30 @@ Strict scope: page text + page numbers. Layout reconstruction (multi-column merg
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, OCR libraries (OCR fallback is a separate task, not this one)
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, OCR libraries (OCR fallback is a separate task, not this one)
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `RawAsset` | `kb_core::RawAsset` | `kb-source-fs` |
|
||||
| `RawAsset` | `kebab_core::RawAsset` | `kebab-source-fs` |
|
||||
| PDF bytes | `&[u8]` | filesystem |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (`pdf-page-v1` chunker in p7-2) |
|
||||
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (`pdf-page-v1` chunker in p7-2) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub struct PdfTextExtractor;
|
||||
|
||||
impl kb_core::Extractor for PdfTextExtractor {
|
||||
fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Pdf) }
|
||||
fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("pdf-text-v1".into()) }
|
||||
fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kb_core::CanonicalDocument>;
|
||||
impl kebab_core::Extractor for PdfTextExtractor {
|
||||
fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Pdf) }
|
||||
fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("pdf-text-v1".into()) }
|
||||
fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kebab_core::CanonicalDocument>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -100,12 +100,12 @@ impl kb_core::Extractor for PdfTextExtractor {
|
||||
| determinism | identical bytes → identical CanonicalDocument JSON across two runs | inline |
|
||||
| snapshot | CanonicalDocument JSON for fixture stable | `fixtures/pdf/three-page-en.pdf` |
|
||||
|
||||
All tests under `cargo test -p kb-parse-pdf`.
|
||||
All tests under `cargo test -p kebab-parse-pdf`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-pdf` passes
|
||||
- [ ] `cargo test -p kb-parse-pdf` passes
|
||||
- [ ] `cargo check -p kebab-parse-pdf` passes
|
||||
- [ ] `cargo test -p kebab-parse-pdf` passes
|
||||
- [ ] No OCR / LLM code present
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.4 SourceSpan::Page, §9.2
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P7
|
||||
component: kb-chunk (pdf-page-v1)
|
||||
component: kebab-chunk (pdf-page-v1)
|
||||
task_id: p7-2
|
||||
title: "PDF page-aware chunker (pdf-page-v1)"
|
||||
status: planned
|
||||
depends_on: [p7-1]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §0 Q3 citation, §9 versioning]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ Per-medium chunkers must stay tiny and obvious. Page-aware logic is small but it
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `serde`, `serde_json`
|
||||
- `blake3` (policy_hash)
|
||||
- `serde-json-canonicalizer`
|
||||
@@ -31,30 +31,30 @@ Per-medium chunkers must stay tiny and obvious. Page-aware logic is small but it
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf` (consumes `CanonicalDocument` via `kb-core` only), `kb-normalize`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf` (consumes `CanonicalDocument` via `kebab-core` only), `kebab-normalize`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `CanonicalDocument` (produced by `pdf-text-v1`) | `kb_core::CanonicalDocument` | p7-1 |
|
||||
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` |
|
||||
| `CanonicalDocument` (produced by `pdf-text-v1`) | `kebab_core::CanonicalDocument` | p7-1 |
|
||||
| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite`, `kb-embed*` |
|
||||
| `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite`, `kebab-embed*` |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub struct PdfPageV1Chunker;
|
||||
|
||||
impl kb_core::Chunker for PdfPageV1Chunker {
|
||||
fn chunker_version(&self) -> kb_core::ChunkerVersion { kb_core::ChunkerVersion("pdf-page-v1".into()) }
|
||||
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String;
|
||||
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>;
|
||||
impl kebab_core::Chunker for PdfPageV1Chunker {
|
||||
fn chunker_version(&self) -> kebab_core::ChunkerVersion { kebab_core::ChunkerVersion("pdf-page-v1".into()) }
|
||||
fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
|
||||
fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -62,7 +62,7 @@ impl kb_core::Chunker for PdfPageV1Chunker {
|
||||
|
||||
## Behavior contract
|
||||
|
||||
- Only operates on documents whose blocks all carry `SourceSpan::Page` (i.e., from `kb-parse-pdf`). Other documents → return `anyhow::Error("PdfPageV1Chunker only handles PDF docs")`.
|
||||
- Only operates on documents whose blocks all carry `SourceSpan::Page` (i.e., from `kebab-parse-pdf`). Other documents → return `anyhow::Error("PdfPageV1Chunker only handles PDF docs")`.
|
||||
- For each page block (1 block per page after p7-1):
|
||||
- If `text.len()` (byte estimate) ≤ `policy.target_tokens * 4` (proxy for tokens) → emit one chunk for the entire page.
|
||||
- Else → split by paragraphs (split text on `\n\n` or sentence-ending punctuation followed by whitespace) and group adjacent paragraphs until the running byte total approaches `policy.target_tokens * 4`. Apply `policy.overlap_tokens * 4` bytes of trailing overlap into the next chunk's prefix.
|
||||
@@ -91,12 +91,12 @@ impl kb_core::Chunker for PdfPageV1Chunker {
|
||||
| determinism | same input → same chunk_ids twice | inline |
|
||||
| snapshot | `Vec<Chunk>` JSON for fixture stable | `fixtures/pdf/three-page-en.pdf` (chunked) |
|
||||
|
||||
All tests under `cargo test -p kb-chunk pdf`.
|
||||
All tests under `cargo test -p kebab-chunk pdf`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-chunk` passes (existing `md-heading-v1` continues to pass)
|
||||
- [ ] `cargo test -p kb-chunk pdf` passes
|
||||
- [ ] `cargo check -p kebab-chunk` passes (existing `md-heading-v1` continues to pass)
|
||||
- [ ] `cargo test -p kebab-chunk pdf` passes
|
||||
- [ ] Snapshot stable across two runs
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.5, §0 Q3, §9
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P8
|
||||
component: kb-parse-audio (whisper adapter)
|
||||
component: kebab-parse-audio (whisper adapter)
|
||||
task_id: p8-1
|
||||
title: "Audio Extractor + Transcriber trait + whisper.cpp adapter"
|
||||
status: planned
|
||||
depends_on: [p0-1, p1-6]
|
||||
unblocks: [p8-2]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.4 Block::AudioRef + AudioRefBlock, §3.7a Transcript + TranscriptSegment, §9.3 audio policy, §9 versioning]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `whisper-rs = "0.13"` (or current stable)
|
||||
- `symphonia = { version = "0.5", features = ["all"] }` — decode `.m4a/.mp3/.wav/.flac/.ogg` to interleaved f32 PCM at the source's native sample rate / channel layout. Symphonia does NOT resample; that is rubato's job.
|
||||
- `rubato = "0.15"` — sample-rate conversion to 16 kHz mono f32 (the input shape whisper.cpp expects). Use `rubato::FftFixedIn::new(input_sample_rate, 16_000, frames_per_chunk, sub_chunks, 1 /* channels after downmix */)` for fixed-input streaming; pre-mix multi-channel to mono via simple averaging before the resampler.
|
||||
@@ -34,21 +34,21 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf`, `kb-parse-image`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf`, `kebab-parse-image`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `RawAsset` | `kb_core::RawAsset` | `kb-source-fs` |
|
||||
| `RawAsset` | `kebab_core::RawAsset` | `kebab-source-fs` |
|
||||
| audio bytes | `&[u8]` | filesystem |
|
||||
| `kb-config.audio` | `{ model_path, language, chunk_seconds, n_threads, gpu }` | runtime |
|
||||
| `kebab-config.audio` | `{ model_path, language, chunk_seconds, n_threads, gpu }` | runtime |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (`audio-segment-v1` chunker in p8-2) |
|
||||
| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (`audio-segment-v1` chunker in p8-2) |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -56,19 +56,19 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor
|
||||
pub trait Transcriber: Send + Sync {
|
||||
fn engine(&self) -> &'static str;
|
||||
fn engine_version(&self) -> String;
|
||||
fn transcribe(&self, pcm_f32_16khz: &[f32], language_hint: Option<&kb_core::Lang>) -> anyhow::Result<kb_core::Transcript>;
|
||||
fn transcribe(&self, pcm_f32_16khz: &[f32], language_hint: Option<&kebab_core::Lang>) -> anyhow::Result<kebab_core::Transcript>;
|
||||
}
|
||||
|
||||
pub struct WhisperCppTranscriber { /* internal: whisper_rs::WhisperContext */ }
|
||||
impl WhisperCppTranscriber { pub fn new(config: &kb_config::Config) -> anyhow::Result<Self>; }
|
||||
impl WhisperCppTranscriber { pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>; }
|
||||
impl Transcriber for WhisperCppTranscriber { /* per trait */ }
|
||||
|
||||
pub struct AudioExtractor { transcriber: std::sync::Arc<dyn Transcriber> }
|
||||
impl AudioExtractor { pub fn new(transcriber: std::sync::Arc<dyn Transcriber>) -> Self; }
|
||||
impl kb_core::Extractor for AudioExtractor {
|
||||
fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Audio(_)) }
|
||||
fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("audio-whisper-v1".into()) }
|
||||
fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kb_core::CanonicalDocument>;
|
||||
impl kebab_core::Extractor for AudioExtractor {
|
||||
fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Audio(_)) }
|
||||
fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("audio-whisper-v1".into()) }
|
||||
fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result<kebab_core::CanonicalDocument>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -87,7 +87,7 @@ impl kb_core::Extractor for AudioExtractor {
|
||||
- `provenance` events: `Discovered`, `Parsed`, `Transcribed`.
|
||||
- `block_id` per design §4.2 with `block_kind = "audio_ref"`, `heading_path = []`, `ordinal = 0`, `source_span = SourceSpan::Time { start_ms: 0, end_ms: duration_ms }`.
|
||||
- `WhisperCppTranscriber`:
|
||||
- Loads model from `config.audio.model_path` (e.g., `~/.local/share/kb/models/whisper/ggml-large-v3.bin`).
|
||||
- Loads model from `config.audio.model_path` (e.g., `~/.local/share/kebab/models/whisper/ggml-large-v3.bin`).
|
||||
- Runs with `WhisperFullParams::new(SamplingStrategy::Greedy { best_of: 1 })` — deterministic.
|
||||
- Streams in chunks of `config.audio.chunk_seconds` (default 30) to bound memory; aggregates segments.
|
||||
- `Transcript.segments` populated with `start_ms`, `end_ms`, `text`, `confidence: Some(p)` from whisper's per-token probabilities (averaged), `speaker: None` (diarization is P+).
|
||||
@@ -115,12 +115,12 @@ impl kb_core::Extractor for AudioExtractor {
|
||||
| `#[ignore]` integration | 30-second Korean audio → segments_count > 1, language = "ko" | requires `large-v3` model |
|
||||
| snapshot | CanonicalDocument JSON stable for short fixture | `fixtures/audio/hello.wav` |
|
||||
|
||||
All tests under `cargo test -p kb-parse-audio`. Mark slow/large-model tests `#[ignore]`.
|
||||
All tests under `cargo test -p kebab-parse-audio`. Mark slow/large-model tests `#[ignore]`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-parse-audio` passes
|
||||
- [ ] `cargo test -p kb-parse-audio` passes (excluding `#[ignore]`)
|
||||
- [ ] `cargo check -p kebab-parse-audio` passes
|
||||
- [ ] `cargo test -p kebab-parse-audio` passes (excluding `#[ignore]`)
|
||||
- [ ] No imports outside Allowed dependencies (resampler crate may be added — record in PR)
|
||||
- [ ] First-run model download path documented (NOT performed by code; user responsibility)
|
||||
- [ ] PR links design §3.4, §3.7a, §9.3
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P8
|
||||
component: kb-chunk (audio-segment-v1)
|
||||
component: kebab-chunk (audio-segment-v1)
|
||||
task_id: p8-2
|
||||
title: "Audio segment chunker (audio-segment-v1)"
|
||||
status: planned
|
||||
depends_on: [p8-1]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§3.5 Chunk, §3.4 SourceSpan::Time, §4.2 chunk_id recipe, §0 Q3 citation, §9 versioning]
|
||||
---
|
||||
|
||||
@@ -22,8 +22,8 @@ Per-medium chunker. Tiny but versioned — `chunk_id` depends on `chunker_versio
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `serde`, `serde_json`
|
||||
- `blake3` (policy_hash)
|
||||
- `serde-json-canonicalizer`
|
||||
@@ -31,30 +31,30 @@ Per-medium chunker. Tiny but versioned — `chunk_id` depends on `chunker_versio
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio` (consumes via `kb-core` only), `kb-normalize`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio` (consumes via `kebab-core` only), `kebab-normalize`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `CanonicalDocument` containing one `AudioRefBlock` with `Transcript` | `kb_core::CanonicalDocument` | p8-1 |
|
||||
| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` |
|
||||
| `CanonicalDocument` containing one `AudioRefBlock` with `Transcript` | `kebab_core::CanonicalDocument` | p8-1 |
|
||||
| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` |
|
||||
|
||||
## Outputs
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<Chunk>` | `kb_core::Chunk` | `kb-store-sqlite`, embedders |
|
||||
| `Vec<Chunk>` | `kebab_core::Chunk` | `kebab-store-sqlite`, embedders |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub struct AudioSegmentV1Chunker;
|
||||
|
||||
impl kb_core::Chunker for AudioSegmentV1Chunker {
|
||||
fn chunker_version(&self) -> kb_core::ChunkerVersion { kb_core::ChunkerVersion("audio-segment-v1".into()) }
|
||||
fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String;
|
||||
fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result<Vec<kb_core::Chunk>>;
|
||||
impl kebab_core::Chunker for AudioSegmentV1Chunker {
|
||||
fn chunker_version(&self) -> kebab_core::ChunkerVersion { kebab_core::ChunkerVersion("audio-segment-v1".into()) }
|
||||
fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String;
|
||||
fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result<Vec<kebab_core::Chunk>>;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -92,12 +92,12 @@ impl kb_core::Chunker for AudioSegmentV1Chunker {
|
||||
| determinism | same input → same chunk_ids twice | inline |
|
||||
| snapshot | `Vec<Chunk>` JSON for fixture transcript stable | `fixtures/audio/transcript-1.json` (constructed) |
|
||||
|
||||
All tests under `cargo test -p kb-chunk audio`.
|
||||
All tests under `cargo test -p kebab-chunk audio`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-chunk` passes (md-heading-v1 + pdf-page-v1 + audio-segment-v1 all coexist)
|
||||
- [ ] `cargo test -p kb-chunk audio` passes
|
||||
- [ ] `cargo check -p kebab-chunk` passes (md-heading-v1 + pdf-page-v1 + audio-segment-v1 all coexist)
|
||||
- [ ] `cargo test -p kebab-chunk audio` passes
|
||||
- [ ] Snapshot stable across two runs
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §3.5, §3.4 SourceSpan::Time, §4.2
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P9
|
||||
component: kb-tui (library view)
|
||||
component: kebab-tui (library view)
|
||||
task_id: p9-1
|
||||
title: "Ratatui library list view + tag filter"
|
||||
status: planned
|
||||
depends_on: [p1-6]
|
||||
unblocks: [p9-2, p9-3, p9-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [report §16.2 TUI (also tasks/phase-9-ui.md epic), design §3.7 SearchHit, design §1 UX scenes for shared key bindings]
|
||||
---
|
||||
|
||||
@@ -14,7 +14,7 @@ contract_sections: [report §16.2 TUI (also tasks/phase-9-ui.md epic), design §
|
||||
|
||||
## Goal
|
||||
|
||||
Stand up a Ratatui app skeleton with a "Library" pane: list documents, filter by tag/lang, navigate. Establishes the global app loop, key dispatch, and `kb-app` integration point that the search/ask/inspect panes (p9-2..p9-4) extend.
|
||||
Stand up a Ratatui app skeleton with a "Library" pane: list documents, filter by tag/lang, navigate. Establishes the global app loop, key dispatch, and `kebab-app` integration point that the search/ask/inspect panes (p9-2..p9-4) extend.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -22,9 +22,9 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-app` (facade — the only crate this binary touches besides `kb-core`/`kb-config`)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-app` (facade — the only crate this binary touches besides `kebab-core`/`kebab-config`)
|
||||
- `ratatui = "0.28"`
|
||||
- `crossterm`
|
||||
- `tracing`
|
||||
@@ -32,15 +32,15 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (UI must go through `kb-app` only — this is the design §8 boundary)
|
||||
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (UI must go through `kebab-app` only — this is the design §8 boundary)
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `kb-app::list_docs(filter)` | facade call | runtime |
|
||||
| `kebab-app::list_docs(filter)` | facade call | runtime |
|
||||
| keyboard events | `crossterm` | terminal |
|
||||
| `kb-config::Config` | runtime | env / file |
|
||||
| `kebab-config::Config` | runtime | env / file |
|
||||
|
||||
## Outputs
|
||||
|
||||
@@ -57,7 +57,7 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ
|
||||
// state in WITHOUT modifying the App struct definition. This avoids merge
|
||||
// conflicts when p9-2/3/4 land in parallel; only p9-1 ever changes `App`.
|
||||
pub struct App {
|
||||
pub config: kb_config::Config,
|
||||
pub config: kebab_config::Config,
|
||||
pub focus: Pane,
|
||||
pub library: LibraryState, // owned by p9-1
|
||||
pub search: Option<SearchState>, // populated by p9-2 (None until that crate links in)
|
||||
@@ -73,7 +73,7 @@ pub struct AskState; // body filled by p9-3
|
||||
pub struct InspectState; // body filled by p9-4
|
||||
|
||||
impl App {
|
||||
pub fn new(config: kb_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn new(config: kebab_config::Config) -> anyhow::Result<Self>;
|
||||
pub fn run(&mut self) -> anyhow::Result<()>; // blocking loop until quit
|
||||
}
|
||||
|
||||
@@ -102,12 +102,12 @@ pub enum KeyOutcome { Continue, Quit, SwitchPane(Pane), Refresh }
|
||||
- `Enter` → switch to Inspect pane (p9-4) on selected doc
|
||||
- `q` or `Esc` → quit
|
||||
- All facade calls run on the main thread (no async). For long calls, render a "loading…" state and call from a worker thread; bridge via `mpsc::channel` (this task may keep things synchronous and accept brief UI hangs for v1).
|
||||
- Logging: `tracing` initialized to a file under `~/.local/state/kb/logs/`; never to stdout/stderr (so the TUI is not corrupted).
|
||||
- Logging: `tracing` initialized to a file under `~/.local/state/kebab/logs/`; never to stdout/stderr (so the TUI is not corrupted).
|
||||
- Error rendering: a popup overlay shows `error: {msg}\nhint: {hint}` from `anyhow::Error` chain; press any key to dismiss.
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Reads: `kb-app::list_docs` only.
|
||||
- Reads: `kebab-app::list_docs` only.
|
||||
- Writes: none.
|
||||
|
||||
## Test plan
|
||||
@@ -118,16 +118,16 @@ pub enum KeyOutcome { Continue, Quit, SwitchPane(Pane), Refresh }
|
||||
| unit | filter `f` opens edit overlay; `Enter` triggers refresh | inline |
|
||||
| snapshot | rendered library with 3 docs + filter open produces stable frame buffer (use `ratatui::backend::TestBackend`) | inline |
|
||||
| unit | error popup renders without panic on injected `anyhow::Error` | inline |
|
||||
| integration | mocked `kb-app::list_docs` returning N docs renders all rows | inline |
|
||||
| integration | mocked `kebab-app::list_docs` returning N docs renders all rows | inline |
|
||||
|
||||
All tests under `cargo test -p kb-tui library`.
|
||||
All tests under `cargo test -p kebab-tui library`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-tui` passes
|
||||
- [ ] `cargo test -p kb-tui library` passes
|
||||
- [ ] No imports outside `kb-core`, `kb-config`, `kb-app`
|
||||
- [ ] `kb tui` (or `kb` if TUI is the default) launches and shows Library on a real terminal (manual smoke)
|
||||
- [ ] `cargo check -p kebab-tui` passes
|
||||
- [ ] `cargo test -p kebab-tui library` passes
|
||||
- [ ] No imports outside `kebab-core`, `kebab-config`, `kebab-app`
|
||||
- [ ] `kebab tui` (or `kebab` if TUI is the default) launches and shows Library on a real terminal (manual smoke)
|
||||
- [ ] PR links design §8 module boundary, report §16.2 (TUI epic)
|
||||
|
||||
## Out of scope
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P9
|
||||
component: kb-tui (search pane)
|
||||
component: kebab-tui (search pane)
|
||||
task_id: p9-2
|
||||
title: "TUI Search pane: input + result list + preview + editor jump"
|
||||
status: planned
|
||||
depends_on: [p2-2, p3-4, p9-1]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§1.5/1.6 search output, §3.7 SearchHit, §0 Q3 citation]
|
||||
---
|
||||
|
||||
@@ -14,7 +14,7 @@ contract_sections: [§1.5/1.6 search output, §3.7 SearchHit, §0 Q3 citation]
|
||||
|
||||
## Goal
|
||||
|
||||
Add a Search pane to the TUI that drives `kb-app::search`, renders dense results (rank+score / path#frag / heading / snippet), and supports `g` (editor jump to citation) for the selected hit.
|
||||
Add a Search pane to the TUI that drives `kebab-app::search`, renders dense results (rank+score / path#frag / heading / snippet), and supports `g` (editor jump to citation) for the selected hit.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -22,25 +22,25 @@ Search is the most-used surface. Confining it to one pane leverages the App skel
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-app`
|
||||
- `kb-tui` (extends p9-1)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-app`
|
||||
- `kebab-tui` (extends p9-1)
|
||||
- `ratatui`, `crossterm`
|
||||
- `tracing`
|
||||
- `thiserror`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `kb-app::search(query)` | facade | runtime |
|
||||
| `kebab-app::search(query)` | facade | runtime |
|
||||
| keyboard events | `crossterm` | terminal |
|
||||
| selected hit's citation | `kb_core::Citation` | App state |
|
||||
| selected hit's citation | `kebab_core::Citation` | App state |
|
||||
|
||||
## Outputs
|
||||
|
||||
@@ -54,16 +54,16 @@ Search is the most-used surface. Confining it to one pane leverages the App skel
|
||||
```rust
|
||||
pub fn render_search<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ratatui::layout::Rect, state: &App);
|
||||
pub fn handle_key_search(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome;
|
||||
pub fn jump_to_citation(citation: &kb_core::Citation, editor_env: &str /* $EDITOR */) -> anyhow::Result<()>;
|
||||
pub fn jump_to_citation(citation: &kebab_core::Citation, editor_env: &str /* $EDITOR */) -> anyhow::Result<()>;
|
||||
```
|
||||
|
||||
This task fills the body of `kb_tui::SearchState` (forward-declared in p9-1). The `App` struct itself is NOT edited — only `SearchState` gets fields:
|
||||
This task fills the body of `kebab_tui::SearchState` (forward-declared in p9-1). The `App` struct itself is NOT edited — only `SearchState` gets fields:
|
||||
|
||||
```rust
|
||||
pub struct SearchState {
|
||||
pub input: String,
|
||||
pub mode: kb_core::SearchMode,
|
||||
pub hits: Vec<kb_core::SearchHit>,
|
||||
pub mode: kebab_core::SearchMode,
|
||||
pub hits: Vec<kebab_core::SearchHit>,
|
||||
pub selected_hit: usize,
|
||||
pub last_query_at: Option<time::OffsetDateTime>, // debounce timer
|
||||
}
|
||||
@@ -73,7 +73,7 @@ The Library pane's keypress handler (in p9-1) sets `app.search = Some(SearchStat
|
||||
|
||||
## Behavior contract
|
||||
|
||||
- Layout: top input bar (search query + mode badge `[hybrid|lexical|vector]`), middle result list (one hit per 4 lines per design §1.5 dense format), bottom preview pane (full chunk text fetched lazily via `kb-app::inspect_chunk`).
|
||||
- Layout: top input bar (search query + mode badge `[hybrid|lexical|vector]`), middle result list (one hit per 4 lines per design §1.5 dense format), bottom preview pane (full chunk text fetched lazily via `kebab-app::inspect_chunk`).
|
||||
- Key bindings (Search pane):
|
||||
- typing → updates `search_input`; debounced (200 ms) re-search
|
||||
- `Tab` → cycles `search_mode` Lexical → Vector → Hybrid → Lexical
|
||||
@@ -104,14 +104,14 @@ The Library pane's keypress handler (in p9-1) sets `app.search = Some(SearchStat
|
||||
| unit | `j` / `k` move selection within bounds | inline |
|
||||
| unit | `jump_to_citation` for `Line` builds `+<line> <path>` command (assert via mocked Command runner) | inline |
|
||||
| snapshot | rendered Search pane with 3 hits + preview stable | TestBackend |
|
||||
| integration | mocked `kb-app::search` returning fixture hits drives render | inline |
|
||||
| integration | mocked `kebab-app::search` returning fixture hits drives render | inline |
|
||||
|
||||
All tests under `cargo test -p kb-tui search`.
|
||||
All tests under `cargo test -p kebab-tui search`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-tui` passes
|
||||
- [ ] `cargo test -p kb-tui search` passes
|
||||
- [ ] `cargo check -p kebab-tui` passes
|
||||
- [ ] `cargo test -p kebab-tui search` passes
|
||||
- [ ] `g` keybinding launches `$EDITOR` with correct `+<line>` argument (manual smoke against vim)
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] PR links design §1.5/1.6, §3.7
|
||||
@@ -125,5 +125,5 @@ All tests under `cargo test -p kb-tui search`.
|
||||
## Risks / notes
|
||||
|
||||
- Suspending and restoring crossterm raw mode around the editor spawn is finicky; code defensively (RAII guard).
|
||||
- Different editors take different jump syntaxes. Provide an env override `KB_EDITOR_JUMP_FORMAT="vim"` for users on exotic editors.
|
||||
- Different editors take different jump syntaxes. Provide an env override `KEBAB_EDITOR_JUMP_FORMAT="vim"` for users on exotic editors.
|
||||
- Long snippet text wrap: clamp to viewport width and ellipsize per design §1.5 (`…` already in dense template).
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P9
|
||||
component: kb-tui (ask pane)
|
||||
component: kebab-tui (ask pane)
|
||||
task_id: p9-3
|
||||
title: "TUI Ask pane: streaming answer + citation links + --explain toggle"
|
||||
status: planned
|
||||
depends_on: [p4-3, p9-1]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§1.1–1.4 ask scenes, §2.3 Answer wire, §3.8 Answer]
|
||||
---
|
||||
|
||||
@@ -14,7 +14,7 @@ contract_sections: [§1.1–1.4 ask scenes, §2.3 Answer wire, §3.8 Answer]
|
||||
|
||||
## Goal
|
||||
|
||||
Add an Ask pane that calls `kb-app::ask`, streams tokens into the answer area in real time, renders citation footnotes (default mode A), and toggles to `--explain` (mode B + retrieval trace) with a key.
|
||||
Add an Ask pane that calls `kebab-app::ask`, streams tokens into the answer area in real time, renders citation footnotes (default mode A), and toggles to `--explain` (mode B + retrieval trace) with a key.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -22,23 +22,23 @@ Streaming UI is the only TUI piece that meaningfully differs from search/inspect
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-app`
|
||||
- `kb-tui` (extends p9-1)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-app`
|
||||
- `kebab-tui` (extends p9-1)
|
||||
- `ratatui`, `crossterm`
|
||||
- `tracing`
|
||||
- `thiserror`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (only via `kb-app`), `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (only via `kebab-app`), `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `kb-app::ask(query, AskOpts)` | facade | runtime |
|
||||
| `kebab-app::ask(query, AskOpts)` | facade | runtime |
|
||||
| keyboard events | `crossterm` | terminal |
|
||||
|
||||
## Outputs
|
||||
@@ -46,7 +46,7 @@ Streaming UI is the only TUI piece that meaningfully differs from search/inspect
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| Ratatui Ask pane render | terminal | user |
|
||||
| `kb-app::ask` invocation with streaming closure | facade | RAG pipeline |
|
||||
| `kebab-app::ask` invocation with streaming closure | facade | RAG pipeline |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
@@ -55,7 +55,7 @@ pub fn render_ask<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ra
|
||||
pub fn handle_key_ask(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome;
|
||||
```
|
||||
|
||||
This task fills the body of `kb_tui::AskState` (forward-declared in p9-1). `App` is NOT edited — only `AskState` gets fields:
|
||||
This task fills the body of `kebab_tui::AskState` (forward-declared in p9-1). `App` is NOT edited — only `AskState` gets fields:
|
||||
|
||||
```rust
|
||||
pub struct AskState {
|
||||
@@ -63,8 +63,8 @@ pub struct AskState {
|
||||
pub explain: bool,
|
||||
pub streaming: bool,
|
||||
pub partial: String,
|
||||
pub answer: Option<kb_core::Answer>,
|
||||
pub thread: Option<std::thread::JoinHandle<anyhow::Result<kb_core::Answer>>>,
|
||||
pub answer: Option<kebab_core::Answer>,
|
||||
pub thread: Option<std::thread::JoinHandle<anyhow::Result<kebab_core::Answer>>>,
|
||||
pub rx: Option<std::sync::mpsc::Receiver<String>>,
|
||||
}
|
||||
```
|
||||
@@ -74,7 +74,7 @@ pub struct AskState {
|
||||
## Behavior contract
|
||||
|
||||
- Layout: top input bar (`?` prompt, query text), middle answer area (rendered Markdown-light: paragraphs + inline `[N]` markers), bottom-right citations panel (numbered list of citations with `path#fragment` and section label), bottom-left status (`grounded ✓/✗ model prompt_v k chunks`).
|
||||
- Submission: `Enter` triggers a worker thread that calls `kb-app::ask` with `AskOpts.stream_sink: Some(tx)` (`tx: mpsc::Sender<String>`). The thread holds the `tx`, the TUI holds the matching `rx` (set on `AskState.rx`). On each render frame the TUI drains `rx.try_iter()` into `state.partial`, no blocking.
|
||||
- Submission: `Enter` triggers a worker thread that calls `kebab-app::ask` with `AskOpts.stream_sink: Some(tx)` (`tx: mpsc::Sender<String>`). The thread holds the `tx`, the TUI holds the matching `rx` (set on `AskState.rx`). On each render frame the TUI drains `rx.try_iter()` into `state.partial`, no blocking.
|
||||
- Streaming: while `ask_streaming = true`, the Answer area shows `ask_partial` and a small "▍" cursor. When the worker finishes, `ask_answer` is populated and the citations panel switches to the final list.
|
||||
- Refusal rendering:
|
||||
- `grounded = false` and `refusal_reason = ScoreGate` → render the answer (which is the human-friendly "근거 부족…" message), citations show "가까운 후보".
|
||||
@@ -85,12 +85,12 @@ pub struct AskState {
|
||||
- `e` → toggle `ask_explain`; resubmit on next `Enter`. While explain ON, citations panel is replaced by the per-claim breakdown (mode B in design §1.2) and a footer shows the retrieval trace summary.
|
||||
- `Esc` → switch back to Library pane (cancellation of an in-flight ask is best-effort: the worker thread continues but its final answer is dropped).
|
||||
- `j` / `k` → scroll the answer area when oversized.
|
||||
- All facade calls stay within `kb-app::ask` — never reach into `kb-rag` directly.
|
||||
- All facade calls stay within `kebab-app::ask` — never reach into `kebab-rag` directly.
|
||||
- Errors render as a popup overlay; do not crash the pane.
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Reads/writes via `kb-app::ask` which itself writes the `answers` row in `kb.sqlite`. The pane has no direct DB access.
|
||||
- Reads/writes via `kebab-app::ask` which itself writes the `answers` row in `kebab.sqlite`. The pane has no direct DB access.
|
||||
|
||||
## Test plan
|
||||
|
||||
@@ -102,14 +102,14 @@ pub struct AskState {
|
||||
| unit | refusal answer renders without citations panel index errors | inline |
|
||||
| snapshot | rendered Ask pane mid-stream is stable | TestBackend |
|
||||
| snapshot | rendered Ask pane after finished grounded answer is stable | TestBackend |
|
||||
| integration | mocked `kb-app::ask` returning a canned `Answer` populates final state correctly | inline |
|
||||
| integration | mocked `kebab-app::ask` returning a canned `Answer` populates final state correctly | inline |
|
||||
|
||||
All tests under `cargo test -p kb-tui ask`.
|
||||
All tests under `cargo test -p kebab-tui ask`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-tui` passes
|
||||
- [ ] `cargo test -p kb-tui ask` passes
|
||||
- [ ] `cargo check -p kebab-tui` passes
|
||||
- [ ] `cargo test -p kebab-tui ask` passes
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] Manual smoke: stream tokens visible character-by-character against a real Ollama (or `MockLanguageModel`)
|
||||
- [ ] PR links design §1.1–1.4, §2.3
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P9
|
||||
component: kb-tui (inspect pane)
|
||||
component: kebab-tui (inspect pane)
|
||||
task_id: p9-4
|
||||
title: "TUI Inspect pane: document & chunk detail render"
|
||||
status: planned
|
||||
depends_on: [p1-6, p9-1]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [§1 inspect output, §3.5 Chunk, §2.5 DocSummary, §2.6 ChunkInspection]
|
||||
---
|
||||
|
||||
@@ -22,24 +22,24 @@ Inspect is read-only and has no external interactions; smallest possible pane. U
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-app`
|
||||
- `kb-tui` (extends p9-1)
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-app`
|
||||
- `kebab-tui` (extends p9-1)
|
||||
- `ratatui`, `crossterm`
|
||||
- `tracing`
|
||||
- `thiserror`
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (only via `kb-app`), `kb-desktop`
|
||||
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (only via `kebab-app`), `kebab-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `kb-app::inspect_doc(id)` | facade | runtime |
|
||||
| `kb-app::inspect_chunk(id)` | facade | runtime |
|
||||
| `kebab-app::inspect_doc(id)` | facade | runtime |
|
||||
| `kebab-app::inspect_chunk(id)` | facade | runtime |
|
||||
| keyboard events | `crossterm` | terminal |
|
||||
|
||||
## Outputs
|
||||
@@ -51,19 +51,19 @@ Inspect is read-only and has no external interactions; smallest possible pane. U
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub enum InspectTarget { Doc(kb_core::DocumentId), Chunk(kb_core::ChunkId) }
|
||||
pub enum InspectTarget { Doc(kebab_core::DocumentId), Chunk(kebab_core::ChunkId) }
|
||||
|
||||
pub fn render_inspect<B: ratatui::backend::Backend>(f: &mut ratatui::Frame, area: ratatui::layout::Rect, state: &App);
|
||||
pub fn handle_key_inspect(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome;
|
||||
```
|
||||
|
||||
This task fills the body of `kb_tui::InspectState` (forward-declared in p9-1). `App` is NOT edited.
|
||||
This task fills the body of `kebab_tui::InspectState` (forward-declared in p9-1). `App` is NOT edited.
|
||||
|
||||
```rust
|
||||
pub struct InspectState {
|
||||
pub target: Option<InspectTarget>,
|
||||
pub doc: Option<kb_core::CanonicalDocument>,
|
||||
pub chunk: Option<kb_core::Chunk>,
|
||||
pub doc: Option<kebab_core::CanonicalDocument>,
|
||||
pub chunk: Option<kebab_core::Chunk>,
|
||||
pub collapsed: std::collections::HashSet<&'static str>,
|
||||
pub scroll: u16,
|
||||
}
|
||||
@@ -89,7 +89,7 @@ pub struct InspectState {
|
||||
- `c` → collapse / expand currently focused section (focus is implicit by current scroll position; v1 may simplify by toggling all sections)
|
||||
- `Esc` → return to previous pane (Library or Search)
|
||||
- `Enter` → no-op (Inspect is terminal — no editor jump here; users use Search pane for jump)
|
||||
- Loading: while `kb-app::inspect_doc` or `inspect_chunk` runs, show "loading…". On error, popup with hint.
|
||||
- Loading: while `kebab-app::inspect_doc` or `inspect_chunk` runs, show "loading…". On error, popup with hint.
|
||||
- Renders must conform to wire schemas `doc_summary.v1` (subset for header) and `chunk_inspection.v1`.
|
||||
|
||||
## Storage / wire effects
|
||||
@@ -100,18 +100,18 @@ pub struct InspectState {
|
||||
|
||||
| kind | description | fixture / data |
|
||||
|------|-------------|----------------|
|
||||
| unit | switching to InspectTarget::Doc triggers `kb-app::inspect_doc` once | inline mock |
|
||||
| unit | switching to InspectTarget::Doc triggers `kebab-app::inspect_doc` once | inline mock |
|
||||
| unit | scroll bounded by content height | inline |
|
||||
| unit | collapse toggle via `c` flips state | inline |
|
||||
| snapshot | doc-view rendered for fixture stable | TestBackend + fixture |
|
||||
| snapshot | chunk-view rendered for fixture stable | TestBackend + fixture |
|
||||
|
||||
All tests under `cargo test -p kb-tui inspect`.
|
||||
All tests under `cargo test -p kebab-tui inspect`.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-tui` passes
|
||||
- [ ] `cargo test -p kb-tui inspect` passes
|
||||
- [ ] `cargo check -p kebab-tui` passes
|
||||
- [ ] `cargo test -p kebab-tui inspect` passes
|
||||
- [ ] No imports outside Allowed dependencies
|
||||
- [ ] Manual smoke: inspect a doc with multiple chunks, scroll, return to library
|
||||
- [ ] PR links design §3.5, §2.5, §2.6
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
phase: P9
|
||||
component: kb-desktop (Tauri)
|
||||
component: kebab-desktop (Tauri)
|
||||
task_id: p9-5
|
||||
title: "Tauri desktop app: backend commands wrapping kb-app + multimodal source viewer"
|
||||
title: "Tauri desktop app: backend commands wrapping kebab-app + multimodal source viewer"
|
||||
status: planned
|
||||
depends_on: [p9-1, p9-2, p9-3, p9-4]
|
||||
unblocks: []
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||||
contract_sections: [report §16.3 desktop (also tasks/phase-9-ui.md epic), design §1 ask/search scenes, design §2 wire schemas v1, design §8 module boundaries]
|
||||
---
|
||||
|
||||
@@ -14,23 +14,23 @@ contract_sections: [report §16.3 desktop (also tasks/phase-9-ui.md epic), desig
|
||||
|
||||
## Goal
|
||||
|
||||
Stand up a Tauri 2.x app (`kb-desktop` crate as backend, `kb-desktop-frontend/` as web assets) whose Tauri commands wrap `kb-app` 1:1. The frontend renders multimodal source viewers (Markdown render, PDF page viewer, image viewer with region overlay, audio player with seek). Citation clicks route to the appropriate viewer.
|
||||
Stand up a Tauri 2.x app (`kebab-desktop` crate as backend, `kebab-desktop-frontend/` as web assets) whose Tauri commands wrap `kebab-app` 1:1. The frontend renders multimodal source viewers (Markdown render, PDF page viewer, image viewer with region overlay, audio player with seek). Citation clicks route to the appropriate viewer.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
Last task. Combines all backend phases into a single user-facing surface. Strict policy: backend commands are thin wrappers over `kb-app`; no new business logic.
|
||||
Last task. Combines all backend phases into a single user-facing surface. Strict policy: backend commands are thin wrappers over `kebab-app`; no new business logic.
|
||||
|
||||
## Allowed dependencies
|
||||
|
||||
- backend (`kb-desktop`):
|
||||
- `kb-core`
|
||||
- `kb-config`
|
||||
- `kb-app`
|
||||
- backend (`kebab-desktop`):
|
||||
- `kebab-core`
|
||||
- `kebab-config`
|
||||
- `kebab-app`
|
||||
- `tauri = "2"` + `tauri-build`
|
||||
- `serde`, `serde_json`
|
||||
- `tracing`
|
||||
- `thiserror`
|
||||
- frontend (`kb-desktop-frontend/`): vanilla TypeScript + Vite (default; user may swap to Svelte/Solid in a follow-up).
|
||||
- frontend (`kebab-desktop-frontend/`): vanilla TypeScript + Vite (default; user may swap to Svelte/Solid in a follow-up).
|
||||
- PDF rendering: `pdfjs-dist`
|
||||
- Markdown rendering: `marked` + `dompurify`
|
||||
- Audio: HTML `<audio>` with custom segment overlay
|
||||
@@ -38,7 +38,7 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (UI must go through `kb-app` only — design §8).
|
||||
- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (UI must go through `kebab-app` only — design §8).
|
||||
- **No native PDF render backend** (no `pdfium`, no `mupdf`, no `poppler`). PDF rendering lives entirely in the frontend (`pdfjs-dist`). Adding any of these would (a) bloat the bundle 100+ MB, (b) require frozen-design amendment, and (c) double the path-containment surface.
|
||||
|
||||
## Inputs
|
||||
@@ -46,7 +46,7 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| Tauri commands | invoked from frontend | user clicks |
|
||||
| `kb-config::Config` | runtime | env / file |
|
||||
| `kebab-config::Config` | runtime | env / file |
|
||||
| user file system (read-only) | for source viewers | OS |
|
||||
|
||||
## Outputs
|
||||
@@ -59,7 +59,7 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
// Tauri command surface (one per kb-app facade method, plus source viewers)
|
||||
// Tauri command surface (one per kebab-app facade method, plus source viewers)
|
||||
#[tauri::command] fn cmd_init(force: bool) -> Result<()>;
|
||||
#[tauri::command] fn cmd_ingest(scope_json: serde_json::Value, summary_only: bool) -> Result<serde_json::Value /* IngestReportWireV1 */>;
|
||||
#[tauri::command] fn cmd_list_docs(filter_json: serde_json::Value) -> Result<Vec<serde_json::Value /* DocSummaryWireV1 */>>;
|
||||
@@ -75,11 +75,11 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
|
||||
#[tauri::command] fn cmd_read_file_bytes(path: String) -> Result<Vec<u8>>; // raw bytes for PDF / image / audio
|
||||
```
|
||||
|
||||
(All commands convert internal `kb-core` types to wire-schema-v1 JSON before returning.)
|
||||
(All commands convert internal `kebab-core` types to wire-schema-v1 JSON before returning.)
|
||||
|
||||
## Behavior contract
|
||||
|
||||
- Backend bootstraps `tracing` to a file under `~/.local/state/kb/logs/` and a Tauri plugin loads/saves window state.
|
||||
- Backend bootstraps `tracing` to a file under `~/.local/state/kebab/logs/` and a Tauri plugin loads/saves window state.
|
||||
- Every Tauri command performs **path containment** for source viewers: resolves `path` against `config.workspace.root`, rejects (`anyhow::Error`) any path outside.
|
||||
- Layout (frontend): left = Library + Search + Ask tabs; right = Source viewer keyed by current citation.
|
||||
- Citation routing in the frontend (clicks on `[#N]` markers or hit rows). All rendering is frontend-side; backend serves raw bytes only.
|
||||
@@ -88,23 +88,23 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
|
||||
- `Citation::Region { path, x, y, w, h }` → `cmd_read_file_bytes(path)` → blob URL → `<img>` + absolute-positioned overlay at `(x, y, w, h)`.
|
||||
- `Citation::Caption { path, model }` → same as Region but no overlay; caption banner shows `model`.
|
||||
- `Citation::Time { path, start_ms, end_ms }` → `cmd_read_file_bytes(path)` → blob URL → `<audio src=...>` seeked to `start_ms / 1000`, with a timeline marker spanning `[start_ms, end_ms]`.
|
||||
- Streaming `kb ask`: backend command `cmd_ask` returns the buffered Answer (per §0 Q5: pipe/JSON mode buffers). For real-time streaming in the desktop, expose a separate `cmd_ask_stream` event channel via Tauri's `Window::emit("kb://ask-token", payload)`. (Implementation can be deferred to a follow-up; v1 of the desktop accepts buffered.)
|
||||
- Streaming `kebab ask`: backend command `cmd_ask` returns the buffered Answer (per §0 Q5: pipe/JSON mode buffers). For real-time streaming in the desktop, expose a separate `cmd_ask_stream` event channel via Tauri's `Window::emit("kebab://ask-token", payload)`. (Implementation can be deferred to a follow-up; v1 of the desktop accepts buffered.)
|
||||
- All backend errors mapped to a `String` message with structure `{ "error": msg, "hint": Option<msg> }`.
|
||||
- Frontend respects light/dark per OS theme (Tauri supplies the API).
|
||||
- No telemetry. No automatic update channel for v1 (manual download).
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
- Reads via `kb-app` (which reads/writes via SQLite + LanceDB).
|
||||
- Reads via `kebab-app` (which reads/writes via SQLite + LanceDB).
|
||||
- Reads workspace files directly for source viewers (path-contained).
|
||||
- Writes nothing outside what `kb-app` writes.
|
||||
- Writes nothing outside what `kebab-app` writes.
|
||||
- Wire JSON between backend and frontend uses schema v1 strictly. The frontend MUST validate `schema_version` strings on every IPC return and warn (or upgrade-gate) when `v1 != current`.
|
||||
|
||||
## Test plan
|
||||
|
||||
| kind | description | fixture / data |
|
||||
|------|-------------|----------------|
|
||||
| unit (backend) | each command wraps the corresponding `kb-app` function and serializes via wire schema | inline mocks |
|
||||
| unit (backend) | each command wraps the corresponding `kebab-app` function and serializes via wire schema | inline mocks |
|
||||
| unit (backend) | `cmd_read_markdown` rejects paths outside workspace | tmp config |
|
||||
| unit (backend) | `cmd_read_file_bytes` rejects paths outside workspace incl. `..`, absolute path, symlink-out | tmp config + traversal vectors |
|
||||
| unit (backend) | `cmd_read_file_bytes` returns identical bytes to `std::fs::read` for an in-workspace file | tmp config |
|
||||
@@ -112,13 +112,13 @@ Last task. Combines all backend phases into a single user-facing surface. Strict
|
||||
| smoke (frontend, optional in this task) | Vitest test that mounts the Library tab, calls a mocked `cmd_list_docs`, renders 1 row | minimal |
|
||||
| manual | full-stack smoke against a real ingested workspace (Markdown + 1 PDF + 1 image + 1 audio); each citation jumps correctly | manual checklist |
|
||||
|
||||
Backend tests under `cargo test -p kb-desktop`. Frontend tests are bonus and not gated by this task's DoD.
|
||||
Backend tests under `cargo test -p kebab-desktop`. Frontend tests are bonus and not gated by this task's DoD.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `cargo check -p kb-desktop` passes
|
||||
- [ ] `cargo test -p kb-desktop` passes
|
||||
- [ ] `pnpm --filter kb-desktop-frontend build` produces a static asset bundle Tauri can package
|
||||
- [ ] `cargo check -p kebab-desktop` passes
|
||||
- [ ] `cargo test -p kebab-desktop` passes
|
||||
- [ ] `pnpm --filter kebab-desktop-frontend build` produces a static asset bundle Tauri can package
|
||||
- [ ] `tauri build` produces an unsigned dmg on macOS in CI (signed/notarized are out of scope)
|
||||
- [ ] Each Tauri command returns wire-schema-v1 JSON; frontend asserts `schema_version`
|
||||
- [ ] No imports outside Allowed dependencies (backend)
|
||||
@@ -139,4 +139,4 @@ Backend tests under `cargo test -p kb-desktop`. Frontend tests are bonus and not
|
||||
- Path containment is the desktop's most security-sensitive surface; tests must include path traversal vectors (`..`, symlinks, absolute paths).
|
||||
- PDF rendering via `pdfjs-dist` is heavy (~2 MB worker); lazy-load on first PDF citation. The trade-off vs a native render backend (e.g., `pdfium` ~150 MB binary, code-signing pain) is heavily one-sided; v1 stays on `pdfjs-dist`.
|
||||
- Audio formats vary; rely on the browser engine's HTML audio decoder (WebKit on macOS supports `.m4a`, `.mp3`; mileage varies on `.flac`/`.ogg`).
|
||||
- Wide Tauri command surface tempts business-logic creep; CI must enforce that no `kb-rag` / `kb-search` / store crate appears in `kb-desktop`'s `cargo tree`.
|
||||
- Wide Tauri command surface tempts business-logic creep; CI must enforce that no `kebab-rag` / `kebab-search` / store crate appears in `kebab-desktop`'s `cargo tree`.
|
||||
|
||||
@@ -3,7 +3,7 @@ phase: P0
|
||||
title: "Workspace 뼈대 + 도메인 계약"
|
||||
status: completed
|
||||
depends_on: []
|
||||
source: kb_local_rust_report.md §3, §4, §6, §7, §13
|
||||
source: kebab_local_rust_report.md §3, §4, §6, §7, §13
|
||||
---
|
||||
|
||||
# P0 — Workspace 뼈대 + 도메인 계약
|
||||
@@ -16,11 +16,11 @@ compile 되는 Rust 2024 workspace 와 domain spec 확정. 이후 모든 phase
|
||||
|
||||
| crate | 역할 |
|
||||
|-------|------|
|
||||
| `kb-core` | domain type, trait, error, ID 규칙 |
|
||||
| `kb-parse-types` | parser 중간 표현 (`ParsedBlock` 등) — `kb-core` 와 parsers/normalize 사이 thin layer (design §3.7b) |
|
||||
| `kb-config` | config 로딩, 기본값, 경로 확장 |
|
||||
| `kb-app` | facade. CLI/TUI/desktop 공통 진입점 |
|
||||
| `kb-cli` | `kb` 바이너리 skeleton (`--help`만 동작) |
|
||||
| `kebab-core` | domain type, trait, error, ID 규칙 |
|
||||
| `kebab-parse-types` | parser 중간 표현 (`ParsedBlock` 등) — `kebab-core` 와 parsers/normalize 사이 thin layer (design §3.7b) |
|
||||
| `kebab-config` | config 로딩, 기본값, 경로 확장 |
|
||||
| `kebab-app` | facade. CLI/TUI/desktop 공통 진입점 |
|
||||
| `kebab-cli` | `kebab` 바이너리 skeleton (`--help`만 동작) |
|
||||
|
||||
## Workspace 설정
|
||||
|
||||
@@ -29,7 +29,7 @@ compile 되는 Rust 2024 workspace 와 domain spec 확정. 이후 모든 phase
|
||||
```toml
|
||||
[workspace]
|
||||
resolver = "3"
|
||||
members = ["crates/kb-core", "crates/kb-parse-types", "crates/kb-config", "crates/kb-app", "crates/kb-cli"]
|
||||
members = ["crates/kebab-core", "crates/kebab-parse-types", "crates/kebab-config", "crates/kebab-app", "crates/kebab-cli"]
|
||||
|
||||
[workspace.package]
|
||||
edition = "2024"
|
||||
@@ -48,11 +48,11 @@ tracing = "0.1"
|
||||
|
||||
추가 멤버 crate 는 후속 phase 에서 합류.
|
||||
|
||||
## kb-core 도메인 타입
|
||||
## kebab-core 도메인 타입
|
||||
|
||||
`RawAsset`, `CanonicalDocument`, `Block`, `Chunk`, `SearchHit`, `Citation`, `SourceSpan`, `Provenance`, `Metadata`. report §6 정의 그대로.
|
||||
|
||||
## kb-core trait
|
||||
## kebab-core trait
|
||||
|
||||
```rust
|
||||
pub trait SourceConnector { fn scan(&self, scope: &SourceScope) -> anyhow::Result<Vec<RawAsset>>; }
|
||||
@@ -78,7 +78,7 @@ index_id = collection name + index version + embedding model id
|
||||
|
||||
각 record 에 다음 version 필드 보존: `doc_version`, `schema_version`, `parser_version`, `chunker_version`, `embedding_model`, `embedding_version`, `index_version`, `prompt_template_version`.
|
||||
|
||||
## kb-app facade
|
||||
## kebab-app facade
|
||||
|
||||
CLI/TUI/desktop 모두 facade 함수 호출. parser/DB/LLM adapter 직접 호출 금지. 초기 facade 메서드 stub:
|
||||
|
||||
@@ -91,9 +91,9 @@ pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
|
||||
pub fn doctor() -> anyhow::Result<DoctorReport>;
|
||||
```
|
||||
|
||||
## kb-cli skeleton
|
||||
## kebab-cli skeleton
|
||||
|
||||
`clap` derive. subcommand: `init`, `ingest`, `index`, `search`, `ask`, `inspect doc|chunk`, `doctor`. 본체는 `kb-app` 호출만. P0 에선 `--help` 만 동작.
|
||||
`clap` derive. subcommand: `init`, `ingest`, `index`, `search`, `ask`, `inspect doc|chunk`, `doctor`. 본체는 `kebab-app` 호출만. P0 에선 `--help` 만 동작.
|
||||
|
||||
## spec 문서
|
||||
|
||||
@@ -112,15 +112,15 @@ pub fn doctor() -> anyhow::Result<DoctorReport>;
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-core`: 외부 의존 최소 (serde, time, uuid, blake3, thiserror, tracing).
|
||||
- `kb-cli` → `kb-app` 만 의존. parser/DB/LLM 직접 의존 금지.
|
||||
- `kb-app` 은 trait 만 보고 동작. 구현체는 dyn injection.
|
||||
- `kebab-core`: 외부 의존 최소 (serde, time, uuid, blake3, thiserror, tracing).
|
||||
- `kebab-cli` → `kebab-app` 만 의존. parser/DB/LLM 직접 의존 금지.
|
||||
- `kebab-app` 은 trait 만 보고 동작. 구현체는 dyn injection.
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `cargo check --workspace` 통과
|
||||
- [ ] `cargo test --workspace` 통과 (단위 테스트는 ID 생성/도메인 직렬화 round-trip)
|
||||
- [ ] `kb --help` 출력
|
||||
- [ ] `kebab --help` 출력
|
||||
- [ ] `docs/spec/*` 7개 문서 존재
|
||||
- [ ] `fixtures/markdown/*` 3개 존재
|
||||
- [ ] domain type serde JSON snapshot test 1개 이상
|
||||
|
||||
@@ -3,47 +3,47 @@ phase: P1
|
||||
title: "Markdown ingestion 파이프라인"
|
||||
status: completed
|
||||
depends_on: [P0]
|
||||
source: kb_local_rust_report.md §8, §14, §17 Phase 1
|
||||
source: kebab_local_rust_report.md §8, §14, §17 Phase 1
|
||||
---
|
||||
|
||||
# P1 — Markdown ingestion 파이프라인
|
||||
|
||||
## 목표
|
||||
|
||||
`Markdown 파일 -> RawAsset -> CanonicalDocument -> Chunk -> SQLite` 흐름 완성. LLM/embedding 없이도 `kb ingest` / `kb list docs` / `kb inspect doc <id>` 동작.
|
||||
`Markdown 파일 -> RawAsset -> CanonicalDocument -> Chunk -> SQLite` 흐름 완성. LLM/embedding 없이도 `kebab ingest` / `kebab list docs` / `kebab inspect doc <id>` 동작.
|
||||
|
||||
## 산출 crate
|
||||
|
||||
| crate | 역할 |
|
||||
|-------|------|
|
||||
| `kb-source-fs` | local folder scan, checksum, 변경 감지. `SourceConnector` 구현 |
|
||||
| `kb-parse-md` | Markdown bytes → structured document. `Extractor` 구현 |
|
||||
| `kb-normalize` | parser output → `CanonicalDocument` |
|
||||
| `kb-chunk` | block-aware chunking. `Chunker` 구현 (`md-heading-v1`) |
|
||||
| `kb-store-sqlite` | metadata, document, chunk, job table. FTS table 은 P2 에서 활성화 |
|
||||
| `kebab-source-fs` | local folder scan, checksum, 변경 감지. `SourceConnector` 구현 |
|
||||
| `kebab-parse-md` | Markdown bytes → structured document. `Extractor` 구현 |
|
||||
| `kebab-normalize` | parser output → `CanonicalDocument` |
|
||||
| `kebab-chunk` | block-aware chunking. `Chunker` 구현 (`md-heading-v1`) |
|
||||
| `kebab-store-sqlite` | metadata, document, chunk, job table. FTS table 은 P2 에서 활성화 |
|
||||
|
||||
## kb-source-fs
|
||||
## kebab-source-fs
|
||||
|
||||
- 입력: `SourceScope { root: PathBuf, include: Vec<Glob>, exclude: Vec<Glob> }`
|
||||
- 동작: 재귀 walk → 각 파일 `blake3` → `RawAsset` 목록.
|
||||
- 변경 감지: `(source_uri, checksum)` 기준 신/구 비교. 동일 checksum 은 skip.
|
||||
- watch 모드는 P1 범위 밖 (config 만 정의, 구현 후순위).
|
||||
|
||||
## kb-parse-md
|
||||
## kebab-parse-md
|
||||
|
||||
- parser 후보: `pulldown-cmark` 1차. GFM table/task list 필요해지면 `comrak` 검토 (§8).
|
||||
- 보존 대상: YAML/TOML frontmatter, heading tree, paragraph, list, code block + lang tag, table, blockquote, link, image ref, **line range**.
|
||||
- 출력: 중간 표현 (parser 고유). `kb-normalize` 가 canonical 로 변환.
|
||||
- 출력: 중간 표현 (parser 고유). `kebab-normalize` 가 canonical 로 변환.
|
||||
- malformed markdown: panic 금지. 가능한 부분만 보존하고 `Provenance` 에 warning 기록.
|
||||
|
||||
## kb-normalize
|
||||
## kebab-normalize
|
||||
|
||||
- 책임: parser 중간 표현 → `CanonicalDocument`.
|
||||
- frontmatter → `Metadata` (id, title, aliases, tags, created_at, updated_at, source_type, trust_level, lang).
|
||||
- block 트리 평탄화 + `BlockId` 부여 (heading path + 순번 기반 deterministic).
|
||||
- `SourceSpan` 은 `LineRange { start, end }` 또는 `ByteRange` 둘 다 허용. Markdown 은 line range 1차.
|
||||
|
||||
## kb-chunk (`md-heading-v1`)
|
||||
## kebab-chunk (`md-heading-v1`)
|
||||
|
||||
우선순위 (§14):
|
||||
1. heading boundary 우선
|
||||
@@ -58,7 +58,7 @@ policy 기본값: `target_tokens = 500`, `overlap_tokens = 80`, `respect_markdow
|
||||
|
||||
token 추정: tokenizer 미도입 단계라 byte / 문자 기반 근사 OK. 실제 tokenizer 는 P3 embedding 도입 시 교체.
|
||||
|
||||
## kb-store-sqlite
|
||||
## kebab-store-sqlite
|
||||
|
||||
스키마 (1차):
|
||||
|
||||
@@ -117,7 +117,7 @@ CREATE TABLE jobs (
|
||||
- transaction: ingest 1건 = 1 transaction. 부분 실패 시 rollback.
|
||||
- idempotent: 동일 `doc_id` 재수집은 UPSERT, version bump.
|
||||
|
||||
## kb-app facade 확장
|
||||
## kebab-app facade 확장
|
||||
|
||||
```rust
|
||||
pub fn ingest(scope: SourceScope) -> anyhow::Result<IngestReport>;
|
||||
@@ -131,10 +131,10 @@ pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
|
||||
## CLI
|
||||
|
||||
```text
|
||||
kb ingest <path> [--include <glob>] [--exclude <glob>]
|
||||
kb list docs [--tag <t>]
|
||||
kb inspect doc <doc_id>
|
||||
kb inspect chunk <chunk_id>
|
||||
kebab ingest <path> [--include <glob>] [--exclude <glob>]
|
||||
kebab list docs [--tag <t>]
|
||||
kebab inspect doc <doc_id>
|
||||
kebab inspect chunk <chunk_id>
|
||||
```
|
||||
|
||||
## 테스트
|
||||
@@ -146,14 +146,14 @@ kb inspect chunk <chunk_id>
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
`kb-parse-md` 금지: `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, embedding 호출. parser 는 순수 함수.
|
||||
`kebab-parse-md` 금지: `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, embedding 호출. parser 는 순수 함수.
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `kb ingest <path>` 실행 후 SQLite 에 documents/blocks/chunks 채워짐
|
||||
- [ ] `kb list docs` 정상 출력
|
||||
- [ ] `kb inspect doc <id>` JSON 출력
|
||||
- [ ] `kb inspect chunk <id>` JSON 출력 (heading path + source span 포함)
|
||||
- [ ] `kebab ingest <path>` 실행 후 SQLite 에 documents/blocks/chunks 채워짐
|
||||
- [ ] `kebab list docs` 정상 출력
|
||||
- [ ] `kebab inspect doc <id>` JSON 출력
|
||||
- [ ] `kebab inspect chunk <id>` JSON 출력 (heading path + source span 포함)
|
||||
- [ ] 같은 폴더 재수집 시 중복 row 없음
|
||||
- [ ] parser/chunker version 변경 시 재처리 대상 식별 가능
|
||||
- [ ] fixture snapshot test 통과
|
||||
|
||||
@@ -3,19 +3,19 @@ phase: P2
|
||||
title: "SQLite FTS5 lexical 검색 + citation"
|
||||
status: completed
|
||||
depends_on: [P1]
|
||||
source: kb_local_rust_report.md §10, §15, §17 Phase 2
|
||||
source: kebab_local_rust_report.md §10, §15, §17 Phase 2
|
||||
---
|
||||
|
||||
# P2 — SQLite FTS5 lexical 검색 + citation
|
||||
|
||||
## 목표
|
||||
|
||||
embedding/LLM 없이 FTS5 만으로 동작하는 검색 + citation 출력. `kb search "..."` 가 chunk 와 source span 반환.
|
||||
embedding/LLM 없이 FTS5 만으로 동작하는 검색 + citation 출력. `kebab search "..."` 가 chunk 와 source span 반환.
|
||||
|
||||
## 산출 crate
|
||||
|
||||
- `kb-search` (lexical 모드) — `Retriever` trait 구현 1번째.
|
||||
- `kb-store-sqlite` 확장: FTS5 virtual table + trigger.
|
||||
- `kebab-search` (lexical 모드) — `Retriever` trait 구현 1번째.
|
||||
- `kebab-store-sqlite` 확장: FTS5 virtual table + trigger.
|
||||
|
||||
## FTS5 스키마
|
||||
|
||||
@@ -69,15 +69,15 @@ pub struct SearchHit {
|
||||
}
|
||||
```
|
||||
|
||||
`Citation` 형식: `notes/rust/kb.md:L12-L34`.
|
||||
`Citation` 형식: `notes/rust/kebab.md:L12-L34`.
|
||||
|
||||
## 인덱스 라이프사이클
|
||||
|
||||
- ingest 시 trigger 로 자동 동기화.
|
||||
- `kb index --rebuild-fts` command 로 FTS table 재구축 (chunker version bump 후 사용).
|
||||
- `kebab index --rebuild-fts` command 로 FTS table 재구축 (chunker version bump 후 사용).
|
||||
- `index_version` 은 `(schema_version, fts_config_hash)` 조합.
|
||||
|
||||
## kb-app facade 확장
|
||||
## kebab-app facade 확장
|
||||
|
||||
```rust
|
||||
pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHit>>;
|
||||
@@ -86,16 +86,16 @@ pub fn search(query: SearchQuery) -> anyhow::Result<Vec<SearchHit>>;
|
||||
## CLI
|
||||
|
||||
```text
|
||||
kb search "Rust workspace 설계" [--k 10] [--tag rust] [--mode lexical]
|
||||
kb index --rebuild-fts
|
||||
kebab search "Rust workspace 설계" [--k 10] [--tag rust] [--mode lexical]
|
||||
kebab index --rebuild-fts
|
||||
```
|
||||
|
||||
출력 예:
|
||||
|
||||
```text
|
||||
1. [0.82] Rust workspace는 여러 package를 하나로 관리한다…
|
||||
doc: notes/rust/kb.md
|
||||
citation: notes/rust/kb.md:L12-L34
|
||||
doc: notes/rust/kebab.md
|
||||
citation: notes/rust/kebab.md:L12-L34
|
||||
heading: 아키텍처 > Rust workspace
|
||||
```
|
||||
|
||||
@@ -108,13 +108,13 @@ kb index --rebuild-fts
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-search` 는 `kb-store-sqlite` 와 `kb-core` 만 의존.
|
||||
- `kebab-search` 는 `kebab-store-sqlite` 와 `kebab-core` 만 의존.
|
||||
- LLM/embedding 호출 금지 (P2 단계).
|
||||
- CLI 는 `kb-app` 통해서만 호출.
|
||||
- CLI 는 `kebab-app` 통해서만 호출.
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `kb search "..."` top-k chunk 반환
|
||||
- [ ] `kebab search "..."` top-k chunk 반환
|
||||
- [ ] 모든 결과에 citation 포함
|
||||
- [ ] citation line range 가 원본과 일치
|
||||
- [ ] 한영 혼합 query 동작 (한국어 토큰화 한계는 노트로)
|
||||
|
||||
@@ -3,23 +3,23 @@ phase: P3
|
||||
title: "Local embedding + LanceDB + hybrid search"
|
||||
status: completed
|
||||
depends_on: [P2]
|
||||
source: kb_local_rust_report.md §10, §11, §15, §17 Phase 3
|
||||
source: kebab_local_rust_report.md §10, §11, §15, §17 Phase 3
|
||||
---
|
||||
|
||||
# P3 — Local embedding + LanceDB + hybrid search
|
||||
|
||||
## 목표
|
||||
|
||||
local embedding 으로 chunk vector 화 → LanceDB 저장 → vector 검색 + lexical 융합 (hybrid). `kb search --mode {lexical,vector,hybrid}` 동작.
|
||||
local embedding 으로 chunk vector 화 → LanceDB 저장 → vector 검색 + lexical 융합 (hybrid). `kebab search --mode {lexical,vector,hybrid}` 동작.
|
||||
|
||||
## 산출 crate
|
||||
|
||||
| crate | 역할 |
|
||||
|-------|------|
|
||||
| `kb-embed` | `Embedder` trait + `EmbeddingInput`/output 타입 |
|
||||
| `kb-embed-local` | `fastembed-rs` adapter (1차). later: Ollama embed endpoint, candle |
|
||||
| `kb-store-vector` | LanceDB 연동. table 관리, upsert, vector search |
|
||||
| `kb-search` | lexical + vector 병행 + score fusion |
|
||||
| `kebab-embed` | `Embedder` trait + `EmbeddingInput`/output 타입 |
|
||||
| `kebab-embed-local` | `fastembed-rs` adapter (1차). later: Ollama embed endpoint, candle |
|
||||
| `kebab-store-vector` | LanceDB 연동. table 관리, upsert, vector search |
|
||||
| `kebab-search` | lexical + vector 병행 + score fusion |
|
||||
|
||||
## Embedder
|
||||
|
||||
@@ -64,7 +64,7 @@ created_at : timestamp
|
||||
## Indexing job
|
||||
|
||||
```text
|
||||
kb index --embeddings [--model <id>] [--batch-size N] [--resume]
|
||||
kebab index --embeddings [--model <id>] [--batch-size N] [--resume]
|
||||
```
|
||||
|
||||
- chunk 중 `embedding_id = chunk_id + model_id + dim` 가 vector store 에 없는 것만 처리.
|
||||
@@ -91,7 +91,7 @@ k_rrf 기본 60.
|
||||
|
||||
P3 범위에선 reranker 미도입 (P+ 단계 노트).
|
||||
|
||||
## kb-search 구조
|
||||
## kebab-search 구조
|
||||
|
||||
```rust
|
||||
pub struct HybridRetriever {
|
||||
@@ -102,11 +102,11 @@ pub struct HybridRetriever {
|
||||
```
|
||||
|
||||
- 각 sub retriever 는 `Retriever` trait 구현.
|
||||
- `kb-app::search` 가 mode 따라 dispatch.
|
||||
- `kebab-app::search` 가 mode 따라 dispatch.
|
||||
|
||||
## kb-app facade 확장
|
||||
## kebab-app facade 확장
|
||||
|
||||
P3 동안 kb-app facade 의 `ingest` / `search` / `list_docs` / `inspect_doc` / `inspect_chunk` 의 stub 본체를 실제 라이브러리 호출로 대체. P0 부터 시그니처는 frozen 이므로 signature 변경 없이 body 만 swap.
|
||||
P3 동안 kebab-app facade 의 `ingest` / `search` / `list_docs` / `inspect_doc` / `inspect_chunk` 의 stub 본체를 실제 라이브러리 호출로 대체. P0 부터 시그니처는 frozen 이므로 signature 변경 없이 body 만 swap.
|
||||
|
||||
```rust
|
||||
pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result<IngestReport>; // p3-5
|
||||
@@ -117,14 +117,14 @@ pub fn inspect_chunk(id: &ChunkId) -> anyhow::Result<Chunk>;
|
||||
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>; // p4-3 (stub remains)
|
||||
```
|
||||
|
||||
p3-5 는 LLM 미관여 facade 모두 (`ask` 제외) 를 한 번에 wire. 이후 `cargo run -p kb-cli -- index` 와 `cargo run -p kb-cli -- search --mode {lexical,vector,hybrid}` 가 실 동작.
|
||||
p3-5 는 LLM 미관여 facade 모두 (`ask` 제외) 를 한 번에 wire. 이후 `cargo run -p kebab-cli -- index` 와 `cargo run -p kebab-cli -- search --mode {lexical,vector,hybrid}` 가 실 동작.
|
||||
|
||||
## CLI
|
||||
|
||||
```text
|
||||
kb index --embeddings
|
||||
kb search --mode vector "비슷한 설계 원칙"
|
||||
kb search --mode hybrid "Markdown chunking 규칙"
|
||||
kebab index --embeddings
|
||||
kebab search --mode vector "비슷한 설계 원칙"
|
||||
kebab search --mode hybrid "Markdown chunking 규칙"
|
||||
```
|
||||
|
||||
## 테스트
|
||||
@@ -137,19 +137,19 @@ kb search --mode hybrid "Markdown chunking 규칙"
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-embed-local` 만 ONNX/모델 binding 의존. 다른 crate 는 trait 만 사용.
|
||||
- `kb-store-vector` 는 `lancedb` 의존. SQLite 와 cross-write 금지 (각 store 책임 분리).
|
||||
- `kebab-embed-local` 만 ONNX/모델 binding 의존. 다른 crate 는 trait 만 사용.
|
||||
- `kebab-store-vector` 는 `lancedb` 의존. SQLite 와 cross-write 금지 (각 store 책임 분리).
|
||||
- LLM crate 와 분리 (§11.1).
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `kb index` (= `kb-app::ingest`) 로 모든 chunk 가 SQLite + LanceDB 에 저장 (p3-5)
|
||||
- [ ] `kb search --mode vector` 정상 hit
|
||||
- [ ] `kb search --mode hybrid` 정상 hit, citation 포함
|
||||
- [ ] `kebab index` (= `kebab-app::ingest`) 로 모든 chunk 가 SQLite + LanceDB 에 저장 (p3-5)
|
||||
- [ ] `kebab search --mode vector` 정상 hit
|
||||
- [ ] `kebab search --mode hybrid` 정상 hit, citation 포함
|
||||
- [ ] 모델/차원 변경 시 별도 table 로 분리 저장
|
||||
- [ ] resume 시 미완료 chunk 만 처리 (P+ 로 deferred)
|
||||
- [ ] hit@k 측정 가능한 형태로 결과 구조화 (P5 준비)
|
||||
- [ ] `kb-app::{ingest,search,list_docs,inspect_doc,inspect_chunk}` 가 실 동작 (`ask` 는 P4-3 까지 stub) — p3-5
|
||||
- [ ] `kebab-app::{ingest,search,list_docs,inspect_doc,inspect_chunk}` 가 실 동작 (`ask` 는 P4-3 까지 stub) — p3-5
|
||||
|
||||
## 리스크 / 주의
|
||||
|
||||
|
||||
@@ -3,22 +3,22 @@ phase: P4
|
||||
title: "Local LLM + RAG + grounded answer"
|
||||
status: completed
|
||||
depends_on: [P3]
|
||||
source: kb_local_rust_report.md §11, §15.2, §17 Phase 4
|
||||
source: kebab_local_rust_report.md §11, §15.2, §17 Phase 4
|
||||
---
|
||||
|
||||
# P4 — Local LLM + RAG + grounded answer
|
||||
|
||||
## 목표
|
||||
|
||||
local LLM 으로 citation 포함 답변 생성. 근거 부족 시 거절. `kb ask "..."` 동작.
|
||||
local LLM 으로 citation 포함 답변 생성. 근거 부족 시 거절. `kebab ask "..."` 동작.
|
||||
|
||||
## 산출 crate
|
||||
|
||||
| crate | 역할 |
|
||||
|-------|------|
|
||||
| `kb-llm` | `LanguageModel` trait + request/response 타입 |
|
||||
| `kb-llm-local` | Ollama adapter 1차. later: llama.cpp, candle |
|
||||
| `kb-rag` | retrieval → context packing → prompt → generate → citation 검증 |
|
||||
| `kebab-llm` | `LanguageModel` trait + request/response 타입 |
|
||||
| `kebab-llm-local` | Ollama adapter 1차. later: llama.cpp, candle |
|
||||
| `kebab-rag` | retrieval → context packing → prompt → generate → citation 검증 |
|
||||
|
||||
## LanguageModel
|
||||
|
||||
@@ -50,9 +50,9 @@ pub struct GenerateResponse {
|
||||
- HTTP localhost 호출 (`http://127.0.0.1:11434/api/generate`).
|
||||
- 내부에서 async runtime 사용 가능. 외부 API 는 동기 wrapper 유지.
|
||||
- model 기본값 config (`qwen2.5:14b-instruct` 등). 실제 선택은 P5 eval 후 결정.
|
||||
- 서버 미기동 시 명확한 에러 메시지 + `kb doctor` 진단.
|
||||
- 서버 미기동 시 명확한 에러 메시지 + `kebab doctor` 진단.
|
||||
|
||||
## kb-rag 파이프라인
|
||||
## kebab-rag 파이프라인
|
||||
|
||||
```text
|
||||
query
|
||||
@@ -118,7 +118,7 @@ pub struct Answer {
|
||||
|
||||
`answers` table 에 저장 (재현/감사용). 사용한 chunk_id 목록 + retrieval params 도 함께.
|
||||
|
||||
## kb-app facade 확장
|
||||
## kebab-app facade 확장
|
||||
|
||||
```rust
|
||||
pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>;
|
||||
@@ -127,9 +127,9 @@ pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result<Answer>;
|
||||
## CLI
|
||||
|
||||
```text
|
||||
kb ask "내 KB 설계에서 저장소 전략은?"
|
||||
kb ask --k 8 --temperature 0 "..."
|
||||
kb ask --explain "..." # retrieval trace + packed prompt 출력
|
||||
kebab ask "내 KB 설계에서 저장소 전략은?"
|
||||
kebab ask --k 8 --temperature 0 "..."
|
||||
kebab ask --explain "..." # retrieval trace + packed prompt 출력
|
||||
```
|
||||
|
||||
## 테스트
|
||||
@@ -142,13 +142,13 @@ kb ask --explain "..." # retrieval trace + packed prompt 출력
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-llm-local` 만 Ollama HTTP 의존.
|
||||
- `kb-rag` 는 `kb-search` (Retriever trait) + `kb-llm` (LanguageModel trait) 만 사용. SQLite/LanceDB 직접 호출 금지.
|
||||
- CLI 는 `kb-app::ask` 만 호출.
|
||||
- `kebab-llm-local` 만 Ollama HTTP 의존.
|
||||
- `kebab-rag` 는 `kebab-search` (Retriever trait) + `kebab-llm` (LanguageModel trait) 만 사용. SQLite/LanceDB 직접 호출 금지.
|
||||
- CLI 는 `kebab-app::ask` 만 호출.
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `kb ask "..."` 동작
|
||||
- [ ] `kebab ask "..."` 동작
|
||||
- [ ] 답변에 citation 포함
|
||||
- [ ] 근거 없는 질문 거절
|
||||
- [ ] `--explain` 으로 retrieval trace 확인
|
||||
@@ -158,6 +158,6 @@ kb ask --explain "..." # retrieval trace + packed prompt 출력
|
||||
## 리스크 / 주의
|
||||
|
||||
- 모델 선택은 P5 golden set 으로 평가 후 확정. P4 에선 default 만.
|
||||
- Ollama 미기동 / 모델 미다운로드 → `kb doctor` 가 명확히 안내.
|
||||
- Ollama 미기동 / 모델 미다운로드 → `kebab doctor` 가 명확히 안내.
|
||||
- LLM 답변에 hallucinated citation 자주 나옴. 후처리 검증이 핵심.
|
||||
- prompt template 변경은 `prompt_template_version` 반드시 bump.
|
||||
|
||||
@@ -3,7 +3,7 @@ phase: P5
|
||||
title: "Golden query / regression eval"
|
||||
status: completed
|
||||
depends_on: [P4]
|
||||
source: kb_local_rust_report.md §17 Phase 5, §18
|
||||
source: kebab_local_rust_report.md §17 Phase 5, §18
|
||||
---
|
||||
|
||||
# P5 — Golden query / regression eval
|
||||
@@ -14,7 +14,7 @@ source: kb_local_rust_report.md §17 Phase 5, §18
|
||||
|
||||
## 산출 crate
|
||||
|
||||
- `kb-eval` — golden query 실행기, 지표 계산, report 생성.
|
||||
- `kebab-eval` — golden query 실행기, 지표 계산, report 생성.
|
||||
|
||||
## Golden set fixture
|
||||
|
||||
@@ -25,9 +25,9 @@ source: kb_local_rust_report.md §17 Phase 5, §18
|
||||
query: "Markdown chunking 규칙"
|
||||
lang: ko
|
||||
expected_doc_ids:
|
||||
- doc:notes/rust/kb-architecture.md
|
||||
- doc:notes/rust/kebab-architecture.md
|
||||
expected_chunk_ids:
|
||||
- chunk:notes/rust/kb-architecture.md#chunking-policy
|
||||
- chunk:notes/rust/kebab-architecture.md#chunking-policy
|
||||
must_contain:
|
||||
- "heading"
|
||||
- "code block"
|
||||
@@ -57,9 +57,9 @@ source: kb_local_rust_report.md §17 Phase 5, §18
|
||||
## 실행 모드
|
||||
|
||||
```text
|
||||
kb eval run --suite golden [--mode {lexical,vector,hybrid}] [--with-rag]
|
||||
kb eval compare <run_id_a> <run_id_b>
|
||||
kb eval report <run_id> --format {json,md,html}
|
||||
kebab eval run --suite golden [--mode {lexical,vector,hybrid}] [--with-rag]
|
||||
kebab eval compare <run_id_a> <run_id_b>
|
||||
kebab eval report <run_id> --format {json,md,html}
|
||||
```
|
||||
|
||||
run record:
|
||||
@@ -90,7 +90,7 @@ DB 저장 (`eval_runs`, `eval_query_results` table) 또는 JSON 파일. 재현
|
||||
- 자동 hyperparameter 탐색 — 안 함.
|
||||
- LLM judge ("LLM as a judge") — P5 범위 밖. groundedness 는 rule-based (`must_contain`) 만.
|
||||
|
||||
## kb-app facade 확장
|
||||
## kebab-app facade 확장
|
||||
|
||||
```rust
|
||||
pub fn eval_run(opts: EvalRunOpts) -> anyhow::Result<EvalRun>;
|
||||
@@ -105,13 +105,13 @@ pub fn eval_compare(a: &str, b: &str) -> anyhow::Result<CompareReport>;
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-eval` 은 `kb-app` 만 호출 (검색/ask 는 facade 통해서). 내부 store/LLM 직접 호출 금지.
|
||||
- `kebab-eval` 은 `kebab-app` 만 호출 (검색/ask 는 facade 통해서). 내부 store/LLM 직접 호출 금지.
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `fixtures/golden_queries.yaml` 30+ 개
|
||||
- [ ] `kb eval run` 으로 hit@k, MRR, citation_coverage 산출
|
||||
- [ ] `kb eval compare` 로 두 run 비교 가능
|
||||
- [ ] `kebab eval run` 으로 hit@k, MRR, citation_coverage 산출
|
||||
- [ ] `kebab eval compare` 로 두 run 비교 가능
|
||||
- [ ] config snapshot 이 run 에 저장됨 (chunker, embedding, llm, prompt 버전)
|
||||
- [ ] CI 로 회귀 감지 가능 (예: hit@5 가 baseline 대비 -3% 이상 떨어지면 실패)
|
||||
|
||||
|
||||
@@ -3,7 +3,7 @@ phase: P6
|
||||
title: "이미지 ingestion (OCR + caption)"
|
||||
status: planned
|
||||
depends_on: [P5]
|
||||
source: kb_local_rust_report.md §9.1, §17 Phase 6
|
||||
source: kebab_local_rust_report.md §9.1, §17 Phase 6
|
||||
---
|
||||
|
||||
# P6 — 이미지 ingestion
|
||||
@@ -14,8 +14,8 @@ source: kb_local_rust_report.md §9.1, §17 Phase 6
|
||||
|
||||
## 산출 crate
|
||||
|
||||
- `kb-parse-image` — `Extractor` 구현. 이미지 → CanonicalDocument.
|
||||
- (선택) `kb-ocr` / `kb-vlm` 어댑터 (외부 모델 분리 시).
|
||||
- `kebab-parse-image` — `Extractor` 구현. 이미지 → CanonicalDocument.
|
||||
- (선택) `kebab-ocr` / `kebab-vlm` 어댑터 (외부 모델 분리 시).
|
||||
|
||||
## 추출 정보 3종 (§9.1)
|
||||
|
||||
@@ -83,10 +83,10 @@ photos/diagram-2026.png#caption # caption chunk
|
||||
## CLI
|
||||
|
||||
```text
|
||||
kb ingest ./assets/diagram.png
|
||||
kb ingest ./assets/ # 폴더 안 이미지 자동 인식
|
||||
kb search "이미지 안의 OCR 텍스트"
|
||||
kb inspect doc <image_doc_id> # OCR/caption/EXIF 모두 표시
|
||||
kebab ingest ./assets/diagram.png
|
||||
kebab ingest ./assets/ # 폴더 안 이미지 자동 인식
|
||||
kebab search "이미지 안의 OCR 텍스트"
|
||||
kebab inspect doc <image_doc_id> # OCR/caption/EXIF 모두 표시
|
||||
```
|
||||
|
||||
## 테스트
|
||||
@@ -99,13 +99,13 @@ kb inspect doc <image_doc_id> # OCR/caption/EXIF 모두 표시
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-parse-image` 는 `kb-core` + 이미지 디코딩 (`image` crate) + OCR adapter 만.
|
||||
- `kebab-parse-image` 는 `kebab-core` + 이미지 디코딩 (`image` crate) + OCR adapter 만.
|
||||
- LLM/embedding 호출 금지 (caption 은 별도 adapter 통해).
|
||||
- VLM caption 은 background job. ingest blocking 금지.
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `kb ingest <image>` 동작
|
||||
- [ ] `kebab ingest <image>` 동작
|
||||
- [ ] OCR text 검색 가능
|
||||
- [ ] OCR region citation 출력
|
||||
- [ ] caption 과 observed text provenance 분리
|
||||
|
||||
@@ -3,7 +3,7 @@ phase: P7
|
||||
title: "PDF text extraction + page citation"
|
||||
status: planned
|
||||
depends_on: [P5]
|
||||
source: kb_local_rust_report.md §9.2, §17 Phase 7
|
||||
source: kebab_local_rust_report.md §9.2, §17 Phase 7
|
||||
---
|
||||
|
||||
# P7 — PDF ingestion
|
||||
@@ -14,7 +14,7 @@ text PDF 추출 → page-aware chunking → citation `paper.pdf:p13`. scanned PD
|
||||
|
||||
## 산출 crate
|
||||
|
||||
- `kb-parse-pdf` — `Extractor` 구현.
|
||||
- `kebab-parse-pdf` — `Extractor` 구현.
|
||||
|
||||
## 단계 분리 (§9.2)
|
||||
|
||||
@@ -63,10 +63,10 @@ paper.pdf:p13:span=0-1240 # char range within page
|
||||
## CLI
|
||||
|
||||
```text
|
||||
kb ingest ./paper.pdf
|
||||
kb ingest ./papers/
|
||||
kb search "PDF 안의 특정 개념"
|
||||
kb inspect doc <pdf_doc_id>
|
||||
kebab ingest ./paper.pdf
|
||||
kebab ingest ./papers/
|
||||
kebab search "PDF 안의 특정 개념"
|
||||
kebab inspect doc <pdf_doc_id>
|
||||
```
|
||||
|
||||
## 테스트
|
||||
@@ -79,12 +79,12 @@ kb inspect doc <pdf_doc_id>
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-parse-pdf` 는 `kb-core` + `pdf-extract` / `lopdf` 만.
|
||||
- `kebab-parse-pdf` 는 `kebab-core` + `pdf-extract` / `lopdf` 만.
|
||||
- OCR 호출은 별도 adapter 통해 (P6 OCR 인프라 재사용).
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `kb ingest <pdf>` 동작
|
||||
- [ ] `kebab ingest <pdf>` 동작
|
||||
- [ ] page-level chunk + citation
|
||||
- [ ] 검색 결과에 `paper.pdf:p<n>` 포함
|
||||
- [ ] 추출 실패 페이지에 대한 provenance warning
|
||||
|
||||
@@ -3,7 +3,7 @@ phase: P8
|
||||
title: "음성 transcription + timestamp citation"
|
||||
status: planned
|
||||
depends_on: [P5]
|
||||
source: kb_local_rust_report.md §9.3, §17 Phase 8
|
||||
source: kebab_local_rust_report.md §9.3, §17 Phase 8
|
||||
---
|
||||
|
||||
# P8 — 음성 ingestion
|
||||
@@ -14,8 +14,8 @@ audio 파일 → transcript (timestamped segment) → CanonicalDocument → 동
|
||||
|
||||
## 산출 crate
|
||||
|
||||
- `kb-parse-audio` — `Extractor` 구현.
|
||||
- `kb-asr-whisper` (또는 `kb-parse-audio` 내부 모듈) — whisper.cpp adapter.
|
||||
- `kebab-parse-audio` — `Extractor` 구현.
|
||||
- `kebab-asr-whisper` (또는 `kebab-parse-audio` 내부 모듈) — whisper.cpp adapter.
|
||||
|
||||
## 파이프라인 (§9.3)
|
||||
|
||||
@@ -94,11 +94,11 @@ meeting-2026-04-27.m4a:00:13:42-00:14:10:speaker=S1
|
||||
## CLI
|
||||
|
||||
```text
|
||||
kb ingest ./meeting.m4a
|
||||
kb ingest ./recordings/
|
||||
kb search "회의에서 언급한 결정사항"
|
||||
kb inspect doc <audio_doc_id> # transcript + segment timestamp 표시
|
||||
kb play <chunk_id> # (선택) 해당 구간 재생 — 후순위
|
||||
kebab ingest ./meeting.m4a
|
||||
kebab ingest ./recordings/
|
||||
kebab search "회의에서 언급한 결정사항"
|
||||
kebab inspect doc <audio_doc_id> # transcript + segment timestamp 표시
|
||||
kebab play <chunk_id> # (선택) 해당 구간 재생 — 후순위
|
||||
```
|
||||
|
||||
## 테스트
|
||||
@@ -111,12 +111,12 @@ kb play <chunk_id> # (선택) 해당 구간 재생 — 후순위
|
||||
|
||||
## 의존성 경계
|
||||
|
||||
- `kb-parse-audio` 는 `kb-core` + `Transcriber` adapter 만.
|
||||
- `kebab-parse-audio` 는 `kebab-core` + `Transcriber` adapter 만.
|
||||
- LLM 호출 금지. RAG 단계는 transcript text 기반으로 동일 파이프라인.
|
||||
|
||||
## 완료 조건
|
||||
|
||||
- [ ] `kb ingest <audio>` 동작
|
||||
- [ ] `kebab ingest <audio>` 동작
|
||||
- [ ] transcript 가 segment timestamp 와 함께 저장
|
||||
- [ ] 검색 결과에 `00:hh:mm:ss-` citation 포함
|
||||
- [ ] 동일 오디오 재수집 idempotent
|
||||
|
||||
@@ -3,7 +3,7 @@ phase: P9
|
||||
title: "TUI + desktop app"
|
||||
status: planned
|
||||
depends_on: [P5]
|
||||
source: kb_local_rust_report.md §16, §17 Phase 9
|
||||
source: kebab_local_rust_report.md §16, §17 Phase 9
|
||||
---
|
||||
|
||||
# P9 — TUI + desktop app
|
||||
@@ -15,18 +15,18 @@ CLI 위에 사용성 레이어 추가. domain/검색/RAG 가 안정된 뒤 마
|
||||
## 순서
|
||||
|
||||
```text
|
||||
kb-tui (먼저) → kb-desktop (나중)
|
||||
kebab-tui (먼저) → kebab-desktop (나중)
|
||||
```
|
||||
|
||||
이유: TUI 는 domain 변화에 빠르게 적응 가능. desktop 은 packaging/배포 비용 큼.
|
||||
|
||||
---
|
||||
|
||||
## P9.A — kb-tui (Ratatui)
|
||||
## P9.A — kebab-tui (Ratatui)
|
||||
|
||||
### 산출 crate
|
||||
|
||||
- `kb-tui` — Ratatui + crossterm 기반 terminal UI.
|
||||
- `kebab-tui` — Ratatui + crossterm 기반 terminal UI.
|
||||
|
||||
### 화면 구성
|
||||
|
||||
@@ -51,8 +51,8 @@ q : 종료
|
||||
|
||||
### 의존성 경계
|
||||
|
||||
- `kb-tui` → `kb-app` 만. parser/store/LLM 직접 호출 금지.
|
||||
- 비동기 I/O (검색/ask) 는 `kb-app` 비동기 wrapper 또는 thread + channel.
|
||||
- `kebab-tui` → `kebab-app` 만. parser/store/LLM 직접 호출 금지.
|
||||
- 비동기 I/O (검색/ask) 는 `kebab-app` 비동기 wrapper 또는 thread + channel.
|
||||
|
||||
### 완료 조건
|
||||
|
||||
@@ -63,7 +63,7 @@ q : 종료
|
||||
|
||||
---
|
||||
|
||||
## P9.B — kb-desktop
|
||||
## P9.B — kebab-desktop
|
||||
|
||||
### 후보 비교
|
||||
|
||||
@@ -72,14 +72,14 @@ q : 종료
|
||||
| Tauri | Rust backend + web frontend. native webview, 작은 binary | web frontend 별도 stack (TS/JS) |
|
||||
| egui/eframe | 순수 Rust, immediate-mode | 디자인 자유도/접근성 한계 |
|
||||
|
||||
추천: Tauri 1차. 기존 `kb-app` facade 그대로 backend 로 노출. frontend 는 가볍게 (svelte/solid/vanilla).
|
||||
추천: Tauri 1차. 기존 `kebab-app` facade 그대로 backend 로 노출. frontend 는 가볍게 (svelte/solid/vanilla).
|
||||
|
||||
### 산출 crate / 구조
|
||||
|
||||
- `kb-desktop` (Tauri app crate)
|
||||
- `kb-desktop-frontend/` (web 자산)
|
||||
- `kebab-desktop` (Tauri app crate)
|
||||
- `kebab-desktop-frontend/` (web 자산)
|
||||
|
||||
Tauri command 는 `kb-app` 함수 1:1 wrap. 신규 비즈니스 로직 추가 금지.
|
||||
Tauri command 는 `kebab-app` 함수 1:1 wrap. 신규 비즈니스 로직 추가 금지.
|
||||
|
||||
### 화면 구성 (1차)
|
||||
|
||||
@@ -100,7 +100,7 @@ Tauri command 는 `kb-app` 함수 1:1 wrap. 신규 비즈니스 로직 추가
|
||||
|
||||
### 의존성 경계
|
||||
|
||||
- frontend 는 Tauri command (= `kb-app` wrapper) 만 호출. SQLite/LanceDB 직접 접근 금지.
|
||||
- frontend 는 Tauri command (= `kebab-app` wrapper) 만 호출. SQLite/LanceDB 직접 접근 금지.
|
||||
- 모델 다운로드/실행은 backend 책임.
|
||||
|
||||
### 완료 조건
|
||||
|
||||
Reference in New Issue
Block a user