diff --git a/README.md b/README.md index c75c85d..9af3f77 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ -# kb — Local-first Knowledge Base +# kebab — Local-first Knowledge Base -> **상태:** P0–P4 구현 완료 (31 component task 중 17 완료) + 3건 post-merge hotfix 적용. `kb index` / `kb search --mode {lexical,vector,hybrid}` / `kb ask` 모두 실 동작. 다음 단계 = P5 (eval suite). 자세한 진행 상황은 [tasks/INDEX.md](tasks/INDEX.md), 머지 후 발견된 버그와 fix는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md). +> **상태:** P0–P4 구현 완료 (31 component task 중 17 완료) + 3건 post-merge hotfix 적용. `kebab index` / `kebab search --mode {lexical,vector,hybrid}` / `kebab ask` 모두 실 동작. 다음 단계 = P5 (eval suite). 자세한 진행 상황은 [tasks/INDEX.md](tasks/INDEX.md), 머지 후 발견된 버그와 fix는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md). -`kb` 는 개인용 로컬 knowledge base + RAG 도구다. Markdown / PDF / 이미지 / 음성을 한 곳에 색인하고, 의미 검색 + citation 포함 LLM 답변을 단일 binary 로 제공한다. 모든 추론은 로컬 (Ollama / fastembed / whisper.cpp) 에서 돌아간다. +`kebab` 는 개인용 로컬 knowledge base + RAG 도구다. Markdown / PDF / 이미지 / 음성을 한 곳에 색인하고, 의미 검색 + citation 포함 LLM 답변을 단일 binary 로 제공한다. 모든 추론은 로컬 (Ollama / fastembed / whisper.cpp) 에서 돌아간다. 대상 하드웨어: M4 48GB MacBook 1대, 사용자 1명. @@ -12,14 +12,14 @@ | 명령 | 동작 | 상태 | |------|------|------| -| `kb init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 | ✅ P0 | -| `kb ingest []` | Markdown 색인 (idempotent). PDF/이미지/음성은 P6+. | ✅ P3-5 | -| `kb search --mode {lexical,vector,hybrid} ""` | 검색 — citation 포함, hybrid는 RRF fusion | ✅ P3-5 | -| `kb list docs` | 색인된 문서 목록 | ✅ P3-5 | -| `kb inspect doc ` / `kb inspect chunk ` | raw record 보기 | ✅ P3-5 | -| `kb ask ""` | RAG 답변 + 근거 인용. 근거 부족 시 거절. Ollama 필요. | ✅ P4-3 | -| `kb doctor` | 설정/모델/DB 헬스 체크 | ✅ P0 | -| `kb eval run / compare` | golden query 회귀 측정 | ⏳ P5 | +| `kebab init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 | ✅ P0 | +| `kebab ingest []` | Markdown 색인 (idempotent). PDF/이미지/음성은 P6+. | ✅ P3-5 | +| `kebab search --mode {lexical,vector,hybrid} ""` | 검색 — citation 포함, hybrid는 RRF fusion | ✅ P3-5 | +| `kebab list docs` | 색인된 문서 목록 | ✅ P3-5 | +| `kebab inspect doc ` / `kebab inspect chunk ` | raw record 보기 | ✅ P3-5 | +| `kebab ask ""` | RAG 답변 + 근거 인용. 근거 부족 시 거절. Ollama 필요. | ✅ P4-3 | +| `kebab doctor` | 설정/모델/DB 헬스 체크 | ✅ P0 | +| `kebab eval run / compare` | golden query 회귀 측정 | ⏳ P5 | 기계 친화 모드: 모든 명령에 `--json` 플래그. 출력은 frozen wire schema v1 (`schema_version` 필드 항상 포함, 예: `ingest_report.v1`, `search_hit.v1`, `answer.v1`, `doctor.v1`). @@ -44,35 +44,35 @@ | citation 형식 | URI fragment (`path#L12-L34`, W3C Media Fragments) | | ID 생성 | `blake3(canonical_json(tuple))[..32]` hex | | RRF fusion_score | `[0, 1]` 정규화 — `2 / (k_rrf + 1)` 로 나눠 mode 간 비교 가능 (post-merge hotfix) | -| layout | XDG (`~/.local/share/kb/`, `~/.config/kb/`, …) | +| layout | XDG (`~/.local/share/kebab/`, `~/.config/kebab/`, …) | -전체는 [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](docs/superpowers/specs/2026-04-27-kb-final-form-design.md) 참조. +전체는 [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](docs/superpowers/specs/2026-04-27-kebab-final-form-design.md) 참조. --- ## 의존성 그래프 ```text -kb-cli, kb-tui, kb-desktop - └─> kb-app - ├─> kb-source-fs - ├─> kb-parse-md / kb-parse-pdf / kb-parse-image / kb-parse-audio - │ └─> kb-parse-types - ├─> kb-normalize - │ └─> kb-parse-types - ├─> kb-chunk - ├─> kb-store-sqlite - ├─> kb-store-vector - ├─> kb-embed-local (kb-embed trait crate) - ├─> kb-search - ├─> kb-llm-local (kb-llm trait crate) - ├─> kb-rag - ├─> kb-eval - └─> kb-config - └─> kb-core (모두 의존) +kebab-cli, kebab-tui, kebab-desktop + └─> kebab-app + ├─> kebab-source-fs + ├─> kebab-parse-md / kebab-parse-pdf / kebab-parse-image / kebab-parse-audio + │ └─> kebab-parse-types + ├─> kebab-normalize + │ └─> kebab-parse-types + ├─> kebab-chunk + ├─> kebab-store-sqlite + ├─> kebab-store-vector + ├─> kebab-embed-local (kebab-embed trait crate) + ├─> kebab-search + ├─> kebab-llm-local (kebab-llm trait crate) + ├─> kebab-rag + ├─> kebab-eval + └─> kebab-config + └─> kebab-core (모두 의존) ``` -UI → store/llm/parse 직접 의존 금지. 모든 user-facing 진입은 `kb-app` facade 만 통한다 (design §8). `kb-cli` 가 `--config ` flag 를 honor 하려면 `kb_app::*_with_config(cfg, …)` companion 을 통해 Config 을 명시적으로 thread 하는 패턴 — 자세한 이유는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md) 의 `--config` 항목. +UI → store/llm/parse 직접 의존 금지. 모든 user-facing 진입은 `kebab-app` facade 만 통한다 (design §8). `kebab-cli` 가 `--config ` flag 를 honor 하려면 `kebab_app::*_with_config(cfg, …)` companion 을 통해 Config 을 명시적으로 thread 하는 패턴 — 자세한 이유는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md) 의 `--config` 항목. --- @@ -80,16 +80,16 @@ UI → store/llm/parse 직접 의존 금지. 모든 user-facing 진입은 `kb-ap | Phase | 내용 | 핵심 산출 crate | 선행 | 상태 | |-------|------|----------------|------|------| -| **P0** | Workspace 뼈대 + 도메인 계약 + ID recipe | `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` | – | ✅ 완료 | -| **P1** | Markdown ingestion (walk → parse → chunk → SQLite) | `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-sqlite` | P0 | ✅ 완료 | -| **P2** | SQLite FTS5 lexical 검색 + citation | `kb-search` (lexical) | P1 | ✅ 완료 | -| **P3** | Local embedding + LanceDB + hybrid (RRF) + kb-app wiring | `kb-embed`, `kb-embed-local`, `kb-store-vector`, `kb-search` | P2 | ✅ 완료 | -| **P4** | Local LLM + RAG + grounded answer | `kb-llm`, `kb-llm-local`, `kb-rag` | P3 | ✅ 완료 | -| **P5** | Golden query / regression eval | `kb-eval` | P4 | ⏳ 다음 | -| **P6** | 이미지 ingestion (OCR + caption) | `kb-parse-image` | P5 | ⏳ | -| **P7** | PDF text + page citation | `kb-parse-pdf` | P5 | ⏳ | -| **P8** | 음성 transcription + timestamp citation | `kb-parse-audio` | P5 | ⏳ | -| **P9** | TUI + desktop app | `kb-tui`, `kb-desktop` | P5 | ⏳ | +| **P0** | Workspace 뼈대 + 도메인 계약 + ID recipe | `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` | – | ✅ 완료 | +| **P1** | Markdown ingestion (walk → parse → chunk → SQLite) | `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-sqlite` | P0 | ✅ 완료 | +| **P2** | SQLite FTS5 lexical 검색 + citation | `kebab-search` (lexical) | P1 | ✅ 완료 | +| **P3** | Local embedding + LanceDB + hybrid (RRF) + kebab-app wiring | `kebab-embed`, `kebab-embed-local`, `kebab-store-vector`, `kebab-search` | P2 | ✅ 완료 | +| **P4** | Local LLM + RAG + grounded answer | `kebab-llm`, `kebab-llm-local`, `kebab-rag` | P3 | ✅ 완료 | +| **P5** | Golden query / regression eval | `kebab-eval` | P4 | ⏳ 다음 | +| **P6** | 이미지 ingestion (OCR + caption) | `kebab-parse-image` | P5 | ⏳ | +| **P7** | PDF text + page citation | `kebab-parse-pdf` | P5 | ⏳ | +| **P8** | 음성 transcription + timestamp citation | `kebab-parse-audio` | P5 | ⏳ | +| **P9** | TUI + desktop app | `kebab-tui`, `kebab-desktop` | P5 | ⏳ | P0~P5 직렬. P6~P9 P5 이후 병렬 가능. @@ -100,13 +100,13 @@ P0~P5 직렬. P6~P9 P5 이후 병렬 가능. ## 디렉토리 구조 ```text -kb/ +kebab/ ├── README.md # 이 파일 -├── kb_local_rust_report.md # 최초 설계 보고서 (방향성 + 근거) +├── kebab_local_rust_report.md # 최초 설계 보고서 (방향성 + 근거) ├── docs/ │ ├── superpowers/ │ │ ├── specs/ -│ │ │ └── 2026-04-27-kb-final-form-design.md # frozen design (12 sections) +│ │ │ └── 2026-04-27-kebab-final-form-design.md # frozen design (12 sections) │ │ └── plans/ │ │ └── 2026-04-27-task-decomposition.md # task 분해 implementation plan │ ├── SMOKE.md # 로컬 워크스페이스에 직접 돌려보는 절차 @@ -127,19 +127,19 @@ kb/ │ ├── p8/p8-1, p8-2 # (2) │ └── p9/p9-1 … p9-5 # (5) ├── crates/ -│ ├── kb-core/ kb-parse-types/ kb-config/ # 도메인 + 설정 (P0) -│ ├── kb-source-fs/ # 워크스페이스 walk + checksum (P1-1) -│ ├── kb-parse-md/ # Markdown frontmatter + blocks (P1-2/3) -│ ├── kb-normalize/ # ParsedBlock → CanonicalDocument (P1-4) -│ ├── kb-chunk/ # heading-aware chunker (P1-5) -│ ├── kb-store-sqlite/ # SQLite + FTS5 (V001/V002/V003) (P1-6, P2-1, P3-3) -│ ├── kb-search/ # Lexical + Vector + Hybrid retriever (P2-2, P3-4) -│ ├── kb-embed/ kb-embed-local/ # Embedder trait + fastembed adapter (P3-1, P3-2) -│ ├── kb-store-vector/ # LanceDB VectorStore (P3-3) -│ ├── kb-llm/ kb-llm-local/ # LanguageModel trait + Ollama adapter (P4-1, P4-2) -│ ├── kb-rag/ # RAG pipeline (P4-3) -│ ├── kb-app/ # facade (P0 시그니처 + P3-5 본체) -│ └── kb-cli/ # binary (P0 → 핫픽스로 --config flag wiring 강화) +│ ├── kebab-core/ kebab-parse-types/ kebab-config/ # 도메인 + 설정 (P0) +│ ├── kebab-source-fs/ # 워크스페이스 walk + checksum (P1-1) +│ ├── kebab-parse-md/ # Markdown frontmatter + blocks (P1-2/3) +│ ├── kebab-normalize/ # ParsedBlock → CanonicalDocument (P1-4) +│ ├── kebab-chunk/ # heading-aware chunker (P1-5) +│ ├── kebab-store-sqlite/ # SQLite + FTS5 (V001/V002/V003) (P1-6, P2-1, P3-3) +│ ├── kebab-search/ # Lexical + Vector + Hybrid retriever (P2-2, P3-4) +│ ├── kebab-embed/ kebab-embed-local/ # Embedder trait + fastembed adapter (P3-1, P3-2) +│ ├── kebab-store-vector/ # LanceDB VectorStore (P3-3) +│ ├── kebab-llm/ kebab-llm-local/ # LanguageModel trait + Ollama adapter (P4-1, P4-2) +│ ├── kebab-rag/ # RAG pipeline (P4-3) +│ ├── kebab-app/ # facade (P0 시그니처 + P3-5 본체) +│ └── kebab-cli/ # binary (P0 → 핫픽스로 --config flag wiring 강화) ├── migrations/ # SQLite refinery V001/V002/V003 └── fixtures/ # 테스트 fixture 트리 ``` @@ -153,19 +153,19 @@ kb/ cargo build --release # 첫 실행 — XDG 경로에 config.toml 생성 -./target/release/kb init +./target/release/kebab init # config 손보고 -${EDITOR:-vi} ~/.config/kb/config.toml +${EDITOR:-vi} ~/.config/kebab/config.toml # 색인 -./target/release/kb ingest +./target/release/kebab ingest # 검색 -./target/release/kb search "Markdown chunking 규칙" --mode hybrid +./target/release/kebab search "Markdown chunking 규칙" --mode hybrid # 질문 (Ollama 필요) -./target/release/kb ask "내 KB 설계에서 저장소 전략은?" +./target/release/kebab ask "내 KB 설계에서 저장소 전략은?" ``` 워크스페이스를 격리해서 직접 돌려보는 패턴은 [docs/SMOKE.md](docs/SMOKE.md) 참조 — `--config ` 로 임시 디렉토리에 격리된 KB 를 만들 수 있다. @@ -181,17 +181,17 @@ ${EDITOR:-vi} ~/.config/kb/config.toml - multi-workspace (P+ 후순위) - LLM-as-judge eval (rule-based `must_contain` 만) - visual embedding (CLIP) — P+ -- desktop app `kb://` protocol handler — P+ +- desktop app `kebab://` protocol handler — P+ --- ## 외부 AI 통합 -`kb` 의 `--json` 모드 + frozen wire schema v1 은 외부 자동화의 stable contract. 가능한 통합: +`kebab` 의 `--json` 모드 + frozen wire schema v1 은 외부 자동화의 stable contract. 가능한 통합: -1. **Claude Code / Codex skill** — 얇은 wrapper (`kb search --json` / `kb ask --json` 호출). ~50 lines. -2. **MCP server** — `kb-mcp` binary (stdio JSON-RPC) 가 `kb-app` facade 를 1:1 노출. Claude Desktop / Cursor / Zed 등 공유. -3. **HTTP wrapper** — `kb serve --bind 127.0.0.1:7711` (P+, local-only 가치 깨므로 신중). +1. **Claude Code / Codex skill** — 얇은 wrapper (`kebab search --json` / `kebab ask --json` 호출). ~50 lines. +2. **MCP server** — `kebab-mcp` binary (stdio JSON-RPC) 가 `kebab-app` facade 를 1:1 노출. Claude Desktop / Cursor / Zed 등 공유. +3. **HTTP wrapper** — `kebab serve --bind 127.0.0.1:7711` (P+, local-only 가치 깨므로 신중). --- @@ -199,7 +199,7 @@ ${EDITOR:-vi} ~/.config/kb/config.toml 이 repo 는 단일 사용자 프로젝트지만 spec 변경 절차는 명문화되어 있다. -1. **frozen design 변경** — `docs/superpowers/specs/2026-04-27-kb-final-form-design.md` 가 단일 contract. 변경 시 영향 받는 component task 모두 동시 갱신 필요. PR 1개로 묶기. +1. **frozen design 변경** — `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` 가 단일 contract. 변경 시 영향 받는 component task 모두 동시 갱신 필요. PR 1개로 묶기. 2. **새 component task 추가** — `tasks/_template.md` 복사 후 `tasks/p/p--.md` 생성. `contract_sections` 에 design doc 섹션 명시. `Allowed/Forbidden dependencies` 는 design §8 module-boundary 표 따름. 3. **구현** — component task 1개당 sub-agent 1세션 권장. `cargo test -p ` + DoD 체크리스트 통과. PR 으로 머지. 4. **버전 변경** — `parser_version` / `chunker_version` / `embedding_version` 등 변경은 design §9 의 cascade rule 따름. 영향 받는 record 는 재처리 필요. @@ -215,8 +215,8 @@ ${EDITOR:-vi} ~/.config/kb/config.toml ## 참고 -- 최초 설계 보고서: [kb_local_rust_report.md](kb_local_rust_report.md) -- Frozen design: [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](docs/superpowers/specs/2026-04-27-kb-final-form-design.md) +- 최초 설계 보고서: [kebab_local_rust_report.md](kebab_local_rust_report.md) +- Frozen design: [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](docs/superpowers/specs/2026-04-27-kebab-final-form-design.md) - Task 분해 plan: [docs/superpowers/plans/2026-04-27-task-decomposition.md](docs/superpowers/plans/2026-04-27-task-decomposition.md) - Task 인덱스: [tasks/INDEX.md](tasks/INDEX.md) - Post-merge 핫픽스 로그: [tasks/HOTFIXES.md](tasks/HOTFIXES.md) diff --git a/docs/SMOKE.md b/docs/SMOKE.md index ddc9aca..e3cdf79 100644 --- a/docs/SMOKE.md +++ b/docs/SMOKE.md @@ -1,21 +1,21 @@ --- -title: "kb 스모크 실행 가이드" +title: "kebab 스모크 실행 가이드" date: 2026-05-01 --- -# kb 스모크 실행 가이드 +# kebab 스모크 실행 가이드 -P3-5 머지 후 (`kb-app::ingest` / `search` / `list` / `inspect` 와이어링) 부터, 그리고 P4-3 머지 후 (`kb ask` 와이어링) 부터 사용자가 자기 설치본을 직접 검증할 수 있다. 이 문서는 사용자 환경 (`~/.config/kb/`, `~/.local/share/kb/`) 을 건드리지 않고 임시 디렉토리에 격리된 KB 를 띄워 전체 파이프라인을 1세션 안에 한 번 돌리는 절차다. +P3-5 머지 후 (`kebab-app::ingest` / `search` / `list` / `inspect` 와이어링) 부터, 그리고 P4-3 머지 후 (`kebab ask` 와이어링) 부터 사용자가 자기 설치본을 직접 검증할 수 있다. 이 문서는 사용자 환경 (`~/.config/kebab/`, `~/.local/share/kebab/`) 을 건드리지 않고 임시 디렉토리에 격리된 KB 를 띄워 전체 파이프라인을 1세션 안에 한 번 돌리는 절차다. ## 준비 빌드: ```bash -cargo build --release -p kb-cli # debug 도 무방. 디버그가 더 빠르게 빌드됨. +cargo build --release -p kebab-cli # debug 도 무방. 디버그가 더 빠르게 빌드됨. ``` -원격 Ollama (선택, `kb ask` 만 필요): +원격 Ollama (선택, `kebab ask` 만 필요): ```bash # Mac 등 별도 호스트에서 @@ -34,8 +34,8 @@ curl http://:11434/api/tags ## 격리된 워크스페이스 생성 ```bash -mkdir -p /tmp/kb-smoke/{workspace,data} -cat > /tmp/kb-smoke/workspace/intro.md <<'EOF' +mkdir -p /tmp/kebab-smoke/{workspace,data} +cat > /tmp/kebab-smoke/workspace/intro.md <<'EOF' --- title: 인사말 tags: [demo] @@ -51,19 +51,19 @@ EOF ## 격리된 config -`/tmp/kb-smoke/config.toml`: +`/tmp/kebab-smoke/config.toml`: ```toml schema_version = 1 [workspace] -root = "/tmp/kb-smoke/workspace" +root = "/tmp/kebab-smoke/workspace" include = ["**/*.md"] exclude = [".git/**", "node_modules/**", ".obsidian/**"] [storage] -data_dir = "/tmp/kb-smoke/data" -sqlite = "{data_dir}/kb.sqlite" +data_dir = "/tmp/kebab-smoke/data" +sqlite = "{data_dir}/kebab.sqlite" vector_dir = "{data_dir}/lancedb" asset_dir = "{data_dir}/assets" artifact_dir = "{data_dir}/artifacts" @@ -110,12 +110,12 @@ explain_default = false max_context_tokens = 6000 ``` -`KB_*` 환경변수로 override 가능 (`KB_MODELS_LLM_MODEL=qwen2.5:32b kb …` 등). 자세한 키 목록은 `crates/kb-config/src/lib.rs` 의 `apply_env` 매치 암. +`KEBAB_*` 환경변수로 override 가능 (`KEBAB_MODELS_LLM_MODEL=qwen2.5:32b kebab …` 등). 자세한 키 목록은 `crates/kebab-config/src/lib.rs` 의 `apply_env` 매치 암. ## 명령 시퀀스 ```bash -KB() { ./target/debug/kb --config /tmp/kb-smoke/config.toml "$@"; } +KEBAB() { ./target/debug/kebab --config /tmp/kebab-smoke/config.toml "$@"; } KB doctor # 1. health check KB ingest # 2. 워크스페이스 색인 @@ -128,31 +128,31 @@ KB ask "이 KB 안에서 ..." --mode hybrid --k 5 # 8. RAG 답변 (Ollama KB --json ask "..." --mode hybrid # 9. 기계 친화 출력 검증 ``` -각 명령은 0 종료 코드면 정상. `kb ask` 는 거절 시 종료 코드 1 (`RefusalSignal`) — 의도된 동작. +각 명령은 0 종료 코드면 정상. `kebab ask` 는 거절 시 종료 코드 1 (`RefusalSignal`) — 의도된 동작. ## 검증 체크리스트 -- `kb doctor` 가 `--config` path 를 honor 하고 그 안의 `storage.data_dir` 를 출력 (XDG default 가 아님). -- `kb ingest` idempotent — 두 번째 실행이 `new=0 updated=N`. -- `kb list docs` 출력에 frontmatter 의 `title` 이 아닌 deterministic `doc_id` (32-hex) + `workspace_path` 가 보임. -- `kb search --mode hybrid` 의 `fusion_score` 가 `[0, 1]` 범위 (top-1 종종 1.0 — 두 retriever 모두 rank 1 일 때). -- `kb ask` JSON 응답에 `model.id` 가 config 의 모델 (`gemma4:26b` 등) 과 일치, `embedding.id = multilingual-e5-small`, `citations[].marker` 가 `[1]` / `[2]` 형식 (square-bracketed bare index). -- 코퍼스에 없는 주제로 `kb ask` → `refusal_reason: "llm_self_judge"` (또는 `no_chunks` / `score_gate`) + `grounded: false`. +- `kebab doctor` 가 `--config` path 를 honor 하고 그 안의 `storage.data_dir` 를 출력 (XDG default 가 아님). +- `kebab ingest` idempotent — 두 번째 실행이 `new=0 updated=N`. +- `kebab list docs` 출력에 frontmatter 의 `title` 이 아닌 deterministic `doc_id` (32-hex) + `workspace_path` 가 보임. +- `kebab search --mode hybrid` 의 `fusion_score` 가 `[0, 1]` 범위 (top-1 종종 1.0 — 두 retriever 모두 rank 1 일 때). +- `kebab ask` JSON 응답에 `model.id` 가 config 의 모델 (`gemma4:26b` 등) 과 일치, `embedding.id = multilingual-e5-small`, `citations[].marker` 가 `[1]` / `[2]` 형식 (square-bracketed bare index). +- 코퍼스에 없는 주제로 `kebab ask` → `refusal_reason: "llm_self_judge"` (또는 `no_chunks` / `score_gate`) + `grounded: false`. ## 정리 ```bash -rm -rf /tmp/kb-smoke/data # 데이터만 날리고 다시 ingest 가능 -rm -rf /tmp/kb-smoke # 통째로 정리 +rm -rf /tmp/kebab-smoke/data # 데이터만 날리고 다시 ingest 가능 +rm -rf /tmp/kebab-smoke # 통째로 정리 ``` -`~/.config/kb/` 와 `~/.local/share/kb/` 는 한 번도 터치되지 않는다 (`--config` flag 가 정확히 honor 되는 경우 — P3-5 hotfix 이후 보장). +`~/.config/kebab/` 와 `~/.local/share/kebab/` 는 한 번도 터치되지 않는다 (`--config` flag 가 정확히 honor 되는 경우 — P3-5 hotfix 이후 보장). ## 알려진 동작 -- 첫 `kb ingest` 시 fastembed 모델 다운로드 (~470MB) — `data_dir/models/fastembed/` 에 캐시. -- `kb ask` 응답 시간 = LLM 토큰 throughput 에 종속. M4 Pro 48GB + gemma4:26b 기준 답변 50–100 토큰에 20–55초. -- `--config` path 가 존재하지 않거나 malformed 면 `kb doctor` 가 hard fail (defaults 가 silently mask 하지 않게 하는 hotfix 동작). +- 첫 `kebab ingest` 시 fastembed 모델 다운로드 (~470MB) — `data_dir/models/fastembed/` 에 캐시. +- `kebab ask` 응답 시간 = LLM 토큰 throughput 에 종속. M4 Pro 48GB + gemma4:26b 기준 답변 50–100 토큰에 20–55초. +- `--config` path 가 존재하지 않거나 malformed 면 `kebab doctor` 가 hard fail (defaults 가 silently mask 하지 않게 하는 hotfix 동작). - 매 CLI invocation 마다 fastembed 모델 init 비용 (~4초) — process-level 캐시 부재 때문. P9 TUI 진입 시 `App` 의 `OnceLock` 으로 세션 동안 한 번만 init. 자세한 history 와 발견된 버그는 [tasks/HOTFIXES.md](../tasks/HOTFIXES.md) 참조. diff --git a/docs/spec/ai-generation-guidelines.md b/docs/spec/ai-generation-guidelines.md index b0040fd..58827ad 100644 --- a/docs/spec/ai-generation-guidelines.md +++ b/docs/spec/ai-generation-guidelines.md @@ -5,8 +5,8 @@ When implementing tasks against this codebase: - Treat the frozen design doc as the single source of truth. Do not invent new fields, traits, or enum variants. - Prefer editing existing files to creating new ones; reuse types from - `kb-core` instead of duplicating shapes. + `kebab-core` instead of duplicating shapes. - For each task, follow the task spec under `tasks/p/p-.md`. Canonical source: -[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §11 + §12. +[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §11 + §12. diff --git a/docs/spec/canonical-document.md b/docs/spec/canonical-document.md index 0926fc2..afa67ee 100644 --- a/docs/spec/canonical-document.md +++ b/docs/spec/canonical-document.md @@ -4,4 +4,4 @@ Medium-agnostic representation of a document with `Block`s, `SourceSpan`s, and provenance. Canonical source: -[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3.4 + §3.7a. +[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3.4 + §3.7a. diff --git a/docs/spec/chunk-policy.md b/docs/spec/chunk-policy.md index fe204fc..9a2fae0 100644 --- a/docs/spec/chunk-policy.md +++ b/docs/spec/chunk-policy.md @@ -5,4 +5,4 @@ `policy_hash` so chunk IDs include the policy. Canonical source: -[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3.5 + §7.1 + §7.2. +[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3.5 + §7.1 + §7.2. diff --git a/docs/spec/citation-policy.md b/docs/spec/citation-policy.md index 3ef0f82..6273aab 100644 --- a/docs/spec/citation-policy.md +++ b/docs/spec/citation-policy.md @@ -4,4 +4,4 @@ Citations use W3C Media Fragments URIs to locate evidence inside a document. Five variants: `Line`, `Page`, `Region`, `Caption`, `Time`. Canonical source: -[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3.5 + §0 Q3. +[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3.5 + §0 Q3. diff --git a/docs/spec/domain-model.md b/docs/spec/domain-model.md index 98ef0ec..ae342d5 100644 --- a/docs/spec/domain-model.md +++ b/docs/spec/domain-model.md @@ -1,6 +1,6 @@ # Domain model -The domain types live in `kb-core` and mirror the frozen design exactly. +The domain types live in `kebab-core` and mirror the frozen design exactly. Canonical source: -[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §3. +[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §3. diff --git a/docs/spec/ids.md b/docs/spec/ids.md index 19bbc15..607f934 100644 --- a/docs/spec/ids.md +++ b/docs/spec/ids.md @@ -1,6 +1,6 @@ # ID recipe -All `kb-*` IDs are 32 hex chars: the first 32 of `blake3(canonical_json(tuple))`. +All `kebab-*` IDs are 32 hex chars: the first 32 of `blake3(canonical_json(tuple))`. Canonical source: -[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §4. +[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §4. diff --git a/docs/spec/module-boundaries.md b/docs/spec/module-boundaries.md index 7225608..dbcc7f4 100644 --- a/docs/spec/module-boundaries.md +++ b/docs/spec/module-boundaries.md @@ -1,8 +1,8 @@ # Module boundaries -`kb-core` is leaf — every other crate depends on it. Parsers depend on -`kb-parse-types` (not on `kb-normalize`); `kb-normalize` depends on -`kb-parse-types` (not on parsers). UI crates depend only on `kb-app`. +`kebab-core` is leaf — every other crate depends on it. Parsers depend on +`kebab-parse-types` (not on `kebab-normalize`); `kebab-normalize` depends on +`kebab-parse-types` (not on parsers). UI crates depend only on `kebab-app`. Canonical source: -[docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../superpowers/specs/2026-04-27-kb-final-form-design.md), §8. +[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../superpowers/specs/2026-04-27-kebab-final-form-design.md), §8. diff --git a/docs/superpowers/plans/2026-04-27-task-decomposition.md b/docs/superpowers/plans/2026-04-27-task-decomposition.md index 58e6499..3d417ba 100644 --- a/docs/superpowers/plans/2026-04-27-task-decomposition.md +++ b/docs/superpowers/plans/2026-04-27-task-decomposition.md @@ -8,11 +8,11 @@ - Phase A — Author the canonical task spec template (`tasks/_template.md`). - Phase B — Decompose P1 (Markdown ingestion) into 6 component task specs to validate the template. - Phase C — After Phase B passes review, decompose P0 + P2..P9 into the remaining ~24 task specs in one pass. -- Each task spec cites only the frozen design doc (`docs/superpowers/specs/2026-04-27-kb-final-form-design.md`) for types, traits, schema, layout. No new domain types or traits are introduced inside task specs. +- Each task spec cites only the frozen design doc (`docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`) for types, traits, schema, layout. No new domain types or traits are introduced inside task specs. **Tech Stack:** Plain Markdown documents under `tasks/`. No code changes in this plan — produces specs that AI sub-agents will later implement against. -**Frozen contract source:** [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../specs/2026-04-27-kb-final-form-design.md). All task specs reference this. Modifications to the contract require updating that file first, then re-checking dependent task specs. +**Frozen contract source:** [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../specs/2026-04-27-kebab-final-form-design.md). All task specs reference this. Modifications to the contract require updating that file first, then re-checking dependent task specs. **Phase task index (target file layout):** @@ -83,7 +83,7 @@ Existing per-phase epic files (`tasks/phase-0-skeleton.md` … `phase-9-ui.md`) - [ ] **Step 1: Verify the design doc path resolves** ```bash -test -f docs/superpowers/specs/2026-04-27-kb-final-form-design.md && echo OK +test -f docs/superpowers/specs/2026-04-27-kebab-final-form-design.md && echo OK ``` Expected: `OK` @@ -101,7 +101,7 @@ title: "" status: planned depends_on: [] # other task_ids unblocks: [] # other task_ids -contract_source: ../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [] # e.g. [§3.5, §5.5, §7.2] --- @@ -117,7 +117,7 @@ contract_sections: [] # e.g. [§3.5, §5.5, §7.2] ## Allowed dependencies -- `kb-core` +- `kebab-core` - - @@ -166,7 +166,7 @@ If any item here is needed during implementation, STOP and update the frozen des | unit | ... | ... | | snapshot | ... (JSON freeze) | `fixtures/...` | | contract | trait round-trip | mock impls | -| integration | end-to-end via `kb-app` facade | tmp workspace | +| integration | end-to-end via `kebab-app` facade | tmp workspace | All tests must run under `cargo test -p ` and not require external network or Ollama unless explicitly stated. @@ -247,7 +247,7 @@ git add tasks/p1 tasks/INDEX.md git commit -m "tasks: prepare P1 component decomposition skeleton" ``` -### Task B1: `p1-1-source-fs.md` (kb-source-fs) +### Task B1: `p1-1-source-fs.md` (kebab-source-fs) **Files:** - Create: `tasks/p1/p1-1-source-fs.md` @@ -259,13 +259,13 @@ Write the following content to `tasks/p1/p1-1-source-fs.md`: ````markdown --- phase: P1 -component: kb-source-fs +component: kebab-source-fs task_id: p1-1 title: "Local filesystem source connector" status: planned depends_on: [p0-1] unblocks: [p1-2, p1-3, p1-4, p1-5, p1-6] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.3, §6.2, §6.6, §7.1, §7.2 SourceConnector, §8] --- @@ -281,8 +281,8 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums ## Allowed dependencies -- `kb-core` -- `kb-config` +- `kebab-core` +- `kebab-config` - `ignore` (gitignore semantics) - `blake3` - `walkdir` @@ -293,21 +293,21 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums ## Forbidden dependencies -- `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `SourceScope` | `kb_core::SourceScope` | `kb-app` from config | +| `SourceScope` | `kebab_core::SourceScope` | `kebab-app` from config | | filesystem | `&Path` | OS | -| `.kbignore` | text file | workspace root, optional | +| `.kebabignore` | text file | workspace root, optional | ## Outputs | output | type | downstream consumer | |--------|------|---------------------| -| `Vec` | `kb_core::RawAsset` | `kb-parse-md`, asset writer in `kb-store-sqlite` (via `kb-app`) | +| `Vec` | `kebab_core::RawAsset` | `kebab-parse-md`, asset writer in `kebab-store-sqlite` (via `kebab-app`) | ## Public surface (signatures only — no new types) @@ -315,11 +315,11 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums pub struct FsSourceConnector { /* internal */ } impl FsSourceConnector { - pub fn new(config: &kb_config::Config) -> anyhow::Result; + pub fn new(config: &kebab_config::Config) -> anyhow::Result; } -impl kb_core::SourceConnector for FsSourceConnector { - fn scan(&self, scope: &kb_core::SourceScope) -> anyhow::Result>; +impl kebab_core::SourceConnector for FsSourceConnector { + fn scan(&self, scope: &kebab_core::SourceScope) -> anyhow::Result>; } ``` @@ -329,7 +329,7 @@ impl kb_core::SourceConnector for FsSourceConnector { - `asset_id` derived per design §4.2 from `blake3(raw bytes)` full hex. - `media_type` selected from extension + libmagic-like sniff fallback (`.md` → Markdown, others fall through to `MediaType::Other`). - `discovered_at` = current `OffsetDateTime::now_utc()` at scan time. -- Combine `config.workspace.exclude` ∪ `.kbignore` for filter (union; ordering does not matter). +- Combine `config.workspace.exclude` ∪ `.kebabignore` for filter (union; ordering does not matter). - Symbolic links: follow once, detect cycles via `canonicalize` + visited set. - Files larger than `storage.copy_threshold_mb` MB → emit `AssetStorage::Reference { path, sha }` (do not copy bytes here; copying is done by the asset writer task). - Idempotent: same input → same `Vec` (sort by `workspace_path`). @@ -337,7 +337,7 @@ impl kb_core::SourceConnector for FsSourceConnector { ## Storage / wire effects - Reads: filesystem under `config.workspace.root`. -- Writes: nothing. (Asset copy is handled by the asset writer in `kb-store-sqlite`.) +- Writes: nothing. (Asset copy is handled by the asset writer in `kebab-store-sqlite`.) ## Test plan @@ -346,25 +346,25 @@ impl kb_core::SourceConnector for FsSourceConnector { | unit | POSIX path normalization | inline cases incl. `./a/b.md`, `a//b.md`, `a/b.md` → identical | | unit | blake3 of known bytes matches expected hex | inline | | unit | gitignore filter (`*.tmp`, `node_modules/**`) excludes correctly | tmp tree built in test | -| unit | `.kbignore` ∪ config exclude works | tmp tree | +| unit | `.kebabignore` ∪ config exclude works | tmp tree | | unit | symlink cycle does not loop | tmp tree with `a -> b -> a` | | snapshot | `Vec` serialized JSON for fixture tree is stable | `fixtures/source-fs/tree-1` | | determinism | re-running scan twice produces byte-identical JSON | `fixtures/source-fs/tree-1` | -All tests run under `cargo test -p kb-source-fs` with no network and no model. +All tests run under `cargo test -p kebab-source-fs` with no network and no model. ## Definition of Done -- [ ] `cargo check -p kb-source-fs` passes -- [ ] `cargo test -p kb-source-fs` passes +- [ ] `cargo check -p kebab-source-fs` passes +- [ ] `cargo test -p kebab-source-fs` passes - [ ] Snapshot test `fixtures/source-fs/tree-1` round-trips deterministically -- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kb-source-fs`) +- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kebab-source-fs`) - [ ] PR description links to design §3.3, §6.2, §7.2 ## Out of scope - File watching (P+). -- Asset copy/reference storage on disk (`kb-store-sqlite` task p1-6). +- Asset copy/reference storage on disk (`kebab-store-sqlite` task p1-6). - Non-fs source connectors (HTTP, S3 — P+). ## Risks / notes @@ -400,13 +400,13 @@ Write to `tasks/p1/p1-2-parse-md-frontmatter.md`: ````markdown --- phase: P1 -component: kb-parse-md (frontmatter submodule) +component: kebab-parse-md (frontmatter submodule) task_id: p1-2 title: "Markdown frontmatter parsing → Metadata" status: planned depends_on: [p0-1] unblocks: [p1-4] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.6 Metadata, §0 Q9 frontmatter, §10 errors] --- @@ -414,15 +414,15 @@ contract_sections: [§3.6 Metadata, §0 Q9 frontmatter, §10 errors] ## Goal -Parse YAML/TOML frontmatter from Markdown bytes into `kb_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`. +Parse YAML/TOML frontmatter from Markdown bytes into `kebab_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`. ## Why now / why this size -Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kb-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9. +Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kebab-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9. ## Allowed dependencies -- `kb-core` +- `kebab-core` - `serde` - `serde_yaml` (or `yaml-rust2`) for YAML - `toml` for TOML @@ -432,7 +432,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from ## Forbidden dependencies -- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-tui`, `kb-desktop`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `pulldown-cmark` (block parser is a sibling task) +- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-tui`, `kebab-desktop`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `pulldown-cmark` (block parser is a sibling task) ## Inputs @@ -445,7 +445,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from | output | type | downstream | |--------|------|------------| -| `(Metadata, Option, Vec)` | tuple | `kb-normalize` → CanonicalDocument | +| `(Metadata, Option, Vec)` | tuple | `kebab-normalize` → CanonicalDocument | ## Public surface (signatures only — no new types) @@ -453,7 +453,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from pub fn parse_frontmatter( bytes: &[u8], hints: &BodyHints, -) -> anyhow::Result<(kb_core::Metadata, Option, Vec)>; +) -> anyhow::Result<(kebab_core::Metadata, Option, Vec)>; ``` `FrontmatterSpan` and `Warning` are crate-internal helpers; if any new public type is needed, STOP and update the frozen design doc first. @@ -490,12 +490,12 @@ pub fn parse_frontmatter( | snapshot | `fixtures/markdown/frontmatter-only.md` produces stable JSON | fixture | | snapshot | mixed-language body with no `lang:` detects `ko` or `en` | `fixtures/markdown/mixed-lang.md` | -All tests under `cargo test -p kb-parse-md --lib frontmatter`. +All tests under `cargo test -p kebab-parse-md --lib frontmatter`. ## Definition of Done -- [ ] `cargo check -p kb-parse-md` passes -- [ ] `cargo test -p kb-parse-md frontmatter` passes +- [ ] `cargo check -p kebab-parse-md` passes +- [ ] `cargo test -p kebab-parse-md frontmatter` passes - [ ] No `pulldown-cmark` import in this submodule - [ ] Snapshot tests stable across two consecutive runs - [ ] PR links design §0 Q9, §3.6 @@ -534,13 +534,13 @@ Write to `tasks/p1/p1-3-parse-md-blocks.md`: ````markdown --- phase: P1 -component: kb-parse-md (blocks submodule) +component: kebab-parse-md (blocks submodule) task_id: p1-3 title: "Markdown body → Block tree with line spans" status: planned depends_on: [p0-1] unblocks: [p1-4] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.4 Block, §3.4 SourceSpan, §0 Q3 citation] --- @@ -548,7 +548,7 @@ contract_sections: [§3.4 Block, §3.4 SourceSpan, §0 Q3 citation] ## Goal -Parse Markdown body bytes into a flat `Vec` (intermediate, crate-private) with heading paths and line ranges preserved, ready for `kb-normalize` to lift into `CanonicalDocument`. +Parse Markdown body bytes into a flat `Vec` (intermediate, crate-private) with heading paths and line ranges preserved, ready for `kebab-normalize` to lift into `CanonicalDocument`. ## Why now / why this size @@ -556,14 +556,14 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from ## Allowed dependencies -- `kb-core` +- `kebab-core` - `pulldown-cmark` (CommonMark with source-map; GFM tables enabled via feature) - `serde` - `thiserror` ## Forbidden dependencies -- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `kb-tui`, `kb-desktop`, `comrak` (alternative parser; pick one) +- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `kebab-tui`, `kebab-desktop`, `comrak` (alternative parser; pick one) ## Inputs @@ -576,7 +576,7 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from | output | type | downstream | |--------|------|------------| -| `Vec` (intermediate type, crate-private) | – | `kb-normalize` | +| `Vec` (intermediate type, crate-private) | – | `kebab-normalize` | | `Vec` | – | propagated into Provenance | ## Public surface (signatures only — no new types) @@ -585,7 +585,7 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec, Vec)>; ``` -`ParsedBlock` is a crate-internal mirror that maps 1:1 to `kb_core::Block` variants once `kb-normalize` assigns `BlockId`s. +`ParsedBlock` is a crate-internal mirror that maps 1:1 to `kebab_core::Block` variants once `kebab-normalize` assigns `BlockId`s. ## Behavior contract @@ -593,7 +593,7 @@ pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec< - Heading tree: every block records its ancestor heading texts in order (e.g., `["아키텍처", "Chunking 정책"]`). - Code blocks: language tag preserved (` ```rust ` → `Some("rust")`), fenced content not split. - Tables: GFM tables produce `TableBlock` with header row + body rows; if a table cell is malformed, fall back to a `Paragraph` block + warning. -- Image references: `![alt](src)` produces `ImageRefBlock` with `asset_id = None`, `src = "..."`, `alt = "..."`. Resolution to `AssetId` happens later in `kb-normalize`. +- Image references: `![alt](src)` produces `ImageRefBlock` with `asset_id = None`, `src = "..."`, `alt = "..."`. Resolution to `AssetId` happens later in `kebab-normalize`. - Lists: ordered/unordered preserved; nested list items flattened into one `ListBlock` with each top-level item's text. - Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per design §3.4). Drop other inlines silently. - Malformed input never panics. Worst case: empty `Vec` + warning. @@ -616,12 +616,12 @@ pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec< | snapshot | `fixtures/markdown/nested-headings.md` → ParsedBlock JSON stable | fixture | | snapshot | `fixtures/markdown/code-and-table.md` → JSON stable | fixture | -All tests under `cargo test -p kb-parse-md --lib blocks`. +All tests under `cargo test -p kebab-parse-md --lib blocks`. ## Definition of Done -- [ ] `cargo check -p kb-parse-md` passes -- [ ] `cargo test -p kb-parse-md blocks` passes +- [ ] `cargo check -p kebab-parse-md` passes +- [ ] `cargo test -p kebab-parse-md blocks` passes - [ ] Snapshot tests stable across two runs - [ ] No imports outside Allowed dependencies - [ ] PR links design §3.4 @@ -629,7 +629,7 @@ All tests under `cargo test -p kb-parse-md --lib blocks`. ## Out of scope - Frontmatter (p1-2). -- Lifting `ParsedBlock` → `kb_core::Block` with `BlockId` (p1-4). +- Lifting `ParsedBlock` → `kebab_core::Block` with `BlockId` (p1-4). - Chunking (p1-5). ## Risks / notes @@ -658,13 +658,13 @@ Write to `tasks/p1/p1-4-normalize.md`: ````markdown --- phase: P1 -component: kb-normalize +component: kebab-normalize task_id: p1-4 title: "Lift parser output → CanonicalDocument with deterministic IDs" status: planned depends_on: [p1-2, p1-3] unblocks: [p1-5, p1-6] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.4, §4 ID recipe, §3.6 Provenance] --- @@ -676,12 +676,12 @@ Combine `Metadata` (p1-2) + `Vec` (p1-3) + `RawAsset` (p1-1) into a ## Why now / why this size -Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate. +Single responsibility: ID generation + struct assembly. Keeps `kebab-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate. ## Allowed dependencies -- `kb-core` -- `kb-config` +- `kebab-core` +- `kebab-config` - `serde` - `serde-json-canonicalizer` (canonical JSON for ID hashing) - `blake3` @@ -691,38 +691,38 @@ Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` pure ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md` (consumed via plain types only — must not couple back), `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-source-fs`, `kebab-parse-md` (consumed via plain types only — must not couple back), `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` -Note: this crate accepts `ParsedBlock` from `kb-parse-md` either by (a) exposing `ParsedBlock` as a `kb-core` type, or (b) `kb-parse-md` re-exporting via a public DTO. Pick (a): move `ParsedBlock` into `kb-core` so this task does not import `kb-parse-md`. +Note: this crate accepts `ParsedBlock` from `kebab-parse-md` either by (a) exposing `ParsedBlock` as a `kebab-core` type, or (b) `kebab-parse-md` re-exporting via a public DTO. Pick (a): move `ParsedBlock` into `kebab-core` so this task does not import `kebab-parse-md`. ## Inputs | input | type | source | |-------|------|--------| -| `RawAsset` | `kb_core::RawAsset` | p1-1 | +| `RawAsset` | `kebab_core::RawAsset` | p1-1 | | `Metadata` + frontmatter span + warnings | from p1-2 | parser caller | | `Vec` + warnings | from p1-3 | parser caller | -| `parser_version` | `kb_core::ParserVersion` | constant in `kb-parse-md` | +| `parser_version` | `kebab_core::ParserVersion` | constant in `kebab-parse-md` | ## Outputs | output | type | downstream | |--------|------|------------| -| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk`, `kb-store-sqlite` | +| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk`, `kebab-store-sqlite` | ## Public surface (signatures only — no new types) ```rust pub fn build_canonical_document( - asset: &kb_core::RawAsset, - metadata: kb_core::Metadata, - blocks: Vec, - parser_version: &kb_core::ParserVersion, + asset: &kebab_core::RawAsset, + metadata: kebab_core::Metadata, + blocks: Vec, + parser_version: &kebab_core::ParserVersion, warnings: Vec, -) -> anyhow::Result; +) -> anyhow::Result; -pub fn id_for_doc(workspace_path: &kb_core::WorkspacePath, asset: &kb_core::AssetId, parser_version: &kb_core::ParserVersion) -> kb_core::DocumentId; -pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kb_core::SourceSpan) -> kb_core::BlockId; +pub fn id_for_doc(workspace_path: &kebab_core::WorkspacePath, asset: &kebab_core::AssetId, parser_version: &kebab_core::ParserVersion) -> kebab_core::DocumentId; +pub fn id_for_block(doc: &kebab_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kebab_core::SourceSpan) -> kebab_core::BlockId; ``` ## Behavior contract @@ -750,14 +750,14 @@ pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[Strin | unit | provenance contains Discovered/Parsed/Normalized in order | inline | | snapshot | `fixtures/markdown/code-and-table.md` → CanonicalDocument JSON stable (incl. all IDs) | fixture | -All tests under `cargo test -p kb-normalize`. +All tests under `cargo test -p kebab-normalize`. ## Definition of Done -- [ ] `cargo check -p kb-normalize` passes -- [ ] `cargo test -p kb-normalize` passes +- [ ] `cargo check -p kebab-normalize` passes +- [ ] `cargo test -p kebab-normalize` passes - [ ] Determinism test runs ≥ 1000 iterations under 1 second -- [ ] No `kb-parse-md` import (consumed via `kb-core::ParsedBlock`) +- [ ] No `kebab-parse-md` import (consumed via `kebab-core::ParsedBlock`) - [ ] PR links design §4.2, §4.3 ## Out of scope @@ -791,13 +791,13 @@ Write to `tasks/p1/p1-5-chunk.md`: ````markdown --- phase: P1 -component: kb-chunk +component: kebab-chunk task_id: p1-5 title: "Markdown heading-aware chunker (md-heading-v1)" status: planned depends_on: [p1-4] unblocks: [p1-6, p2-2, p3-2] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §7.2 Chunker, §0 Q3 citation] --- @@ -813,8 +813,8 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun ## Allowed dependencies -- `kb-core` -- `kb-config` +- `kebab-core` +- `kebab-config` - `serde` - `blake3` (policy_hash) - `serde-json-canonicalizer` @@ -822,30 +822,30 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-normalize` (consumes `CanonicalDocument` only via `kb-core`), `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize` (consumes `CanonicalDocument` only via `kebab-core`), `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 | -| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` from config | +| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 | +| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` from config | ## Outputs | output | type | downstream | |--------|------|------------| -| `Vec` | `kb_core::Chunk` | `kb-store-sqlite` (p1-6), `kb-embed*` (P3) | +| `Vec` | `kebab_core::Chunk` | `kebab-store-sqlite` (p1-6), `kebab-embed*` (P3) | ## Public surface (signatures only — no new types) ```rust pub struct MdHeadingV1Chunker; -impl kb_core::Chunker for MdHeadingV1Chunker { - fn chunker_version(&self) -> kb_core::ChunkerVersion; - fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String; - fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result>; +impl kebab_core::Chunker for MdHeadingV1Chunker { + fn chunker_version(&self) -> kebab_core::ChunkerVersion; + fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String; + fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result>; } ``` @@ -882,12 +882,12 @@ impl kb_core::Chunker for MdHeadingV1Chunker { | determinism | identical input + identical policy → identical chunk_ids | inline | | snapshot | `fixtures/markdown/long-section.md` → Vec JSON stable | fixture | -All tests under `cargo test -p kb-chunk`. +All tests under `cargo test -p kebab-chunk`. ## Definition of Done -- [ ] `cargo check -p kb-chunk` passes -- [ ] `cargo test -p kb-chunk` passes +- [ ] `cargo check -p kebab-chunk` passes +- [ ] `cargo test -p kebab-chunk` passes - [ ] Snapshot stable across two runs - [ ] No imports outside Allowed dependencies - [ ] PR links design §3.5, §4.2 @@ -924,13 +924,13 @@ Write to `tasks/p1/p1-6-store-sqlite.md`: ````markdown --- phase: P1 -component: kb-store-sqlite (P1 subset) +component: kebab-store-sqlite (P1 subset) task_id: p1-6 title: "SQLite store: assets/documents/blocks/chunks + asset writer + migrations" status: planned depends_on: [p1-1, p1-4, p1-5] unblocks: [p2-1, p3-3, p4-3] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§5 DDL (5.1, 5.2, 5.3, 5.4, 5.5 chunks only — FTS handled in p2-1), §5.7 jobs/ingest_runs, §5.8 transactions, §6.3 data_dir layout] --- @@ -946,8 +946,8 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT ## Allowed dependencies -- `kb-core` -- `kb-config` +- `kebab-core` +- `kebab-config` - `rusqlite` (with `bundled-sqlcipher` disabled; use `bundled` feature) - `refinery` for migrations - `serde_json` @@ -958,7 +958,7 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT ## Forbidden dependencies -- `kb-source-fs` (only types via `kb-core`), `kb-parse-md`, `kb-normalize`, `kb-chunk` (only types via `kb-core`), `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-source-fs` (only types via `kebab-core`), `kebab-parse-md`, `kebab-normalize`, `kebab-chunk` (only types via `kebab-core`), `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs @@ -966,17 +966,17 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT |-------|------|--------| | migrations | `migrations/V001__init.sql` | repo | | `RawAsset` + bytes | `(RawAsset, Vec)` | p1-1 + reader | -| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 | -| `Vec` | `kb_core::Chunk` | p1-5 | -| `IngestRun` aggregates | `(scope, counts, duration)` | `kb-app` | +| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 | +| `Vec` | `kebab_core::Chunk` | p1-5 | +| `IngestRun` aggregates | `(scope, counts, duration)` | `kebab-app` | ## Outputs | output | type | downstream | |--------|------|------------| -| `data_dir/kb.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | – | every later phase | +| `data_dir/kebab.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | – | every later phase | | `data_dir/assets//` bytes (when copied) | – | future re-extraction, integrity verification | -| `IngestReport` (wire schema v1) | `kb_core::IngestReport` | `kb-cli`, eval | +| `IngestReport` (wire schema v1) | `kebab_core::IngestReport` | `kebab-cli`, eval | ## Public surface (signatures only — no new types) @@ -984,23 +984,23 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT pub struct SqliteStore { /* internal */ } impl SqliteStore { - pub fn open(config: &kb_config::Config) -> anyhow::Result; + pub fn open(config: &kebab_config::Config) -> anyhow::Result; pub fn run_migrations(&self) -> anyhow::Result<()>; - pub fn put_asset_with_bytes(&self, asset: &kb_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>; + pub fn put_asset_with_bytes(&self, asset: &kebab_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>; } -impl kb_core::DocumentStore for SqliteStore { - fn put_asset(&self, a: &kb_core::RawAsset) -> anyhow::Result<()>; - fn put_document(&self, d: &kb_core::CanonicalDocument) -> anyhow::Result<()>; - fn put_blocks(&self, doc: &kb_core::DocumentId, blocks: &[kb_core::Block]) -> anyhow::Result<()>; - fn put_chunks(&self, doc: &kb_core::DocumentId, chunks: &[kb_core::Chunk]) -> anyhow::Result<()>; - fn get_document(&self, id: &kb_core::DocumentId) -> anyhow::Result>; - fn get_chunk(&self, id: &kb_core::ChunkId) -> anyhow::Result>; - fn list_documents(&self, filter: &kb_core::DocFilter) -> anyhow::Result>; +impl kebab_core::DocumentStore for SqliteStore { + fn put_asset(&self, a: &kebab_core::RawAsset) -> anyhow::Result<()>; + fn put_document(&self, d: &kebab_core::CanonicalDocument) -> anyhow::Result<()>; + fn put_blocks(&self, doc: &kebab_core::DocumentId, blocks: &[kebab_core::Block]) -> anyhow::Result<()>; + fn put_chunks(&self, doc: &kebab_core::DocumentId, chunks: &[kebab_core::Chunk]) -> anyhow::Result<()>; + fn get_document(&self, id: &kebab_core::DocumentId) -> anyhow::Result>; + fn get_chunk(&self, id: &kebab_core::ChunkId) -> anyhow::Result>; + fn list_documents(&self, filter: &kebab_core::DocFilter) -> anyhow::Result>; } -impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ } +impl kebab_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ } ``` ## Behavior contract @@ -1020,7 +1020,7 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ } ## Storage / wire effects -- Writes: `kb.sqlite` (multiple tables), `data_dir/assets//` (copied case). +- Writes: `kebab.sqlite` (multiple tables), `data_dir/assets//` (copied case). - Reads on subsequent calls: same DB. ## Test plan @@ -1036,14 +1036,14 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ } | contract | DocumentStore trait round-trip for fixture document | `fixtures/markdown/code-and-table.md` | | snapshot | IngestReport JSON for fixture run | fixture | -All tests under `cargo test -p kb-store-sqlite` with no network. +All tests under `cargo test -p kebab-store-sqlite` with no network. ## Definition of Done -- [ ] `cargo check -p kb-store-sqlite` passes -- [ ] `cargo test -p kb-store-sqlite` passes +- [ ] `cargo check -p kebab-store-sqlite` passes +- [ ] `cargo test -p kebab-store-sqlite` passes - [ ] migration `V001__init.sql` matches design §5 verbatim (diff-checked in CI) -- [ ] Writes to `~/.local/share/kb/` are gated by `kb-config`'s `data_dir` and never escape it +- [ ] Writes to `~/.local/share/kebab/` are gated by `kebab-config`'s `data_dir` and never escape it - [ ] No imports outside Allowed dependencies - [ ] PR links design §5 @@ -1056,7 +1056,7 @@ All tests under `cargo test -p kb-store-sqlite` with no network. ## Risks / notes -- WAL mode requires careful test cleanup: tests must drop the connection before removing `kb.sqlite-wal` / `-shm`. +- WAL mode requires careful test cleanup: tests must drop the connection before removing `kebab.sqlite-wal` / `-shm`. - Asset directory shard prefix uses `asset_id[..2]`; using `asset_id[..1]` would create at most 16 dirs (insufficient). ```` @@ -1074,7 +1074,7 @@ git commit -m "tasks: add p1-6 store-sqlite component spec" ```bash ls tasks/p1/ | sort -for f in tasks/p1/p1-*.md; do grep -q '2026-04-27-kb-final-form-design.md' "$f" || echo "MISSING REF in $f"; done +for f in tasks/p1/p1-*.md; do grep -q '2026-04-27-kebab-final-form-design.md' "$f" || echo "MISSING REF in $f"; done echo done ``` @@ -1106,9 +1106,9 @@ For each component task below, the steps are: (1) write file, (2) verify, (3) co **Files:** Create: `tasks/p0/p0-1-skeleton.md`. Also `mkdir -p tasks/p0`. -`contract_sections`: §3 (all subsections), §4, §5 (migrations meta only), §6, §7, §8, §10. `Allowed`: workspace + `kb-core` + `kb-config` + `kb-app` + `kb-cli` only. +`contract_sections`: §3 (all subsections), §4, §5 (migrations meta only), §6, §7, §8, §10. `Allowed`: workspace + `kebab-core` + `kebab-config` + `kebab-app` + `kebab-cli` only. -Body covers: workspace `Cargo.toml` resolver=3, edition 2024, member list (`kb-core`, `kb-config`, `kb-app`, `kb-cli`), workspace dependencies, `kb-core` types and traits per design §3 / §7, deterministic ID functions per §4 (with full unit tests), `kb-config` loader (TOML + env + CLI override per §6.4), `kb-app` facade signatures (`ingest`, `search`, `ask`, `inspect_doc`, `inspect_chunk`, `doctor`, `init`), `kb-cli` skeleton with clap + `--help`. DoD: `cargo check --workspace`, `cargo test --workspace` (Newtype+ID+canonical-json tests only), `kb --help` works, `docs/spec/*` stubs created (link to frozen design doc), `docs/wire-schema/v1/*.schema.json` stubs (one file per object in §2). +Body covers: workspace `Cargo.toml` resolver=3, edition 2024, member list (`kebab-core`, `kebab-config`, `kebab-app`, `kebab-cli`), workspace dependencies, `kebab-core` types and traits per design §3 / §7, deterministic ID functions per §4 (with full unit tests), `kebab-config` loader (TOML + env + CLI override per §6.4), `kebab-app` facade signatures (`ingest`, `search`, `ask`, `inspect_doc`, `inspect_chunk`, `doctor`, `init`), `kebab-cli` skeleton with clap + `--help`. DoD: `cargo check --workspace`, `cargo test --workspace` (Newtype+ID+canonical-json tests only), `kebab --help` works, `docs/spec/*` stubs created (link to frozen design doc), `docs/wire-schema/v1/*.schema.json` stubs (one file per object in §2). - [ ] **Step 1: Create directory and file** @@ -1134,11 +1134,11 @@ git -c user.name=kb -c user.email=kb@local commit -m "tasks: add p0-1 skeleton c #### `p2-1-fts-schema.md` -`contract_sections`: §5.5 FTS5 + triggers, §9 versioning. `Allowed`: `kb-core`, `kb-config`, `kb-store-sqlite` (extends migrations). `depends_on: [p1-6]`. Migration `V002__fts.sql` adds `chunks_fts` virtual table and three triggers verbatim from §5.5. Tests: backfill from existing chunks via `INSERT INTO chunks_fts SELECT ... FROM chunks`, then assert FTS row count == chunks row count; insert/update/delete in `chunks` reflects in `chunks_fts`. +`contract_sections`: §5.5 FTS5 + triggers, §9 versioning. `Allowed`: `kebab-core`, `kebab-config`, `kebab-store-sqlite` (extends migrations). `depends_on: [p1-6]`. Migration `V002__fts.sql` adds `chunks_fts` virtual table and three triggers verbatim from §5.5. Tests: backfill from existing chunks via `INSERT INTO chunks_fts SELECT ... FROM chunks`, then assert FTS row count == chunks row count; insert/update/delete in `chunks` reflects in `chunks_fts`. #### `p2-2-lexical-retriever.md` -`contract_sections`: §3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5 search output (for snippet length defaults), §2.2 wire schema. `Allowed`: `kb-core`, `kb-config`, `kb-store-sqlite`. `depends_on: [p2-1]`. Implements `Retriever` trait with `bm25(chunks_fts)` ranking, snippet via SQLite `snippet()` (≤ `snippet_chars` chars), citation built per §0 Q3 from `source_spans`. Tests: top-k correctness on fixture corpus, citation line range round-trip against original Markdown, deterministic across two runs. +`contract_sections`: §3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5 search output (for snippet length defaults), §2.2 wire schema. `Allowed`: `kebab-core`, `kebab-config`, `kebab-store-sqlite`. `depends_on: [p2-1]`. Implements `Retriever` trait with `bm25(chunks_fts)` ranking, snippet via SQLite `snippet()` (≤ `snippet_chars` chars), citation built per §0 Q3 from `source_spans`. Tests: top-k correctness on fixture corpus, citation line range round-trip against original Markdown, deterministic across two runs. - [ ] **Step 1: Create directory and both spec files** (template per Phase B; bodies as described above). - [ ] **Step 2: Verify with `for f in tasks/p2/p2-*.md; do test -s "$f" || echo MISSING $f; done; echo done` (expect only `done`).** @@ -1152,10 +1152,10 @@ git add tasks/p2 && git -c user.name=kb -c user.email=kb@local commit -m "tasks: **Files:** `mkdir -p tasks/p3` and create `p3-1-embedder-trait.md`, `p3-2-fastembed-adapter.md`, `p3-3-lancedb-store.md`, `p3-4-hybrid-fusion.md`. -- `p3-1-embedder-trait.md`: §3.7, §7.2 Embedder, §11. Allowed: `kb-core`, `kb-config`. No external embedding dep. Public surface: `Embedder` trait + `EmbeddingInput`/`EmbeddingKind` already in core (validate they exist; if not, this task is also a `kb-core` patch). `depends_on: [p0-1]`. Tests: trait dyn dispatch, mock embedder. -- `p3-2-fastembed-adapter.md`: §11.3, §6.4 `[models.embedding]`. Allowed: `kb-core`, `kb-config`, `fastembed`, `tokenizers`, `ort`. `depends_on: [p3-1]`. Provides `FastembedEmbedder` implementing `Embedder` for `multilingual-e5-small` (default), with required Document/Query prefix per §11.3. Tests: dimension check, deterministic vector for fixed input (hash compare on first 8 floats with epsilon), batch size respected. -- `p3-3-lancedb-store.md`: §3.5, §5.6 embedding_records, §6.3 lancedb table naming. Allowed: `kb-core`, `kb-config`, `lancedb`, `arrow`, `kb-store-sqlite` (write `embedding_records` row only — no other table). `depends_on: [p3-2, p1-6]`. Implements `VectorStore` trait. Table naming `chunk_embeddings__.lance`. `ensure_table` creates if missing. `upsert` inserts vectors and writes a matching `embedding_records` row in same logical operation (best-effort 2PC: lance commit, then SQLite insert; on SQLite failure, log warning + leave lance row — re-upsert is idempotent because of the `UNIQUE(chunk_id, model_id, model_version, dimensions)` constraint and lance upsert semantics). `search` filters via SearchFilters and returns top-k. Tests: smoke (insert+search), dimension mismatch error, model isolation (two models stay in two tables). -- `p3-4-hybrid-fusion.md`: §3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 `[search]` rrf settings. Allowed: `kb-core`, `kb-config`, `kb-store-sqlite` (lexical Retriever from p2-2), `kb-store-vector` (vector Retriever wrapper around `VectorStore::search`). `depends_on: [p2-2, p3-3]`. Implements `HybridRetriever` that dispatches by `SearchMode`, fuses with RRF (k from config, default 60), populates `lexical_score`, `vector_score`, `lexical_rank`, `vector_rank`, `fusion_score`. Tests: pure lexical mode == p2-2 output; pure vector mode == p3-3 output; hybrid produces strictly larger or equal coverage of expected hits than either single mode on a small fixture; deterministic. +- `p3-1-embedder-trait.md`: §3.7, §7.2 Embedder, §11. Allowed: `kebab-core`, `kebab-config`. No external embedding dep. Public surface: `Embedder` trait + `EmbeddingInput`/`EmbeddingKind` already in core (validate they exist; if not, this task is also a `kebab-core` patch). `depends_on: [p0-1]`. Tests: trait dyn dispatch, mock embedder. +- `p3-2-fastembed-adapter.md`: §11.3, §6.4 `[models.embedding]`. Allowed: `kebab-core`, `kebab-config`, `fastembed`, `tokenizers`, `ort`. `depends_on: [p3-1]`. Provides `FastembedEmbedder` implementing `Embedder` for `multilingual-e5-small` (default), with required Document/Query prefix per §11.3. Tests: dimension check, deterministic vector for fixed input (hash compare on first 8 floats with epsilon), batch size respected. +- `p3-3-lancedb-store.md`: §3.5, §5.6 embedding_records, §6.3 lancedb table naming. Allowed: `kebab-core`, `kebab-config`, `lancedb`, `arrow`, `kebab-store-sqlite` (write `embedding_records` row only — no other table). `depends_on: [p3-2, p1-6]`. Implements `VectorStore` trait. Table naming `chunk_embeddings__.lance`. `ensure_table` creates if missing. `upsert` inserts vectors and writes a matching `embedding_records` row in same logical operation (best-effort 2PC: lance commit, then SQLite insert; on SQLite failure, log warning + leave lance row — re-upsert is idempotent because of the `UNIQUE(chunk_id, model_id, model_version, dimensions)` constraint and lance upsert semantics). `search` filters via SearchFilters and returns top-k. Tests: smoke (insert+search), dimension mismatch error, model isolation (two models stay in two tables). +- `p3-4-hybrid-fusion.md`: §3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 `[search]` rrf settings. Allowed: `kebab-core`, `kebab-config`, `kebab-store-sqlite` (lexical Retriever from p2-2), `kebab-store-vector` (vector Retriever wrapper around `VectorStore::search`). `depends_on: [p2-2, p3-3]`. Implements `HybridRetriever` that dispatches by `SearchMode`, fuses with RRF (k from config, default 60), populates `lexical_score`, `vector_score`, `lexical_rank`, `vector_rank`, `fusion_score`. Tests: pure lexical mode == p2-2 output; pure vector mode == p3-3 output; hybrid produces strictly larger or equal coverage of expected hits than either single mode on a small fixture; deterministic. - [ ] **Step 1: Create directory and 4 files** (template per Phase B). - [ ] **Step 2: Verify** @@ -1176,9 +1176,9 @@ git add tasks/p3 && git -c user.name=kb -c user.email=kb@local commit -m "tasks: **Files:** `mkdir -p tasks/p4` and create `p4-1-llm-trait.md`, `p4-2-ollama-adapter.md`, `p4-3-rag-pipeline.md`. -- `p4-1-llm-trait.md`: §7.2 LanguageModel + TokenChunk, §0 Q5 streaming, §3.8 Answer types referenced. Allowed: `kb-core`, `kb-config`. `depends_on: [p0-1]`. Defines (or validates) `LanguageModel` trait, `GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage` per design. Tests: trait dyn dispatch, mock LM streams 3 tokens. -- `p4-2-ollama-adapter.md`: §11.2 Ollama, §6.4 `[models.llm]`, §0 Q5 streaming. Allowed: `kb-core`, `kb-config`, `reqwest` (blocking + json + stream feature) or `ureq` + manual SSE; `serde_json`, `tokio`/runtime if needed. `depends_on: [p4-1]`. Implements `OllamaLanguageModel` with streaming `/api/generate`. `temperature=0.0` default, `seed` honored for determinism. Reachability/missing-model errors map to `LlmError` per design §10. Tests: against a mock HTTP server (`wiremock` or hand-rolled `tiny_http`); deterministic stream collect equals buffered concatenation; missing model returns `LlmError::ModelNotPulled` with proper hint. -- `p4-3-rag-pipeline.md`: §0 Q4 refusal (two-layer), §0 Q7 footer, §1.1–1.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 `[rag]`. Allowed: `kb-core`, `kb-config`, `kb-search` (Retriever), `kb-llm` (LanguageModel). `depends_on: [p3-4, p4-2]`. Pipeline: retrieve top-k → score gate (`refusal_reason: ScoreGate` if top1 < gate) → context packer (token budget + heading_path header `[#n doc=… heading=… span=…]`) → render `rag-v1` prompt → stream → collect → citation extraction (regex `\[(\d+)\]`) → citation validation (each `[n]` must map to a packed chunk; otherwise `grounded=false`, `refusal_reason: LlmSelfJudge`) → write `answers` row. Tests: happy path produces grounded Answer with citations; query with all chunks below gate produces ScoreGate refusal; query whose LLM emits a citation pointing to non-existent `[7]` becomes LlmSelfJudge refusal; identical query under temperature=0 produces byte-identical Answer (snapshot). +- `p4-1-llm-trait.md`: §7.2 LanguageModel + TokenChunk, §0 Q5 streaming, §3.8 Answer types referenced. Allowed: `kebab-core`, `kebab-config`. `depends_on: [p0-1]`. Defines (or validates) `LanguageModel` trait, `GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage` per design. Tests: trait dyn dispatch, mock LM streams 3 tokens. +- `p4-2-ollama-adapter.md`: §11.2 Ollama, §6.4 `[models.llm]`, §0 Q5 streaming. Allowed: `kebab-core`, `kebab-config`, `reqwest` (blocking + json + stream feature) or `ureq` + manual SSE; `serde_json`, `tokio`/runtime if needed. `depends_on: [p4-1]`. Implements `OllamaLanguageModel` with streaming `/api/generate`. `temperature=0.0` default, `seed` honored for determinism. Reachability/missing-model errors map to `LlmError` per design §10. Tests: against a mock HTTP server (`wiremock` or hand-rolled `tiny_http`); deterministic stream collect equals buffered concatenation; missing model returns `LlmError::ModelNotPulled` with proper hint. +- `p4-3-rag-pipeline.md`: §0 Q4 refusal (two-layer), §0 Q7 footer, §1.1–1.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 `[rag]`. Allowed: `kebab-core`, `kebab-config`, `kebab-search` (Retriever), `kebab-llm` (LanguageModel). `depends_on: [p3-4, p4-2]`. Pipeline: retrieve top-k → score gate (`refusal_reason: ScoreGate` if top1 < gate) → context packer (token budget + heading_path header `[#n doc=… heading=… span=…]`) → render `rag-v1` prompt → stream → collect → citation extraction (regex `\[(\d+)\]`) → citation validation (each `[n]` must map to a packed chunk; otherwise `grounded=false`, `refusal_reason: LlmSelfJudge`) → write `answers` row. Tests: happy path produces grounded Answer with citations; query with all chunks below gate produces ScoreGate refusal; query whose LLM emits a citation pointing to non-existent `[7]` becomes LlmSelfJudge refusal; identical query under temperature=0 produces byte-identical Answer (snapshot). - [ ] **Step 1, 2, 3** as in C3. @@ -1193,8 +1193,8 @@ git add tasks/p4 && git -c user.name=kb -c user.email=kb@local commit -m "tasks: **Files:** `mkdir -p tasks/p5` and create `p5-1-golden-fixture-runner.md`, `p5-2-metrics-compare.md`. -- `p5-1-golden-fixture-runner.md`: phase epic + §5.7 eval_runs/eval_query_results, §6.3 runs_dir. Allowed: `kb-core`, `kb-config`, `kb-app` (calls facade for search/ask), `serde_yaml`. `depends_on: [p4-3]`. Loads `fixtures/golden_queries.yaml`, runs each query in selected mode (lexical/vector/hybrid/rag), captures per-query results to `eval_query_results` and to `runs_dir//per_query.jsonl`. Tests: fixture with 3 queries runs end-to-end on a tiny corpus, all rows recorded. -- `p5-2-metrics-compare.md`: phase epic, §0 Q6 wire schema. Allowed: `kb-core`, `kb-config`, `kb-store-sqlite` (read eval rows). `depends_on: [p5-1]`. Computes hit@k, MRR, recall@k_doc, citation_coverage, groundedness (rule-based via `must_contain`), empty_result_rate, refusal_correctness. `kb eval compare a b` produces wins/losses/draws + delta. Tests: fixed input rows produce expected metric values; compare produces stable sorted output. +- `p5-1-golden-fixture-runner.md`: phase epic + §5.7 eval_runs/eval_query_results, §6.3 runs_dir. Allowed: `kebab-core`, `kebab-config`, `kebab-app` (calls facade for search/ask), `serde_yaml`. `depends_on: [p4-3]`. Loads `fixtures/golden_queries.yaml`, runs each query in selected mode (lexical/vector/hybrid/rag), captures per-query results to `eval_query_results` and to `runs_dir//per_query.jsonl`. Tests: fixture with 3 queries runs end-to-end on a tiny corpus, all rows recorded. +- `p5-2-metrics-compare.md`: phase epic, §0 Q6 wire schema. Allowed: `kebab-core`, `kebab-config`, `kebab-store-sqlite` (read eval rows). `depends_on: [p5-1]`. Computes hit@k, MRR, recall@k_doc, citation_coverage, groundedness (rule-based via `must_contain`), empty_result_rate, refusal_correctness. `kebab eval compare a b` produces wins/losses/draws + delta. Tests: fixed input rows produce expected metric values; compare produces stable sorted output. - [ ] **Step 1, 2, 3** as in C3. @@ -1209,9 +1209,9 @@ git add tasks/p5 && git -c user.name=kb -c user.email=kb@local commit -m "tasks: `mkdir -p tasks/p6` and create: -- `p6-1-image-extractor-exif.md`: phase epic §9.1, §3.4 ImageRefBlock, §3.7a ImageType. Allowed: `kb-core`, `kb-config`, `image`, `kamadak-exif`. Implements `Extractor` for `MediaType::Image(_)` producing a `CanonicalDocument` whose body is exactly one `ImageRefBlock`. EXIF goes to `metadata.user`. `depends_on: [p0-1, p1-6]`. Tests: PNG/JPEG decode metadata; EXIF extraction; deterministic doc_id. -- `p6-2-ocr-adapter.md`: phase epic §9.1. Allowed: `kb-core`, `kb-config`, `image`, OS-specific OCR (feature `apple-vision` for macOS via sidecar binary; feature `tesseract` for cross-platform; default tesseract). Defines `OcrEngine` trait + adapter. Populates `ImageRefBlock.ocr` `OcrText` (`joined`, regions, engine, engine_version). `depends_on: [p6-1]`. Tests: deterministic text on a fixed fixture image with high-confidence text. -- `p6-3-caption-adapter.md`: phase epic §9.1 caption section, §3.7a ModelCaption. Allowed: `kb-core`, `kb-config`, `kb-llm` (reuse LanguageModel for VLM). Optional/feature-gated. `depends_on: [p6-1, p4-2]`. Populates `ImageRefBlock.caption`. Tests: with mock LM, caption recorded with model id; absence of feature flag leaves caption=None. +- `p6-1-image-extractor-exif.md`: phase epic §9.1, §3.4 ImageRefBlock, §3.7a ImageType. Allowed: `kebab-core`, `kebab-config`, `image`, `kamadak-exif`. Implements `Extractor` for `MediaType::Image(_)` producing a `CanonicalDocument` whose body is exactly one `ImageRefBlock`. EXIF goes to `metadata.user`. `depends_on: [p0-1, p1-6]`. Tests: PNG/JPEG decode metadata; EXIF extraction; deterministic doc_id. +- `p6-2-ocr-adapter.md`: phase epic §9.1. Allowed: `kebab-core`, `kebab-config`, `image`, OS-specific OCR (feature `apple-vision` for macOS via sidecar binary; feature `tesseract` for cross-platform; default tesseract). Defines `OcrEngine` trait + adapter. Populates `ImageRefBlock.ocr` `OcrText` (`joined`, regions, engine, engine_version). `depends_on: [p6-1]`. Tests: deterministic text on a fixed fixture image with high-confidence text. +- `p6-3-caption-adapter.md`: phase epic §9.1 caption section, §3.7a ModelCaption. Allowed: `kebab-core`, `kebab-config`, `kebab-llm` (reuse LanguageModel for VLM). Optional/feature-gated. `depends_on: [p6-1, p4-2]`. Populates `ImageRefBlock.caption`. Tests: with mock LM, caption recorded with model id; absence of feature flag leaves caption=None. - [ ] **Step 1, 2, 3** as in C3. @@ -1226,8 +1226,8 @@ git add tasks/p6 && git -c user.name=kb -c user.email=kb@local commit -m "tasks: `mkdir -p tasks/p7` and create: -- `p7-1-pdf-text-extractor.md`: phase epic §9.2, §3.4 SourceSpan::Page. Allowed: `kb-core`, `kb-config`, `pdf-extract`, `lopdf` (page metadata). `depends_on: [p0-1, p1-6]`. Extractor for `MediaType::Pdf` produces a `CanonicalDocument` with one `Paragraph` per page, `SourceSpan::Page`. Failed-text pages are emitted as paragraphs with empty text and a `Provenance` warning marking them as scanned candidates. Tests: page count, span correctness, failure handling. -- `p7-2-pdf-page-chunker.md`: phase epic §9.2, §3.5, §0 Q3 citation. Allowed: `kb-core`, `kb-config`. New chunker version `pdf-page-v1` that respects page boundaries. `depends_on: [p7-1]`. Tests: chunk does not cross page boundary; very long page subdivides per `target_tokens`. +- `p7-1-pdf-text-extractor.md`: phase epic §9.2, §3.4 SourceSpan::Page. Allowed: `kebab-core`, `kebab-config`, `pdf-extract`, `lopdf` (page metadata). `depends_on: [p0-1, p1-6]`. Extractor for `MediaType::Pdf` produces a `CanonicalDocument` with one `Paragraph` per page, `SourceSpan::Page`. Failed-text pages are emitted as paragraphs with empty text and a `Provenance` warning marking them as scanned candidates. Tests: page count, span correctness, failure handling. +- `p7-2-pdf-page-chunker.md`: phase epic §9.2, §3.5, §0 Q3 citation. Allowed: `kebab-core`, `kebab-config`. New chunker version `pdf-page-v1` that respects page boundaries. `depends_on: [p7-1]`. Tests: chunk does not cross page boundary; very long page subdivides per `target_tokens`. - [ ] **Step 1, 2, 3** as in C3. @@ -1242,7 +1242,7 @@ git add tasks/p7 && git -c user.name=kb -c user.email=kb@local commit -m "tasks: `mkdir -p tasks/p8` and create: -- `p8-1-whisper-adapter.md`: phase epic §9.3, §3.4 AudioRefBlock + `Transcript`. Allowed: `kb-core`, `kb-config`, whisper.cpp Rust binding (`whisper-rs`) or sidecar binary. `depends_on: [p0-1, p1-6]`. Implements `Transcriber` trait. Default model `large-v3` via config; tests use a tiny model (e.g., `base.en`) for speed. Tests: monotone segment timestamps, language detection populated, deterministic transcript on fixed audio. +- `p8-1-whisper-adapter.md`: phase epic §9.3, §3.4 AudioRefBlock + `Transcript`. Allowed: `kebab-core`, `kebab-config`, whisper.cpp Rust binding (`whisper-rs`) or sidecar binary. `depends_on: [p0-1, p1-6]`. Implements `Transcriber` trait. Default model `large-v3` via config; tests use a tiny model (e.g., `base.en`) for speed. Tests: monotone segment timestamps, language detection populated, deterministic transcript on fixed audio. - `p8-2-segment-chunker.md`: phase epic §9.3, §3.5. New `audio-segment-v1` chunker that groups segments up to `target_tokens` with priority on speaker turn boundaries (when present). `depends_on: [p8-1]`. Tests: chunk timestamp == first/last segment timestamp; speaker change forces split. - [ ] **Step 1, 2, 3** as in C3. @@ -1258,11 +1258,11 @@ git add tasks/p8 && git -c user.name=kb -c user.email=kb@local commit -m "tasks: `mkdir -p tasks/p9` and create: -- `p9-1-tui-library.md`: phase epic §16.2, §3.7. Allowed: `kb-core`, `kb-app` only (UI law). `ratatui`, `crossterm`. `depends_on: [p1-6]`. Library list view + tag filter. Tests: snapshot of rendered frame against fixture corpus list. +- `p9-1-tui-library.md`: phase epic §16.2, §3.7. Allowed: `kebab-core`, `kebab-app` only (UI law). `ratatui`, `crossterm`. `depends_on: [p1-6]`. Library list view + tag filter. Tests: snapshot of rendered frame against fixture corpus list. - `p9-2-tui-search.md`: phase epic §16.2, §1.5. Allowed: same as p9-1. `depends_on: [p2-2, p3-4]`. Search input + result list + preview pane; `Enter` triggers external editor jump (`$EDITOR + `). Tests: search results render; `g` keybinding constructs the correct editor command. - `p9-3-tui-ask.md`: phase epic §16.2, §1.1, §1.2. Allowed: same. `depends_on: [p4-3]`. Ask pane shows streaming tokens; `--explain` toggle. Tests: streaming render, refusal render. - `p9-4-tui-inspect.md`: §1.6 inspect, §3.5. Allowed: same. `depends_on: [p1-6, p3-3]`. Renders Document and Chunk inspection per wire schemas 2.5/2.6. -- `p9-5-desktop-tauri.md`: phase epic §16.3, §1 all scenes. Allowed: `kb-core`, `kb-app`, Tauri backend; frontend stack TBD by user (vanilla TS by default). `depends_on: [p9-1, p9-2, p9-3, p9-4]`. Backend exposes Tauri commands that wrap `kb-app` 1:1. Source viewer per medium (Markdown render, PDF page, image with region overlay, audio with seek). Tests: backend command unit tests (no frontend e2e in this task). +- `p9-5-desktop-tauri.md`: phase epic §16.3, §1 all scenes. Allowed: `kebab-core`, `kebab-app`, Tauri backend; frontend stack TBD by user (vanilla TS by default). `depends_on: [p9-1, p9-2, p9-3, p9-4]`. Backend exposes Tauri commands that wrap `kebab-app` 1:1. Source viewer per medium (Markdown render, PDF page, image with region overlay, audio with seek). Tests: backend command unit tests (no frontend e2e in this task). - [ ] **Step 1, 2, 3** as in C3. @@ -1351,5 +1351,5 @@ git -c user.name=kb -c user.email=kb@local commit -m "tasks: update INDEX with f - [ ] `tasks/p0/`, `tasks/p1/` … `tasks/p9/` exist with the component spec files listed above. - [ ] Every component spec contains: frontmatter (with `contract_sections`), Allowed/Forbidden, Inputs, Outputs, Public surface, Behavior contract, Storage/wire effects, Test plan, Definition of Done, Out of scope, Risks/notes. - [ ] `tasks/INDEX.md` lists every component task. -- [ ] No new domain types introduced inside any component spec — every type referenced is defined in [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](../specs/2026-04-27-kb-final-form-design.md). +- [ ] No new domain types introduced inside any component spec — every type referenced is defined in [docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](../specs/2026-04-27-kebab-final-form-design.md). - [ ] All commits authored sequentially per task; rollback is per-task. diff --git a/docs/superpowers/specs/2026-04-27-kebab-final-form-design.md b/docs/superpowers/specs/2026-04-27-kebab-final-form-design.md index 5f14b11..212a699 100644 --- a/docs/superpowers/specs/2026-04-27-kebab-final-form-design.md +++ b/docs/superpowers/specs/2026-04-27-kebab-final-form-design.md @@ -3,7 +3,7 @@ title: "KB v1 최종 결과물 형태 — Frozen Design" date: 2026-04-27 status: frozen purpose: 작은 단위 분해 작업 시 spec 변경을 막기 위한 단단한 contract 동결 -source_report: ../../../kb_local_rust_report.md +source_report: ../../../kebab_local_rust_report.md related_tasks: ../../../tasks/INDEX.md --- @@ -11,7 +11,7 @@ related_tasks: ../../../tasks/INDEX.md 이 문서는 사용자가 만족할 **최종 결과물의 매우 구체적 형태**를 동결한다. 각 phase 의 task 분해는 이 contract 위에서 수행되며, 이 문서가 바뀌지 않는 한 task 들의 인터페이스는 변하지 않는다. -전제 보고서는 [`kb_local_rust_report.md`](../../../kb_local_rust_report.md). 그 보고서가 *방향*과 *근거*를 제공하며, 이 문서가 *형태*를 못박는다. +전제 보고서는 [`kebab_local_rust_report.md`](../../../kebab_local_rust_report.md). 그 보고서가 *방향*과 *근거*를 제공하며, 이 문서가 *형태*를 못박는다. --- @@ -20,7 +20,7 @@ related_tasks: ../../../tasks/INDEX.md | # | 결정 | 값 | 근거 | |---|------|-----|------| | Q1 | scope 우선순위 | UX → Data 역도출 | 사용자 만족이 spec 안정성 lever | -| Q2 | headline UX | `kb ask` 답변 화면 | 검색/citation/RAG/refusal/모델메타 모두 노출 | +| Q2 | headline UX | `kebab ask` 답변 화면 | 검색/citation/RAG/refusal/모델메타 모두 노출 | | – | ask 기본 형식 | inline numeric refs `[1]…[n]` + footer | 일상 가독성 | | – | ask `--explain` | per-claim 분해 + verbose footer + retrieval trace | 디버그 단일 플래그 | | Q3 | citation 문자열 | URI fragment (`path#k=v…`, W3C Media Fragments) | 표준 정합 + Windows path 안전 + 브라우저 자동 스크롤 | @@ -34,7 +34,7 @@ related_tasks: ../../../tasks/INDEX.md | Q10 | workspace | single root + XDG layout | personal v1 적정 | | – | asset 보존 | content-addressable copy, `copy_threshold_mb=100` 초과 시 reference + checksum | reproducibility + 디스크 절감 | | – | wire 버전 | additive within `vN`, breaking → `vN+1` | 외부 깨짐 방지 | -| – | ignore | gitignore 문법 + `.kbignore` | 익숙함 | +| – | ignore | gitignore 문법 + `.kebabignore` | 익숙함 | | – | 에러 | thiserror per crate, anyhow at boundary | 추적성 + UX | | – | sync | watch=false default | v1 명시 ingest | @@ -42,43 +42,43 @@ related_tasks: ../../../tasks/INDEX.md ## 1. Headline UX scenes -### 1.1 `kb ask` (default) +### 1.1 `kebab ask` (default) ```text -$ kb ask "Markdown chunking 규칙은?" +$ kebab ask "Markdown chunking 규칙은?" heading boundary 우선 [1]. code block 중간 분할 금지 [2]. table 가능한 한 단일 chunk 유지 [2]. 긴 section 은 paragraph 단위로 분할 [1]. chunk 마다 heading_path 와 source_span 보존 [1]. ───────────────────────────────────────────────────────── -[1] notes/rust/kb-architecture.md#L661-L672 +[1] notes/rust/kebab-architecture.md#L661-L672 §14 Chunking 정책 -[2] notes/rust/kb-architecture.md#L665-L668 +[2] notes/rust/kebab-architecture.md#L665-L668 §14 Chunking 정책 grounded ✓ qwen2.5:14b-instruct rag-v1 3 chunks ``` -### 1.2 `kb ask --explain` +### 1.2 `kebab ask --explain` ```text -$ kb ask --explain "Markdown chunking 규칙은?" +$ kebab ask --explain "Markdown chunking 규칙은?" ▎ heading boundary 우선 - └ notes/rust/kb-architecture.md#L662 + └ notes/rust/kebab-architecture.md#L662 「heading boundary를 우선한다」 ▎ code block 중간 분할 금지 - └ notes/rust/kb-architecture.md#L663 + └ notes/rust/kebab-architecture.md#L663 「code block은 중간에서 자르지 않는다」 ▎ table 단일 chunk 유지 - └ notes/rust/kb-architecture.md#L664 + └ notes/rust/kebab-architecture.md#L664 「table은 가능한 한 하나의 chunk로」 ▎ heading_path / source_span 보존 - └ notes/rust/kb-architecture.md#L668-L670 + └ notes/rust/kebab-architecture.md#L668-L670 retrieval trace query "Markdown chunking 규칙은?" @@ -87,8 +87,8 @@ retrieval trace threshold (gate) 0.30 → top-1 0.82 pass fusion rrf (k=60) chunks (used) 3 / 8 returned - #1 0.82 notes/rust/kb-architecture.md#L661-L672 bm25=12.4 vec=0.78 - #2 0.78 notes/rust/kb-architecture.md#L692-L713 bm25=10.1 vec=0.74 + #1 0.82 notes/rust/kebab-architecture.md#L661-L672 bm25=12.4 vec=0.78 + #2 0.78 notes/rust/kebab-architecture.md#L692-L713 bm25=10.1 vec=0.74 #3 0.55 guides/markdown-style.md#L4-L18 bm25=8.2 vec=0.61 grounded ✓ qwen2.5:14b-instruct rag-v1 3 chunks @@ -96,10 +96,10 @@ prompt 1184 tokens completion 312 tokens latency 1842 ms embedding multilingual-e5-small index v1.0 ``` -### 1.3 `kb ask` (refusal — score gate) +### 1.3 `kebab ask` (refusal — score gate) ```text -$ kb ask "당신의 회사 매출은?" +$ kebab ask "당신의 회사 매출은?" 근거 부족. KB 에 해당 내용 없음. 가까운 후보 (모두 임계 0.30 미만): @@ -108,10 +108,10 @@ $ kb ask "당신의 회사 매출은?" grounded ✗ qwen2.5:14b-instruct rag-v1 0 chunks used ``` -### 1.4 `kb ask` (refusal — LLM self-judge) +### 1.4 `kebab ask` (refusal — LLM self-judge) ```text -$ kb ask "이 책의 23쪽 결론은?" +$ kebab ask "이 책의 23쪽 결론은?" 근거 부족. 제공된 chunk 중 결론 내용 없음. 검색은 됨, LLM 이 결론 부재 판단: @@ -121,17 +121,17 @@ $ kb ask "이 책의 23쪽 결론은?" grounded ✗ qwen2.5:14b-instruct rag-v1 3 chunks searched, 0 grounded ``` -### 1.5 `kb search` (dense) +### 1.5 `kebab search` (dense) ```text -$ kb search "Markdown chunking 규칙" +$ kebab search "Markdown chunking 규칙" -1. 0.82 notes/rust/kb-architecture.md#L661-L672 +1. 0.82 notes/rust/kebab-architecture.md#L661-L672 §14 Chunking 정책 heading boundary 우선. code block 중간 분할 금지. table 가능한 한 단일 chunk… -2. 0.71 notes/rust/kb-architecture.md#L692-L713 +2. 0.71 notes/rust/kebab-architecture.md#L692-L713 §15 검색과 RAG 정책 검색은 처음부터 hybrid 로 설계하되 구현은 단계적… @@ -142,7 +142,7 @@ $ kb search "Markdown chunking 규칙" 3 hits hybrid index v1.0 bm25+e5-small/RRF ``` -### 1.6 `kb search --explain` +### 1.6 `kebab search --explain` 각 hit 아래 추가: @@ -174,8 +174,8 @@ $ kb search "Markdown chunking 규칙" { "schema_version": "citation.v1", "kind": "line|page|region|caption|time", - "path": "notes/rust/kb.md", - "uri": "notes/rust/kb.md#L12-L34", + "path": "notes/rust/kebab.md", + "uri": "notes/rust/kebab.md#L12-L34", "line": { "start": 12, "end": 34, "section": "§14 Chunking 정책" }, "page": { "page": 13, "section": "Experiment Setup" }, @@ -196,7 +196,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는 "score": 0.82, "chunk_id": "9b4a8c1e7d3f2a05", "doc_id": "3f9a2c10ee4d6b78", - "doc_path": "notes/rust/kb-architecture.md", + "doc_path": "notes/rust/kebab-architecture.md", "heading_path": ["아키텍처", "Chunking 정책"], "section_label": "§14 Chunking 정책", "snippet": "heading boundary 우선. code block 중간 분할 금지…", @@ -261,7 +261,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는 { "kind": "new|updated|skipped|error", "doc_id": "3f9a2c10ee4d6b78", - "doc_path": "notes/rust/kb-architecture.md", + "doc_path": "notes/rust/kebab-architecture.md", "asset_id": "8c1e7d3f2a05", "byte_len": 41822, "block_count": 184, @@ -277,13 +277,13 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는 `--summary-only` 시 `items: null`. -### 2.5 DocSummary (`kb list docs`) +### 2.5 DocSummary (`kebab list docs`) ```json { "schema_version": "doc_summary.v1", "doc_id": "3f9a2c10ee4d6b78", - "doc_path": "notes/rust/kb-architecture.md", + "doc_path": "notes/rust/kebab-architecture.md", "title": "Rust 로컬 Knowledge Base 설계", "lang": "ko", "tags": ["knowledge-base", "rust", "rag"], @@ -305,7 +305,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는 "schema_version": "chunk_inspection.v1", "chunk_id": "9b4a8c1e7d3f2a05", "doc_id": "3f9a2c10ee4d6b78", - "doc_path": "notes/rust/kb-architecture.md", + "doc_path": "notes/rust/kebab-architecture.md", "heading_path": ["아키텍처", "Chunking 정책"], "text": "heading boundary 우선…", "source_spans": [{ "kind": "line", "start": 661, "end": 672 }], @@ -325,9 +325,9 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는 "schema_version": "doctor.v1", "ok": true, "checks": [ - { "name": "config_loaded", "ok": true, "detail": "~/.config/kb/config.toml" }, - { "name": "data_dir_writable", "ok": true, "detail": "~/.local/share/kb" }, - { "name": "sqlite_open", "ok": true, "detail": "kb.sqlite (schema v1)" }, + { "name": "config_loaded", "ok": true, "detail": "~/.config/kebab/config.toml" }, + { "name": "data_dir_writable", "ok": true, "detail": "~/.local/share/kebab" }, + { "name": "sqlite_open", "ok": true, "detail": "kebab.sqlite (schema v1)" }, { "name": "lancedb_open", "ok": true, "detail": "lancedb/" }, { "name": "embedding_model", "ok": true, "detail": "multilingual-e5-small (384d)" }, { "name": "ollama_reachable", "ok": true, "detail": "http://127.0.0.1:11434" }, @@ -346,7 +346,7 @@ variant 별 해당 키만 채움. `path` 와 `uri` 는 항상 채움 (`uri` 는 --- -## 3. 도메인 모델 (kb-core) +## 3. 도메인 모델 (kebab-core) ### 3.1 Newtype IDs @@ -581,7 +581,7 @@ pub struct RetrievalDetail { ### 3.7a Forward-declared types -`Block::ImageRef` / `AudioRef` variant 은 v1 부터 존재하나, 그 안의 `ocr` / `caption` / `transcript` 필드는 P1 에선 항상 `None`. 다음 타입은 `kb-core` 에 stub 으로 둠 (최종 도메인 모델 슬롯): +`Block::ImageRef` / `AudioRef` variant 은 v1 부터 존재하나, 그 안의 `ocr` / `caption` / `transcript` 필드는 P1 에선 항상 `None`. 다음 타입은 `kebab-core` 에 stub 으로 둠 (최종 도메인 모델 슬롯): ```rust pub struct OcrText { pub joined: String, pub regions: Vec, pub engine: String, pub engine_version: String } @@ -596,41 +596,41 @@ pub enum ImageType { Png, Jpeg, Webp, Gif, Tiff, Other(String) } pub enum AudioType { M4a, Mp3, Wav, Flac, Ogg, Other(String) } ``` -`ExtractConfig`, `DocFilter`, `JobKind`, `JobStatus`, `JobFilter`, `JobRow`, `JobId`, `VectorRecord`, `VectorHit`, `RefusalSignal`, `NoHitSignal`, `DoctorUnhealthy` 도 `kb-core` 에 정의 (자세한 필드는 사용 시 결정, 이 spec 에서 forward-ref 만 보장). +`ExtractConfig`, `DocFilter`, `JobKind`, `JobStatus`, `JobFilter`, `JobRow`, `JobId`, `VectorRecord`, `VectorHit`, `RefusalSignal`, `NoHitSignal`, `DoctorUnhealthy` 도 `kebab-core` 에 정의 (자세한 필드는 사용 시 결정, 이 spec 에서 forward-ref 만 보장). `OffsetDateTime` 는 `time::OffsetDateTime`, `Result` 는 crate-local alias. -### 3.7b Parser intermediate types — `kb-parse-types` +### 3.7b Parser intermediate types — `kebab-parse-types` -Parser 의 *중간* 표현 (`ParsedBlock` 류) 은 `kb-core` 가 아니라 별도의 thin crate **`kb-parse-types`** 에 둔다. 이유: `kb-normalize` 는 medium-agnostic 한 ID/Provenance lift 를 책임지고 어떤 parser 도 직접 import 하면 안 된다. 그러나 normalize 에 들어오는 입력 타입이 어딘가에 정의되어야 하는데, 그것을 `kb-core` 에 박으면 (a) parser-별 ParsedBlock 변종 (`ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`) 이 향후 합류할 때 core 의 namespace 가 폭발하고, (b) parser 의 의미 변경이 core 변경이 되어 모든 의존자가 영향을 받는다. +Parser 의 *중간* 표현 (`ParsedBlock` 류) 은 `kebab-core` 가 아니라 별도의 thin crate **`kebab-parse-types`** 에 둔다. 이유: `kebab-normalize` 는 medium-agnostic 한 ID/Provenance lift 를 책임지고 어떤 parser 도 직접 import 하면 안 된다. 그러나 normalize 에 들어오는 입력 타입이 어딘가에 정의되어야 하는데, 그것을 `kebab-core` 에 박으면 (a) parser-별 ParsedBlock 변종 (`ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`) 이 향후 합류할 때 core 의 namespace 가 폭발하고, (b) parser 의 의미 변경이 core 변경이 되어 모든 의존자가 영향을 받는다. -`kb-parse-types` 는 이 둘 사이의 **유일한 layer** 다. 의존 그래프: +`kebab-parse-types` 는 이 둘 사이의 **유일한 layer** 다. 의존 그래프: ```text -kb-core (도메인 모델 — Block, Chunk, SourceSpan, IDs, …) +kebab-core (도메인 모델 — Block, Chunk, SourceSpan, IDs, …) ▲ │ -kb-parse-types (parser 중간 표현 — ParsedBlock, ParsedImageRegion[P+], ParsedPdfPage[P+], ParsedAudioSegment[P+], Inline) +kebab-parse-types (parser 중간 표현 — ParsedBlock, ParsedImageRegion[P+], ParsedPdfPage[P+], ParsedAudioSegment[P+], Inline) ▲ ▲ │ │ -kb-parse-md, kb-parse-pdf, kb-normalize -kb-parse-image, kb-parse-audio +kebab-parse-md, kebab-parse-pdf, kebab-normalize +kebab-parse-image, kebab-parse-audio ``` -`kb-parse-types` 는: -- `kb-core` 에만 의존 (`Block`, `SourceSpan`, `Lang` 등 도메인 타입 사용). -- 다른 어떤 `kb-*` 에도 의존하지 않는다. +`kebab-parse-types` 는: +- `kebab-core` 에만 의존 (`Block`, `SourceSpan`, `Lang` 등 도메인 타입 사용). +- 다른 어떤 `kebab-*` 에도 의존하지 않는다. - 어떤 parser 의 구체 라이브러리 (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`) 에도 의존하지 않는다. - serde + thiserror 정도의 외부 의존만 가진다. P1 에서 정의되는 타입: ```rust -// kb-parse-types — depends on kb-core only. +// kebab-parse-types — depends on kebab-core only. pub struct ParsedBlock { pub kind: ParsedBlockKind, pub heading_path: Vec, - pub source_span: kb_core::SourceSpan, + pub source_span: kebab_core::SourceSpan, pub payload: ParsedPayload, } @@ -638,11 +638,11 @@ pub enum ParsedBlockKind { Heading, Paragraph, List, Code, Table, Quote, ImageRe pub enum ParsedPayload { Heading { level: u8, text: String }, - Paragraph { text: String, inlines: Vec }, - List { ordered: bool, items: Vec> }, + Paragraph { text: String, inlines: Vec }, + List { ordered: bool, items: Vec> }, Code { lang: Option, code: String }, Table { headers: Vec, rows: Vec> }, - Quote { text: String, inlines: Vec }, + Quote { text: String, inlines: Vec }, ImageRef { src: String, alt: String }, AudioRef { src: String }, // duration_ms filled by extractor before chunking } @@ -651,7 +651,7 @@ pub struct Warning { pub kind: WarningKind, pub note: String } pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed } ``` -`Inline` 은 `kb-core` (§3.4) 에 있는 도메인 타입. `kb-parse-types` 는 그것을 *참조* 만 한다 — 같은 의미를 두 crate 에 중복 정의하지 않는다 (그러면 normalize 가 identity-conversion 을 해야 해서 무의미). +`Inline` 은 `kebab-core` (§3.4) 에 있는 도메인 타입. `kebab-parse-types` 는 그것을 *참조* 만 한다 — 같은 의미를 두 crate 에 중복 정의하지 않는다 (그러면 normalize 가 identity-conversion 을 해야 해서 무의미). P6/P7/P8 에서 추가될 타입 (forward-ref): @@ -661,7 +661,7 @@ pub struct ParsedPdfPage { pub page: u32, pub text: String } pub struct ParsedAudioSegment { pub start_ms: u64, pub end_ms: u64, pub text: String } ``` -→ 새 medium 추가 시 `kb-core::Block` 변종은 변하지 않고, `kb-parse-types` 만 확장된다. +→ 새 medium 추가 시 `kebab-core::Block` 변종은 변하지 않고, `kebab-parse-types` 만 확장된다. ### 3.8 Answer / RAG types @@ -995,28 +995,28 @@ CREATE TABLE eval_query_results ( | 종류 | 기본 위치 | |------|-----------| | 워크스페이스 | `~/KnowledgeBase/` | -| config | `~/.config/kb/config.toml` | -| data | `~/.local/share/kb/` | -| cache | `~/.cache/kb/` | -| state (logs) | `~/.local/state/kb/` | +| config | `~/.config/kebab/config.toml` | +| data | `~/.local/share/kebab/` | +| cache | `~/.cache/kebab/` | +| state (logs) | `~/.local/state/kebab/` | -`~`, `$HOME`, `${KB_*}` expand. 절대 path 정규화 후 사용. +`~`, `$HOME`, `${KEBAB_*}` expand. 절대 path 정규화 후 사용. ### 6.2 Workspace 구조 ``` ~/KnowledgeBase/ ├── inbox/ notes/ papers/ photos/ recordings/ -└── .kbignore +└── .kebabignore ``` -`.kbignore` 와 `config.workspace.exclude` 합집합. +`.kebabignore` 와 `config.workspace.exclude` 합집합. ### 6.3 Data dir 구조 ``` -~/.local/share/kb/ -├── kb.sqlite (+ -wal, -shm) +~/.local/share/kebab/ +├── kebab.sqlite (+ -wal, -shm) ├── lancedb/ │ └── chunk_embeddings__.lance/ ├── assets// # shard @@ -1025,7 +1025,7 @@ CREATE TABLE eval_query_results ( └── runs// # eval per_query.jsonl + report.md ``` -### 6.4 Config (`~/.config/kb/config.toml`) — frozen schema +### 6.4 Config (`~/.config/kebab/config.toml`) — frozen schema ```toml schema_version = 1 @@ -1036,8 +1036,8 @@ include = ["**/*.md"] exclude = [".git/**", "node_modules/**", ".obsidian/**"] [storage] -data_dir = "${XDG_DATA_HOME:-~/.local/share}/kb" -sqlite = "{data_dir}/kb.sqlite" +data_dir = "${XDG_DATA_HOME:-~/.local/share}/kebab" +sqlite = "{data_dir}/kebab.sqlite" vector_dir = "{data_dir}/lancedb" asset_dir = "{data_dir}/assets" artifact_dir = "{data_dir}/artifacts" @@ -1086,15 +1086,15 @@ max_context_tokens = 8000 config 우선순위: default → file → env (`KB_
_`) → CLI flag. -### 6.5 `kb init` 출력 +### 6.5 `kebab init` 출력 ```text -$ kb init -created ~/.config/kb/config.toml -created ~/.local/share/kb/ +$ kebab init +created ~/.config/kebab/config.toml +created ~/.local/share/kebab/ created ~/KnowledgeBase/ -opened ~/.local/share/kb/kb.sqlite (schema v1) -hint edit ~/.config/kb/config.toml then `kb ingest ~/KnowledgeBase` +opened ~/.local/share/kebab/kebab.sqlite (schema v1) +hint edit ~/.config/kebab/config.toml then `kebab ingest ~/KnowledgeBase` ``` 기존 파일 보존, `--force` 명시 필요. @@ -1107,7 +1107,7 @@ hint edit ~/.config/kb/config.toml then `kb ingest ~/KnowledgeBase` --- -## 7. Trait contracts (kb-core) +## 7. Trait contracts (kebab-core) ### 7.1 입출력 보조 @@ -1210,34 +1210,34 @@ pub trait JobRepo { ## 8. 모듈 경계 (Allowed / Forbidden) ```text -kb-cli, kb-tui, kb-desktop - └─> kb-app - ├─> kb-source-fs - ├─> kb-parse-md / kb-parse-pdf / kb-parse-image / kb-parse-audio - │ └─> kb-parse-types (parser intermediate) - ├─> kb-normalize - │ └─> kb-parse-types - ├─> kb-chunk - ├─> kb-store-sqlite (DocumentStore, JobRepo, Retriever[lexical]) - ├─> kb-store-vector (VectorStore) - ├─> kb-embed-local - ├─> kb-search (Retriever[hybrid]) - ├─> kb-llm-local - ├─> kb-rag - ├─> kb-eval - └─> kb-config - └─> kb-core (모두 의존) +kebab-cli, kebab-tui, kebab-desktop + └─> kebab-app + ├─> kebab-source-fs + ├─> kebab-parse-md / kebab-parse-pdf / kebab-parse-image / kebab-parse-audio + │ └─> kebab-parse-types (parser intermediate) + ├─> kebab-normalize + │ └─> kebab-parse-types + ├─> kebab-chunk + ├─> kebab-store-sqlite (DocumentStore, JobRepo, Retriever[lexical]) + ├─> kebab-store-vector (VectorStore) + ├─> kebab-embed-local + ├─> kebab-search (Retriever[hybrid]) + ├─> kebab-llm-local + ├─> kebab-rag + ├─> kebab-eval + └─> kebab-config + └─> kebab-core (모두 의존) ``` -`kb-parse-types` 는 `kb-core` 와 parsers/normalize 사이의 thin layer (§3.7b 참조). parser-별 중간 표현 (`ParsedBlock`, `ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`, `Inline`) 을 한 곳에 모아 (a) `kb-core` 의 namespace 폭발을 막고 (b) `kb-normalize` 가 parser 를 직접 import 하지 않게 한다. +`kebab-parse-types` 는 `kebab-core` 와 parsers/normalize 사이의 thin layer (§3.7b 참조). parser-별 중간 표현 (`ParsedBlock`, `ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`, `Inline`) 을 한 곳에 모아 (a) `kebab-core` 의 namespace 폭발을 막고 (b) `kebab-normalize` 가 parser 를 직접 import 하지 않게 한다. 핵심 금지: - UI → store/llm/parse 직접 의존 ✗ - parse-* → store/llm/embed ✗ -- parse-* → kb-normalize ✗ (단방향: parsers → kb-parse-types ← normalize) +- parse-* → kebab-normalize ✗ (단방향: parsers → kebab-parse-types ← normalize) - chunk → llm/embed ✗ - normalize → store / parse-* ✗ -- kb-parse-types → 어떤 parser/normalize/store/llm/embed/search/rag/ui ✗ (`kb-core` 만 의존) +- kebab-parse-types → 어떤 parser/normalize/store/llm/embed/search/rag/ui ✗ (`kebab-core` 만 의존) - 다른 store 와 cross-write ✗ `cargo deny` + workspace deny.toml + CI 체크로 강제. @@ -1270,7 +1270,7 @@ CI: ## 10. 에러 모델 + exit codes ```rust -// kb-core +// kebab-core pub enum CoreError { InvalidId, InvalidCitation, InvalidSpan, Malformed } // crate-local examples pub enum ParseMdError { Yaml(String), Encoding, Pulldown(String), Span } @@ -1278,7 +1278,7 @@ pub enum StoreError { Sqlx(rusqlite::Error), Migration(String), Conflict(String) pub enum LlmError { Unreachable, ModelNotPulled(String), Timeout, Stream(String) } ``` -Boundary (`kb-app`, `kb-cli`) 에서 `anyhow::Error` 합침. exit code 매핑: +Boundary (`kebab-app`, `kebab-cli`) 에서 `anyhow::Error` 합침. exit code 매핑: ```rust fn exit_code(err: &anyhow::Error) -> i32 { @@ -1295,17 +1295,17 @@ fn exit_code(err: &anyhow::Error) -> i32 { | `--verbose` | + anyhow chain | | `--debug` 또는 `RUST_LOG=debug` | + tracing target/level/span | -Refusal 은 에러 아님. `kb ask` 거절은 정상 stdout (Answer with grounded=false) + exit 1. +Refusal 은 에러 아님. `kebab ask` 거절은 정상 stdout (Answer with grounded=false) + exit 1. -Logging: `tracing` + `tracing-subscriber` + `tracing-appender` daily roll, `~/.local/state/kb/logs/`. structured (`trace_id`, `doc_id`, `chunk_id`). +Logging: `tracing` + `tracing-subscriber` + `tracing-appender` daily roll, `~/.local/state/kebab/logs/`. structured (`trace_id`, `doc_id`, `chunk_id`). -`kb doctor` 출력 (사람): +`kebab doctor` 출력 (사람): ```text -$ kb doctor -✓ config_loaded ~/.config/kb/config.toml -✓ data_dir_writable ~/.local/share/kb -✓ sqlite_open kb.sqlite (schema v1) +$ kebab doctor +✓ config_loaded ~/.config/kebab/config.toml +✓ data_dir_writable ~/.local/share/kebab +✓ sqlite_open kebab.sqlite (schema v1) ✓ lancedb_open lancedb/ ✓ embedding_model multilingual-e5-small (384d) ✓ ollama_reachable http://127.0.0.1:11434 @@ -1322,7 +1322,7 @@ $ kb doctor 이 문서가 동결 ↔ 다음 컴포넌트 분해 작업이 안전: - 모든 wire schema (`docs/wire-schema/v1/*.schema.json`) -- 모든 trait 시그니처 (kb-core) +- 모든 trait 시그니처 (kebab-core) - 모든 ID recipe (4.2) - SQLite DDL (5장) - Filesystem + config schema (6장) @@ -1334,7 +1334,7 @@ $ kb doctor **의도적으로 빠진 것 (out of scope, P+)**: - multi-workspace - watch mode -- desktop app `kb://` protocol handler +- desktop app `kebab://` protocol handler - LLM-as-judge eval - visual embedding (CLIP) - real-time collab diff --git a/kebab_local_rust_report.md b/kebab_local_rust_report.md index 3810dac..a53fb8c 100644 --- a/kebab_local_rust_report.md +++ b/kebab_local_rust_report.md @@ -17,11 +17,11 @@ urlcolor: blue 최종 방향은 다음 한 문장으로 요약할 수 있다. -> Markdown을 1등급 지식 소스로 삼고, 이미지, PDF, 음성은 각각 extractor adapter를 통해 동일한 `CanonicalDocument -> Chunk -> Embed -> Index -> Search -> RAG` 파이프라인으로 흘려보낸다. CLI, TUI, desktop app은 모두 같은 `kb-app` facade를 함수 호출로 사용한다. +> Markdown을 1등급 지식 소스로 삼고, 이미지, PDF, 음성은 각각 extractor adapter를 통해 동일한 `CanonicalDocument -> Chunk -> Embed -> Index -> Search -> RAG` 파이프라인으로 흘려보낸다. CLI, TUI, desktop app은 모두 같은 `kebab-app` facade를 함수 호출로 사용한다. 가장 먼저 만들 것은 채팅 UI가 아니다. 먼저 만들어야 할 것은 다음 7가지다. -1. `kb-core`의 도메인 모델과 trait 계약 +1. `kebab-core`의 도메인 모델과 trait 계약 2. deterministic ID 규칙 3. Markdown canonicalization 4. chunking policy @@ -66,16 +66,16 @@ urlcolor: blue Markdown files | v -kb-source-fs +kebab-source-fs | v -kb-parse-md +kebab-parse-md | v CanonicalDocument | v -kb-chunk +kebab-chunk | v Chunks @@ -88,15 +88,15 @@ SQLite metadata/FTS LanceDB vectors Raw asset store +--------------------+--------------------+ | v - kb-search + kebab-search | v - kb-rag + kebab-rag | +---------------+---------------+ | | | v v v - kb-cli kb-tui kb-desktop + kebab-cli kebab-tui kebab-desktop ``` 추후 확장 후 구조는 다음과 같다. @@ -133,23 +133,23 @@ Cargo workspace는 여러 package를 함께 관리하는 구조이며, 공통 `C [workspace] resolver = "3" members = [ - "crates/kb-core", - "crates/kb-config", - "crates/kb-source-fs", - "crates/kb-parse-md", - "crates/kb-normalize", - "crates/kb-chunk", - "crates/kb-store-sqlite", - "crates/kb-store-vector", - "crates/kb-embed", - "crates/kb-embed-local", - "crates/kb-search", - "crates/kb-llm", - "crates/kb-llm-local", - "crates/kb-rag", - "crates/kb-eval", - "crates/kb-app", - "crates/kb-cli" + "crates/kebab-core", + "crates/kebab-config", + "crates/kebab-source-fs", + "crates/kebab-parse-md", + "crates/kebab-normalize", + "crates/kebab-chunk", + "crates/kebab-store-sqlite", + "crates/kebab-store-vector", + "crates/kebab-embed", + "crates/kebab-embed-local", + "crates/kebab-search", + "crates/kebab-llm", + "crates/kebab-llm-local", + "crates/kebab-rag", + "crates/kebab-eval", + "crates/kebab-app", + "crates/kebab-cli" ] [workspace.package] @@ -173,7 +173,7 @@ tracing = "0.1" 초기 repo는 이렇게 잡는다. ```text -kb/ +kebab/ Cargo.toml README.md docs/ @@ -191,70 +191,70 @@ kb/ nested-headings.md code-and-table.md crates/ - kb-core/ - kb-config/ - kb-source-fs/ - kb-parse-md/ - kb-normalize/ - kb-chunk/ - kb-store-sqlite/ - kb-store-vector/ - kb-embed/ - kb-embed-local/ - kb-search/ - kb-llm/ - kb-llm-local/ - kb-rag/ - kb-eval/ - kb-app/ - kb-cli/ + kebab-core/ + kebab-config/ + kebab-source-fs/ + kebab-parse-md/ + kebab-normalize/ + kebab-chunk/ + kebab-store-sqlite/ + kebab-store-vector/ + kebab-embed/ + kebab-embed-local/ + kebab-search/ + kebab-llm/ + kebab-llm-local/ + kebab-rag/ + kebab-eval/ + kebab-app/ + kebab-cli/ ``` 나중에 추가할 crate는 다음과 같다. ```text -crates/kb-parse-image/ -crates/kb-parse-pdf/ -crates/kb-parse-audio/ -crates/kb-rerank/ -crates/kb-tui/ -crates/kb-desktop/ +crates/kebab-parse-image/ +crates/kebab-parse-pdf/ +crates/kebab-parse-audio/ +crates/kebab-rerank/ +crates/kebab-tui/ +crates/kebab-desktop/ ``` 중요한 의존성 규칙은 다음과 같다. ```text -kb-cli, kb-tui, kb-desktop - -> kb-app - -> kb-index / kb-search / kb-rag - -> kb-core traits +kebab-cli, kebab-tui, kebab-desktop + -> kebab-app + -> kebab-index / kebab-search / kebab-rag + -> kebab-core traits -> concrete adapters ``` -UI crate는 절대로 parser, DB, LLM adapter를 직접 호출하지 않는다. 모든 user-facing command는 `kb-app` facade를 통해 호출한다. +UI crate는 절대로 parser, DB, LLM adapter를 직접 호출하지 않는다. 모든 user-facing command는 `kebab-app` facade를 통해 호출한다. # 5. 컴포넌트 목록과 책임 | 컴포넌트 | 책임 | 초기 구현 | |---|---|---| -| `kb-core` | domain type, trait, error, ID 규칙 | 필수 | -| `kb-config` | config 파일 로딩, 기본값, 경로 확장 | 필수 | -| `kb-source-fs` | 로컬 폴더 scan, checksum, 변경 감지 | 필수 | -| `kb-parse-md` | Markdown -> structured document | 필수 | -| `kb-normalize` | parser output -> `CanonicalDocument` | 필수 | -| `kb-chunk` | block-aware chunking | 필수 | -| `kb-store-sqlite` | metadata, document, chunk, job, FTS | 필수 | -| `kb-store-vector` | vector upsert/search | P1 | -| `kb-embed` | embedding trait | P1 | -| `kb-embed-local` | local embedding adapter | P1 | -| `kb-search` | lexical, vector, hybrid retrieval | P1 | -| `kb-llm` | language model trait | P1 | -| `kb-llm-local` | Ollama 또는 llama.cpp adapter | P1 | -| `kb-rag` | context packing, answer, citation | P1 | -| `kb-eval` | golden query, regression test | P1 | -| `kb-cli` | command line interface | 필수 | -| `kb-tui` | terminal UI | P2 | -| `kb-desktop` | desktop app | P3 | +| `kebab-core` | domain type, trait, error, ID 규칙 | 필수 | +| `kebab-config` | config 파일 로딩, 기본값, 경로 확장 | 필수 | +| `kebab-source-fs` | 로컬 폴더 scan, checksum, 변경 감지 | 필수 | +| `kebab-parse-md` | Markdown -> structured document | 필수 | +| `kebab-normalize` | parser output -> `CanonicalDocument` | 필수 | +| `kebab-chunk` | block-aware chunking | 필수 | +| `kebab-store-sqlite` | metadata, document, chunk, job, FTS | 필수 | +| `kebab-store-vector` | vector upsert/search | P1 | +| `kebab-embed` | embedding trait | P1 | +| `kebab-embed-local` | local embedding adapter | P1 | +| `kebab-search` | lexical, vector, hybrid retrieval | P1 | +| `kebab-llm` | language model trait | P1 | +| `kebab-llm-local` | Ollama 또는 llama.cpp adapter | P1 | +| `kebab-rag` | context packing, answer, citation | P1 | +| `kebab-eval` | golden query, regression test | P1 | +| `kebab-cli` | command line interface | 필수 | +| `kebab-tui` | terminal UI | P2 | +| `kebab-desktop` | desktop app | P3 | # 6. 핵심 도메인 모델 @@ -398,10 +398,10 @@ Markdown frontmatter 기본 규약은 다음 정도로 시작한다. ```yaml --- -id: rust-kb-architecture +id: rust-kebab-architecture title: Rust 로컬 Knowledge Base 설계 aliases: - - local kb + - local kebab - rust rag tags: - knowledge-base @@ -418,7 +418,7 @@ lang: ko Markdown citation은 line range를 기본으로 한다. ```text -notes/rust/kb.md:L12-L34 +notes/rust/kebab.md:L12-L34 ``` # 9. 이미지, PDF, 음성 확장 전략 @@ -534,13 +534,13 @@ Ollama 문서는 macOS Sonoma 이상에서 Apple M series CPU/GPU support를 언 초기 adapter는 다음처럼 둔다. ```text -kb-llm-local +kebab-llm-local - OllamaLanguageModel - later: LlamaCppLanguageModel - later: CandleLanguageModel ``` -Ollama가 내부적으로 local server를 쓰더라도, 프로젝트 아키텍처 관점에서는 HTTP MSA가 아니다. `kb-llm-local` 안에 캡슐화된 model adapter일 뿐이다. +Ollama가 내부적으로 local server를 쓰더라도, 프로젝트 아키텍처 관점에서는 HTTP MSA가 아니다. `kebab-llm-local` 안에 캡슐화된 model adapter일 뿐이다. ## 11.3 Local embedding @@ -583,10 +583,10 @@ M4 48GB MacBook은 개인용 local KB에 충분한 타겟이지만, indexing과 root = "~/KnowledgeBase" [storage] -sqlite_path = "~/.local/share/kb/kb.sqlite" -vector_path = "~/.local/share/kb/lancedb" -raw_asset_path = "~/.local/share/kb/assets" -artifact_path = "~/.local/share/kb/artifacts" +sqlite_path = "~/.local/share/kebab/kebab.sqlite" +vector_path = "~/.local/share/kebab/lancedb" +raw_asset_path = "~/.local/share/kebab/assets" +artifact_path = "~/.local/share/kebab/artifacts" [indexing] max_parallel_extractors = 2 @@ -609,7 +609,7 @@ context_tokens = 32768 # 13. Trait 계약 -컴포넌트는 trait으로 연결한다. 아래 계약을 `kb-core`에 둔다. +컴포넌트는 trait으로 연결한다. 아래 계약을 `kebab-core`에 둔다. ```rust pub trait SourceConnector { @@ -741,14 +741,14 @@ pub struct Answer { CLI는 가장 먼저 만든다. ```text -kb init -kb ingest -kb index -kb search -kb ask -kb inspect doc -kb inspect chunk -kb doctor +kebab init +kebab ingest +kebab index +kebab search +kebab ask +kebab inspect doc +kebab inspect chunk +kebab doctor ``` CLI는 개발과 테스트의 기준점이다. TUI와 desktop app은 CLI 기능이 안정된 뒤 붙인다. @@ -791,10 +791,10 @@ Desktop app은 가장 나중에 만든다. 이유는 UI보다 먼저 domain mode 산출물: ```text -kb-core -kb-config -kb-app -kb-cli +kebab-core +kebab-config +kebab-app +kebab-cli docs/spec/* fixtures/markdown/* ``` @@ -804,7 +804,7 @@ fixtures/markdown/* ```text cargo check --workspace cargo test --workspace -kb --help +kebab --help ``` ## Phase 1 - Markdown ingestion @@ -814,19 +814,19 @@ kb --help 구현 crate: ```text -kb-source-fs -kb-parse-md -kb-normalize -kb-chunk -kb-store-sqlite +kebab-source-fs +kebab-parse-md +kebab-normalize +kebab-chunk +kebab-store-sqlite ``` 완료 조건: ```text -kb ingest ~/KnowledgeBase -kb list docs -kb inspect doc +kebab ingest ~/KnowledgeBase +kebab list docs +kebab inspect doc ``` ## Phase 2 - Lexical search @@ -836,14 +836,14 @@ kb inspect doc 완료 조건: ```text -kb search "Rust workspace 설계" +kebab search "Rust workspace 설계" ``` 결과는 citation을 포함해야 한다. ```text 1. Rust workspace는 여러 package를 하나로 관리한다... - source: notes/rust/kb.md:L12-L34 + source: notes/rust/kebab.md:L12-L34 ``` ## Phase 3 - Vector search와 embedding @@ -853,18 +853,18 @@ kb search "Rust workspace 설계" 구현 crate: ```text -kb-embed -kb-embed-local -kb-store-vector -kb-search +kebab-embed +kebab-embed-local +kebab-store-vector +kebab-search ``` 완료 조건: ```text -kb index --embeddings -kb search --mode vector "비슷한 설계 원칙" -kb search --mode hybrid "Markdown chunking 규칙" +kebab index --embeddings +kebab search --mode vector "비슷한 설계 원칙" +kebab search --mode hybrid "Markdown chunking 규칙" ``` ## Phase 4 - Local LLM RAG @@ -874,15 +874,15 @@ kb search --mode hybrid "Markdown chunking 규칙" 구현 crate: ```text -kb-llm -kb-llm-local -kb-rag +kebab-llm +kebab-llm-local +kebab-rag ``` 완료 조건: ```text -kb ask "내 KB 설계에서 저장소 전략은?" +kebab ask "내 KB 설계에서 저장소 전략은?" ``` 답변은 citation을 포함해야 하며, 근거가 없으면 거절해야 한다. @@ -895,7 +895,7 @@ kb ask "내 KB 설계에서 저장소 전략은?" ```text fixtures/golden_queries.yaml -kb-eval +kebab-eval ``` 측정값: @@ -915,8 +915,8 @@ answer groundedness 완료 조건: ```text -kb ingest ./assets/diagram.png -kb search "이미지 안의 OCR 텍스트" +kebab ingest ./assets/diagram.png +kebab search "이미지 안의 OCR 텍스트" ``` ## Phase 7 - PDF support @@ -926,8 +926,8 @@ kb search "이미지 안의 OCR 텍스트" 완료 조건: ```text -kb ingest ./paper.pdf -kb search "PDF 안의 특정 개념" +kebab ingest ./paper.pdf +kebab search "PDF 안의 특정 개념" ``` ## Phase 8 - 음성 support @@ -937,8 +937,8 @@ kb search "PDF 안의 특정 개념" 완료 조건: ```text -kb ingest ./meeting.m4a -kb search "회의에서 언급한 결정사항" +kebab ingest ./meeting.m4a +kebab search "회의에서 언급한 결정사항" ``` ## Phase 9 - TUI와 desktop app @@ -948,7 +948,7 @@ kb search "회의에서 언급한 결정사항" 순서: ```text -kb-tui -> kb-desktop +kebab-tui -> kebab-desktop ``` # 18. 테스트 전략 @@ -986,7 +986,7 @@ AI에게 “전체 repo를 만들어줘”라고 시키지 말고, component spe 템플릿은 다음과 같다. ```text -Component: kb-parse-md +Component: kebab-parse-md Responsibility: - Markdown bytes를 CanonicalDocument로 변환한다. @@ -994,17 +994,17 @@ Responsibility: - line range 또는 byte range를 최대한 보존한다. Allowed dependencies: -- kb-core +- kebab-core - pulldown-cmark 또는 comrak - serde - thiserror Forbidden dependencies: -- kb-store -- kb-llm -- kb-rag -- kb-tui -- kb-desktop +- kebab-store +- kebab-llm +- kebab-rag +- kebab-tui +- kebab-desktop Inputs: - RawAsset @@ -1052,7 +1052,7 @@ Non-goals: ## 1-2일차 - workspace 생성 -- `kb-core` 도메인 타입 초안 +- `kebab-core` 도메인 타입 초안 - ID 규칙 문서화 - CLI skeleton @@ -1073,7 +1073,7 @@ Non-goals: - SQLite FTS5 검색 - citation 출력 -- `kb inspect` 구현 +- `kebab inspect` 구현 ## 12-14일차 @@ -1100,7 +1100,7 @@ MVP 완료 조건은 다음과 같다. - [ ] parser/chunker version을 바꾸면 재처리 대상이 식별된다. - [ ] local embedding을 붙일 수 있는 trait이 있다. - [ ] local LLM을 붙일 수 있는 trait이 있다. -- [ ] `kb-app` facade를 통해 CLI가 동작한다. +- [ ] `kebab-app` facade를 통해 CLI가 동작한다. P1 완료 조건은 다음과 같다. diff --git a/tasks/HOTFIXES.md b/tasks/HOTFIXES.md index ca853e4..9e63261 100644 --- a/tasks/HOTFIXES.md +++ b/tasks/HOTFIXES.md @@ -14,30 +14,30 @@ historical contract that was implemented; this file accumulates the deltas so phase 5+ readers can find the live behavior without diffing git history. -## 2026-05-01 — `--config` flag silently ignored across all kb-cli subcommands +## 2026-05-01 — `--config` flag silently ignored across all kebab-cli subcommands -**Discovered**: post-P3-5 manual smoke at `/tmp/kb-smoke/`. +**Discovered**: post-P3-5 manual smoke at `/tmp/kebab-smoke/`. -**Symptom**: `kb --config /path/to/config.toml ingest|search|list|inspect|doctor` ignored the flag and fell back to `~/.config/kb/config.toml` (XDG default). Users had to use `KB_*` env vars to point at a non-default config. +**Symptom**: `kebab --config /path/to/config.toml ingest|search|list|inspect|doctor` ignored the flag and fell back to `~/.config/kebab/config.toml` (XDG default). Users had to use `KEBAB_*` env vars to point at a non-default config. -**Root cause**: `kb-cli` read `cli.config` only inside `Cmd::Ingest` to build `SourceScope`, then called bare `kb_app::ingest(scope, summary_only)` which internally re-loaded `Config::load(None)` (XDG path). Same pattern in `Cmd::Search` / `List` / `Inspect` / `Doctor`. P3-5 introduced `*_with_config` test seams via `#[doc(hidden)] pub fn` but kb-cli never used them. +**Root cause**: `kebab-cli` read `cli.config` only inside `Cmd::Ingest` to build `SourceScope`, then called bare `kebab_app::ingest(scope, summary_only)` which internally re-loaded `Config::load(None)` (XDG path). Same pattern in `Cmd::Search` / `List` / `Inspect` / `Doctor`. P3-5 introduced `*_with_config` test seams via `#[doc(hidden)] pub fn` but kebab-cli never used them. **Fix** (PR #20, fix/cli-config-flag-and-search-output): -- `kb-cli` now builds the Config once via `Config::load(cli.config.as_deref())` at the top of every subcommand and threads it into `kb_app::*_with_config(cfg, ...)` instead of `kb_app::*(...)`. -- `kb_app::doctor()` rewritten as `doctor_with_config_path(Option<&Path>)` that reports the actual path probed and hard-fails when `--config ` doesn't exist (defaults would otherwise mask user intent). -- `kb-app` module doc-comment updated: `#[doc(hidden)] pub fn *_with_config` is no longer "test-only seam" — it's the official "config-explicit" API consumed by CLI `--config`, integration tests, and TUI sessions. -- Same PR also improved `kb search` printer: `{:.4}` score formatting (RRF range collapses on `{:.2}`) and `> heading_path` suffix so chunks from the same document are visually distinct. +- `kebab-cli` now builds the Config once via `Config::load(cli.config.as_deref())` at the top of every subcommand and threads it into `kebab_app::*_with_config(cfg, ...)` instead of `kebab_app::*(...)`. +- `kebab_app::doctor()` rewritten as `doctor_with_config_path(Option<&Path>)` that reports the actual path probed and hard-fails when `--config ` doesn't exist (defaults would otherwise mask user intent). +- `kebab-app` module doc-comment updated: `#[doc(hidden)] pub fn *_with_config` is no longer "test-only seam" — it's the official "config-explicit" API consumed by CLI `--config`, integration tests, and TUI sessions. +- Same PR also improved `kebab search` printer: `{:.4}` score formatting (RRF range collapses on `{:.2}`) and `> heading_path` suffix so chunks from the same document are visually distinct. **Amends**: tasks/p3/p3-5-app-wiring.md (the test seam was always meant to be the config-explicit API; only the doc-comment lied). -### 2026-05-01 — `--config` regression in `kb ask` (P4-3 follow-up) +### 2026-05-01 — `--config` regression in `kebab ask` (P4-3 follow-up) **Discovered**: post-P4-3 manual smoke against 192.168.0.47 Ollama with `gemma4:26b`. -**Symptom**: `kb --config ask` returned `model.id = qwen2.5:14b-instruct` (XDG default model) and `score_gate = 0.30` (XDG default), instead of `gemma4:26b` / `0.05` from the explicit config. P4-3 added the ask body but kb-cli's `Cmd::Ask` arm still called bare `kb_app::ask(query, opts)` — same regression class as the P3-5 fix above, just missed when ask was wired. +**Symptom**: `kebab --config ask` returned `model.id = qwen2.5:14b-instruct` (XDG default model) and `score_gate = 0.30` (XDG default), instead of `gemma4:26b` / `0.05` from the explicit config. P4-3 added the ask body but kebab-cli's `Cmd::Ask` arm still called bare `kebab_app::ask(query, opts)` — same regression class as the P3-5 fix above, just missed when ask was wired. **Fix** (PR #24, fix/cli-ask-honor-config-flag): -- `kb-cli` builds `Config::load(cli.config.as_deref())` once at the top of `Cmd::Ask` and calls `kb_app::ask_with_config(cfg, query, opts)`. +- `kebab-cli` builds `Config::load(cli.config.as_deref())` once at the top of `Cmd::Ask` and calls `kebab_app::ask_with_config(cfg, query, opts)`. **Amends**: tasks/p4/p4-3-rag-pipeline.md. @@ -48,15 +48,15 @@ git history. **Root cause**: RRF formula `score(c) = Σ 1/(k_rrf + rank_m(c))` produces values bounded by `num_retrievers / (k_rrf + 1)`. With `num_retrievers = 2` and the default `k_rrf = 60`, the upper bound is `2/61 ≈ 0.0328`. The default `config.rag.score_gate = 0.05` was calibrated for vector / lexical scores already in `[0, 1]` and silently refused every hybrid query. `fusion_score` was also incomparable across modes — Lexical / Vector lived in `[0, 1]`, Hybrid lived in `(0, 0.033]`. **Fix** (PR #25, fix/rrf-fusion-score-normalize-and-docs): -- `crates/kb-search/src/hybrid.rs` divides every raw RRF score by `2 / (k_rrf + 1)` so `fusion_score` always lives in `[0, 1]` regardless of mode. Both retrievers contributing rank 1 normalises to `1.0`; chunks present in only one retriever cap around `0.5`. RRF's rank-ordering invariants are preserved (same constant divides every score), so sort + tiebreak behaviour is identical. +- `crates/kebab-search/src/hybrid.rs` divides every raw RRF score by `2 / (k_rrf + 1)` so `fusion_score` always lives in `[0, 1]` regardless of mode. Both retrievers contributing rank 1 normalises to `1.0`; chunks present in only one retriever cap around `0.5`. RRF's rank-ordering invariants are preserved (same constant divides every score), so sort + tiebreak behaviour is identical. - One unit test (`rrf_formula_matches_known_value`) updated to expect the normalised value `(1/61 + 1/62) / (2/61) ≈ 0.9919`. -- The integration snapshot `crates/kb-search/tests/fixtures/search/hybrid/run-1.json` already used presence checks (`fusion_score_positive: true`) rather than absolute values, so it didn't need regeneration. +- The integration snapshot `crates/kebab-search/tests/fixtures/search/hybrid/run-1.json` already used presence checks (`fusion_score_positive: true`) rather than absolute values, so it didn't need regeneration. **Why not a per-mode `score_gate` config**: separate `lexical_score_gate / vector_score_gate / hybrid_score_gate` would force every downstream consumer (CLI, eval, TUI) to know which mode picks which threshold. Normalising the score itself is a one-line change at the source and makes `Answer.retrieval.score_gate` semantically meaningful without per-mode bookkeeping. **Amends**: tasks/p3/p3-4-hybrid-fusion.md (RRF formula now divides by `2/(k_rrf+1)` after summation), tasks/phase-3-vector-hybrid.md (RRF section). -**Verification**: post-fix smoke at `/tmp/kb-smoke/` with default `score_gate = 0.05` succeeded across four scenarios — Korean→Korean, English→English, cross-language, and out-of-corpus refusal. +**Verification**: post-fix smoke at `/tmp/kebab-smoke/` with default `score_gate = 0.05` succeeded across four scenarios — Korean→Korean, English→English, cross-language, and out-of-corpus refusal. ## How to add an entry diff --git a/tasks/INDEX.md b/tasks/INDEX.md index 998ba36..723f065 100644 --- a/tasks/INDEX.md +++ b/tasks/INDEX.md @@ -1,12 +1,12 @@ --- title: "KB 작업 단위 인덱스" -source: kb_local_rust_report.md +source: kebab_local_rust_report.md date: 2026-04-27 --- # KB 작업 단위 인덱스 -[`kb_local_rust_report.md`](../kb_local_rust_report.md) 의 Phase 로드맵을 아키텍처 수준 작업 단위로 분해. 각 task 문서는 독립적으로 착수/검수 가능한 단위. +[`kebab_local_rust_report.md`](../kebab_local_rust_report.md) 의 Phase 로드맵을 아키텍처 수준 작업 단위로 분해. 각 task 문서는 독립적으로 착수/검수 가능한 단위. ## 의존 그래프 @@ -25,16 +25,16 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능. | # | 코드 | 제목 | 핵심 산출 crate | 선행 | |---|------|------|----------------|------| -| P0 | [phase-0-skeleton.md](phase-0-skeleton.md) | Workspace 뼈대 + 도메인 계약 | kb-core, kb-parse-types, kb-config, kb-app, kb-cli | – | -| P1 | [phase-1-markdown-ingestion.md](phase-1-markdown-ingestion.md) | Markdown ingestion 파이프라인 | kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-sqlite | P0 | -| P2 | [phase-2-lexical-search.md](phase-2-lexical-search.md) | SQLite FTS5 lexical 검색 + citation | kb-search (lexical) | P1 | -| P3 | [phase-3-vector-hybrid.md](phase-3-vector-hybrid.md) | Local embedding + LanceDB + hybrid | kb-embed, kb-embed-local, kb-store-vector, kb-search | P2 | -| P4 | [phase-4-local-llm-rag.md](phase-4-local-llm-rag.md) | Local LLM + RAG + grounded answer | kb-llm, kb-llm-local, kb-rag | P3 | -| P5 | [phase-5-evaluation.md](phase-5-evaluation.md) | Golden query / regression eval | kb-eval | P4 | -| P6 | [phase-6-image.md](phase-6-image.md) | 이미지 ingestion (OCR + caption) | kb-parse-image | P5 | -| P7 | [phase-7-pdf.md](phase-7-pdf.md) | PDF text + page citation | kb-parse-pdf | P5 | -| P8 | [phase-8-audio.md](phase-8-audio.md) | 음성 transcription + timestamp citation | kb-parse-audio | P5 | -| P9 | [phase-9-ui.md](phase-9-ui.md) | TUI + desktop app | kb-tui, kb-desktop | P5 | +| P0 | [phase-0-skeleton.md](phase-0-skeleton.md) | Workspace 뼈대 + 도메인 계약 | kebab-core, kebab-parse-types, kebab-config, kebab-app, kebab-cli | – | +| P1 | [phase-1-markdown-ingestion.md](phase-1-markdown-ingestion.md) | Markdown ingestion 파이프라인 | kebab-source-fs, kebab-parse-md, kebab-normalize, kebab-chunk, kebab-store-sqlite | P0 | +| P2 | [phase-2-lexical-search.md](phase-2-lexical-search.md) | SQLite FTS5 lexical 검색 + citation | kebab-search (lexical) | P1 | +| P3 | [phase-3-vector-hybrid.md](phase-3-vector-hybrid.md) | Local embedding + LanceDB + hybrid | kebab-embed, kebab-embed-local, kebab-store-vector, kebab-search | P2 | +| P4 | [phase-4-local-llm-rag.md](phase-4-local-llm-rag.md) | Local LLM + RAG + grounded answer | kebab-llm, kebab-llm-local, kebab-rag | P3 | +| P5 | [phase-5-evaluation.md](phase-5-evaluation.md) | Golden query / regression eval | kebab-eval | P4 | +| P6 | [phase-6-image.md](phase-6-image.md) | 이미지 ingestion (OCR + caption) | kebab-parse-image | P5 | +| P7 | [phase-7-pdf.md](phase-7-pdf.md) | PDF text + page citation | kebab-parse-pdf | P5 | +| P8 | [phase-8-audio.md](phase-8-audio.md) | 음성 transcription + timestamp citation | kebab-parse-audio | P5 | +| P9 | [phase-9-ui.md](phase-9-ui.md) | TUI + desktop app | kebab-tui, kebab-desktop | P5 | ## Component task decomposition (per phase) diff --git a/tasks/_template.md b/tasks/_template.md index 6b1c863..d08ff76 100644 --- a/tasks/_template.md +++ b/tasks/_template.md @@ -6,7 +6,7 @@ title: "" status: planned depends_on: [] # other task_ids unblocks: [] # other task_ids -contract_source: ../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [] # e.g. [§3.5, §5.5, §7.2] --- @@ -22,7 +22,7 @@ contract_sections: [] # e.g. [§3.5, §5.5, §7.2] ## Allowed dependencies -- `kb-core` +- `kebab-core` - - @@ -71,7 +71,7 @@ If any item here is needed during implementation, STOP and update the frozen des | unit | ... | ... | | snapshot | ... (JSON freeze) | `fixtures/...` | | contract | trait round-trip | mock impls | -| integration | end-to-end via `kb-app` facade | tmp workspace | +| integration | end-to-end via `kebab-app` facade | tmp workspace | All tests must run under `cargo test -p ` and not require external network or Ollama unless explicitly stated. diff --git a/tasks/p0/p0-1-skeleton.md b/tasks/p0/p0-1-skeleton.md index 15b5665..9726aa4 100644 --- a/tasks/p0/p0-1-skeleton.md +++ b/tasks/p0/p0-1-skeleton.md @@ -1,12 +1,12 @@ --- phase: P0 -component: workspace + kb-core + kb-config + kb-app + kb-cli +component: workspace + kebab-core + kebab-config + kebab-app + kebab-cli task_id: p0-1 title: "Workspace skeleton + frozen domain types/traits + ID recipe + facade" status: completed depends_on: [] unblocks: [p1-1, p1-2, p1-3, p1-4, p1-5, p1-6, p2-1, p2-2, p3-1, p3-2, p3-3, p3-4, p4-1, p4-2, p4-3, p5-1, p5-2, p6-1, p6-2, p6-3, p7-1, p7-2, p8-1, p8-2, p9-1, p9-2, p9-3, p9-4, p9-5] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3 (all), §4, §5.1 schema_meta+migrations, §6 (config + XDG), §7 (all traits), §8 module boundaries, §9 versioning, §10 errors+exit codes, §2.8 wire schema_version] --- @@ -14,56 +14,56 @@ contract_sections: [§3 (all), §4, §5.1 schema_meta+migrations, §6 (config + ## Goal -Stand up the Cargo workspace (Rust 2024, resolver=3) with `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts. +Stand up the Cargo workspace (Rust 2024, resolver=3) with `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts. ## Why now / why this size -Every other task imports `kb-core`. If types or trait signatures wobble after this point, every downstream task spec drifts. This task is large but indivisible: types + traits + ID recipe + facade + CLI skeleton + wire schema stubs must land together so the rest of the workspace can compile against them. +Every other task imports `kebab-core`. If types or trait signatures wobble after this point, every downstream task spec drifts. This task is large but indivisible: types + traits + ID recipe + facade + CLI skeleton + wire schema stubs must land together so the rest of the workspace can compile against them. ## Allowed dependencies - workspace `[workspace.dependencies]`: `anyhow = "1"`, `thiserror = "2"`, `serde = { version = "1", features = ["derive"] }`, `serde_json = "1"`, `time = { version = "0.3", features = ["serde", "macros"] }`, `uuid = { version = "1", features = ["v7", "serde"] }`, `blake3 = "1"`, `tracing = "0.1"` - per crate: - - `kb-core`: workspace deps + `serde_json::Map`, `serde-json-canonicalizer`, `unicode-normalization` - - `kb-parse-types`: workspace deps + `kb-core` ONLY (no parsers, no stores, no normalize). Defines parser intermediate representations per design §3.7b. - - `kb-config`: workspace deps + `toml = "0.8"`, `dirs = "5"` (XDG paths) - - `kb-app`: workspace deps + `kb-core`, `kb-config`, `tracing-subscriber`, `tracing-appender` - - `kb-cli`: workspace deps + `kb-core`, `kb-config`, `kb-app`, `clap = { version = "4", features = ["derive"] }` + - `kebab-core`: workspace deps + `serde_json::Map`, `serde-json-canonicalizer`, `unicode-normalization` + - `kebab-parse-types`: workspace deps + `kebab-core` ONLY (no parsers, no stores, no normalize). Defines parser intermediate representations per design §3.7b. + - `kebab-config`: workspace deps + `toml = "0.8"`, `dirs = "5"` (XDG paths) + - `kebab-app`: workspace deps + `kebab-core`, `kebab-config`, `tracing-subscriber`, `tracing-appender` + - `kebab-cli`: workspace deps + `kebab-core`, `kebab-config`, `kebab-app`, `clap = { version = "4", features = ["derive"] }` ## Forbidden dependencies -- `kb-core` MUST NOT depend on any other `kb-*` crate. -- `kb-parse-types` MUST depend ONLY on `kb-core`. No parser libraries (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`, …), no other `kb-*` crate. -- `kb-config` MUST NOT depend on `kb-app`, `kb-cli`, parsers, stores, embedders, search, llm, rag, tui, desktop. -- `kb-app` MUST NOT yet depend on parsers/stores/embedders/search/llm/rag (those crates do not exist yet — facade methods stub out and return `unimplemented!()` or `anyhow::bail!("not yet wired (Pn-i)")`). -- `kb-cli` MUST NOT call any non-`kb-app` crate directly. +- `kebab-core` MUST NOT depend on any other `kebab-*` crate. +- `kebab-parse-types` MUST depend ONLY on `kebab-core`. No parser libraries (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`, …), no other `kebab-*` crate. +- `kebab-config` MUST NOT depend on `kebab-app`, `kebab-cli`, parsers, stores, embedders, search, llm, rag, tui, desktop. +- `kebab-app` MUST NOT yet depend on parsers/stores/embedders/search/llm/rag (those crates do not exist yet — facade methods stub out and return `unimplemented!()` or `anyhow::bail!("not yet wired (Pn-i)")`). +- `kebab-cli` MUST NOT call any non-`kebab-app` crate directly. ## Inputs | input | type | source | |-------|------|--------| -| frozen design doc | Markdown | `docs/superpowers/specs/2026-04-27-kb-final-form-design.md` | -| user `kb` invocation | command-line args | end user | +| frozen design doc | Markdown | `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` | +| user `kebab` invocation | command-line args | end user | ## Outputs | output | type | downstream consumer | |--------|------|---------------------| | compiling workspace | Rust crates | every later task | -| `kb-core` types/traits | Rust API | every other crate | -| `kb-core` ID functions | Rust API | parsers, normalize, chunkers, embedders, search, rag | -| `kb-config::Config` | Rust struct | every other crate | -| `kb-app` facade methods (stubs) | Rust API | `kb-cli`, future TUI/desktop | -| `kb` binary | executable | end user | +| `kebab-core` types/traits | Rust API | every other crate | +| `kebab-core` ID functions | Rust API | parsers, normalize, chunkers, embedders, search, rag | +| `kebab-config::Config` | Rust struct | every other crate | +| `kebab-app` facade methods (stubs) | Rust API | `kebab-cli`, future TUI/desktop | +| `kebab` binary | executable | end user | | `docs/wire-schema/v1/*.schema.json` stubs | JSON Schema files | future wire emitters and consumers | | `docs/spec/*.md` stubs (link to frozen design) | Markdown | future contributors | ## Public surface (signatures only — no new types) -All types/traits below are defined in `kb-core` exactly per design §3 and §7 (no additions, no renames). Subagent must copy field-for-field. +All types/traits below are defined in `kebab-core` exactly per design §3 and §7 (no additions, no renames). Subagent must copy field-for-field. ```rust -// ── kb-core ───────────────────────────────────────────────────────────────── +// ── kebab-core ───────────────────────────────────────────────────────────────── // Newtype IDs (design §3.1) — Display + FromStr implemented. pub struct AssetId(pub String); @@ -114,7 +114,7 @@ pub struct AudioRefBlock { /* per §3.4 */ } pub enum Inline { /* per §3.4 */ } pub enum SourceSpan { /* per §3.4 */ } -// (ParsedBlock + parser intermediates live in kb-parse-types per design §3.7b — NOT in kb-core.) +// (ParsedBlock + parser intermediates live in kebab-parse-types per design §3.7b — NOT in kebab-core.) // Chunk + Citation (§3.5) pub struct Chunk { /* per §3.5 */ } @@ -232,14 +232,14 @@ pub fn nfc(input: &str) -> String; // §4.1 ``` ```rust -// ── kb-parse-types ────────────────────────────────────────────────────────── +// ── kebab-parse-types ────────────────────────────────────────────────────────── // Per design §3.7b. Defines parser intermediate representations consumed by -// kb-normalize. Depends on kb-core only — never on parser libraries. +// kebab-normalize. Depends on kebab-core only — never on parser libraries. pub struct ParsedBlock { pub kind: ParsedBlockKind, pub heading_path: Vec, - pub source_span: kb_core::SourceSpan, + pub source_span: kebab_core::SourceSpan, pub payload: ParsedPayload, } @@ -247,16 +247,16 @@ pub enum ParsedBlockKind { Heading, Paragraph, List, Code, Table, Quote, ImageRe pub enum ParsedPayload { Heading { level: u8, text: String }, - Paragraph { text: String, inlines: Vec }, - List { ordered: bool, items: Vec> }, + Paragraph { text: String, inlines: Vec }, + List { ordered: bool, items: Vec> }, Code { lang: Option, code: String }, Table { headers: Vec, rows: Vec> }, - Quote { text: String, inlines: Vec }, + Quote { text: String, inlines: Vec }, ImageRef { src: String, alt: String }, AudioRef { src: String }, } -// `Inline` itself lives in kb-core (§3.4) — parse-types references it, never duplicates it. +// `Inline` itself lives in kebab-core (§3.4) — parse-types references it, never duplicates it. pub struct Warning { pub kind: WarningKind, pub note: String } pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed } @@ -268,31 +268,31 @@ pub struct ParsedAudioSegment; ``` ```rust -// ── kb-config ─────────────────────────────────────────────────────────────── +// ── kebab-config ─────────────────────────────────────────────────────────────── pub struct Config { /* full schema per §6.4 */ } impl Config { pub fn load(path: Option<&std::path::Path>) -> anyhow::Result; pub fn from_file(path: &std::path::Path) -> anyhow::Result; pub fn defaults() -> Self; pub fn apply_env(self, env: &std::collections::HashMap) -> Self; - pub fn xdg_config_path() -> std::path::PathBuf; // ~/.config/kb/config.toml - pub fn xdg_data_dir() -> std::path::PathBuf; // ~/.local/share/kb + pub fn xdg_config_path() -> std::path::PathBuf; // ~/.config/kebab/config.toml + pub fn xdg_data_dir() -> std::path::PathBuf; // ~/.local/share/kebab pub fn xdg_cache_dir() -> std::path::PathBuf; pub fn xdg_state_dir() -> std::path::PathBuf; } ``` ```rust -// ── kb-app ────────────────────────────────────────────────────────────────── +// ── kebab-app ────────────────────────────────────────────────────────────────── pub fn init_workspace(force: bool) -> anyhow::Result<()>; -pub fn ingest(scope: kb_core::SourceScope, summary_only: bool) -> anyhow::Result; -pub fn list_docs(filter: kb_core::DocFilter) -> anyhow::Result>; -pub fn inspect_doc(id: &kb_core::DocumentId) -> anyhow::Result; -pub fn inspect_chunk(id: &kb_core::ChunkId) -> anyhow::Result; -pub fn search(query: kb_core::SearchQuery) -> anyhow::Result>; -pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result; +pub fn ingest(scope: kebab_core::SourceScope, summary_only: bool) -> anyhow::Result; +pub fn list_docs(filter: kebab_core::DocFilter) -> anyhow::Result>; +pub fn inspect_doc(id: &kebab_core::DocumentId) -> anyhow::Result; +pub fn inspect_chunk(id: &kebab_core::ChunkId) -> anyhow::Result; +pub fn search(query: kebab_core::SearchQuery) -> anyhow::Result>; +pub fn ask(query: &str, opts: AskOpts) -> anyhow::Result; pub fn doctor() -> anyhow::Result; -pub struct AskOpts { pub k: usize, pub explain: bool, pub mode: kb_core::SearchMode, pub temperature: Option, pub seed: Option } +pub struct AskOpts { pub k: usize, pub explain: bool, pub mode: kebab_core::SearchMode, pub temperature: Option, pub seed: Option } pub struct DoctorReport { pub ok: bool, pub checks: Vec } pub struct DoctorCheck { pub name: String, pub ok: bool, pub detail: String, pub hint: Option } ``` @@ -300,9 +300,9 @@ pub struct DoctorCheck { pub name: String, pub ok: bool, pub detail: String, pub P0 facade implementations call `anyhow::bail!("not yet wired (P-)")`; later phases replace bodies but never change signatures. ```rust -// ── kb-cli ────────────────────────────────────────────────────────────────── +// ── kebab-cli ────────────────────────────────────────────────────────────────── // clap subcommands: init | ingest | list (docs) | inspect (doc|chunk) | search | ask | doctor | eval (subcommand placeholder) -// Each maps 1:1 to a kb_app function. Exit code mapping per §10. +// Each maps 1:1 to a kebab_app function. Exit code mapping per §10. ``` ## Behavior contract @@ -312,18 +312,18 @@ P0 facade implementations call `anyhow::bail!("not yet wired (P-)")`; late - `id_from` uses `serde-json-canonicalizer` exactly as design §4.2 specifies and truncates blake3 to 32 hex chars. - `Citation::to_uri` emits W3C Media Fragments URIs per §0 Q3 (`#L-L`, `#p=`, `#xywh=…`, `#caption`, `#t=hh:mm:ss,hh:mm:ss[&speaker=…]`). - `Citation::parse` is the strict inverse (round-trip property). -- `kb-config` resolves XDG paths via `dirs` crate; respects `XDG_CONFIG_HOME`, `XDG_DATA_HOME`, `XDG_CACHE_HOME`, `XDG_STATE_HOME` if set. -- Config layer order: defaults → file → env (`KB_
_`) → CLI flag (CLI override is applied by `kb-cli` after `Config::load`). -- `kb-cli` global flags: `--config `, `--verbose`, `--debug`, `--json`, `--explain` (where applicable). On `--json`, output conforms to wire schema v1. -- `kb-cli` exit codes: 0 success, 1 no-hit/refusal, 2 error, 3 doctor unhealthy (per §10). +- `kebab-config` resolves XDG paths via `dirs` crate; respects `XDG_CONFIG_HOME`, `XDG_DATA_HOME`, `XDG_CACHE_HOME`, `XDG_STATE_HOME` if set. +- Config layer order: defaults → file → env (`KB_
_`) → CLI flag (CLI override is applied by `kebab-cli` after `Config::load`). +- `kebab-cli` global flags: `--config `, `--verbose`, `--debug`, `--json`, `--explain` (where applicable). On `--json`, output conforms to wire schema v1. +- `kebab-cli` exit codes: 0 success, 1 no-hit/refusal, 2 error, 3 doctor unhealthy (per §10). - All facade-returned wire objects emit `schema_version` per §2 (e.g., `"answer.v1"`, `"search_hit.v1"`). ## Storage / wire effects -- Filesystem: creates `~/.config/kb/`, `~/.local/share/kb/`, `~/KnowledgeBase/` only when `kb init` runs; never on `Config::load`. +- Filesystem: creates `~/.config/kebab/`, `~/.local/share/kebab/`, `~/KnowledgeBase/` only when `kebab init` runs; never on `Config::load`. - Wire schemas: ships `docs/wire-schema/v1/{citation,search_hit,answer,ingest_report,doc_summary,chunk_inspection,doctor}.schema.json` as **stubs** declaring the top-level `schema_version` and required fields per §2. Full property validation can land later. -- DB: workspace ships `migrations/V001__init.sql` containing **only** §5.1 `schema_meta` + `migrations` tables (the full schema lands in p1-6's migration file or p0-1 may pre-stage the empty migrations directory; choose the former to keep this task within `kb-core`/`kb-config`/`kb-app`/`kb-cli` scope). -- Logging: `tracing` initialized in `kb-cli`; daily-rolling file in `~/.local/state/kb/logs/`. +- DB: workspace ships `migrations/V001__init.sql` containing **only** §5.1 `schema_meta` + `migrations` tables (the full schema lands in p1-6's migration file or p0-1 may pre-stage the empty migrations directory; choose the former to keep this task within `kebab-core`/`kebab-config`/`kebab-app`/`kebab-cli` scope). +- Logging: `tracing` initialized in `kebab-cli`; daily-rolling file in `~/.local/state/kebab/logs/`. ## Test plan @@ -336,20 +336,20 @@ P0 facade implementations call `anyhow::bail!("not yet wired (P-)")`; late | unit | newtype `Display`/`FromStr` rejects invalid lengths/chars | inline | | unit | `Config::defaults` + env override + CLI override produces expected merged config | inline | | snapshot | `Config::defaults` JSON serde stable | inline (round-trip) | -| smoke | `kb --help`, `kb init`, `kb doctor` run; doctor reports config_loaded ✓ data_dir_writable ✓ even with no DB present (downstream checks may fail with hint) | tmp `XDG_*` env | +| smoke | `kebab --help`, `kebab init`, `kebab doctor` run; doctor reports config_loaded ✓ data_dir_writable ✓ even with no DB present (downstream checks may fail with hint) | tmp `XDG_*` env | | build | `cargo check --workspace` and `cargo test --workspace` pass | repo | All tests must run with no network, no Ollama, no models. ## Definition of Done -- [ ] `Cargo.toml` workspace lists `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` and resolver=3, edition 2024 +- [ ] `Cargo.toml` workspace lists `kebab-core`, `kebab-parse-types`, `kebab-config`, `kebab-app`, `kebab-cli` and resolver=3, edition 2024 - [ ] `cargo check --workspace` passes - [ ] `cargo test --workspace` passes -- [ ] `kb-parse-types` `cargo tree` shows ONLY `kb-core` + `serde`/`thiserror` style deps (no parser libs, no other `kb-*`) -- [ ] `kb --help` prints subcommands -- [ ] `kb init` creates XDG dirs idempotently and writes `config.toml` -- [ ] `kb doctor` returns wire JSON conforming to `doctor.v1` (in `--json` mode) +- [ ] `kebab-parse-types` `cargo tree` shows ONLY `kebab-core` + `serde`/`thiserror` style deps (no parser libs, no other `kebab-*`) +- [ ] `kebab --help` prints subcommands +- [ ] `kebab init` creates XDG dirs idempotently and writes `config.toml` +- [ ] `kebab doctor` returns wire JSON conforming to `doctor.v1` (in `--json` mode) - [ ] `docs/wire-schema/v1/*.schema.json` stubs exist (7 files: citation, search_hit, answer, ingest_report, doc_summary, chunk_inspection, doctor) - [ ] `docs/spec/` stubs exist linking to the frozen design (one file per: domain-model, ids, canonical-document, chunk-policy, citation-policy, module-boundaries, ai-generation-guidelines) - [ ] `fixtures/` root directory created with all subdirectories that downstream tasks reference: `fixtures/markdown/`, `fixtures/source-fs/`, `fixtures/search/lexical/`, `fixtures/search/hybrid/`, `fixtures/embed/`, `fixtures/vector/`, `fixtures/rag/`, `fixtures/eval/`, `fixtures/image/`, `fixtures/pdf/`, `fixtures/audio/`. Each subdir gets a `.gitkeep` so it tracks. P1 ships at minimum `fixtures/markdown/{simple-note,nested-headings,code-and-table}.md` (per epic phase-0); other dirs stay empty until their phase lands. @@ -361,12 +361,12 @@ All tests must run with no network, no Ollama, no models. - Any parser / store / embedder / search / llm / rag / tui / desktop logic (downstream phases). - Full schema migrations (most DDL lands in p1-6 / p2-1 / p3-3). - Wire schema deep validation (only required fields + `schema_version` checked here). -- Real `kb-app` business logic (functions stub with `unimplemented!()` or explicit `bail!`). +- Real `kebab-app` business logic (functions stub with `unimplemented!()` or explicit `bail!`). ## Risks / notes - ID recipe is the contract that every later record depends on. Any change after this task lands forces a `parser_version` / `chunker_version` / `embedding_version` cascade per §9. Treat changes as schema migrations and update the design doc first. - Newtype IDs use `String` (not `[u8; 16]`) to keep serde simple; tests must still enforce 32-char hex constraint on `FromStr`. -- `kb-app` stubs must use `bail!` not `panic!` so the CLI exits with code 2 cleanly per §10. +- `kebab-app` stubs must use `bail!` not `panic!` so the CLI exits with code 2 cleanly per §10. - `clap` v4 derive: subcommand `inspect` has nested `doc` / `chunk` variants; ensure exit code 0/1/2 mapping wraps the facade call uniformly. - XDG path discovery on macOS: spec uses XDG (not `Application Support`) per §6.1 — `dirs` crate honors XDG env vars; tests must set them explicitly. diff --git a/tasks/p1/p1-1-source-fs.md b/tasks/p1/p1-1-source-fs.md index f849e12..4695a1a 100644 --- a/tasks/p1/p1-1-source-fs.md +++ b/tasks/p1/p1-1-source-fs.md @@ -1,12 +1,12 @@ --- phase: P1 -component: kb-source-fs +component: kebab-source-fs task_id: p1-1 title: "Local filesystem source connector" status: completed depends_on: [p0-1] unblocks: [p1-2, p1-3, p1-4, p1-5, p1-6] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.3, §6.2, §6.6, §7.1, §7.2 SourceConnector, §8] --- @@ -22,8 +22,8 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums ## Allowed dependencies -- `kb-core` -- `kb-config` +- `kebab-core` +- `kebab-config` - `ignore` (gitignore semantics) - `blake3` - `walkdir` @@ -34,21 +34,21 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums ## Forbidden dependencies -- `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `SourceScope` | `kb_core::SourceScope` | `kb-app` from config | +| `SourceScope` | `kebab_core::SourceScope` | `kebab-app` from config | | filesystem | `&Path` | OS | -| `.kbignore` | text file | workspace root, optional | +| `.kebabignore` | text file | workspace root, optional | ## Outputs | output | type | downstream consumer | |--------|------|---------------------| -| `Vec` | `kb_core::RawAsset` | `kb-parse-md`, asset writer in `kb-store-sqlite` (via `kb-app`) | +| `Vec` | `kebab_core::RawAsset` | `kebab-parse-md`, asset writer in `kebab-store-sqlite` (via `kebab-app`) | ## Public surface (signatures only — no new types) @@ -56,11 +56,11 @@ Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums pub struct FsSourceConnector { /* internal */ } impl FsSourceConnector { - pub fn new(config: &kb_config::Config) -> anyhow::Result; + pub fn new(config: &kebab_config::Config) -> anyhow::Result; } -impl kb_core::SourceConnector for FsSourceConnector { - fn scan(&self, scope: &kb_core::SourceScope) -> anyhow::Result>; +impl kebab_core::SourceConnector for FsSourceConnector { + fn scan(&self, scope: &kebab_core::SourceScope) -> anyhow::Result>; } ``` @@ -70,7 +70,7 @@ impl kb_core::SourceConnector for FsSourceConnector { - `asset_id` derived per design §4.2 from `blake3(raw bytes)` full hex. - `media_type` selected from extension + libmagic-like sniff fallback (`.md` → Markdown, others fall through to `MediaType::Other`). - `discovered_at` = current `OffsetDateTime::now_utc()` at scan time. -- Combine `config.workspace.exclude` ∪ `.kbignore` for filter (union; ordering does not matter). +- Combine `config.workspace.exclude` ∪ `.kebabignore` for filter (union; ordering does not matter). - Symbolic links: follow once, detect cycles via `canonicalize` + visited set. - Files larger than `storage.copy_threshold_mb` MB → emit `AssetStorage::Reference { path, sha }` (do not copy bytes here; copying is done by the asset writer task). - Idempotent: same input → same `Vec` (sort by `workspace_path`). @@ -78,7 +78,7 @@ impl kb_core::SourceConnector for FsSourceConnector { ## Storage / wire effects - Reads: filesystem under `config.workspace.root`. -- Writes: nothing. (Asset copy is handled by the asset writer in `kb-store-sqlite`.) +- Writes: nothing. (Asset copy is handled by the asset writer in `kebab-store-sqlite`.) ## Test plan @@ -87,25 +87,25 @@ impl kb_core::SourceConnector for FsSourceConnector { | unit | POSIX path normalization | inline cases incl. `./a/b.md`, `a//b.md`, `a/b.md` → identical | | unit | blake3 of known bytes matches expected hex | inline | | unit | gitignore filter (`*.tmp`, `node_modules/**`) excludes correctly | tmp tree built in test | -| unit | `.kbignore` ∪ config exclude works | tmp tree | +| unit | `.kebabignore` ∪ config exclude works | tmp tree | | unit | symlink cycle does not loop | tmp tree with `a -> b -> a` | | snapshot | `Vec` serialized JSON for fixture tree is stable | `fixtures/source-fs/tree-1` | | determinism | re-running scan twice produces byte-identical JSON | `fixtures/source-fs/tree-1` | -All tests run under `cargo test -p kb-source-fs` with no network and no model. +All tests run under `cargo test -p kebab-source-fs` with no network and no model. ## Definition of Done -- [ ] `cargo check -p kb-source-fs` passes -- [ ] `cargo test -p kb-source-fs` passes +- [ ] `cargo check -p kebab-source-fs` passes +- [ ] `cargo test -p kebab-source-fs` passes - [ ] Snapshot test `fixtures/source-fs/tree-1` round-trips deterministically -- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kb-source-fs`) +- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kebab-source-fs`) - [ ] PR description links to design §3.3, §6.2, §7.2 ## Out of scope - File watching (P+). -- Asset copy/reference storage on disk (`kb-store-sqlite` task p1-6). +- Asset copy/reference storage on disk (`kebab-store-sqlite` task p1-6). - Non-fs source connectors (HTTP, S3 — P+). ## Risks / notes diff --git a/tasks/p1/p1-2-parse-md-frontmatter.md b/tasks/p1/p1-2-parse-md-frontmatter.md index 07f2e02..7910ba8 100644 --- a/tasks/p1/p1-2-parse-md-frontmatter.md +++ b/tasks/p1/p1-2-parse-md-frontmatter.md @@ -1,29 +1,29 @@ --- phase: P1 -component: kb-parse-md (frontmatter submodule) +component: kebab-parse-md (frontmatter submodule) task_id: p1-2 title: "Markdown frontmatter parsing → Metadata" status: completed depends_on: [p0-1] unblocks: [p1-4] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md -contract_sections: [design §3.6 Metadata, design §3.7b kb-parse-types (Warning), design §0 Q9 frontmatter, design §10 errors] +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md +contract_sections: [design §3.6 Metadata, design §3.7b kebab-parse-types (Warning), design §0 Q9 frontmatter, design §10 errors] --- # p1-2 — Markdown frontmatter parsing ## Goal -Parse YAML/TOML frontmatter from Markdown bytes into `kb_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`. +Parse YAML/TOML frontmatter from Markdown bytes into `kebab_core::Metadata`, with auto-derive defaults and unknown-key preservation in `metadata.user`. ## Why now / why this size -Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kb-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9. +Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from block parsing keeps both halves of `kebab-parse-md` simple and lets us reach 100% test coverage on the rules in design §0 Q9. ## Allowed dependencies -- `kb-core` -- `kb-parse-types` (provides shared `Warning` + `WarningKind` per design §3.7b) +- `kebab-core` +- `kebab-parse-types` (provides shared `Warning` + `WarningKind` per design §3.7b) - `serde` - `serde_yaml` (or `yaml-rust2`) for YAML - `toml` for TOML @@ -33,7 +33,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from ## Forbidden dependencies -- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-tui`, `kb-desktop`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `pulldown-cmark` (block parser is a sibling task) +- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-tui`, `kebab-desktop`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `pulldown-cmark` (block parser is a sibling task) ## Inputs @@ -46,7 +46,7 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from | output | type | downstream | |--------|------|------------| -| `(Metadata, Option, Vec)` | tuple | `kb-normalize` → CanonicalDocument | +| `(Metadata, Option, Vec)` | tuple | `kebab-normalize` → CanonicalDocument | ## Public surface (signatures only — no new types) @@ -54,10 +54,10 @@ Frontmatter is small but contractually load-bearing (Q9 spec). Isolating it from pub fn parse_frontmatter( bytes: &[u8], hints: &BodyHints, -) -> anyhow::Result<(kb_core::Metadata, Option, Vec)>; +) -> anyhow::Result<(kebab_core::Metadata, Option, Vec)>; ``` -`Warning` / `WarningKind` come from `kb-parse-types` (shared with `p1-3` blocks parser and downstream `kb-normalize`). `FrontmatterSpan` is crate-internal; if any new public type is needed, STOP and update the frozen design doc first. +`Warning` / `WarningKind` come from `kebab-parse-types` (shared with `p1-3` blocks parser and downstream `kebab-normalize`). `FrontmatterSpan` is crate-internal; if any new public type is needed, STOP and update the frozen design doc first. ## Behavior contract @@ -68,7 +68,7 @@ pub fn parse_frontmatter( - `source_type` default `markdown`; `trust_level` default `primary`. - `aliases`, `tags` default empty. - Unknown keys → `metadata.user` (`serde_json::Map`), preserved verbatim, no warning. -- Unknown enum value (e.g. `trust_level: weird`) → emit `kb_parse_types::Warning { kind: WarningKind::MalformedFrontmatter, note: "unknown trust_level=weird, defaulted to primary" }` + ingest continues with default. +- Unknown enum value (e.g. `trust_level: weird`) → emit `kebab_parse_types::Warning { kind: WarningKind::MalformedFrontmatter, note: "unknown trust_level=weird, defaulted to primary" }` + ingest continues with default. - Malformed YAML → frontmatter discarded, body still parsed, `Warning { kind: WarningKind::MalformedFrontmatter, note: "" }` emitted. - No frontmatter at all → defaults applied silently. - `id:` field captured into `metadata.user_id_alias` (alias only — does NOT influence `doc_id` per design §4.2). @@ -91,12 +91,12 @@ pub fn parse_frontmatter( | snapshot | `fixtures/markdown/frontmatter-only.md` produces stable JSON | fixture | | snapshot | mixed-language body with no `lang:` detects `ko` or `en` | `fixtures/markdown/mixed-lang.md` | -All tests under `cargo test -p kb-parse-md --lib frontmatter`. +All tests under `cargo test -p kebab-parse-md --lib frontmatter`. ## Definition of Done -- [ ] `cargo check -p kb-parse-md` passes -- [ ] `cargo test -p kb-parse-md frontmatter` passes +- [ ] `cargo check -p kebab-parse-md` passes +- [ ] `cargo test -p kebab-parse-md frontmatter` passes - [ ] No `pulldown-cmark` import in this submodule - [ ] Snapshot tests stable across two consecutive runs - [ ] PR links design §0 Q9, §3.6 diff --git a/tasks/p1/p1-3-parse-md-blocks.md b/tasks/p1/p1-3-parse-md-blocks.md index dcd4813..8a41e16 100644 --- a/tasks/p1/p1-3-parse-md-blocks.md +++ b/tasks/p1/p1-3-parse-md-blocks.md @@ -1,20 +1,20 @@ --- phase: P1 -component: kb-parse-md (blocks submodule) +component: kebab-parse-md (blocks submodule) task_id: p1-3 title: "Markdown body → Block tree with line spans" status: completed depends_on: [p0-1] unblocks: [p1-4] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md -contract_sections: [§3.4 Block, §3.4 SourceSpan, §3.7b kb-parse-types, §0 Q3 citation] +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md +contract_sections: [§3.4 Block, §3.4 SourceSpan, §3.7b kebab-parse-types, §0 Q3 citation] --- # p1-3 — Markdown body → Block tree ## Goal -Parse Markdown body bytes into a flat `Vec` with heading paths and line ranges preserved, ready for `kb-normalize` to lift into `CanonicalDocument`. +Parse Markdown body bytes into a flat `Vec` with heading paths and line ranges preserved, ready for `kebab-normalize` to lift into `CanonicalDocument`. ## Why now / why this size @@ -22,15 +22,15 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from ## Allowed dependencies -- `kb-core` -- `kb-parse-types` (defines `ParsedBlock`, `ParsedPayload`, `Warning`) +- `kebab-core` +- `kebab-parse-types` (defines `ParsedBlock`, `ParsedPayload`, `Warning`) - `pulldown-cmark` (CommonMark with source-map; GFM tables enabled via feature) - `serde` - `thiserror` ## Forbidden dependencies -- `kb-store-*`, `kb-llm*`, `kb-rag`, `kb-embed*`, `kb-search`, `kb-source-fs`, `kb-chunk`, `kb-normalize`, `kb-tui`, `kb-desktop`, `comrak` (alternative parser; pick one) +- `kebab-store-*`, `kebab-llm*`, `kebab-rag`, `kebab-embed*`, `kebab-search`, `kebab-source-fs`, `kebab-chunk`, `kebab-normalize`, `kebab-tui`, `kebab-desktop`, `comrak` (alternative parser; pick one) ## Inputs @@ -43,8 +43,8 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from | output | type | downstream | |--------|------|------------| -| `Vec` | shared type from `kb-parse-types` | `kb-normalize` | -| `Vec` | shared type | propagated into Provenance | +| `Vec` | shared type from `kebab-parse-types` | `kebab-normalize` | +| `Vec` | shared type | propagated into Provenance | ## Public surface (signatures only — no new types) @@ -52,10 +52,10 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from pub fn parse_blocks( body: &[u8], body_offset_lines: u32, -) -> anyhow::Result<(Vec, Vec)>; +) -> anyhow::Result<(Vec, Vec)>; ``` -`ParsedBlock` is defined in `kb-parse-types` (design §3.7b). `kb-parse-md` does NOT define its own; it consumes the shared type. Lift to `kb_core::Block` (with `BlockId` assignment) is `kb-normalize`'s job (p1-4). +`ParsedBlock` is defined in `kebab-parse-types` (design §3.7b). `kebab-parse-md` does NOT define its own; it consumes the shared type. Lift to `kebab_core::Block` (with `BlockId` assignment) is `kebab-normalize`'s job (p1-4). ## Behavior contract @@ -63,9 +63,9 @@ pub fn parse_blocks( - Heading tree: every block records its ancestor heading texts in order (e.g., `["아키텍처", "Chunking 정책"]`). - Code blocks: `ParsedPayload::Code { lang: Some("rust"), code }` — fenced content not split. - Tables: GFM tables produce `ParsedPayload::Table { headers, rows }`; if a table cell is malformed, fall back to `ParsedPayload::Paragraph` + `Warning::MalformedTable`. -- Image references: `![alt](src)` produces `ParsedPayload::ImageRef { src, alt }`. `AssetId` resolution happens later in `kb-normalize` (when image src can be matched to a workspace asset). -- Lists: ordered/unordered preserved via `ParsedPayload::List { ordered, items }`; nested list items flattened so each `items[i]` is a `Vec` for one top-level item. -- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per `kb_core::Inline` per design §3.4). Drop other inlines silently. +- Image references: `![alt](src)` produces `ParsedPayload::ImageRef { src, alt }`. `AssetId` resolution happens later in `kebab-normalize` (when image src can be matched to a workspace asset). +- Lists: ordered/unordered preserved via `ParsedPayload::List { ordered, items }`; nested list items flattened so each `items[i]` is a `Vec` for one top-level item. +- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per `kebab_core::Inline` per design §3.4). Drop other inlines silently. - Malformed input never panics. Worst case: empty `Vec` + `Warning::ExtractFailed`. ## Storage / wire effects @@ -86,12 +86,12 @@ pub fn parse_blocks( | snapshot | `fixtures/markdown/nested-headings.md` → ParsedBlock JSON stable | fixture | | snapshot | `fixtures/markdown/code-and-table.md` → JSON stable | fixture | -All tests under `cargo test -p kb-parse-md --lib blocks`. +All tests under `cargo test -p kebab-parse-md --lib blocks`. ## Definition of Done -- [ ] `cargo check -p kb-parse-md` passes -- [ ] `cargo test -p kb-parse-md blocks` passes +- [ ] `cargo check -p kebab-parse-md` passes +- [ ] `cargo test -p kebab-parse-md blocks` passes - [ ] Snapshot tests stable across two runs - [ ] No imports outside Allowed dependencies - [ ] PR links design §3.4 @@ -99,7 +99,7 @@ All tests under `cargo test -p kb-parse-md --lib blocks`. ## Out of scope - Frontmatter (p1-2). -- Lifting `kb_parse_types::ParsedBlock` → `kb_core::Block` with `BlockId` (p1-4 normalize). +- Lifting `kebab_parse_types::ParsedBlock` → `kebab_core::Block` with `BlockId` (p1-4 normalize). - Chunking (p1-5). ## Risks / notes diff --git a/tasks/p1/p1-4-normalize.md b/tasks/p1/p1-4-normalize.md index dd8bddf..ff550f7 100644 --- a/tasks/p1/p1-4-normalize.md +++ b/tasks/p1/p1-4-normalize.md @@ -1,30 +1,30 @@ --- phase: P1 -component: kb-normalize +component: kebab-normalize task_id: p1-4 title: "Lift parser output → CanonicalDocument with deterministic IDs" status: completed depends_on: [p1-2, p1-3] unblocks: [p1-5, p1-6] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md -contract_sections: [§3.4, §3.7b kb-parse-types, §4 ID recipe, §3.6 Provenance, §8 module boundaries] +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md +contract_sections: [§3.4, §3.7b kebab-parse-types, §4 ID recipe, §3.6 Provenance, §8 module boundaries] --- # p1-4 — Lift to CanonicalDocument ## Goal -Combine `Metadata` (p1-2) + `Vec` (p1-3) + `RawAsset` (p1-1) into a `kb_core::CanonicalDocument` with deterministic `doc_id` and `block_id`s per design §4 recipe. +Combine `Metadata` (p1-2) + `Vec` (p1-3) + `RawAsset` (p1-1) into a `kebab_core::CanonicalDocument` with deterministic `doc_id` and `block_id`s per design §4 recipe. ## Why now / why this size -Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate. +Single responsibility: ID generation + struct assembly. Keeps `kebab-parse-md` purely a parser and isolates the (security-critical) deterministic ID logic in one crate. ## Allowed dependencies -- `kb-core` -- `kb-parse-types` (input shapes — `ParsedBlock`, `ParsedPayload`, `Warning`) -- `kb-config` +- `kebab-core` +- `kebab-parse-types` (input shapes — `ParsedBlock`, `ParsedPayload`, `Warning`) +- `kebab-config` - `serde` - `serde-json-canonicalizer` (canonical JSON for ID hashing) - `blake3` @@ -34,36 +34,36 @@ Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` pure ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md` (consumed via shared `kb-parse-types` only — `kb-parse-md` must NOT appear in this crate's `cargo tree`), `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-source-fs`, `kebab-parse-md` (consumed via shared `kebab-parse-types` only — `kebab-parse-md` must NOT appear in this crate's `cargo tree`), `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `RawAsset` | `kb_core::RawAsset` | p1-1 | -| `Metadata` + frontmatter span + warnings | `(kb_core::Metadata, Option, Vec)` | p1-2 | -| `Vec` + warnings | `(Vec, Vec)` | p1-3 | -| `parser_version` | `kb_core::ParserVersion` | constant in `kb-parse-md` | +| `RawAsset` | `kebab_core::RawAsset` | p1-1 | +| `Metadata` + frontmatter span + warnings | `(kebab_core::Metadata, Option, Vec)` | p1-2 | +| `Vec` + warnings | `(Vec, Vec)` | p1-3 | +| `parser_version` | `kebab_core::ParserVersion` | constant in `kebab-parse-md` | ## Outputs | output | type | downstream | |--------|------|------------| -| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk`, `kb-store-sqlite` | +| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk`, `kebab-store-sqlite` | ## Public surface (signatures only — no new types) ```rust pub fn build_canonical_document( - asset: &kb_core::RawAsset, - metadata: kb_core::Metadata, - blocks: Vec, - parser_version: &kb_core::ParserVersion, - warnings: Vec, -) -> anyhow::Result; + asset: &kebab_core::RawAsset, + metadata: kebab_core::Metadata, + blocks: Vec, + parser_version: &kebab_core::ParserVersion, + warnings: Vec, +) -> anyhow::Result; -pub fn id_for_doc(workspace_path: &kb_core::WorkspacePath, asset: &kb_core::AssetId, parser_version: &kb_core::ParserVersion) -> kb_core::DocumentId; -pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kb_core::SourceSpan) -> kb_core::BlockId; +pub fn id_for_doc(workspace_path: &kebab_core::WorkspacePath, asset: &kebab_core::AssetId, parser_version: &kebab_core::ParserVersion) -> kebab_core::DocumentId; +pub fn id_for_block(doc: &kebab_core::DocumentId, kind: &str, heading_path: &[String], ordinal: u32, span: &kebab_core::SourceSpan) -> kebab_core::BlockId; ``` ## Behavior contract @@ -91,14 +91,14 @@ pub fn id_for_block(doc: &kb_core::DocumentId, kind: &str, heading_path: &[Strin | unit | provenance contains Discovered/Parsed/Normalized in order | inline | | snapshot | `fixtures/markdown/code-and-table.md` → CanonicalDocument JSON stable (incl. all IDs) | fixture | -All tests under `cargo test -p kb-normalize`. +All tests under `cargo test -p kebab-normalize`. ## Definition of Done -- [ ] `cargo check -p kb-normalize` passes -- [ ] `cargo test -p kb-normalize` passes +- [ ] `cargo check -p kebab-normalize` passes +- [ ] `cargo test -p kebab-normalize` passes - [ ] Determinism test runs ≥ 1000 iterations under 1 second -- [ ] No `kb-parse-md` (or any other parser crate) appears in `cargo tree -p kb-normalize` — input types come from `kb-parse-types` only +- [ ] No `kebab-parse-md` (or any other parser crate) appears in `cargo tree -p kebab-normalize` — input types come from `kebab-parse-types` only - [ ] PR links design §3.7b, §4.2, §4.3, §8 ## Out of scope diff --git a/tasks/p1/p1-5-chunk.md b/tasks/p1/p1-5-chunk.md index 3eb7c8c..a205afb 100644 --- a/tasks/p1/p1-5-chunk.md +++ b/tasks/p1/p1-5-chunk.md @@ -1,12 +1,12 @@ --- phase: P1 -component: kb-chunk +component: kebab-chunk task_id: p1-5 title: "Markdown heading-aware chunker (md-heading-v1)" status: completed depends_on: [p1-4] unblocks: [p1-6, p2-2, p3-2] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §7.2 Chunker, §0 Q3 citation] --- @@ -22,8 +22,8 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun ## Allowed dependencies -- `kb-core` -- `kb-config` +- `kebab-core` +- `kebab-config` - `serde` - `blake3` (policy_hash) - `serde-json-canonicalizer` @@ -31,30 +31,30 @@ The first concrete `Chunker`. Establishes how subsequent chunkers (PDF page chun ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-normalize` (consumes `CanonicalDocument` only via `kb-core`), `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize` (consumes `CanonicalDocument` only via `kebab-core`), `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 | -| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` from config | +| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 | +| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` from config | ## Outputs | output | type | downstream | |--------|------|------------| -| `Vec` | `kb_core::Chunk` | `kb-store-sqlite` (p1-6), `kb-embed*` (P3) | +| `Vec` | `kebab_core::Chunk` | `kebab-store-sqlite` (p1-6), `kebab-embed*` (P3) | ## Public surface (signatures only — no new types) ```rust pub struct MdHeadingV1Chunker; -impl kb_core::Chunker for MdHeadingV1Chunker { - fn chunker_version(&self) -> kb_core::ChunkerVersion; - fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String; - fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result>; +impl kebab_core::Chunker for MdHeadingV1Chunker { + fn chunker_version(&self) -> kebab_core::ChunkerVersion; + fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String; + fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result>; } ``` @@ -91,12 +91,12 @@ impl kb_core::Chunker for MdHeadingV1Chunker { | determinism | identical input + identical policy → identical chunk_ids | inline | | snapshot | `fixtures/markdown/long-section.md` → Vec JSON stable | fixture | -All tests under `cargo test -p kb-chunk`. +All tests under `cargo test -p kebab-chunk`. ## Definition of Done -- [ ] `cargo check -p kb-chunk` passes -- [ ] `cargo test -p kb-chunk` passes +- [ ] `cargo check -p kebab-chunk` passes +- [ ] `cargo test -p kebab-chunk` passes - [ ] Snapshot stable across two runs - [ ] No imports outside Allowed dependencies - [ ] PR links design §3.5, §4.2 diff --git a/tasks/p1/p1-6-store-sqlite.md b/tasks/p1/p1-6-store-sqlite.md index 93b7e7c..0b5ed7c 100644 --- a/tasks/p1/p1-6-store-sqlite.md +++ b/tasks/p1/p1-6-store-sqlite.md @@ -1,12 +1,12 @@ --- phase: P1 -component: kb-store-sqlite (P1 subset) +component: kebab-store-sqlite (P1 subset) task_id: p1-6 title: "SQLite store: assets/documents/blocks/chunks + asset writer + migrations" status: completed depends_on: [p1-1, p1-4, p1-5] unblocks: [p2-1, p3-3, p4-3] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§5 DDL (5.1, 5.2, 5.3, 5.4, 5.5 chunks only — FTS handled in p2-1), §5.7 jobs/ingest_runs, §5.8 transactions, §6.3 data_dir layout] --- @@ -22,8 +22,8 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT ## Allowed dependencies -- `kb-core` -- `kb-config` +- `kebab-core` +- `kebab-config` - `rusqlite` (with `bundled-sqlcipher` disabled; use `bundled` feature) - `refinery` for migrations - `serde_json` @@ -34,7 +34,7 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT ## Forbidden dependencies -- `kb-source-fs` (only types via `kb-core`), `kb-parse-md`, `kb-normalize`, `kb-chunk` (only types via `kb-core`), `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-source-fs` (only types via `kebab-core`), `kebab-parse-md`, `kebab-normalize`, `kebab-chunk` (only types via `kebab-core`), `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs @@ -42,17 +42,17 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT |-------|------|--------| | migrations | `migrations/V001__init.sql` | repo | | `RawAsset` + bytes | `(RawAsset, Vec)` | p1-1 + reader | -| `CanonicalDocument` | `kb_core::CanonicalDocument` | p1-4 | -| `Vec` | `kb_core::Chunk` | p1-5 | -| `IngestRun` aggregates | `(scope, counts, duration)` | `kb-app` | +| `CanonicalDocument` | `kebab_core::CanonicalDocument` | p1-4 | +| `Vec` | `kebab_core::Chunk` | p1-5 | +| `IngestRun` aggregates | `(scope, counts, duration)` | `kebab-app` | ## Outputs | output | type | downstream | |--------|------|------------| -| `data_dir/kb.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | – | every later phase | +| `data_dir/kebab.sqlite` rows in `assets`, `documents`, `blocks`, `chunks`, `document_tags`, `ingest_runs`, `jobs`, `schema_meta`, `migrations` | – | every later phase | | `data_dir/assets//` bytes (when copied) | – | future re-extraction, integrity verification | -| `IngestReport` (wire schema v1) | `kb_core::IngestReport` | `kb-cli`, eval | +| `IngestReport` (wire schema v1) | `kebab_core::IngestReport` | `kebab-cli`, eval | ## Public surface (signatures only — no new types) @@ -60,23 +60,23 @@ P1's terminal task. Closes the loop `walk → parse → chunk → store`. The FT pub struct SqliteStore { /* internal */ } impl SqliteStore { - pub fn open(config: &kb_config::Config) -> anyhow::Result; + pub fn open(config: &kebab_config::Config) -> anyhow::Result; pub fn run_migrations(&self) -> anyhow::Result<()>; - pub fn put_asset_with_bytes(&self, asset: &kb_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>; + pub fn put_asset_with_bytes(&self, asset: &kebab_core::RawAsset, bytes: &[u8]) -> anyhow::Result<()>; } -impl kb_core::DocumentStore for SqliteStore { - fn put_asset(&self, a: &kb_core::RawAsset) -> anyhow::Result<()>; - fn put_document(&self, d: &kb_core::CanonicalDocument) -> anyhow::Result<()>; - fn put_blocks(&self, doc: &kb_core::DocumentId, blocks: &[kb_core::Block]) -> anyhow::Result<()>; - fn put_chunks(&self, doc: &kb_core::DocumentId, chunks: &[kb_core::Chunk]) -> anyhow::Result<()>; - fn get_document(&self, id: &kb_core::DocumentId) -> anyhow::Result>; - fn get_chunk(&self, id: &kb_core::ChunkId) -> anyhow::Result>; - fn list_documents(&self, filter: &kb_core::DocFilter) -> anyhow::Result>; +impl kebab_core::DocumentStore for SqliteStore { + fn put_asset(&self, a: &kebab_core::RawAsset) -> anyhow::Result<()>; + fn put_document(&self, d: &kebab_core::CanonicalDocument) -> anyhow::Result<()>; + fn put_blocks(&self, doc: &kebab_core::DocumentId, blocks: &[kebab_core::Block]) -> anyhow::Result<()>; + fn put_chunks(&self, doc: &kebab_core::DocumentId, chunks: &[kebab_core::Chunk]) -> anyhow::Result<()>; + fn get_document(&self, id: &kebab_core::DocumentId) -> anyhow::Result>; + fn get_chunk(&self, id: &kebab_core::ChunkId) -> anyhow::Result>; + fn list_documents(&self, filter: &kebab_core::DocFilter) -> anyhow::Result>; } -impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ } +impl kebab_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ } ``` ## Behavior contract @@ -96,7 +96,7 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ } ## Storage / wire effects -- Writes: `kb.sqlite` (multiple tables), `data_dir/assets//` (copied case). +- Writes: `kebab.sqlite` (multiple tables), `data_dir/assets//` (copied case). - Reads on subsequent calls: same DB. ## Test plan @@ -112,14 +112,14 @@ impl kb_core::JobRepo for SqliteStore { /* per design §7.2 signatures */ } | contract | DocumentStore trait round-trip for fixture document | `fixtures/markdown/code-and-table.md` | | snapshot | IngestReport JSON for fixture run | fixture | -All tests under `cargo test -p kb-store-sqlite` with no network. +All tests under `cargo test -p kebab-store-sqlite` with no network. ## Definition of Done -- [ ] `cargo check -p kb-store-sqlite` passes -- [ ] `cargo test -p kb-store-sqlite` passes +- [ ] `cargo check -p kebab-store-sqlite` passes +- [ ] `cargo test -p kebab-store-sqlite` passes - [ ] migration `V001__init.sql` matches design §5 verbatim (diff-checked in CI) -- [ ] Writes to `~/.local/share/kb/` are gated by `kb-config`'s `data_dir` and never escape it +- [ ] Writes to `~/.local/share/kebab/` are gated by `kebab-config`'s `data_dir` and never escape it - [ ] No imports outside Allowed dependencies - [ ] PR links design §5 @@ -132,5 +132,5 @@ All tests under `cargo test -p kb-store-sqlite` with no network. ## Risks / notes -- WAL mode requires careful test cleanup: tests must drop the connection before removing `kb.sqlite-wal` / `-shm`. +- WAL mode requires careful test cleanup: tests must drop the connection before removing `kebab.sqlite-wal` / `-shm`. - Asset directory shard prefix uses `asset_id[..2]`; using `asset_id[..1]` would create at most 16 dirs (insufficient). diff --git a/tasks/p2/p2-1-fts-schema.md b/tasks/p2/p2-1-fts-schema.md index 4e46623..4c61688 100644 --- a/tasks/p2/p2-1-fts-schema.md +++ b/tasks/p2/p2-1-fts-schema.md @@ -1,12 +1,12 @@ --- phase: P2 -component: kb-store-sqlite (FTS5 migration) +component: kebab-store-sqlite (FTS5 migration) task_id: p2-1 title: "FTS5 virtual table + triggers (V002 migration)" status: completed depends_on: [p1-6] unblocks: [p2-2] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§5.5 chunks_fts + triggers, §9 versioning] --- @@ -18,19 +18,19 @@ Add `chunks_fts` virtual table and three sync triggers via migration `V002__fts. ## Why now / why this size -`chunks_fts` is the lexical index for `kb-search`. Splitting it from p1-6 keeps P1 focused on relational data; bringing it as `V002` lets users upgrade an existing P1 DB without re-ingesting. +`chunks_fts` is the lexical index for `kebab-search`. Splitting it from p1-6 keeps P1 focused on relational data; bringing it as `V002` lets users upgrade an existing P1 DB without re-ingesting. ## Allowed dependencies -- `kb-core` -- `kb-config` -- `kb-store-sqlite` (extends migrations) +- `kebab-core` +- `kebab-config` +- `kebab-store-sqlite` (extends migrations) - `rusqlite` - `refinery` ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search` (consumer is p2-2), `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search` (consumer is p2-2), `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs @@ -52,7 +52,7 @@ Add `chunks_fts` virtual table and three sync triggers via migration `V002__fts. pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>; ``` -(Used by `kb index --rebuild-fts`. Re-runs `INSERT INTO chunks_fts SELECT ... FROM chunks` after `DELETE FROM chunks_fts;`.) +(Used by `kebab index --rebuild-fts`. Re-runs `INSERT INTO chunks_fts SELECT ... FROM chunks` after `DELETE FROM chunks_fts;`.) ## Behavior contract @@ -64,7 +64,7 @@ pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>; ## Storage / wire effects -- Writes: `chunks_fts` virtual table inside `kb.sqlite`. +- Writes: `chunks_fts` virtual table inside `kebab.sqlite`. - Reads: existing `chunks` rows for backfill. ## Test plan @@ -78,12 +78,12 @@ pub fn rebuild_chunks_fts(conn: &rusqlite::Connection) -> anyhow::Result<()>; | function | `rebuild_chunks_fts` produces deterministic content equal to fresh backfill | tmp DB | | migration | running `V002` twice is a no-op (refinery handles idempotency) | tmp DB | -All tests under `cargo test -p kb-store-sqlite fts`. +All tests under `cargo test -p kebab-store-sqlite fts`. ## Definition of Done -- [ ] `cargo check -p kb-store-sqlite` passes -- [ ] `cargo test -p kb-store-sqlite fts` passes +- [ ] `cargo check -p kebab-store-sqlite` passes +- [ ] `cargo test -p kebab-store-sqlite fts` passes - [ ] `migrations/V002__fts.sql` matches design §5.5 verbatim (CI diff check) - [ ] No imports outside Allowed dependencies - [ ] PR links design §5.5 diff --git a/tasks/p2/p2-2-lexical-retriever.md b/tasks/p2/p2-2-lexical-retriever.md index 75524a5..1dd020c 100644 --- a/tasks/p2/p2-2-lexical-retriever.md +++ b/tasks/p2/p2-2-lexical-retriever.md @@ -1,12 +1,12 @@ --- phase: P2 -component: kb-search (lexical mode) +component: kebab-search (lexical mode) task_id: p2-2 title: "Lexical Retriever via SQLite FTS5 + bm25 + citation" status: completed depends_on: [p2-1] unblocks: [p3-4, p4-3] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5/1.6 search output, §2.2 wire schema, §6.4 search settings] --- @@ -14,38 +14,38 @@ contract_sections: [§3.7 SearchQuery/Hit, §0 Q3 citation (URI fragment), §1.5 ## Goal -Implement `kb_core::Retriever` for `SearchMode::Lexical` using SQLite FTS5. Returns `SearchHit` with `bm25` ranking, `snippet()`-derived preview, and proper W3C-fragment citation. +Implement `kebab_core::Retriever` for `SearchMode::Lexical` using SQLite FTS5. Returns `SearchHit` with `bm25` ranking, `snippet()`-derived preview, and proper W3C-fragment citation. ## Why now / why this size -First concrete `Retriever`. Lets `kb search --mode lexical` work without any embedding/LLM infrastructure. Establishes the SearchHit construction contract that hybrid (p3-4) reuses. +First concrete `Retriever`. Lets `kebab search --mode lexical` work without any embedding/LLM infrastructure. Establishes the SearchHit construction contract that hybrid (p3-4) reuses. ## Allowed dependencies -- `kb-core` -- `kb-config` -- `kb-store-sqlite` (read access to `chunks_fts` + `chunks` + `documents`) +- `kebab-core` +- `kebab-config` +- `kebab-store-sqlite` (read access to `chunks_fts` + `chunks` + `documents`) - `rusqlite` - `tracing` - `thiserror` ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `SearchQuery` (mode=Lexical) | `kb_core::SearchQuery` | `kb-app::search` | -| `kb-config::search` settings (`default_k`, `snippet_chars`) | `kb_config::Config` | runtime | -| SQLite connection (read) | `rusqlite::Connection` | `kb-store-sqlite` | +| `SearchQuery` (mode=Lexical) | `kebab_core::SearchQuery` | `kebab-app::search` | +| `kebab-config::search` settings (`default_k`, `snippet_chars`) | `kebab_config::Config` | runtime | +| SQLite connection (read) | `rusqlite::Connection` | `kebab-store-sqlite` | ## Outputs | output | type | downstream | |--------|------|------------| -| `Vec` | `kb_core::SearchHit` | `kb-cli` printer, `kb-rag` packer (P4), hybrid (p3-4) | +| `Vec` | `kebab_core::SearchHit` | `kebab-cli` printer, `kebab-rag` packer (P4), hybrid (p3-4) | ## Public surface (signatures only — no new types) @@ -53,12 +53,12 @@ First concrete `Retriever`. Lets `kb search --mode lexical` work without any emb pub struct LexicalRetriever { /* internal: holds an Arc + IndexVersion */ } impl LexicalRetriever { - pub fn new(store: std::sync::Arc, index_version: kb_core::IndexVersion) -> Self; + pub fn new(store: std::sync::Arc, index_version: kebab_core::IndexVersion) -> Self; } -impl kb_core::Retriever for LexicalRetriever { - fn search(&self, query: &kb_core::SearchQuery) -> anyhow::Result>; - fn index_version(&self) -> kb_core::IndexVersion; +impl kebab_core::Retriever for LexicalRetriever { + fn search(&self, query: &kebab_core::SearchQuery) -> anyhow::Result>; + fn index_version(&self) -> kebab_core::IndexVersion; } ``` @@ -98,8 +98,8 @@ impl kb_core::Retriever for LexicalRetriever { ## Storage / wire effects -- Reads only. Never mutates `kb.sqlite`. -- Wire: `Vec` serialized via wire schema `search_hit.v1` when `kb-cli --json` is used. +- Reads only. Never mutates `kebab.sqlite`. +- Wire: `Vec` serialized via wire schema `search_hit.v1` when `kebab-cli --json` is used. ## Test plan @@ -114,13 +114,13 @@ impl kb_core::Retriever for LexicalRetriever { | determinism | identical query twice produces identical hit order and scores | tmp DB | | snapshot | `Vec` JSON for fixed corpus stable | `fixtures/search/lexical/run-1.json` | -All tests under `cargo test -p kb-search lexical`. +All tests under `cargo test -p kebab-search lexical`. ## Definition of Done -- [ ] `cargo check -p kb-search` passes -- [ ] `cargo test -p kb-search lexical` passes -- [ ] No imports outside Allowed dependencies (`cargo tree -p kb-search` audit) +- [ ] `cargo check -p kebab-search` passes +- [ ] `cargo test -p kebab-search lexical` passes +- [ ] No imports outside Allowed dependencies (`cargo tree -p kebab-search` audit) - [ ] Output JSON conforms to `docs/wire-schema/v1/search_hit.schema.json` - [ ] PR links design §3.7, §0 Q3, §2.2 diff --git a/tasks/p3/p3-1-embedder-trait.md b/tasks/p3/p3-1-embedder-trait.md index f84e2a1..b6a80f0 100644 --- a/tasks/p3/p3-1-embedder-trait.md +++ b/tasks/p3/p3-1-embedder-trait.md @@ -1,12 +1,12 @@ --- phase: P3 -component: kb-embed (trait crate) +component: kebab-embed (trait crate) task_id: p3-1 title: "Embedder trait + EmbeddingInput/Kind validation" status: completed depends_on: [p0-1] unblocks: [p3-2, p3-3, p3-4] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [design §3.7 SearchHit.embedding_model, design §7.1 EmbeddingInput/Kind, design §7.2 Embedder, report §11 LLM/embedding split] --- @@ -14,16 +14,16 @@ contract_sections: [design §3.7 SearchHit.embedding_model, design §7.1 Embeddi ## Goal -Provide the `kb-embed` crate that re-exports `Embedder` trait, `EmbeddingInput`/`EmbeddingKind`, and offers a mock implementation for downstream tests. This task is **trait-only**; concrete adapters live in p3-2. +Provide the `kebab-embed` crate that re-exports `Embedder` trait, `EmbeddingInput`/`EmbeddingKind`, and offers a mock implementation for downstream tests. This task is **trait-only**; concrete adapters live in p3-2. ## Why now / why this size -Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface. Owning the trait + a mock implementation in a tiny crate keeps `kb-store-vector` and `kb-search` testable without touching real models. +Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface. Owning the trait + a mock implementation in a tiny crate keeps `kebab-store-vector` and `kebab-search` testable without touching real models. ## Allowed dependencies -- `kb-core` -- `kb-config` +- `kebab-core` +- `kebab-config` - `serde` - `thiserror` - `tracing` @@ -31,35 +31,35 @@ Concrete adapters (fastembed, ollama-embed, candle) need a stable trait surface. ## Forbidden dependencies -- `fastembed`, `ort`, `tokenizers`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `fastembed`, `ort`, `tokenizers`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `EmbeddingInput` | `kb_core::EmbeddingInput<'_>` | callers (parser-side or query-side) | +| `EmbeddingInput` | `kebab_core::EmbeddingInput<'_>` | callers (parser-side or query-side) | | model identity | `(EmbeddingModelId, EmbeddingVersion, dimensions)` | adapter at construction | ## Outputs | output | type | downstream | |--------|------|------------| -| `Vec>` | row-aligned with input | `kb-store-vector`, `kb-search` (vector mode) | +| `Vec>` | row-aligned with input | `kebab-store-vector`, `kebab-search` (vector mode) | ## Public surface (signatures only — no new types) ```rust -pub use kb_core::{EmbeddingInput, EmbeddingKind, EmbeddingModelId, EmbeddingVersion, Embedder}; +pub use kebab_core::{EmbeddingInput, EmbeddingKind, EmbeddingModelId, EmbeddingVersion, Embedder}; /// Test-only mock that produces deterministic vectors. Compiled only when `mock` feature is on. #[cfg(feature = "mock")] pub struct MockEmbedder { /* internal: model_id, dims, seed */ } #[cfg(feature = "mock")] impl MockEmbedder { - pub fn new(model_id: kb_core::EmbeddingModelId, version: kb_core::EmbeddingVersion, dimensions: usize) -> Self; + pub fn new(model_id: kebab_core::EmbeddingModelId, version: kebab_core::EmbeddingVersion, dimensions: usize) -> Self; } #[cfg(feature = "mock")] -impl kb_core::Embedder for MockEmbedder { /* per §7.2 */ } +impl kebab_core::Embedder for MockEmbedder { /* per §7.2 */ } ``` ## Behavior contract @@ -84,21 +84,21 @@ impl kb_core::Embedder for MockEmbedder { /* per §7.2 */ } | unit | dimensions match construction-time value | inline | | contract | property test: 100 random inputs, each vector has length == dimensions, all finite floats | inline (proptest) | -All tests under `cargo test -p kb-embed`. +All tests under `cargo test -p kebab-embed`. ## Definition of Done -- [ ] `cargo check -p kb-embed` passes -- [ ] `cargo test -p kb-embed` passes +- [ ] `cargo check -p kebab-embed` passes +- [ ] `cargo test -p kebab-embed` passes - [ ] No external embedding dep present - [ ] PR links design §7.2 Embedder, §11 ## Out of scope -- Real adapter (`kb-embed-local` is p3-2). +- Real adapter (`kebab-embed-local` is p3-2). - Reranker traits (P+). ## Risks / notes -- `MockEmbedder` is gated by `mock` feature (default OFF). Downstream tests opt in via `[dev-dependencies] kb-embed = { path = "...", features = ["mock"] }`. CI build of release binary (`cargo build --release` without `--features mock`) MUST NOT include `MockEmbedder` symbol — verifiable via `cargo bloat` or `nm` symbol scan. -- Trait re-exports keep the call site stable even if `kb-core` reorganizes; downstream crates should `use kb_embed::Embedder` rather than `use kb_core::Embedder`. +- `MockEmbedder` is gated by `mock` feature (default OFF). Downstream tests opt in via `[dev-dependencies] kebab-embed = { path = "...", features = ["mock"] }`. CI build of release binary (`cargo build --release` without `--features mock`) MUST NOT include `MockEmbedder` symbol — verifiable via `cargo bloat` or `nm` symbol scan. +- Trait re-exports keep the call site stable even if `kebab-core` reorganizes; downstream crates should `use kebab_embed::Embedder` rather than `use kebab_core::Embedder`. diff --git a/tasks/p3/p3-2-fastembed-adapter.md b/tasks/p3/p3-2-fastembed-adapter.md index 1b4510e..1ad957a 100644 --- a/tasks/p3/p3-2-fastembed-adapter.md +++ b/tasks/p3/p3-2-fastembed-adapter.md @@ -1,12 +1,12 @@ --- phase: P3 -component: kb-embed-local (fastembed adapter) +component: kebab-embed-local (fastembed adapter) task_id: p3-2 title: "fastembed-rs Embedder for multilingual-e5-small" status: completed depends_on: [p3-1] unblocks: [p3-3, p3-4] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [design §7.2 Embedder, report §11.3 local embedding, design §6.4 [models.embedding], design §9 versioning] --- @@ -22,9 +22,9 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d ## Allowed dependencies -- `kb-core` -- `kb-config` -- `kb-embed` +- `kebab-core` +- `kebab-config` +- `kebab-embed` - `fastembed = "4"` (or current stable) - `tokenizers` - `ort` (transitive via fastembed) @@ -33,21 +33,21 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, network HTTP libs (model download is fastembed's responsibility) +- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, network HTTP libs (model download is fastembed's responsibility) ## Inputs | input | type | source | |-------|------|--------| -| `kb-config::Config.models.embedding` | settings | runtime | -| `EmbeddingInput[..]` | `kb_core::EmbeddingInput<'_>[]` | callers | +| `kebab-config::Config.models.embedding` | settings | runtime | +| `EmbeddingInput[..]` | `kebab_core::EmbeddingInput<'_>[]` | callers | | model cache | `data_dir/models/fastembed/` | filesystem | ## Outputs | output | type | downstream | |--------|------|------------| -| `Vec>` | row-aligned, `dimensions = 384` | `kb-store-vector`, query vectors for hybrid search | +| `Vec>` | row-aligned, `dimensions = 384` | `kebab-store-vector`, query vectors for hybrid search | | model identity | `(EmbeddingModelId, EmbeddingVersion, usize)` | record fields, `embedding_id` recipe | ## Public surface (signatures only — no new types) @@ -56,14 +56,14 @@ First real `Embedder`. Drives `EmbeddingId` recipe (model_id + model_version + d pub struct FastembedEmbedder { /* internal: TextEmbedding instance + model meta */ } impl FastembedEmbedder { - pub fn new(config: &kb_config::Config) -> anyhow::Result; + pub fn new(config: &kebab_config::Config) -> anyhow::Result; } -impl kb_core::Embedder for FastembedEmbedder { - fn model_id(&self) -> kb_core::EmbeddingModelId; - fn model_version(&self) -> kb_core::EmbeddingVersion; +impl kebab_core::Embedder for FastembedEmbedder { + fn model_id(&self) -> kebab_core::EmbeddingModelId; + fn model_version(&self) -> kebab_core::EmbeddingVersion; fn dimensions(&self) -> usize; - fn embed(&self, inputs: &[kb_core::EmbeddingInput<'_>]) -> anyhow::Result>>; + fn embed(&self, inputs: &[kebab_core::EmbeddingInput<'_>]) -> anyhow::Result>>; } ``` @@ -96,12 +96,12 @@ impl kb_core::Embedder for FastembedEmbedder { | performance | batch of 64 short inputs completes in < 5s on CI host | tmp config (skip on slow CI via `#[ignore]`) | | snapshot | aggregate hash of vectors for 5 known sentences stable across runs | `fixtures/embed/known-sentences.json` | -All tests under `cargo test -p kb-embed-local`. Mark slow tests `#[ignore]` and run via `cargo test -- --ignored` in dedicated CI lane. +All tests under `cargo test -p kebab-embed-local`. Mark slow tests `#[ignore]` and run via `cargo test -- --ignored` in dedicated CI lane. ## Definition of Done -- [ ] `cargo check -p kb-embed-local` passes -- [ ] `cargo test -p kb-embed-local` passes (excluding `#[ignore]`) +- [ ] `cargo check -p kebab-embed-local` passes +- [ ] `cargo test -p kebab-embed-local` passes (excluding `#[ignore]`) - [ ] First-run model download works under `data_dir/models/fastembed/` - [ ] No imports outside Allowed dependencies - [ ] PR links design §11.3, §6.4, §9 diff --git a/tasks/p3/p3-3-lancedb-store.md b/tasks/p3/p3-3-lancedb-store.md index 3a80a3f..86a71f9 100644 --- a/tasks/p3/p3-3-lancedb-store.md +++ b/tasks/p3/p3-3-lancedb-store.md @@ -1,12 +1,12 @@ --- phase: P3 -component: kb-store-vector (LanceDB) +component: kebab-store-vector (LanceDB) task_id: p3-3 title: "LanceDB VectorStore + embedding_records writer" status: completed depends_on: [p3-2, p1-6] unblocks: [p3-4] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§5.6 embedding_records, §6.3 lancedb table naming, §7.2 VectorStore, §9 versioning] --- @@ -18,13 +18,13 @@ Implement `VectorStore` over LanceDB (embedded). Stores per-model tables (`chunk ## Why now / why this size -Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3-4) can compose lexical + vector retrievers without leaking storage details. +Closes the loop chunk → vector. Splits cleanly from `kebab-search` so hybrid (p3-4) can compose lexical + vector retrievers without leaking storage details. ## Allowed dependencies -- `kb-core` -- `kb-config` -- `kb-store-sqlite` (only for writing/reading rows in `embedding_records`) +- `kebab-core` +- `kebab-config` +- `kebab-store-sqlite` (only for writing/reading rows in `embedding_records`) - `lancedb` - `arrow` (and `arrow-array`, `arrow-schema`) - `serde`, `serde_json` @@ -33,16 +33,16 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3- ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-embed*` (consumes `Vec` via input only — no embedding logic here), `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-embed*` (consumes `Vec` via input only — no embedding logic here), `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `VectorRecord[..]` | `kb_core::VectorRecord` | `kb-app::embed_index` (P3 facade) | -| query vector | `&[f32]` | `kb-embed-local` (`Embedder::embed` for query) | -| filters | `kb_core::SearchFilters` | `SearchQuery` | -| `kb-config::Config.storage.vector_dir` | path | runtime | +| `VectorRecord[..]` | `kebab_core::VectorRecord` | `kebab-app::embed_index` (P3 facade) | +| query vector | `&[f32]` | `kebab-embed-local` (`Embedder::embed` for query) | +| filters | `kebab_core::SearchFilters` | `SearchQuery` | +| `kebab-config::Config.storage.vector_dir` | path | runtime | ## Outputs @@ -50,7 +50,7 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3- |--------|------|------------| | Lance tables under `vector_dir/chunk_embeddings__.lance/` | filesystem | future searches | | `embedding_records` rows | SQLite | reverse lookup, reindex bookkeeping | -| `Vec` | `kb_core::VectorHit` | hybrid retriever (p3-4) | +| `Vec` | `kebab_core::VectorHit` | hybrid retriever (p3-4) | ## Public surface (signatures only — no new types) @@ -58,13 +58,13 @@ Closes the loop chunk → vector. Splits cleanly from `kb-search` so hybrid (p3- pub struct LanceVectorStore { /* internal: connection + sqlite handle */ } impl LanceVectorStore { - pub fn new(config: &kb_config::Config, sqlite: std::sync::Arc) -> anyhow::Result; + pub fn new(config: &kebab_config::Config, sqlite: std::sync::Arc) -> anyhow::Result; } -impl kb_core::VectorStore for LanceVectorStore { - fn ensure_table(&self, model: &kb_core::EmbeddingModelId, dim: usize) -> anyhow::Result; - fn upsert(&self, recs: &[kb_core::VectorRecord]) -> anyhow::Result<()>; - fn search(&self, query_vec: &[f32], k: usize, filters: &kb_core::SearchFilters) -> anyhow::Result>; +impl kebab_core::VectorStore for LanceVectorStore { + fn ensure_table(&self, model: &kebab_core::EmbeddingModelId, dim: usize) -> anyhow::Result; + fn upsert(&self, recs: &[kebab_core::VectorRecord]) -> anyhow::Result<()>; + fn search(&self, query_vec: &[f32], k: usize, filters: &kebab_core::SearchFilters) -> anyhow::Result>; } ``` @@ -116,12 +116,12 @@ impl kb_core::VectorStore for LanceVectorStore { | determinism | same query vector + same data → same top-k order | tmp data_dir | | snapshot | `Vec` JSON for fixed corpus stable | `fixtures/vector/run-1.json` | -All tests under `cargo test -p kb-store-vector`. +All tests under `cargo test -p kebab-store-vector`. ## Definition of Done -- [ ] `cargo check -p kb-store-vector` passes -- [ ] `cargo test -p kb-store-vector` passes +- [ ] `cargo check -p kebab-store-vector` passes +- [ ] `cargo test -p kebab-store-vector` passes - [ ] No imports outside Allowed dependencies - [ ] `embedding_records` rows align 1:1 with Lance rows after a successful upsert batch - [ ] PR links design §5.6, §6.3, §7.2 @@ -130,7 +130,7 @@ All tests under `cargo test -p kb-store-vector`. - IVF / PQ index tuning (P+). - Image / multimodal vector tables (P6). -- `kb-app` orchestration of indexing jobs (`embed_index` facade method body). +- `kebab-app` orchestration of indexing jobs (`embed_index` facade method body). ## Risks / notes diff --git a/tasks/p3/p3-4-hybrid-fusion.md b/tasks/p3/p3-4-hybrid-fusion.md index 0d34a6d..53446a0 100644 --- a/tasks/p3/p3-4-hybrid-fusion.md +++ b/tasks/p3/p3-4-hybrid-fusion.md @@ -1,12 +1,12 @@ --- phase: P3 -component: kb-search (hybrid) +component: kebab-search (hybrid) task_id: p3-4 title: "Hybrid Retriever (RRF) over lexical + vector" status: completed depends_on: [p2-2, p3-3] unblocks: [p4-3] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.7 RetrievalDetail, §0 Q3, §1.6 search --explain, §6.4 [search] rrf settings] --- @@ -22,17 +22,17 @@ Single mediator. Keeps the lexical and vector retrievers focused; only this task ## Allowed dependencies -- `kb-core` -- `kb-config` -- `kb-store-sqlite` (for `LexicalRetriever`) -- `kb-store-vector` (for `LanceVectorStore`) -- `kb-embed` (trait only — for query embedding via `Embedder`) +- `kebab-core` +- `kebab-config` +- `kebab-store-sqlite` (for `LexicalRetriever`) +- `kebab-store-vector` (for `LanceVectorStore`) +- `kebab-embed` (trait only — for query embedding via `Embedder`) - `tracing` - `thiserror` ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`. (`kb-embed-local` is a runtime-injected `dyn Embedder`; this crate must not depend on the concrete adapter directly.) +- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`. (`kebab-embed-local` is a runtime-injected `dyn Embedder`; this crate must not depend on the concrete adapter directly.) ## Inputs @@ -41,21 +41,21 @@ Single mediator. Keeps the lexical and vector retrievers focused; only this task | `LexicalRetriever` | trait object | constructed elsewhere | | `LanceVectorStore` | trait object | constructed elsewhere | | `Box` | for query embedding | runtime-injected | -| `kb-config::Config.search` | `default_k`, `hybrid_fusion`, `rrf_k` | runtime | -| `SearchQuery` | `kb_core::SearchQuery` | `kb-app::search` | +| `kebab-config::Config.search` | `default_k`, `hybrid_fusion`, `rrf_k` | runtime | +| `SearchQuery` | `kebab_core::SearchQuery` | `kebab-app::search` | ## Outputs | output | type | downstream | |--------|------|------------| -| `Vec` (with full `RetrievalDetail`) | `kb_core::SearchHit` | `kb-cli` printer, `kb-rag` packer | +| `Vec` (with full `RetrievalDetail`) | `kebab_core::SearchHit` | `kebab-cli` printer, `kebab-rag` packer | ## Public surface (signatures only — no new types) ```rust pub struct HybridRetriever { - lexical: std::sync::Arc, - vector: std::sync::Arc, // wrapper over LanceVectorStore + Embedder + lexical: std::sync::Arc, + vector: std::sync::Arc, // wrapper over LanceVectorStore + Embedder fusion: FusionPolicy, k: usize, } @@ -64,27 +64,27 @@ pub enum FusionPolicy { Rrf { k_rrf: u32 } } impl HybridRetriever { pub fn new( - config: &kb_config::Config, - lexical: std::sync::Arc, - vector: std::sync::Arc, + config: &kebab_config::Config, + lexical: std::sync::Arc, + vector: std::sync::Arc, ) -> Self; } -impl kb_core::Retriever for HybridRetriever { - fn search(&self, query: &kb_core::SearchQuery) -> anyhow::Result>; - fn index_version(&self) -> kb_core::IndexVersion; +impl kebab_core::Retriever for HybridRetriever { + fn search(&self, query: &kebab_core::SearchQuery) -> anyhow::Result>; + fn index_version(&self) -> kebab_core::IndexVersion; } /// Wrapper that turns a VectorStore + Embedder into a Retriever. pub struct VectorRetriever { - store: std::sync::Arc, - embed: std::sync::Arc, - /* heading_path/snippet enrichment hits SQLite via kb-store-sqlite read accessor */ + store: std::sync::Arc, + embed: std::sync::Arc, + /* heading_path/snippet enrichment hits SQLite via kebab-store-sqlite read accessor */ } impl VectorRetriever { - pub fn new(store: std::sync::Arc, embed: std::sync::Arc, sqlite: std::sync::Arc) -> Self; + pub fn new(store: std::sync::Arc, embed: std::sync::Arc, sqlite: std::sync::Arc) -> Self; } -impl kb_core::Retriever for VectorRetriever { /* per §7.2 */ } +impl kebab_core::Retriever for VectorRetriever { /* per §7.2 */ } ``` ## Behavior contract @@ -124,12 +124,12 @@ impl kb_core::Retriever for VectorRetriever { /* per §7.2 */ } | determinism | identical query twice → byte-identical `Vec` | tmp DB | | snapshot | hybrid output JSON stable | `fixtures/search/hybrid/run-1.json` | -All tests under `cargo test -p kb-search hybrid`. +All tests under `cargo test -p kebab-search hybrid`. ## Definition of Done -- [ ] `cargo check -p kb-search` passes -- [ ] `cargo test -p kb-search hybrid` passes +- [ ] `cargo check -p kebab-search` passes +- [ ] `cargo test -p kebab-search hybrid` passes - [ ] No imports outside Allowed dependencies - [ ] PR links design §3.7, §6.4 search, §0 Q3 diff --git a/tasks/p3/p3-5-app-wiring.md b/tasks/p3/p3-5-app-wiring.md index 206f855..069e563 100644 --- a/tasks/p3/p3-5-app-wiring.md +++ b/tasks/p3/p3-5-app-wiring.md @@ -1,64 +1,64 @@ --- phase: P3 -component: kb-app (facade wiring) +component: kebab-app (facade wiring) task_id: p3-5 -title: "Wire kb-app facade — ingest / search / list / inspect end-to-end" +title: "Wire kebab-app facade — ingest / search / list / inspect end-to-end" status: completed depends_on: [p1-6, p2-2, p3-2, p3-3, p3-4] unblocks: [p4-3, p9-1, p9-2, p9-4] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§7 components, §7.2 traits, §1 UX scenes (ingest, search, list, inspect), §6.4 config, §10 errors] --- -# p3-5 — Wire kb-app facade +# p3-5 — Wire kebab-app facade ## Goal -Replace the `bail!("not yet wired")` stubs in `kb-app` (currently `ingest`, `search`, `list_docs`, `inspect_doc`, `inspect_chunk`) with real bodies that compose the libraries shipped in P1–P3. After this task, `kb index` actually walks a workspace and persists chunks, and `kb search --mode {lexical,vector,hybrid}` returns real `SearchHit`s. `kb-app::ask` stays stubbed (P4-3 owns it). +Replace the `bail!("not yet wired")` stubs in `kebab-app` (currently `ingest`, `search`, `list_docs`, `inspect_doc`, `inspect_chunk`) with real bodies that compose the libraries shipped in P1–P3. After this task, `kebab index` actually walks a workspace and persists chunks, and `kebab search --mode {lexical,vector,hybrid}` returns real `SearchHit`s. `kebab-app::ask` stays stubbed (P4-3 owns it). ## Why now / why this size -P1–P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where `kb-app::ask` will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; `ask` wiring lives in P4-3. +P1–P3 shipped libraries but the CLI is unusable: every facade method bails. Inserting this glue task between P3-4 (last library that completes the retrieval stack) and P4-1 (LLM trait, where `kebab-app::ask` will need this same wiring anyway) lets the user run the tool end-to-end for the first time and validates that the library boundaries actually compose. Limiting it to non-LLM commands keeps it a single-session task; `ask` wiring lives in P4-3. ## Allowed dependencies -`kb-app` may depend on: +`kebab-app` may depend on: -- `kb-core`, `kb-config` (already) -- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-sqlite` (P1) -- `kb-search` (P2-2, P3-4) -- `kb-store-vector`, `kb-embed`, `kb-embed-local` (P3-2, P3-3, P3-4) +- `kebab-core`, `kebab-config` (already) +- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-sqlite` (P1) +- `kebab-search` (P2-2, P3-4) +- `kebab-store-vector`, `kebab-embed`, `kebab-embed-local` (P3-2, P3-3, P3-4) - `tracing`, `anyhow`, `serde`, `serde_json`, `time`, `dirs`, `toml` (existing) ## Forbidden dependencies -- `kb-llm*`, `kb-rag` (P4 ownership — `ask` stays stubbed) -- `kb-tui`, `kb-desktop` (P9 — those *consume* `kb-app`, not the other way) -- `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio` (P6/P7/P8) -- network HTTP libs (model download is `kb-embed-local`'s responsibility via fastembed) +- `kebab-llm*`, `kebab-rag` (P4 ownership — `ask` stays stubbed) +- `kebab-tui`, `kebab-desktop` (P9 — those *consume* `kebab-app`, not the other way) +- `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio` (P6/P7/P8) +- network HTTP libs (model download is `kebab-embed-local`'s responsibility via fastembed) ## Inputs | input | type | source | |-------|------|--------| -| `kb-config::Config` | `kb_config::Config` | loaded once at process start; threaded through every facade call | -| `SourceScope` | `kb_core::SourceScope` | CLI | -| `SearchQuery` | `kb_core::SearchQuery` | CLI | -| `DocFilter` | `kb_core::DocFilter` | CLI | -| `DocumentId` / `ChunkId` | `kb_core::ids::*` | CLI | +| `kebab-config::Config` | `kebab_config::Config` | loaded once at process start; threaded through every facade call | +| `SourceScope` | `kebab_core::SourceScope` | CLI | +| `SearchQuery` | `kebab_core::SearchQuery` | CLI | +| `DocFilter` | `kebab_core::DocFilter` | CLI | +| `DocumentId` / `ChunkId` | `kebab_core::ids::*` | CLI | ## Outputs | output | type | downstream | |--------|------|------------| -| `IngestReport` | `kb_core::IngestReport` | `kb-cli` (`ingest_report.v1`), `kb-eval` (P5) | -| `Vec` | `kb_core::SearchHit` | `kb-cli` (`search_hit.v1`), `kb-rag` (P4-3) | -| `Vec` | `kb_core::DocSummary` | `kb-cli` (`doc_summary.v1`) | -| `CanonicalDocument` / `Chunk` | `kb_core::*` | `kb-cli` (`chunk_inspection.v1`) | +| `IngestReport` | `kebab_core::IngestReport` | `kebab-cli` (`ingest_report.v1`), `kebab-eval` (P5) | +| `Vec` | `kebab_core::SearchHit` | `kebab-cli` (`search_hit.v1`), `kebab-rag` (P4-3) | +| `Vec` | `kebab_core::DocSummary` | `kebab-cli` (`doc_summary.v1`) | +| `CanonicalDocument` / `Chunk` | `kebab_core::*` | `kebab-cli` (`chunk_inspection.v1`) | ## Public surface (signatures only — no new types) -The signatures already live on `kb-app`. This task only swaps the bodies. Frozen surface: +The signatures already live on `kebab-app`. This task only swaps the bodies. Frozen surface: ```rust pub fn ingest(scope: SourceScope, summary_only: bool) -> anyhow::Result; @@ -69,7 +69,7 @@ pub fn search(query: SearchQuery) -> anyhow::Result`, `Arc`, `Arc` so cold-start cost is paid once per process. Document the lifetime model: a CLI invocation paths through `App::open(&Config)` once, runs the requested op, then drops everything on exit. The TUI (P9) holds the `App` for the session. @@ -100,17 +100,17 @@ Each call constructs (or reuses) the retrievers from a single `App` context that ### `list_docs(filter)` / `inspect_doc(id)` / `inspect_chunk(id)` -Direct delegation to `kb_core::DocumentStore` trait methods on `SqliteStore`. No vector / embedding involvement. +Direct delegation to `kebab_core::DocumentStore` trait methods on `SqliteStore`. No vector / embedding involvement. ### Lifecycle -Define an internal `pub(crate) struct App { config: Arc, sqlite: Arc, vector: Option>, embedder: Option>, /* retrievers built per call */ }` with a `pub(crate) fn open(config: &Config) -> Result`. Each public `kb-app::*` function builds an `App` (or accepts one in tests) and uses it. The free functions stay the public API so `kb-cli` and `kb-tui` don't need to refactor. +Define an internal `pub(crate) struct App { config: Arc, sqlite: Arc, vector: Option>, embedder: Option>, /* retrievers built per call */ }` with a `pub(crate) fn open(config: &Config) -> Result`. Each public `kebab-app::*` function builds an `App` (or accepts one in tests) and uses it. The free functions stay the public API so `kebab-cli` and `kebab-tui` don't need to refactor. `vector` and `embedder` are `Option` because the spec allows running KB without embeddings (lexical-only mode for resource-constrained boxes — `provider == "none"`). When absent, `search` with `Vector` / `Hybrid` modes returns `anyhow::Error` with a hint to enable embeddings; `ingest` skips the vector step. ### Versioning -- `parser_version` from `kb-parse-md` constant. +- `parser_version` from `kebab-parse-md` constant. - `chunker_version` from `MdHeadingV1Chunker::chunker_version()`. - `embedding_model` / `embedding_version` from the embedder. - `index_version` from the retriever (or `Lexical`/`Vector` IndexId). @@ -119,15 +119,15 @@ All wired through to the persisted records and the wire output. ## Storage / wire effects -- Writes: `data_dir/kb.sqlite` (assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs), `data_dir/assets//` (when copied), `data_dir/lancedb/chunk_embeddings__.lance/` (when embeddings on), `data_dir/models/fastembed/` (model cache, first run only). +- Writes: `data_dir/kebab.sqlite` (assets, documents, blocks, chunks, embedding_records, jobs/ingest_runs), `data_dir/assets//` (when copied), `data_dir/lancedb/chunk_embeddings__.lance/` (when embeddings on), `data_dir/models/fastembed/` (model cache, first run only). - Reads on subsequent calls: same DB. -- Wire JSON conforms to `ingest_report.v1`, `search_hit.v1`, `doc_summary.v1`, `chunk_inspection.v1` — `kb-cli` already has the wrappers. +- Wire JSON conforms to `ingest_report.v1`, `search_hit.v1`, `doc_summary.v1`, `chunk_inspection.v1` — `kebab-cli` already has the wrappers. ## Test plan | kind | description | fixture / data | |------|-------------|----------------| -| integration | `ingest` over a 3-file fixture workspace produces non-empty `IngestReport` and writes the expected SQLite rows + Lance rows | `crates/kb-app/tests/fixtures/workspace/`, tmp `data_dir` | +| integration | `ingest` over a 3-file fixture workspace produces non-empty `IngestReport` and writes the expected SQLite rows + Lance rows | `crates/kebab-app/tests/fixtures/workspace/`, tmp `data_dir` | | integration | `ingest` with `provider="none"` skips Lance and produces an `IngestReport` with `embeddings_indexed=0` | tmp config | | integration | `ingest` is idempotent: re-running on the same workspace updates `documents.updated_at` and bumps `doc_version` without duplicating rows | tmp `data_dir` | | integration | `search` with `mode=Lexical` returns hits whose `embedding_model` is `None` | tmp `data_dir` | @@ -136,36 +136,36 @@ All wired through to the persisted records and the wire output. | integration | `search` with `mode=Vector` and `provider="none"` returns a clear error | tmp config | | integration | `list_docs` with `tags_any=["rust"]` filters correctly | tmp `data_dir` | | integration | `inspect_doc` round-trips a document; `inspect_chunk` round-trips a chunk | tmp `data_dir` | -| smoke | end-to-end CLI smoke: `kb index` + `kb search --mode hybrid "..."` against the fixture workspace using `cargo run` (manual / `assert_cmd`) | fixture | +| smoke | end-to-end CLI smoke: `kebab index` + `kebab search --mode hybrid "..."` against the fixture workspace using `cargo run` (manual / `assert_cmd`) | fixture | -`#[ignore]` AVX-gated tests follow the P3-3/P3-4 pattern: `require_avx_or_panic()` at the top of each LanceDB-touching body. Default `cargo test -p kb-app` runs the lexical-only and SQLite-only paths. +`#[ignore]` AVX-gated tests follow the P3-3/P3-4 pattern: `require_avx_or_panic()` at the top of each LanceDB-touching body. Default `cargo test -p kebab-app` runs the lexical-only and SQLite-only paths. -All tests under `cargo test -p kb-app`. CLI smoke optional via `assert_cmd` if it doesn't add a heavyweight dep tree. +All tests under `cargo test -p kebab-app`. CLI smoke optional via `assert_cmd` if it doesn't add a heavyweight dep tree. ## Definition of Done -- [ ] `cargo check -p kb-app` passes. -- [ ] `cargo test -p kb-app` passes (default lane). -- [ ] `cargo test -p kb-app -- --ignored` passes on AVX hardware. -- [ ] `cargo run -p kb-cli -- index` succeeds against a non-empty fixture workspace. -- [ ] `cargo run -p kb-cli -- search --mode hybrid ""` returns real hits + citations. -- [ ] `cargo run -p kb-cli -- list` returns the indexed documents. +- [ ] `cargo check -p kebab-app` passes. +- [ ] `cargo test -p kebab-app` passes (default lane). +- [ ] `cargo test -p kebab-app -- --ignored` passes on AVX hardware. +- [ ] `cargo run -p kebab-cli -- index` succeeds against a non-empty fixture workspace. +- [ ] `cargo run -p kebab-cli -- search --mode hybrid ""` returns real hits + citations. +- [ ] `cargo run -p kebab-cli -- list` returns the indexed documents. - [ ] `cargo clippy --workspace --all-targets -- -D warnings` clean. - [ ] No imports outside Allowed dependencies. - [ ] PR links design §1.2, §1.5, §1.6, §7. ## Out of scope -- `kb-app::ask` body — P4-3 (RAG pipeline). -- `kb index --rebuild-fts` CLI option — wire `kb_store_sqlite::rebuild_chunks_fts` only when needed by a downstream task. -- `kb index --resume` checkpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer. +- `kebab-app::ask` body — P4-3 (RAG pipeline). +- `kebab index --rebuild-fts` CLI option — wire `kebab_store_sqlite::rebuild_chunks_fts` only when needed by a downstream task. +- `kebab index --resume` checkpointing — design §1.2 mentions it but spec leaves to a P+ refinement; document but defer. - `--watch` mode — P+. - TUI / desktop integration — P9 consumes the wired facade. ## Risks / notes -- Cold-start cost: first `kb index` run downloads the fastembed model (~470MB) and warms the ONNX session. Surface via `tracing::info!` (already wired in P3-2). -- **Post-merge fix (2026-05)**: original P3-5 implementation left every `kb-cli` subcommand calling the bare `kb_app::*` free functions, which silently re-loaded `Config::load(None)` (XDG default) and ignored the user's `--config ` flag. CLI now goes through `kb_app::*_with_config(cfg, ...)` everywhere. The `#[doc(hidden)] pub fn *_with_config` companions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See [HOTFIXES.md](../HOTFIXES.md) for details. +- Cold-start cost: first `kebab index` run downloads the fastembed model (~470MB) and warms the ONNX session. Surface via `tracing::info!` (already wired in P3-2). +- **Post-merge fix (2026-05)**: original P3-5 implementation left every `kebab-cli` subcommand calling the bare `kebab_app::*` free functions, which silently re-loaded `Config::load(None)` (XDG default) and ignored the user's `--config ` flag. CLI now goes through `kebab_app::*_with_config(cfg, ...)` everywhere. The `#[doc(hidden)] pub fn *_with_config` companions originally framed as "test seam" are now the official config-explicit API for CLI / TUI / integration tests. See [HOTFIXES.md](../HOTFIXES.md) for details. - The `App` lifecycle struct is internal but its construction is the natural seam for adding caching / connection pooling later. Keep it `pub(crate)` so future refactors don't break the CLI. - Mismatched `index_version` across stored records and the live retriever should fail loud at `App::open` (not at first search). Reuse the `tracing::warn!` from `HybridRetriever::new` (P3-4). - The fastembed adapter holds a `tokio::runtime` (P3-3); `App` must be constructed from a synchronous context. Document on `App::open`. diff --git a/tasks/p4/p4-1-llm-trait.md b/tasks/p4/p4-1-llm-trait.md index 2363f72..78e68a5 100644 --- a/tasks/p4/p4-1-llm-trait.md +++ b/tasks/p4/p4-1-llm-trait.md @@ -1,12 +1,12 @@ --- phase: P4 -component: kb-llm (trait crate) +component: kebab-llm (trait crate) task_id: p4-1 title: "LanguageModel trait + GenerateRequest/TokenChunk" status: completed depends_on: [p0-1] unblocks: [p4-2, p4-3] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§7.1 GenerateRequest/TokenChunk, §7.2 LanguageModel, §0 Q5 streaming, §3.8 ModelRef] --- @@ -14,16 +14,16 @@ contract_sections: [§7.1 GenerateRequest/TokenChunk, §7.2 LanguageModel, §0 Q ## Goal -Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper types (`GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage`, `ModelRef`), plus a `MockLanguageModel` for downstream tests. +Provide the `kebab-llm` crate that re-exports the `LanguageModel` trait and helper types (`GenerateRequest`, `TokenChunk`, `FinishReason`, `TokenUsage`, `ModelRef`), plus a `MockLanguageModel` for downstream tests. ## Why now / why this size -`kb-rag` (p4-3) consumes a `LanguageModel` trait object. Owning the trait + a deterministic mock here lets RAG tests run with no Ollama dependency. Real adapters (Ollama, llama.cpp, candle) live in p4-2 and beyond. +`kebab-rag` (p4-3) consumes a `LanguageModel` trait object. Owning the trait + a deterministic mock here lets RAG tests run with no Ollama dependency. Real adapters (Ollama, llama.cpp, candle) live in p4-2 and beyond. ## Allowed dependencies -- `kb-core` -- `kb-config` +- `kebab-core` +- `kebab-config` - `serde` - `thiserror` - `tracing` @@ -31,13 +31,13 @@ Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper ## Forbidden dependencies -- `reqwest`, `ureq`, `tokio`, `whisper-rs`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-tui`, `kb-desktop` +- `reqwest`, `ureq`, `tokio`, `whisper-rs`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `GenerateRequest` | `kb_core::GenerateRequest` | RAG pipeline | +| `GenerateRequest` | `kebab_core::GenerateRequest` | RAG pipeline | | concrete adapter at runtime | `dyn LanguageModel` | p4-2+ | ## Outputs @@ -45,12 +45,12 @@ Provide the `kb-llm` crate that re-exports the `LanguageModel` trait and helper | output | type | downstream | |--------|------|------------| | streaming `TokenChunk` iterator | `Box> + Send>` | RAG pipeline | -| `ModelRef` identity | `kb_core::ModelRef` | Answer.model | +| `ModelRef` identity | `kebab_core::ModelRef` | Answer.model | ## Public surface (signatures only — no new types) ```rust -pub use kb_core::{LanguageModel, GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef}; +pub use kebab_core::{LanguageModel, GenerateRequest, TokenChunk, FinishReason, TokenUsage, ModelRef}; /// Test-only deterministic mock. Compiled only when `mock` feature is on. #[cfg(feature = "mock")] @@ -59,12 +59,12 @@ pub struct MockLanguageModel { pub provider: String, pub context_tokens: usize, pub canned_response: String, // emitted token-by-token - pub canned_finish: kb_core::FinishReason, - pub canned_usage: kb_core::TokenUsage, + pub canned_finish: kebab_core::FinishReason, + pub canned_usage: kebab_core::TokenUsage, } #[cfg(feature = "mock")] -impl kb_core::LanguageModel for MockLanguageModel { /* per §7.2 */ } +impl kebab_core::LanguageModel for MockLanguageModel { /* per §7.2 */ } ``` ## Behavior contract @@ -89,12 +89,12 @@ impl kb_core::LanguageModel for MockLanguageModel { /* per §7.2 */ } | unit | concatenation of streamed `TokenChunk::Token` equals canned text (truncated by stop strings) | inline | | contract | `model_ref()` populates `provider` and leaves `dimensions = None` | inline | -All tests under `cargo test -p kb-llm`. +All tests under `cargo test -p kebab-llm`. ## Definition of Done -- [ ] `cargo check -p kb-llm` passes -- [ ] `cargo test -p kb-llm` passes +- [ ] `cargo check -p kebab-llm` passes +- [ ] `cargo test -p kebab-llm` passes - [ ] No HTTP / async runtime deps present - [ ] PR links design §7.2 LanguageModel, §0 Q5 diff --git a/tasks/p4/p4-2-ollama-adapter.md b/tasks/p4/p4-2-ollama-adapter.md index 6dae6c0..cdd9538 100644 --- a/tasks/p4/p4-2-ollama-adapter.md +++ b/tasks/p4/p4-2-ollama-adapter.md @@ -1,12 +1,12 @@ --- phase: P4 -component: kb-llm-local (Ollama adapter) +component: kebab-llm-local (Ollama adapter) task_id: p4-2 title: "OllamaLanguageModel — streaming /api/generate" status: completed depends_on: [p4-1] unblocks: [p4-3] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [design §7.2 LanguageModel, report §11.2 Ollama, design §6.4 [models.llm], design §0 Q5 streaming, design §10 errors] --- @@ -18,13 +18,13 @@ Implement `OllamaLanguageModel` against Ollama's local HTTP API (`POST /api/gene ## Why now / why this size -First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so swapping providers stays config-only. +First real LM. Required for `kebab ask` to function. Isolated from RAG pipeline so swapping providers stays config-only. ## Allowed dependencies -- `kb-core` -- `kb-config` -- `kb-llm` +- `kebab-core` +- `kebab-config` +- `kebab-llm` - `reqwest = { version = "0.12", default-features = false, features = ["blocking", "json", "rustls-tls"] }` - `serde`, `serde_json` - `tracing` @@ -32,21 +32,21 @@ First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so ## Forbidden dependencies -- `tokio`, `async-std`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-tui`, `kb-desktop`. (Streaming uses `reqwest::blocking::Response::bytes_stream` via line-delimited JSON; no async runtime needed.) +- `tokio`, `async-std`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-tui`, `kebab-desktop`. (Streaming uses `reqwest::blocking::Response::bytes_stream` via line-delimited JSON; no async runtime needed.) ## Inputs | input | type | source | |-------|------|--------| -| `kb-config::Config.models.llm` | endpoint, model, context, temperature, seed | runtime | -| `GenerateRequest` | `kb_core::GenerateRequest` | RAG pipeline | +| `kebab-config::Config.models.llm` | endpoint, model, context, temperature, seed | runtime | +| `GenerateRequest` | `kebab_core::GenerateRequest` | RAG pipeline | | Ollama HTTP server (local) | `http://127.0.0.1:11434` | external process | ## Outputs | output | type | downstream | |--------|------|------------| -| streaming `TokenChunk` iterator | per §7.2 | `kb-rag` | +| streaming `TokenChunk` iterator | per §7.2 | `kebab-rag` | | `ModelRef` | `{ id, provider="ollama", dimensions=None }` | `Answer.model` | ## Public surface (signatures only — no new types) @@ -55,14 +55,14 @@ First real LM. Required for `kb ask` to function. Isolated from RAG pipeline so pub struct OllamaLanguageModel { /* internal: reqwest::blocking::Client + config */ } impl OllamaLanguageModel { - pub fn new(config: &kb_config::Config) -> anyhow::Result; + pub fn new(config: &kebab_config::Config) -> anyhow::Result; } -impl kb_core::LanguageModel for OllamaLanguageModel { - fn model_ref(&self) -> kb_core::ModelRef; +impl kebab_core::LanguageModel for OllamaLanguageModel { + fn model_ref(&self) -> kebab_core::ModelRef; fn context_tokens(&self) -> usize; - fn generate_stream(&self, req: kb_core::GenerateRequest) - -> anyhow::Result> + Send>>; + fn generate_stream(&self, req: kebab_core::GenerateRequest) + -> anyhow::Result> + Send>>; } ``` @@ -93,7 +93,7 @@ impl kb_core::LanguageModel for OllamaLanguageModel { - UTF-8 boundary: buffer incomplete byte sequences across stream lines before emitting `TokenChunk::Token`. - Determinism: with `temperature=0` and fixed `seed`, Ollama's output is reproducible (modulo nondeterminism in the model itself); tests that verify determinism use a fixed seed and may rely on aggregate hash with tolerance, NOT byte equality. - `model_ref().provider = "ollama"`, `dimensions = None`. -- Reachability check: `OllamaLanguageModel::new` does NOT eagerly hit the network; first failure surfaces on `generate_stream`. Use `kb doctor` (separate task) to probe. +- Reachability check: `OllamaLanguageModel::new` does NOT eagerly hit the network; first failure surfaces on `generate_stream`. Use `kebab doctor` (separate task) to probe. ## Storage / wire effects @@ -112,12 +112,12 @@ impl kb_core::LanguageModel for OllamaLanguageModel { | determinism | identical request + temperature=0 + seed=0 produces identical token stream against mock | mocked HTTP | | `#[ignore]` integration | real Ollama on `localhost:11434` with `qwen2.5:14b-instruct` produces non-empty output | requires user opt-in | -All non-ignored tests under `cargo test -p kb-llm-local`. Real-LM integration runs via `cargo test -p kb-llm-local -- --ignored`. +All non-ignored tests under `cargo test -p kebab-llm-local`. Real-LM integration runs via `cargo test -p kebab-llm-local -- --ignored`. ## Definition of Done -- [ ] `cargo check -p kb-llm-local` passes -- [ ] `cargo test -p kb-llm-local` passes (mocked tests; real LM behind `#[ignore]`) +- [ ] `cargo check -p kebab-llm-local` passes +- [ ] `cargo test -p kebab-llm-local` passes (mocked tests; real LM behind `#[ignore]`) - [ ] No async runtime present (uses `reqwest::blocking`) - [ ] No imports outside Allowed dependencies - [ ] PR links design §11.2, §0 Q5, §10 @@ -125,7 +125,7 @@ All non-ignored tests under `cargo test -p kb-llm-local`. Real-LM integration ru ## Out of scope - llama.cpp / candle adapters (P+). -- Embedding via Ollama's `/api/embed` endpoint (alternate adapter inside `kb-embed-local` if requested later). +- Embedding via Ollama's `/api/embed` endpoint (alternate adapter inside `kebab-embed-local` if requested later). - Cancellation / abort tokens (P+). - Connection pooling tuning (default `reqwest::blocking` is sufficient for single-user CLI). diff --git a/tasks/p4/p4-3-rag-pipeline.md b/tasks/p4/p4-3-rag-pipeline.md index 4eae6b8..aaaf77c 100644 --- a/tasks/p4/p4-3-rag-pipeline.md +++ b/tasks/p4/p4-3-rag-pipeline.md @@ -1,12 +1,12 @@ --- phase: P4 -component: kb-rag +component: kebab-rag task_id: p4-3 title: "RAG pipeline: retrieve → gate → pack → generate → cite-validate" status: completed depends_on: [p3-4, p4-2] unblocks: [p5-1] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§0 Q4 refusal (two-layer), §0 Q7 footer, §1.1–1.4 ask scenes, §2.3 Answer wire, §3.8 internal Answer, §6.4 [rag], §10 errors] --- @@ -22,11 +22,11 @@ This is the user-facing payoff. Splitting it further would couple too many inter ## Allowed dependencies -- `kb-core` -- `kb-config` -- `kb-search` (Retriever trait object) -- `kb-llm` (LanguageModel trait object) -- `kb-store-sqlite` (read chunk full text/section + write `answers` row) +- `kebab-core` +- `kebab-config` +- `kebab-search` (Retriever trait object) +- `kebab-llm` (LanguageModel trait object) +- `kebab-store-sqlite` (read chunk full text/section + write `answers` row) - `serde`, `serde_json` - `regex` (for citation marker extraction) - `time` @@ -35,51 +35,51 @@ This is the user-facing payoff. Splitting it further would couple too many inter ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector` (only via Retriever trait), `kb-embed*` (only via Retriever), `kb-llm-local` (only via LanguageModel trait), `kb-tui`, `kb-desktop` +- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector` (only via Retriever trait), `kebab-embed*` (only via Retriever), `kebab-llm-local` (only via LanguageModel trait), `kebab-tui`, `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `query: &str` | text | `kb-app::ask` | +| `query: &str` | text | `kebab-app::ask` | | `AskOpts` | k, explain, mode, temperature, seed | CLI | | `dyn Retriever` | hybrid retriever from p3-4 | runtime injection | | `dyn LanguageModel` | from p4-2 (or mock) | runtime injection | | `dyn DocumentStore` | for chunk full-text fetch | from p1-6 | -| `kb-config::Config.rag` | `prompt_template_version`, `score_gate`, `max_context_tokens` | runtime | +| `kebab-config::Config.rag` | `prompt_template_version`, `score_gate`, `max_context_tokens` | runtime | ## Outputs | output | type | downstream | |--------|------|------------| -| `Answer` | `kb_core::Answer` | `kb-cli` printer, `answers` table | +| `Answer` | `kebab_core::Answer` | `kebab-cli` printer, `answers` table | | `answers` table row | SQLite | history, eval | ## Public surface (signatures only — no new types) ```rust pub struct RagPipeline { - retriever: std::sync::Arc, - llm: std::sync::Arc, - docs: std::sync::Arc, - config: kb_config::Config, + retriever: std::sync::Arc, + llm: std::sync::Arc, + docs: std::sync::Arc, + config: kebab_config::Config, } impl RagPipeline { pub fn new( - config: kb_config::Config, - retriever: std::sync::Arc, - llm: std::sync::Arc, - docs: std::sync::Arc, + config: kebab_config::Config, + retriever: std::sync::Arc, + llm: std::sync::Arc, + docs: std::sync::Arc, ) -> Self; - pub fn ask(&self, query: &str, opts: AskOpts) -> anyhow::Result; + pub fn ask(&self, query: &str, opts: AskOpts) -> anyhow::Result; } pub struct AskOpts { pub k: usize, pub explain: bool, - pub mode: kb_core::SearchMode, + pub mode: kebab_core::SearchMode, pub temperature: Option, pub seed: Option, pub stream_sink: Option>, // tty/UI token streaming @@ -158,12 +158,12 @@ pub struct AskOpts { | determinism | identical inputs + temperature=0 + seed=0 → identical Answer (snapshot) | mock | | snapshot | `Answer` JSON for fixed query stable | `fixtures/rag/run-1.json` | -All tests under `cargo test -p kb-rag` with no real Ollama (mock LM only). +All tests under `cargo test -p kebab-rag` with no real Ollama (mock LM only). ## Definition of Done -- [ ] `cargo check -p kb-rag` passes -- [ ] `cargo test -p kb-rag` passes +- [ ] `cargo check -p kebab-rag` passes +- [ ] `cargo test -p kebab-rag` passes - [ ] No imports outside Allowed dependencies - [ ] All paths write an `answers` row - [ ] Output JSON conforms to `answer.v1` @@ -178,9 +178,9 @@ All tests under `cargo test -p kb-rag` with no real Ollama (mock LM only). ## Risks / notes -- Citation regex is STRICT `\[#(\d{1,3})\]` only. Models that emit `[1]`/`[ #1 ]`/`[foo]` are treated as no-marker → refusal. This is intentional: a noisy citation grammar lets prose `[1]` or `vec![1]` slip through as false positives, which corrupts both `grounded` and `kb eval` `citation_coverage`. The prompt template (`rag-v1`) explicitly instructs `[#번호]`. -- **Post-merge fix (2026-05)**: kb-cli's `Cmd::Ask` arm originally called bare `kb_app::ask(query, opts)`, ignoring `--config ` and silently using XDG defaults (manifested as wrong model id / score_gate / data_dir surfacing in `Answer.retrieval`). Fixed by routing through `kb_app::ask_with_config(cfg, query, opts)`. See [HOTFIXES.md](../HOTFIXES.md). -- **Post-merge fix (2026-05)**: `config.rag.score_gate` default `0.05` was incompatible with hybrid RRF `fusion_score` (raw range `(0, 2/k_rrf]` ≈ `≤ 0.033` at default k_rrf=60), refusing every hybrid `kb ask`. Closed by normalizing RRF in p3-4 to `[0, 1]`. See p3-4 spec + HOTFIXES.md. +- Citation regex is STRICT `\[#(\d{1,3})\]` only. Models that emit `[1]`/`[ #1 ]`/`[foo]` are treated as no-marker → refusal. This is intentional: a noisy citation grammar lets prose `[1]` or `vec![1]` slip through as false positives, which corrupts both `grounded` and `kebab eval` `citation_coverage`. The prompt template (`rag-v1`) explicitly instructs `[#번호]`. +- **Post-merge fix (2026-05)**: kebab-cli's `Cmd::Ask` arm originally called bare `kebab_app::ask(query, opts)`, ignoring `--config ` and silently using XDG defaults (manifested as wrong model id / score_gate / data_dir surfacing in `Answer.retrieval`). Fixed by routing through `kebab_app::ask_with_config(cfg, query, opts)`. See [HOTFIXES.md](../HOTFIXES.md). +- **Post-merge fix (2026-05)**: `config.rag.score_gate` default `0.05` was incompatible with hybrid RRF `fusion_score` (raw range `(0, 2/k_rrf]` ≈ `≤ 0.033` at default k_rrf=60), refusing every hybrid `kebab ask`. Closed by normalizing RRF in p3-4 to `[0, 1]`. See p3-4 spec + HOTFIXES.md. - `stream_sink` channel: pipeline `send`s tokens; if the receiver is dropped (caller cancelled), `SendError` is silently swallowed and generation continues to completion (so the `Answer` row still gets persisted). Pipeline does NOT panic on a dead sink. - `temperature=0` does not fully eliminate stochasticity in some quantized Ollama models; document this and rely on `must_contain` rule-based metrics in P5 instead of exact match. - Prompt-injection defense lives entirely in the system prompt; do NOT mutate `[근거]` text. If chunk text contains `<|system|>` or similar tokens, do not strip them — they are inert when wrapped. diff --git a/tasks/p5/p5-1-golden-fixture-runner.md b/tasks/p5/p5-1-golden-fixture-runner.md index 5cc7e47..66b196b 100644 --- a/tasks/p5/p5-1-golden-fixture-runner.md +++ b/tasks/p5/p5-1-golden-fixture-runner.md @@ -1,12 +1,12 @@ --- phase: P5 -component: kb-eval (runner) +component: kebab-eval (runner) task_id: p5-1 title: "Golden query fixture loader + per-query runner" status: completed depends_on: [p4-3] unblocks: [p5-2] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§5.7 eval_runs/eval_query_results, §6.3 runs_dir, phase epic tasks/phase-5-evaluation.md] --- @@ -14,7 +14,7 @@ contract_sections: [§5.7 eval_runs/eval_query_results, §6.3 runs_dir, phase ep ## Goal -Load `fixtures/golden_queries.yaml`, run each query through `kb-app` (lexical / vector / hybrid / rag), and persist results into `eval_query_results` + `runs_dir//per_query.jsonl`. +Load `fixtures/golden_queries.yaml`, run each query through `kebab-app` (lexical / vector / hybrid / rag), and persist results into `eval_query_results` + `runs_dir//per_query.jsonl`. ## Why now / why this size @@ -22,10 +22,10 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t ## Allowed dependencies -- `kb-core` -- `kb-config` -- `kb-app` (calls facade for search / ask) -- `kb-store-sqlite` (writes eval rows) +- `kebab-core` +- `kebab-config` +- `kebab-app` (calls facade for search / ask) +- `kebab-store-sqlite` (writes eval rows) - `serde`, `serde_yaml`, `serde_json` - `time` - `tracing` @@ -33,7 +33,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (all reached via `kb-app` facade only), `kb-tui`, `kb-desktop` +- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (all reached via `kebab-app` facade only), `kebab-tui`, `kebab-desktop` ## Inputs @@ -41,7 +41,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t |-------|------|--------| | `fixtures/golden_queries.yaml` | YAML | repo-shipped | | `EvalRunOpts` | suite, mode, with_rag, k, temperature, seed | CLI | -| `kb-app` facade | search/ask | runtime | +| `kebab-app` facade | search/ask | runtime | ## Outputs @@ -50,7 +50,7 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t | `eval_runs` row | SQLite | p5-2, history | | `eval_query_results` rows | SQLite | p5-2 | | `runs_dir//per_query.jsonl` | filesystem | external tools, audits | -| `EvalRun` struct | `kb_eval::EvalRun` | caller | +| `EvalRun` struct | `kebab_eval::EvalRun` | caller | ## Public surface (signatures only — no new types) @@ -58,9 +58,9 @@ The runner is the data collector; metrics computation is p5-2's job. Splitting t pub struct GoldenQuery { pub id: String, pub query: String, - pub lang: kb_core::Lang, - pub expected_doc_ids: Vec, - pub expected_chunk_ids: Vec, + pub lang: kebab_core::Lang, + pub expected_doc_ids: Vec, + pub expected_chunk_ids: Vec, pub must_contain: Vec, pub forbidden: Vec, pub difficulty: Option, @@ -68,7 +68,7 @@ pub struct GoldenQuery { pub struct EvalRunOpts { pub suite: String, // "golden" default - pub mode: kb_core::SearchMode, + pub mode: kebab_core::SearchMode, pub with_rag: bool, pub k: usize, pub temperature: Option, @@ -86,9 +86,9 @@ pub struct EvalRun { pub struct QueryResult { pub query_id: String, pub query: String, - pub mode: kb_core::SearchMode, - pub hits_top_k: Vec, - pub answer: Option, + pub mode: kebab_core::SearchMode, + pub hits_top_k: Vec, + pub answer: Option, pub elapsed_ms: u32, pub error: Option, } @@ -103,10 +103,10 @@ pub fn run_eval(opts: &EvalRunOpts) -> anyhow::Result; - Parses YAML; required fields: `id`, `query`. Optional: everything else (defaults to empty / `None`). - Validates uniqueness of `id` and that `expected_doc_ids` / `expected_chunk_ids` exist in DB; missing → return error listing the offenders. - `run_eval`: - - Loads `fixtures/golden_queries.yaml` (path overridable via env `KB_EVAL_GOLDEN`). + - Loads `fixtures/golden_queries.yaml` (path overridable via env `KEBAB_EVAL_GOLDEN`). - Generates `run_id = "run_" + ulid_lower()`. - - Captures `config_snapshot_json`: serialized `kb_config::Config` plus `chunker_version`, `embedding_model+version+dims`, `llm.model_id`, `prompt_template_version`, `score_gate`, `rrf_k`, `index_version`. - - For each query: call `kb_app::search(SearchQuery { mode: opts.mode, k: opts.k, .. })`. If `opts.with_rag`, also call `kb_app::ask(query, AskOpts { mode: opts.mode, k: opts.k, explain: true, temperature: opts.temperature, seed: opts.seed, .. })`. + - Captures `config_snapshot_json`: serialized `kebab_config::Config` plus `chunker_version`, `embedding_model+version+dims`, `llm.model_id`, `prompt_template_version`, `score_gate`, `rrf_k`, `index_version`. + - For each query: call `kebab_app::search(SearchQuery { mode: opts.mode, k: opts.k, .. })`. If `opts.with_rag`, also call `kebab_app::ask(query, AskOpts { mode: opts.mode, k: opts.k, explain: true, temperature: opts.temperature, seed: opts.seed, .. })`. - Each `QueryResult` measured by elapsed wall-clock (ms). - Errors are caught per-query (do not abort the run). Failed queries record `error: Some(msg)` and `hits_top_k = vec![]`. - Determinism: with `temperature=0` and fixed `seed`, two consecutive runs produce byte-identical `per_query.jsonl` for non-RAG queries; RAG queries may differ in negligible token budget telemetry. @@ -130,12 +130,12 @@ pub fn run_eval(opts: &EvalRunOpts) -> anyhow::Result; | determinism | re-running same suite + fixed seed → identical `per_query.jsonl` (lexical only) | tmp DB, fixed corpus | | snapshot | `EvalRun` (with mock LM for `with_rag`) JSON stable | `fixtures/eval/run-1.json` | -All tests under `cargo test -p kb-eval runner`. +All tests under `cargo test -p kebab-eval runner`. ## Definition of Done -- [ ] `cargo check -p kb-eval` passes -- [ ] `cargo test -p kb-eval runner` passes +- [ ] `cargo check -p kebab-eval` passes +- [ ] `cargo test -p kebab-eval runner` passes - [ ] `fixtures/golden_queries.yaml` template shipped (≥ 5 example entries) - [ ] No imports outside Allowed dependencies - [ ] PR links design §5.7 diff --git a/tasks/p5/p5-2-metrics-compare.md b/tasks/p5/p5-2-metrics-compare.md index cc6e6e2..d525ac7 100644 --- a/tasks/p5/p5-2-metrics-compare.md +++ b/tasks/p5/p5-2-metrics-compare.md @@ -1,12 +1,12 @@ --- phase: P5 -component: kb-eval (metrics + compare) +component: kebab-eval (metrics + compare) task_id: p5-2 title: "Metrics computation + compare report" status: completed depends_on: [p5-1] unblocks: [] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§5.7 eval_runs.aggregate_json, phase epic tasks/phase-5-evaluation.md] --- @@ -14,7 +14,7 @@ contract_sections: [§5.7 eval_runs.aggregate_json, phase epic tasks/phase-5-eva ## Goal -Compute hit@k, MRR, recall@k_doc, citation_coverage, groundedness, empty_result_rate, refusal_correctness from stored `eval_query_results`. Write `aggregate_json` back into `eval_runs`. Provide `kb eval compare a b` that diffs two runs. +Compute hit@k, MRR, recall@k_doc, citation_coverage, groundedness, empty_result_rate, refusal_correctness from stored `eval_query_results`. Write `aggregate_json` back into `eval_runs`. Provide `kebab eval compare a b` that diffs two runs. ## Why now / why this size @@ -22,16 +22,16 @@ Metric formulas + comparison logic are pure computation. Splitting them from p5- ## Allowed dependencies -- `kb-core` -- `kb-config` -- `kb-store-sqlite` (read eval rows, write `aggregate_json`) +- `kebab-core` +- `kebab-config` +- `kebab-store-sqlite` (read eval rows, write `aggregate_json`) - `serde`, `serde_json` - `tracing` - `thiserror` ## Forbidden dependencies -- `kb-app`, `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-vector`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-app`, `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-vector`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs @@ -46,7 +46,7 @@ Metric formulas + comparison logic are pure computation. Splitting them from p5- | output | type | downstream | |--------|------|------------| | `eval_runs.aggregate_json` updated | SQLite | history, CI checks | -| `CompareReport` | `kb_eval::CompareReport` | `kb-cli` printer | +| `CompareReport` | `kebab_eval::CompareReport` | `kebab-cli` printer | | optional `runs_dir//report.md` | filesystem | human-readable summary | ## Public surface (signatures only — no new types) @@ -127,15 +127,15 @@ pub fn render_report_md(report: &CompareReport) -> String; | determinism | running `compute_aggregate` twice produces identical `AggregateMetrics` | inline | | snapshot | `CompareReport` JSON for a fixed pair of runs stable | `fixtures/eval/compare-1.json` | -All tests under `cargo test -p kb-eval metrics`. +All tests under `cargo test -p kebab-eval metrics`. ## Definition of Done -- [ ] `cargo check -p kb-eval` passes -- [ ] `cargo test -p kb-eval metrics` passes +- [ ] `cargo check -p kebab-eval` passes +- [ ] `cargo test -p kebab-eval metrics` passes - [ ] No imports outside Allowed dependencies - [ ] `eval_runs.aggregate_json` always populated after `store_aggregate` -- [ ] `kb eval compare` CLI surface integrated via `kb-app` (call `compare_runs` + `render_report_md`) +- [ ] `kebab eval compare` CLI surface integrated via `kebab-app` (call `compare_runs` + `render_report_md`) - [ ] PR links phase epic tasks/phase-5-evaluation.md ## Out of scope @@ -172,11 +172,11 @@ same one this spec defines, the names / wiring just differ. / `compare_runs(a, b)` so integration tests can drive the pipeline against a TempDir-backed `Config`. The no-arg forms wrap them with `Config::load(None)`. -- **CLI surface lives on `kb-cli` directly, not via `kb-app`.** DoD - asks for `kb eval compare` to be reached "via kb-app", but `kb-app` - already depends on `kb-eval` (the P5-1 runner uses the App facade), - so routing the CLI through `kb-app` would form a cycle. `kb-cli` → - `kb-eval` is wired directly; `kb-app` is unchanged. +- **CLI surface lives on `kebab-cli` directly, not via `kebab-app`.** DoD + asks for `kebab eval compare` to be reached "via kebab-app", but `kebab-app` + already depends on `kebab-eval` (the P5-1 runner uses the App facade), + so routing the CLI through `kebab-app` would form a cycle. `kebab-cli` → + `kebab-eval` is wired directly; `kebab-app` is unchanged. - **`AggregateMetrics` is `Serialize + Deserialize`.** The spec defines only the field shape; we add `Deserialize` so the stored `aggregate_json` can round-trip back into the type for follow-up @@ -184,12 +184,12 @@ same one this spec defines, the names / wiring just differ. - **`anyhow`** is used in `Result` returns since the rest of the workspace already speaks anyhow; not in the spec's Allowed list but matches every other crate. -- **`kb-eval` crate-level `kb-app` dep stays.** The crate already - depends on `kb-app` from P5-1 (the runner uses the `App` facade), so +- **`kebab-eval` crate-level `kebab-app` dep stays.** The crate already + depends on `kebab-app` from P5-1 (the runner uses the `App` facade), so the Cargo.toml entry remains. The new modules (`metrics.rs`, - `compare.rs`) do not import `kb-app` themselves — they're behind the + `compare.rs`) do not import `kebab-app` themselves — they're behind the same crate boundary as the runner, but the metric/compare *surface* - is `kb-app`-clean. Splitting the crate to avoid a transitive Cargo + is `kebab-app`-clean. Splitting the crate to avoid a transitive Cargo edge would be churn for no behavior gain. - **`citation_coverage` is intentionally weaker than the spec literal.** Spec calls for "every citation resolves to a real chunk in the DB". diff --git a/tasks/p6/p6-1-image-extractor-exif.md b/tasks/p6/p6-1-image-extractor-exif.md index 390f3aa..ff6449a 100644 --- a/tasks/p6/p6-1-image-extractor-exif.md +++ b/tasks/p6/p6-1-image-extractor-exif.md @@ -1,12 +1,12 @@ --- phase: P6 -component: kb-parse-image (image extractor + EXIF) +component: kebab-parse-image (image extractor + EXIF) task_id: p6-1 title: "Image Extractor producing single-block CanonicalDocument + EXIF metadata" status: planned depends_on: [p0-1, p1-6] unblocks: [p6-2, p6-3] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.4 Block::ImageRef + ImageRefBlock, §3.7a OcrText/ModelCaption stubs, §9.1 image extraction policy, §9 versioning] --- @@ -22,8 +22,8 @@ Establishes the image-as-document contract and decouples extraction (asset → I ## Allowed dependencies -- `kb-core` -- `kb-config` +- `kebab-core` +- `kebab-config` - `image = "0.25"` (decoding for size + format detect) - `kamadak-exif` for EXIF - `serde`, `serde_json` @@ -33,31 +33,31 @@ Establishes the image-as-document contract and decouples extraction (asset → I ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, OCR libs, LLM libs +- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, OCR libs, LLM libs ## Inputs | input | type | source | |-------|------|--------| -| `RawAsset` | `kb_core::RawAsset` | from `kb-source-fs` | +| `RawAsset` | `kebab_core::RawAsset` | from `kebab-source-fs` | | image bytes | `&[u8]` | filesystem | -| `parser_version` | `kb_core::ParserVersion` | constant in this crate (`"image-meta-v1"`) | +| `parser_version` | `kebab_core::ParserVersion` | constant in this crate (`"image-meta-v1"`) | ## Outputs | output | type | downstream | |--------|------|------------| -| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (image-region chunker) → `kb-store-sqlite` | +| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (image-region chunker) → `kebab-store-sqlite` | ## Public surface (signatures only — no new types) ```rust pub struct ImageExtractor; -impl kb_core::Extractor for ImageExtractor { - fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Image(_)) } - fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("image-meta-v1".into()) } - fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result; +impl kebab_core::Extractor for ImageExtractor { + fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Image(_)) } + fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("image-meta-v1".into()) } + fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result; } ``` @@ -69,7 +69,7 @@ impl kb_core::Extractor for ImageExtractor { - `metadata.source_type = SourceType::Reference` (per design enum); `trust_level = TrustLevel::Primary`; `tags`/`aliases` empty. - `metadata.user["exif"]` = JSON object with whitelisted EXIF tags (DateTimeOriginal, GPS lat/lon, Make, Model, Orientation, Software). Missing tags omitted. - `metadata.user["dimensions"] = { "w": , "h": , "format": "" }`. -- `provenance` includes `Discovered`, `Parsed` events (no Normalized — ID assignment happens here directly per §3.4 stub from p1-4 logic, OR pipe through `kb-normalize` if available; this task's choice: emit a fully formed CanonicalDocument with deterministic IDs by calling `kb_core::id_for_doc` and `kb_core::id_for_block` directly). +- `provenance` includes `Discovered`, `Parsed` events (no Normalized — ID assignment happens here directly per §3.4 stub from p1-4 logic, OR pipe through `kebab-normalize` if available; this task's choice: emit a fully formed CanonicalDocument with deterministic IDs by calling `kebab_core::id_for_doc` and `kebab_core::id_for_block` directly). - Failure modes: - Truncated/corrupt image → still emits a CanonicalDocument with `dimensions = null`, EXIF empty, `Provenance` warning event with the decoder error message. - Unsupported format → `anyhow::Error` (caller skips). @@ -77,7 +77,7 @@ impl kb_core::Extractor for ImageExtractor { ## Storage / wire effects -- None directly (the caller persists via `kb-store-sqlite`). +- None directly (the caller persists via `kebab-store-sqlite`). ## Test plan @@ -90,12 +90,12 @@ impl kb_core::Extractor for ImageExtractor { | determinism | identical bytes → identical `doc_id`, `block_id` across two runs | inline | | snapshot | `CanonicalDocument` JSON stable for fixture | `fixtures/image/red-100x50.png` | -All tests under `cargo test -p kb-parse-image`. +All tests under `cargo test -p kebab-parse-image`. ## Definition of Done -- [ ] `cargo check -p kb-parse-image` passes -- [ ] `cargo test -p kb-parse-image` passes +- [ ] `cargo check -p kebab-parse-image` passes +- [ ] `cargo test -p kebab-parse-image` passes - [ ] No OCR/caption/embedding code present - [ ] No imports outside Allowed dependencies - [ ] PR links design §3.4, §9.1 diff --git a/tasks/p6/p6-2-ocr-adapter.md b/tasks/p6/p6-2-ocr-adapter.md index 809f32d..f349773 100644 --- a/tasks/p6/p6-2-ocr-adapter.md +++ b/tasks/p6/p6-2-ocr-adapter.md @@ -1,12 +1,12 @@ --- phase: P6 -component: kb-parse-image (OCR adapter) +component: kebab-parse-image (OCR adapter) task_id: p6-2 title: "OcrEngine trait + Tesseract adapter (Apple Vision feature-gated)" status: planned depends_on: [p6-1] unblocks: [p6-3] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.4 ImageRefBlock.ocr, §3.7a OcrText/OcrRegion, §9.1 OCR vs caption provenance] --- @@ -22,9 +22,9 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini ## Allowed dependencies -- `kb-core` -- `kb-config` -- `kb-parse-image` (consumes its types) +- `kebab-core` +- `kebab-config` +- `kebab-parse-image` (consumes its types) - `tesseract = "0.13"` (feature `tesseract`, default ON) - For feature `apple-vision`: `std::process::Command` only (sidecar binary, not a Rust dep) - `serde`, `serde_json` @@ -34,21 +34,21 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| | image bytes | `&[u8]` | from extractor | -| optional language hint | `kb_core::Lang` | metadata | -| `kb-config` OCR settings | engine name, languages | runtime | +| optional language hint | `kebab_core::Lang` | metadata | +| `kebab-config` OCR settings | engine name, languages | runtime | ## Outputs | output | type | downstream | |--------|------|------------| -| `OcrText` | `kb_core::OcrText` | merged into `ImageRefBlock.ocr` | +| `OcrText` | `kebab_core::OcrText` | merged into `ImageRefBlock.ocr` | ## Public surface (signatures only — no new types) @@ -56,11 +56,11 @@ Strict separation of OCR (observed text) from caption (model-generated). Confini pub trait OcrEngine: Send + Sync { fn engine_name(&self) -> &'static str; fn engine_version(&self) -> String; - fn recognize(&self, image_bytes: &[u8], lang_hint: Option<&kb_core::Lang>) -> anyhow::Result; + fn recognize(&self, image_bytes: &[u8], lang_hint: Option<&kebab_core::Lang>) -> anyhow::Result; } pub struct TesseractOcr { /* internal: lazy api handle */ } -impl TesseractOcr { pub fn new(config: &kb_config::Config) -> anyhow::Result; } +impl TesseractOcr { pub fn new(config: &kebab_config::Config) -> anyhow::Result; } impl OcrEngine for TesseractOcr { /* per trait */ } #[cfg(feature = "apple-vision")] @@ -71,8 +71,8 @@ impl OcrEngine for AppleVisionOcr { /* per trait */ } pub fn apply_ocr( engine: &dyn OcrEngine, image_bytes: &[u8], - block: &mut kb_core::ImageRefBlock, - lang_hint: Option<&kb_core::Lang>, + block: &mut kebab_core::ImageRefBlock, + lang_hint: Option<&kebab_core::Lang>, ) -> anyhow::Result<()>; ``` @@ -85,7 +85,7 @@ pub fn apply_ocr( - `joined` = `regions.iter().map(|r| r.text).join(" ")` (no smart layout reconstruction in v1). - `engine = "tesseract"`, `engine_version = `. The `tesseract` crate (0.13+) does NOT expose a stable Rust `version()` accessor. Use one of: (a) call libtesseract's `TessVersion()` via the bundled FFI surface, OR (b) at adapter construction, shell-out `tesseract --version` once and cache the parsed `"5.3.4"`-style string. Both are deterministic for a fixed install. Pin the chosen approach in the implementation PR. - Apple Vision sidecar (feature `apple-vision`): - - Spawn a small Swift binary `kb-vision-ocr` (path from `config.ocr.apple_vision_binary`) feeding the image via stdin and reading JSON `{ regions: [{x,y,w,h,text,confidence}, ...] }` from stdout. + - Spawn a small Swift binary `kebab-vision-ocr` (path from `config.ocr.apple_vision_binary`) feeding the image via stdin and reading JSON `{ regions: [{x,y,w,h,text,confidence}, ...] }` from stdout. - Same threshold and `joined` rules as Tesseract. `engine = "apple-vision"`, `engine_version = sidecar's --version`. - This subagent task does NOT write the Swift sidecar; it only wires the Rust side. Document the expected sidecar interface in `docs/spec/sidecar-vision.md` (separate doc spec stub, optional). - `apply_ocr` calls `engine.recognize`, sets `block.ocr = Some(text)`, and appends a `Provenance::OcrApplied` event in the caller's CanonicalDocument (caller responsibility — this task exposes a helper). @@ -109,12 +109,12 @@ pub fn apply_ocr( | determinism | two runs of recognize on same input → identical OcrText | fixture | | `#[cfg(feature = "apple-vision")]` smoke | sidecar invocation captured (mock binary echoes fixed JSON) | inline mock | -All tests under `cargo test -p kb-parse-image ocr`. Tesseract install required on CI host. +All tests under `cargo test -p kebab-parse-image ocr`. Tesseract install required on CI host. ## Definition of Done -- [ ] `cargo check -p kb-parse-image --features tesseract` passes -- [ ] `cargo test -p kb-parse-image ocr` passes +- [ ] `cargo check -p kebab-parse-image --features tesseract` passes +- [ ] `cargo test -p kebab-parse-image ocr` passes - [ ] `apple-vision` feature compiles on macOS and gracefully no-ops on Linux - [ ] No imports outside Allowed dependencies - [ ] PR links design §3.4, §3.7a, §9.1 @@ -129,5 +129,5 @@ All tests under `cargo test -p kb-parse-image ocr`. Tesseract install required o ## Risks / notes - Tesseract performance varies wildly with image quality; document `min_confidence` and default page-segmentation mode. -- Apple Vision sidecar requires code signing for distribution; for v1 dev builds, accept unsigned binary from `~/.local/bin/kb-vision-ocr`. +- Apple Vision sidecar requires code signing for distribution; for v1 dev builds, accept unsigned binary from `~/.local/bin/kebab-vision-ocr`. - Large image downscale loses small-text recognition; expose `config.ocr.max_pixels` so power users can tune. diff --git a/tasks/p6/p6-3-caption-adapter.md b/tasks/p6/p6-3-caption-adapter.md index 5616307..c4a99fd 100644 --- a/tasks/p6/p6-3-caption-adapter.md +++ b/tasks/p6/p6-3-caption-adapter.md @@ -1,12 +1,12 @@ --- phase: P6 -component: kb-parse-image (caption adapter) +component: kebab-parse-image (caption adapter) task_id: p6-3 title: "ModelCaption adapter (LanguageModel-driven, feature-gated)" status: planned depends_on: [p6-1, p4-2] unblocks: [] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.4 ImageRefBlock.caption, §3.7a ModelCaption, §9.1 caption (model-generated, low trust)] --- @@ -22,10 +22,10 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le ## Allowed dependencies -- `kb-core` -- `kb-config` -- `kb-parse-image` -- `kb-llm` (LanguageModel trait) +- `kebab-core` +- `kebab-config` +- `kebab-parse-image` +- `kebab-llm` (LanguageModel trait) - `base64` - `serde`, `serde_json` - `image` (resize for prompt cost control) @@ -34,7 +34,7 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-rag`, `kb-llm-local` (only via trait), `kb-tui`, `kb-desktop` +- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-rag`, `kebab-llm-local` (only via trait), `kebab-tui`, `kebab-desktop` ## Inputs @@ -42,28 +42,28 @@ Captioning closes the multimodal loop. Strict separation from OCR keeps trust le |-------|------|--------| | image bytes | `&[u8]` | extractor | | `dyn LanguageModel` (vision-capable) | runtime | injected | -| `kb-config.image.caption` | `{ enabled, max_pixels, prompt_template_version }` | runtime | +| `kebab-config.image.caption` | `{ enabled, max_pixels, prompt_template_version }` | runtime | ## Outputs | output | type | downstream | |--------|------|------------| -| `ModelCaption` | `kb_core::ModelCaption` | merged into `ImageRefBlock.caption` | +| `ModelCaption` | `kebab_core::ModelCaption` | merged into `ImageRefBlock.caption` | ## Public surface (signatures only — no new types) ```rust pub fn caption_image( - llm: &dyn kb_core::LanguageModel, + llm: &dyn kebab_core::LanguageModel, image_bytes: &[u8], - cfg: &kb_config::Config, -) -> anyhow::Result; + cfg: &kebab_config::Config, +) -> anyhow::Result; pub fn apply_caption( - llm: &dyn kb_core::LanguageModel, + llm: &dyn kebab_core::LanguageModel, image_bytes: &[u8], - block: &mut kb_core::ImageRefBlock, - cfg: &kb_config::Config, + block: &mut kebab_core::ImageRefBlock, + cfg: &kebab_config::Config, ) -> anyhow::Result<()>; ``` @@ -86,7 +86,7 @@ pub fn apply_caption( ## Storage / wire effects -- None directly. Caller persists via `kb-store-sqlite`. +- None directly. Caller persists via `kebab-store-sqlite`. ## Test plan @@ -99,12 +99,12 @@ pub fn apply_caption( | unit | downscale honors `max_pixels` (resulting bytes < some threshold) | fixture large image | | determinism | identical input + temperature=0 + seed=0 → identical caption (mock) | inline | -All tests under `cargo test -p kb-parse-image caption` with mock LM only. +All tests under `cargo test -p kebab-parse-image caption` with mock LM only. ## Definition of Done -- [ ] `cargo check -p kb-parse-image --features caption` passes -- [ ] `cargo test -p kb-parse-image caption` passes +- [ ] `cargo check -p kebab-parse-image --features caption` passes +- [ ] `cargo test -p kebab-parse-image caption` passes - [ ] No imports outside Allowed dependencies - [ ] Feature default OFF; only on when user opts in via config - [ ] PR links design §3.4 ImageRefBlock.caption, §9.1 diff --git a/tasks/p7/p7-1-pdf-text-extractor.md b/tasks/p7/p7-1-pdf-text-extractor.md index df6c13f..518c2d0 100644 --- a/tasks/p7/p7-1-pdf-text-extractor.md +++ b/tasks/p7/p7-1-pdf-text-extractor.md @@ -1,12 +1,12 @@ --- phase: P7 -component: kb-parse-pdf (text extractor) +component: kebab-parse-pdf (text extractor) task_id: p7-1 title: "Text PDF extractor → CanonicalDocument with page-level blocks" status: planned depends_on: [p0-1, p1-6] unblocks: [p7-2] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.4 SourceSpan::Page, §3.4 Block::Paragraph, §9.2 PDF text extraction, §9 versioning] --- @@ -22,8 +22,8 @@ Strict scope: page text + page numbers. Layout reconstruction (multi-column merg ## Allowed dependencies -- `kb-core` -- `kb-config` +- `kebab-core` +- `kebab-config` - `pdf-extract = "0.7"` (or current stable) - `lopdf = "0.32"` for page metadata (count, optional title from /Info) - `serde`, `serde_json` @@ -33,30 +33,30 @@ Strict scope: page text + page numbers. Layout reconstruction (multi-column merg ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`, OCR libraries (OCR fallback is a separate task, not this one) +- `kebab-source-fs`, `kebab-parse-md`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`, OCR libraries (OCR fallback is a separate task, not this one) ## Inputs | input | type | source | |-------|------|--------| -| `RawAsset` | `kb_core::RawAsset` | `kb-source-fs` | +| `RawAsset` | `kebab_core::RawAsset` | `kebab-source-fs` | | PDF bytes | `&[u8]` | filesystem | ## Outputs | output | type | downstream | |--------|------|------------| -| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (`pdf-page-v1` chunker in p7-2) | +| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (`pdf-page-v1` chunker in p7-2) | ## Public surface (signatures only — no new types) ```rust pub struct PdfTextExtractor; -impl kb_core::Extractor for PdfTextExtractor { - fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Pdf) } - fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("pdf-text-v1".into()) } - fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result; +impl kebab_core::Extractor for PdfTextExtractor { + fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Pdf) } + fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("pdf-text-v1".into()) } + fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result; } ``` @@ -100,12 +100,12 @@ impl kb_core::Extractor for PdfTextExtractor { | determinism | identical bytes → identical CanonicalDocument JSON across two runs | inline | | snapshot | CanonicalDocument JSON for fixture stable | `fixtures/pdf/three-page-en.pdf` | -All tests under `cargo test -p kb-parse-pdf`. +All tests under `cargo test -p kebab-parse-pdf`. ## Definition of Done -- [ ] `cargo check -p kb-parse-pdf` passes -- [ ] `cargo test -p kb-parse-pdf` passes +- [ ] `cargo check -p kebab-parse-pdf` passes +- [ ] `cargo test -p kebab-parse-pdf` passes - [ ] No OCR / LLM code present - [ ] No imports outside Allowed dependencies - [ ] PR links design §3.4 SourceSpan::Page, §9.2 diff --git a/tasks/p7/p7-2-pdf-page-chunker.md b/tasks/p7/p7-2-pdf-page-chunker.md index ff6c6e7..d15c9fc 100644 --- a/tasks/p7/p7-2-pdf-page-chunker.md +++ b/tasks/p7/p7-2-pdf-page-chunker.md @@ -1,12 +1,12 @@ --- phase: P7 -component: kb-chunk (pdf-page-v1) +component: kebab-chunk (pdf-page-v1) task_id: p7-2 title: "PDF page-aware chunker (pdf-page-v1)" status: planned depends_on: [p7-1] unblocks: [] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.5 Chunk, §4.2 chunk_id recipe, §0 Q3 citation, §9 versioning] --- @@ -22,8 +22,8 @@ Per-medium chunkers must stay tiny and obvious. Page-aware logic is small but it ## Allowed dependencies -- `kb-core` -- `kb-config` +- `kebab-core` +- `kebab-config` - `serde`, `serde_json` - `blake3` (policy_hash) - `serde-json-canonicalizer` @@ -31,30 +31,30 @@ Per-medium chunkers must stay tiny and obvious. Page-aware logic is small but it ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf` (consumes `CanonicalDocument` via `kb-core` only), `kb-normalize`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf` (consumes `CanonicalDocument` via `kebab-core` only), `kebab-normalize`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `CanonicalDocument` (produced by `pdf-text-v1`) | `kb_core::CanonicalDocument` | p7-1 | -| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` | +| `CanonicalDocument` (produced by `pdf-text-v1`) | `kebab_core::CanonicalDocument` | p7-1 | +| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` | ## Outputs | output | type | downstream | |--------|------|------------| -| `Vec` | `kb_core::Chunk` | `kb-store-sqlite`, `kb-embed*` | +| `Vec` | `kebab_core::Chunk` | `kebab-store-sqlite`, `kebab-embed*` | ## Public surface (signatures only — no new types) ```rust pub struct PdfPageV1Chunker; -impl kb_core::Chunker for PdfPageV1Chunker { - fn chunker_version(&self) -> kb_core::ChunkerVersion { kb_core::ChunkerVersion("pdf-page-v1".into()) } - fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String; - fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result>; +impl kebab_core::Chunker for PdfPageV1Chunker { + fn chunker_version(&self) -> kebab_core::ChunkerVersion { kebab_core::ChunkerVersion("pdf-page-v1".into()) } + fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String; + fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result>; } ``` @@ -62,7 +62,7 @@ impl kb_core::Chunker for PdfPageV1Chunker { ## Behavior contract -- Only operates on documents whose blocks all carry `SourceSpan::Page` (i.e., from `kb-parse-pdf`). Other documents → return `anyhow::Error("PdfPageV1Chunker only handles PDF docs")`. +- Only operates on documents whose blocks all carry `SourceSpan::Page` (i.e., from `kebab-parse-pdf`). Other documents → return `anyhow::Error("PdfPageV1Chunker only handles PDF docs")`. - For each page block (1 block per page after p7-1): - If `text.len()` (byte estimate) ≤ `policy.target_tokens * 4` (proxy for tokens) → emit one chunk for the entire page. - Else → split by paragraphs (split text on `\n\n` or sentence-ending punctuation followed by whitespace) and group adjacent paragraphs until the running byte total approaches `policy.target_tokens * 4`. Apply `policy.overlap_tokens * 4` bytes of trailing overlap into the next chunk's prefix. @@ -91,12 +91,12 @@ impl kb_core::Chunker for PdfPageV1Chunker { | determinism | same input → same chunk_ids twice | inline | | snapshot | `Vec` JSON for fixture stable | `fixtures/pdf/three-page-en.pdf` (chunked) | -All tests under `cargo test -p kb-chunk pdf`. +All tests under `cargo test -p kebab-chunk pdf`. ## Definition of Done -- [ ] `cargo check -p kb-chunk` passes (existing `md-heading-v1` continues to pass) -- [ ] `cargo test -p kb-chunk pdf` passes +- [ ] `cargo check -p kebab-chunk` passes (existing `md-heading-v1` continues to pass) +- [ ] `cargo test -p kebab-chunk pdf` passes - [ ] Snapshot stable across two runs - [ ] No imports outside Allowed dependencies - [ ] PR links design §3.5, §0 Q3, §9 diff --git a/tasks/p8/p8-1-whisper-adapter.md b/tasks/p8/p8-1-whisper-adapter.md index 286cd77..95c6841 100644 --- a/tasks/p8/p8-1-whisper-adapter.md +++ b/tasks/p8/p8-1-whisper-adapter.md @@ -1,12 +1,12 @@ --- phase: P8 -component: kb-parse-audio (whisper adapter) +component: kebab-parse-audio (whisper adapter) task_id: p8-1 title: "Audio Extractor + Transcriber trait + whisper.cpp adapter" status: planned depends_on: [p0-1, p1-6] unblocks: [p8-2] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.4 Block::AudioRef + AudioRefBlock, §3.7a Transcript + TranscriptSegment, §9.3 audio policy, §9 versioning] --- @@ -22,8 +22,8 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor ## Allowed dependencies -- `kb-core` -- `kb-config` +- `kebab-core` +- `kebab-config` - `whisper-rs = "0.13"` (or current stable) - `symphonia = { version = "0.5", features = ["all"] }` — decode `.m4a/.mp3/.wav/.flac/.ogg` to interleaved f32 PCM at the source's native sample rate / channel layout. Symphonia does NOT resample; that is rubato's job. - `rubato = "0.15"` — sample-rate conversion to 16 kHz mono f32 (the input shape whisper.cpp expects). Use `rubato::FftFixedIn::new(input_sample_rate, 16_000, frames_per_chunk, sub_chunks, 1 /* channels after downmix */)` for fixed-input streaming; pre-mix multi-channel to mono via simple averaging before the resampler. @@ -34,21 +34,21 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf`, `kb-parse-image`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf`, `kebab-parse-image`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `RawAsset` | `kb_core::RawAsset` | `kb-source-fs` | +| `RawAsset` | `kebab_core::RawAsset` | `kebab-source-fs` | | audio bytes | `&[u8]` | filesystem | -| `kb-config.audio` | `{ model_path, language, chunk_seconds, n_threads, gpu }` | runtime | +| `kebab-config.audio` | `{ model_path, language, chunk_seconds, n_threads, gpu }` | runtime | ## Outputs | output | type | downstream | |--------|------|------------| -| `CanonicalDocument` | `kb_core::CanonicalDocument` | `kb-chunk` (`audio-segment-v1` chunker in p8-2) | +| `CanonicalDocument` | `kebab_core::CanonicalDocument` | `kebab-chunk` (`audio-segment-v1` chunker in p8-2) | ## Public surface (signatures only — no new types) @@ -56,19 +56,19 @@ Audio stays a single, replaceable engine boundary (Transcriber trait). Extractor pub trait Transcriber: Send + Sync { fn engine(&self) -> &'static str; fn engine_version(&self) -> String; - fn transcribe(&self, pcm_f32_16khz: &[f32], language_hint: Option<&kb_core::Lang>) -> anyhow::Result; + fn transcribe(&self, pcm_f32_16khz: &[f32], language_hint: Option<&kebab_core::Lang>) -> anyhow::Result; } pub struct WhisperCppTranscriber { /* internal: whisper_rs::WhisperContext */ } -impl WhisperCppTranscriber { pub fn new(config: &kb_config::Config) -> anyhow::Result; } +impl WhisperCppTranscriber { pub fn new(config: &kebab_config::Config) -> anyhow::Result; } impl Transcriber for WhisperCppTranscriber { /* per trait */ } pub struct AudioExtractor { transcriber: std::sync::Arc } impl AudioExtractor { pub fn new(transcriber: std::sync::Arc) -> Self; } -impl kb_core::Extractor for AudioExtractor { - fn supports(&self, m: &kb_core::MediaType) -> bool { matches!(m, kb_core::MediaType::Audio(_)) } - fn parser_version(&self) -> kb_core::ParserVersion { kb_core::ParserVersion("audio-whisper-v1".into()) } - fn extract(&self, ctx: &kb_core::ExtractContext, bytes: &[u8]) -> anyhow::Result; +impl kebab_core::Extractor for AudioExtractor { + fn supports(&self, m: &kebab_core::MediaType) -> bool { matches!(m, kebab_core::MediaType::Audio(_)) } + fn parser_version(&self) -> kebab_core::ParserVersion { kebab_core::ParserVersion("audio-whisper-v1".into()) } + fn extract(&self, ctx: &kebab_core::ExtractContext, bytes: &[u8]) -> anyhow::Result; } ``` @@ -87,7 +87,7 @@ impl kb_core::Extractor for AudioExtractor { - `provenance` events: `Discovered`, `Parsed`, `Transcribed`. - `block_id` per design §4.2 with `block_kind = "audio_ref"`, `heading_path = []`, `ordinal = 0`, `source_span = SourceSpan::Time { start_ms: 0, end_ms: duration_ms }`. - `WhisperCppTranscriber`: - - Loads model from `config.audio.model_path` (e.g., `~/.local/share/kb/models/whisper/ggml-large-v3.bin`). + - Loads model from `config.audio.model_path` (e.g., `~/.local/share/kebab/models/whisper/ggml-large-v3.bin`). - Runs with `WhisperFullParams::new(SamplingStrategy::Greedy { best_of: 1 })` — deterministic. - Streams in chunks of `config.audio.chunk_seconds` (default 30) to bound memory; aggregates segments. - `Transcript.segments` populated with `start_ms`, `end_ms`, `text`, `confidence: Some(p)` from whisper's per-token probabilities (averaged), `speaker: None` (diarization is P+). @@ -115,12 +115,12 @@ impl kb_core::Extractor for AudioExtractor { | `#[ignore]` integration | 30-second Korean audio → segments_count > 1, language = "ko" | requires `large-v3` model | | snapshot | CanonicalDocument JSON stable for short fixture | `fixtures/audio/hello.wav` | -All tests under `cargo test -p kb-parse-audio`. Mark slow/large-model tests `#[ignore]`. +All tests under `cargo test -p kebab-parse-audio`. Mark slow/large-model tests `#[ignore]`. ## Definition of Done -- [ ] `cargo check -p kb-parse-audio` passes -- [ ] `cargo test -p kb-parse-audio` passes (excluding `#[ignore]`) +- [ ] `cargo check -p kebab-parse-audio` passes +- [ ] `cargo test -p kebab-parse-audio` passes (excluding `#[ignore]`) - [ ] No imports outside Allowed dependencies (resampler crate may be added — record in PR) - [ ] First-run model download path documented (NOT performed by code; user responsibility) - [ ] PR links design §3.4, §3.7a, §9.3 diff --git a/tasks/p8/p8-2-segment-chunker.md b/tasks/p8/p8-2-segment-chunker.md index 724788d..6ec5241 100644 --- a/tasks/p8/p8-2-segment-chunker.md +++ b/tasks/p8/p8-2-segment-chunker.md @@ -1,12 +1,12 @@ --- phase: P8 -component: kb-chunk (audio-segment-v1) +component: kebab-chunk (audio-segment-v1) task_id: p8-2 title: "Audio segment chunker (audio-segment-v1)" status: planned depends_on: [p8-1] unblocks: [] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§3.5 Chunk, §3.4 SourceSpan::Time, §4.2 chunk_id recipe, §0 Q3 citation, §9 versioning] --- @@ -22,8 +22,8 @@ Per-medium chunker. Tiny but versioned — `chunk_id` depends on `chunker_versio ## Allowed dependencies -- `kb-core` -- `kb-config` +- `kebab-core` +- `kebab-config` - `serde`, `serde_json` - `blake3` (policy_hash) - `serde-json-canonicalizer` @@ -31,30 +31,30 @@ Per-medium chunker. Tiny but versioned — `chunk_id` depends on `chunker_versio ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-md`, `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio` (consumes via `kb-core` only), `kb-normalize`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop` +- `kebab-source-fs`, `kebab-parse-md`, `kebab-parse-pdf`, `kebab-parse-image`, `kebab-parse-audio` (consumes via `kebab-core` only), `kebab-normalize`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `CanonicalDocument` containing one `AudioRefBlock` with `Transcript` | `kb_core::CanonicalDocument` | p8-1 | -| `ChunkPolicy` | `kb_core::ChunkPolicy` | `kb-app` | +| `CanonicalDocument` containing one `AudioRefBlock` with `Transcript` | `kebab_core::CanonicalDocument` | p8-1 | +| `ChunkPolicy` | `kebab_core::ChunkPolicy` | `kebab-app` | ## Outputs | output | type | downstream | |--------|------|------------| -| `Vec` | `kb_core::Chunk` | `kb-store-sqlite`, embedders | +| `Vec` | `kebab_core::Chunk` | `kebab-store-sqlite`, embedders | ## Public surface (signatures only — no new types) ```rust pub struct AudioSegmentV1Chunker; -impl kb_core::Chunker for AudioSegmentV1Chunker { - fn chunker_version(&self) -> kb_core::ChunkerVersion { kb_core::ChunkerVersion("audio-segment-v1".into()) } - fn policy_hash(&self, policy: &kb_core::ChunkPolicy) -> String; - fn chunk(&self, doc: &kb_core::CanonicalDocument, policy: &kb_core::ChunkPolicy) -> anyhow::Result>; +impl kebab_core::Chunker for AudioSegmentV1Chunker { + fn chunker_version(&self) -> kebab_core::ChunkerVersion { kebab_core::ChunkerVersion("audio-segment-v1".into()) } + fn policy_hash(&self, policy: &kebab_core::ChunkPolicy) -> String; + fn chunk(&self, doc: &kebab_core::CanonicalDocument, policy: &kebab_core::ChunkPolicy) -> anyhow::Result>; } ``` @@ -92,12 +92,12 @@ impl kb_core::Chunker for AudioSegmentV1Chunker { | determinism | same input → same chunk_ids twice | inline | | snapshot | `Vec` JSON for fixture transcript stable | `fixtures/audio/transcript-1.json` (constructed) | -All tests under `cargo test -p kb-chunk audio`. +All tests under `cargo test -p kebab-chunk audio`. ## Definition of Done -- [ ] `cargo check -p kb-chunk` passes (md-heading-v1 + pdf-page-v1 + audio-segment-v1 all coexist) -- [ ] `cargo test -p kb-chunk audio` passes +- [ ] `cargo check -p kebab-chunk` passes (md-heading-v1 + pdf-page-v1 + audio-segment-v1 all coexist) +- [ ] `cargo test -p kebab-chunk audio` passes - [ ] Snapshot stable across two runs - [ ] No imports outside Allowed dependencies - [ ] PR links design §3.5, §3.4 SourceSpan::Time, §4.2 diff --git a/tasks/p9/p9-1-tui-library.md b/tasks/p9/p9-1-tui-library.md index 610a9d8..4ff3ade 100644 --- a/tasks/p9/p9-1-tui-library.md +++ b/tasks/p9/p9-1-tui-library.md @@ -1,12 +1,12 @@ --- phase: P9 -component: kb-tui (library view) +component: kebab-tui (library view) task_id: p9-1 title: "Ratatui library list view + tag filter" status: planned depends_on: [p1-6] unblocks: [p9-2, p9-3, p9-4] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [report §16.2 TUI (also tasks/phase-9-ui.md epic), design §3.7 SearchHit, design §1 UX scenes for shared key bindings] --- @@ -14,7 +14,7 @@ contract_sections: [report §16.2 TUI (also tasks/phase-9-ui.md epic), design § ## Goal -Stand up a Ratatui app skeleton with a "Library" pane: list documents, filter by tag/lang, navigate. Establishes the global app loop, key dispatch, and `kb-app` integration point that the search/ask/inspect panes (p9-2..p9-4) extend. +Stand up a Ratatui app skeleton with a "Library" pane: list documents, filter by tag/lang, navigate. Establishes the global app loop, key dispatch, and `kebab-app` integration point that the search/ask/inspect panes (p9-2..p9-4) extend. ## Why now / why this size @@ -22,9 +22,9 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ ## Allowed dependencies -- `kb-core` -- `kb-config` -- `kb-app` (facade — the only crate this binary touches besides `kb-core`/`kb-config`) +- `kebab-core` +- `kebab-config` +- `kebab-app` (facade — the only crate this binary touches besides `kebab-core`/`kebab-config`) - `ratatui = "0.28"` - `crossterm` - `tracing` @@ -32,15 +32,15 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (UI must go through `kb-app` only — this is the design §8 boundary) +- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (UI must go through `kebab-app` only — this is the design §8 boundary) ## Inputs | input | type | source | |-------|------|--------| -| `kb-app::list_docs(filter)` | facade call | runtime | +| `kebab-app::list_docs(filter)` | facade call | runtime | | keyboard events | `crossterm` | terminal | -| `kb-config::Config` | runtime | env / file | +| `kebab-config::Config` | runtime | env / file | ## Outputs @@ -57,7 +57,7 @@ Library is the cheapest screen and the natural anchor for the TUI shell. Subsequ // state in WITHOUT modifying the App struct definition. This avoids merge // conflicts when p9-2/3/4 land in parallel; only p9-1 ever changes `App`. pub struct App { - pub config: kb_config::Config, + pub config: kebab_config::Config, pub focus: Pane, pub library: LibraryState, // owned by p9-1 pub search: Option, // populated by p9-2 (None until that crate links in) @@ -73,7 +73,7 @@ pub struct AskState; // body filled by p9-3 pub struct InspectState; // body filled by p9-4 impl App { - pub fn new(config: kb_config::Config) -> anyhow::Result; + pub fn new(config: kebab_config::Config) -> anyhow::Result; pub fn run(&mut self) -> anyhow::Result<()>; // blocking loop until quit } @@ -102,12 +102,12 @@ pub enum KeyOutcome { Continue, Quit, SwitchPane(Pane), Refresh } - `Enter` → switch to Inspect pane (p9-4) on selected doc - `q` or `Esc` → quit - All facade calls run on the main thread (no async). For long calls, render a "loading…" state and call from a worker thread; bridge via `mpsc::channel` (this task may keep things synchronous and accept brief UI hangs for v1). -- Logging: `tracing` initialized to a file under `~/.local/state/kb/logs/`; never to stdout/stderr (so the TUI is not corrupted). +- Logging: `tracing` initialized to a file under `~/.local/state/kebab/logs/`; never to stdout/stderr (so the TUI is not corrupted). - Error rendering: a popup overlay shows `error: {msg}\nhint: {hint}` from `anyhow::Error` chain; press any key to dismiss. ## Storage / wire effects -- Reads: `kb-app::list_docs` only. +- Reads: `kebab-app::list_docs` only. - Writes: none. ## Test plan @@ -118,16 +118,16 @@ pub enum KeyOutcome { Continue, Quit, SwitchPane(Pane), Refresh } | unit | filter `f` opens edit overlay; `Enter` triggers refresh | inline | | snapshot | rendered library with 3 docs + filter open produces stable frame buffer (use `ratatui::backend::TestBackend`) | inline | | unit | error popup renders without panic on injected `anyhow::Error` | inline | -| integration | mocked `kb-app::list_docs` returning N docs renders all rows | inline | +| integration | mocked `kebab-app::list_docs` returning N docs renders all rows | inline | -All tests under `cargo test -p kb-tui library`. +All tests under `cargo test -p kebab-tui library`. ## Definition of Done -- [ ] `cargo check -p kb-tui` passes -- [ ] `cargo test -p kb-tui library` passes -- [ ] No imports outside `kb-core`, `kb-config`, `kb-app` -- [ ] `kb tui` (or `kb` if TUI is the default) launches and shows Library on a real terminal (manual smoke) +- [ ] `cargo check -p kebab-tui` passes +- [ ] `cargo test -p kebab-tui library` passes +- [ ] No imports outside `kebab-core`, `kebab-config`, `kebab-app` +- [ ] `kebab tui` (or `kebab` if TUI is the default) launches and shows Library on a real terminal (manual smoke) - [ ] PR links design §8 module boundary, report §16.2 (TUI epic) ## Out of scope diff --git a/tasks/p9/p9-2-tui-search.md b/tasks/p9/p9-2-tui-search.md index 454f0c8..dfb26c6 100644 --- a/tasks/p9/p9-2-tui-search.md +++ b/tasks/p9/p9-2-tui-search.md @@ -1,12 +1,12 @@ --- phase: P9 -component: kb-tui (search pane) +component: kebab-tui (search pane) task_id: p9-2 title: "TUI Search pane: input + result list + preview + editor jump" status: planned depends_on: [p2-2, p3-4, p9-1] unblocks: [] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§1.5/1.6 search output, §3.7 SearchHit, §0 Q3 citation] --- @@ -14,7 +14,7 @@ contract_sections: [§1.5/1.6 search output, §3.7 SearchHit, §0 Q3 citation] ## Goal -Add a Search pane to the TUI that drives `kb-app::search`, renders dense results (rank+score / path#frag / heading / snippet), and supports `g` (editor jump to citation) for the selected hit. +Add a Search pane to the TUI that drives `kebab-app::search`, renders dense results (rank+score / path#frag / heading / snippet), and supports `g` (editor jump to citation) for the selected hit. ## Why now / why this size @@ -22,25 +22,25 @@ Search is the most-used surface. Confining it to one pane leverages the App skel ## Allowed dependencies -- `kb-core` -- `kb-config` -- `kb-app` -- `kb-tui` (extends p9-1) +- `kebab-core` +- `kebab-config` +- `kebab-app` +- `kebab-tui` (extends p9-1) - `ratatui`, `crossterm` - `tracing` - `thiserror` ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-desktop` +- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `kb-app::search(query)` | facade | runtime | +| `kebab-app::search(query)` | facade | runtime | | keyboard events | `crossterm` | terminal | -| selected hit's citation | `kb_core::Citation` | App state | +| selected hit's citation | `kebab_core::Citation` | App state | ## Outputs @@ -54,16 +54,16 @@ Search is the most-used surface. Confining it to one pane leverages the App skel ```rust pub fn render_search(f: &mut ratatui::Frame, area: ratatui::layout::Rect, state: &App); pub fn handle_key_search(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome; -pub fn jump_to_citation(citation: &kb_core::Citation, editor_env: &str /* $EDITOR */) -> anyhow::Result<()>; +pub fn jump_to_citation(citation: &kebab_core::Citation, editor_env: &str /* $EDITOR */) -> anyhow::Result<()>; ``` -This task fills the body of `kb_tui::SearchState` (forward-declared in p9-1). The `App` struct itself is NOT edited — only `SearchState` gets fields: +This task fills the body of `kebab_tui::SearchState` (forward-declared in p9-1). The `App` struct itself is NOT edited — only `SearchState` gets fields: ```rust pub struct SearchState { pub input: String, - pub mode: kb_core::SearchMode, - pub hits: Vec, + pub mode: kebab_core::SearchMode, + pub hits: Vec, pub selected_hit: usize, pub last_query_at: Option, // debounce timer } @@ -73,7 +73,7 @@ The Library pane's keypress handler (in p9-1) sets `app.search = Some(SearchStat ## Behavior contract -- Layout: top input bar (search query + mode badge `[hybrid|lexical|vector]`), middle result list (one hit per 4 lines per design §1.5 dense format), bottom preview pane (full chunk text fetched lazily via `kb-app::inspect_chunk`). +- Layout: top input bar (search query + mode badge `[hybrid|lexical|vector]`), middle result list (one hit per 4 lines per design §1.5 dense format), bottom preview pane (full chunk text fetched lazily via `kebab-app::inspect_chunk`). - Key bindings (Search pane): - typing → updates `search_input`; debounced (200 ms) re-search - `Tab` → cycles `search_mode` Lexical → Vector → Hybrid → Lexical @@ -104,14 +104,14 @@ The Library pane's keypress handler (in p9-1) sets `app.search = Some(SearchStat | unit | `j` / `k` move selection within bounds | inline | | unit | `jump_to_citation` for `Line` builds `+ ` command (assert via mocked Command runner) | inline | | snapshot | rendered Search pane with 3 hits + preview stable | TestBackend | -| integration | mocked `kb-app::search` returning fixture hits drives render | inline | +| integration | mocked `kebab-app::search` returning fixture hits drives render | inline | -All tests under `cargo test -p kb-tui search`. +All tests under `cargo test -p kebab-tui search`. ## Definition of Done -- [ ] `cargo check -p kb-tui` passes -- [ ] `cargo test -p kb-tui search` passes +- [ ] `cargo check -p kebab-tui` passes +- [ ] `cargo test -p kebab-tui search` passes - [ ] `g` keybinding launches `$EDITOR` with correct `+` argument (manual smoke against vim) - [ ] No imports outside Allowed dependencies - [ ] PR links design §1.5/1.6, §3.7 @@ -125,5 +125,5 @@ All tests under `cargo test -p kb-tui search`. ## Risks / notes - Suspending and restoring crossterm raw mode around the editor spawn is finicky; code defensively (RAII guard). -- Different editors take different jump syntaxes. Provide an env override `KB_EDITOR_JUMP_FORMAT="vim"` for users on exotic editors. +- Different editors take different jump syntaxes. Provide an env override `KEBAB_EDITOR_JUMP_FORMAT="vim"` for users on exotic editors. - Long snippet text wrap: clamp to viewport width and ellipsize per design §1.5 (`…` already in dense template). diff --git a/tasks/p9/p9-3-tui-ask.md b/tasks/p9/p9-3-tui-ask.md index 1c10b8e..bea3e29 100644 --- a/tasks/p9/p9-3-tui-ask.md +++ b/tasks/p9/p9-3-tui-ask.md @@ -1,12 +1,12 @@ --- phase: P9 -component: kb-tui (ask pane) +component: kebab-tui (ask pane) task_id: p9-3 title: "TUI Ask pane: streaming answer + citation links + --explain toggle" status: planned depends_on: [p4-3, p9-1] unblocks: [] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§1.1–1.4 ask scenes, §2.3 Answer wire, §3.8 Answer] --- @@ -14,7 +14,7 @@ contract_sections: [§1.1–1.4 ask scenes, §2.3 Answer wire, §3.8 Answer] ## Goal -Add an Ask pane that calls `kb-app::ask`, streams tokens into the answer area in real time, renders citation footnotes (default mode A), and toggles to `--explain` (mode B + retrieval trace) with a key. +Add an Ask pane that calls `kebab-app::ask`, streams tokens into the answer area in real time, renders citation footnotes (default mode A), and toggles to `--explain` (mode B + retrieval trace) with a key. ## Why now / why this size @@ -22,23 +22,23 @@ Streaming UI is the only TUI piece that meaningfully differs from search/inspect ## Allowed dependencies -- `kb-core` -- `kb-config` -- `kb-app` -- `kb-tui` (extends p9-1) +- `kebab-core` +- `kebab-config` +- `kebab-app` +- `kebab-tui` (extends p9-1) - `ratatui`, `crossterm` - `tracing` - `thiserror` ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (only via `kb-app`), `kb-desktop` +- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (only via `kebab-app`), `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `kb-app::ask(query, AskOpts)` | facade | runtime | +| `kebab-app::ask(query, AskOpts)` | facade | runtime | | keyboard events | `crossterm` | terminal | ## Outputs @@ -46,7 +46,7 @@ Streaming UI is the only TUI piece that meaningfully differs from search/inspect | output | type | downstream | |--------|------|------------| | Ratatui Ask pane render | terminal | user | -| `kb-app::ask` invocation with streaming closure | facade | RAG pipeline | +| `kebab-app::ask` invocation with streaming closure | facade | RAG pipeline | ## Public surface (signatures only — no new types) @@ -55,7 +55,7 @@ pub fn render_ask(f: &mut ratatui::Frame, area: ra pub fn handle_key_ask(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome; ``` -This task fills the body of `kb_tui::AskState` (forward-declared in p9-1). `App` is NOT edited — only `AskState` gets fields: +This task fills the body of `kebab_tui::AskState` (forward-declared in p9-1). `App` is NOT edited — only `AskState` gets fields: ```rust pub struct AskState { @@ -63,8 +63,8 @@ pub struct AskState { pub explain: bool, pub streaming: bool, pub partial: String, - pub answer: Option, - pub thread: Option>>, + pub answer: Option, + pub thread: Option>>, pub rx: Option>, } ``` @@ -74,7 +74,7 @@ pub struct AskState { ## Behavior contract - Layout: top input bar (`?` prompt, query text), middle answer area (rendered Markdown-light: paragraphs + inline `[N]` markers), bottom-right citations panel (numbered list of citations with `path#fragment` and section label), bottom-left status (`grounded ✓/✗ model prompt_v k chunks`). -- Submission: `Enter` triggers a worker thread that calls `kb-app::ask` with `AskOpts.stream_sink: Some(tx)` (`tx: mpsc::Sender`). The thread holds the `tx`, the TUI holds the matching `rx` (set on `AskState.rx`). On each render frame the TUI drains `rx.try_iter()` into `state.partial`, no blocking. +- Submission: `Enter` triggers a worker thread that calls `kebab-app::ask` with `AskOpts.stream_sink: Some(tx)` (`tx: mpsc::Sender`). The thread holds the `tx`, the TUI holds the matching `rx` (set on `AskState.rx`). On each render frame the TUI drains `rx.try_iter()` into `state.partial`, no blocking. - Streaming: while `ask_streaming = true`, the Answer area shows `ask_partial` and a small "▍" cursor. When the worker finishes, `ask_answer` is populated and the citations panel switches to the final list. - Refusal rendering: - `grounded = false` and `refusal_reason = ScoreGate` → render the answer (which is the human-friendly "근거 부족…" message), citations show "가까운 후보". @@ -85,12 +85,12 @@ pub struct AskState { - `e` → toggle `ask_explain`; resubmit on next `Enter`. While explain ON, citations panel is replaced by the per-claim breakdown (mode B in design §1.2) and a footer shows the retrieval trace summary. - `Esc` → switch back to Library pane (cancellation of an in-flight ask is best-effort: the worker thread continues but its final answer is dropped). - `j` / `k` → scroll the answer area when oversized. -- All facade calls stay within `kb-app::ask` — never reach into `kb-rag` directly. +- All facade calls stay within `kebab-app::ask` — never reach into `kebab-rag` directly. - Errors render as a popup overlay; do not crash the pane. ## Storage / wire effects -- Reads/writes via `kb-app::ask` which itself writes the `answers` row in `kb.sqlite`. The pane has no direct DB access. +- Reads/writes via `kebab-app::ask` which itself writes the `answers` row in `kebab.sqlite`. The pane has no direct DB access. ## Test plan @@ -102,14 +102,14 @@ pub struct AskState { | unit | refusal answer renders without citations panel index errors | inline | | snapshot | rendered Ask pane mid-stream is stable | TestBackend | | snapshot | rendered Ask pane after finished grounded answer is stable | TestBackend | -| integration | mocked `kb-app::ask` returning a canned `Answer` populates final state correctly | inline | +| integration | mocked `kebab-app::ask` returning a canned `Answer` populates final state correctly | inline | -All tests under `cargo test -p kb-tui ask`. +All tests under `cargo test -p kebab-tui ask`. ## Definition of Done -- [ ] `cargo check -p kb-tui` passes -- [ ] `cargo test -p kb-tui ask` passes +- [ ] `cargo check -p kebab-tui` passes +- [ ] `cargo test -p kebab-tui ask` passes - [ ] No imports outside Allowed dependencies - [ ] Manual smoke: stream tokens visible character-by-character against a real Ollama (or `MockLanguageModel`) - [ ] PR links design §1.1–1.4, §2.3 diff --git a/tasks/p9/p9-4-tui-inspect.md b/tasks/p9/p9-4-tui-inspect.md index 5289cc6..e92dad8 100644 --- a/tasks/p9/p9-4-tui-inspect.md +++ b/tasks/p9/p9-4-tui-inspect.md @@ -1,12 +1,12 @@ --- phase: P9 -component: kb-tui (inspect pane) +component: kebab-tui (inspect pane) task_id: p9-4 title: "TUI Inspect pane: document & chunk detail render" status: planned depends_on: [p1-6, p9-1] unblocks: [] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [§1 inspect output, §3.5 Chunk, §2.5 DocSummary, §2.6 ChunkInspection] --- @@ -22,24 +22,24 @@ Inspect is read-only and has no external interactions; smallest possible pane. U ## Allowed dependencies -- `kb-core` -- `kb-config` -- `kb-app` -- `kb-tui` (extends p9-1) +- `kebab-core` +- `kebab-config` +- `kebab-app` +- `kebab-tui` (extends p9-1) - `ratatui`, `crossterm` - `tracing` - `thiserror` ## Forbidden dependencies -- `kb-source-fs`, `kb-parse-*`, `kb-normalize`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag` (only via `kb-app`), `kb-desktop` +- `kebab-source-fs`, `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag` (only via `kebab-app`), `kebab-desktop` ## Inputs | input | type | source | |-------|------|--------| -| `kb-app::inspect_doc(id)` | facade | runtime | -| `kb-app::inspect_chunk(id)` | facade | runtime | +| `kebab-app::inspect_doc(id)` | facade | runtime | +| `kebab-app::inspect_chunk(id)` | facade | runtime | | keyboard events | `crossterm` | terminal | ## Outputs @@ -51,19 +51,19 @@ Inspect is read-only and has no external interactions; smallest possible pane. U ## Public surface (signatures only — no new types) ```rust -pub enum InspectTarget { Doc(kb_core::DocumentId), Chunk(kb_core::ChunkId) } +pub enum InspectTarget { Doc(kebab_core::DocumentId), Chunk(kebab_core::ChunkId) } pub fn render_inspect(f: &mut ratatui::Frame, area: ratatui::layout::Rect, state: &App); pub fn handle_key_inspect(state: &mut App, key: crossterm::event::KeyEvent) -> KeyOutcome; ``` -This task fills the body of `kb_tui::InspectState` (forward-declared in p9-1). `App` is NOT edited. +This task fills the body of `kebab_tui::InspectState` (forward-declared in p9-1). `App` is NOT edited. ```rust pub struct InspectState { pub target: Option, - pub doc: Option, - pub chunk: Option, + pub doc: Option, + pub chunk: Option, pub collapsed: std::collections::HashSet<&'static str>, pub scroll: u16, } @@ -89,7 +89,7 @@ pub struct InspectState { - `c` → collapse / expand currently focused section (focus is implicit by current scroll position; v1 may simplify by toggling all sections) - `Esc` → return to previous pane (Library or Search) - `Enter` → no-op (Inspect is terminal — no editor jump here; users use Search pane for jump) -- Loading: while `kb-app::inspect_doc` or `inspect_chunk` runs, show "loading…". On error, popup with hint. +- Loading: while `kebab-app::inspect_doc` or `inspect_chunk` runs, show "loading…". On error, popup with hint. - Renders must conform to wire schemas `doc_summary.v1` (subset for header) and `chunk_inspection.v1`. ## Storage / wire effects @@ -100,18 +100,18 @@ pub struct InspectState { | kind | description | fixture / data | |------|-------------|----------------| -| unit | switching to InspectTarget::Doc triggers `kb-app::inspect_doc` once | inline mock | +| unit | switching to InspectTarget::Doc triggers `kebab-app::inspect_doc` once | inline mock | | unit | scroll bounded by content height | inline | | unit | collapse toggle via `c` flips state | inline | | snapshot | doc-view rendered for fixture stable | TestBackend + fixture | | snapshot | chunk-view rendered for fixture stable | TestBackend + fixture | -All tests under `cargo test -p kb-tui inspect`. +All tests under `cargo test -p kebab-tui inspect`. ## Definition of Done -- [ ] `cargo check -p kb-tui` passes -- [ ] `cargo test -p kb-tui inspect` passes +- [ ] `cargo check -p kebab-tui` passes +- [ ] `cargo test -p kebab-tui inspect` passes - [ ] No imports outside Allowed dependencies - [ ] Manual smoke: inspect a doc with multiple chunks, scroll, return to library - [ ] PR links design §3.5, §2.5, §2.6 diff --git a/tasks/p9/p9-5-desktop-tauri.md b/tasks/p9/p9-5-desktop-tauri.md index f1d2ec4..90f0f0a 100644 --- a/tasks/p9/p9-5-desktop-tauri.md +++ b/tasks/p9/p9-5-desktop-tauri.md @@ -1,12 +1,12 @@ --- phase: P9 -component: kb-desktop (Tauri) +component: kebab-desktop (Tauri) task_id: p9-5 -title: "Tauri desktop app: backend commands wrapping kb-app + multimodal source viewer" +title: "Tauri desktop app: backend commands wrapping kebab-app + multimodal source viewer" status: planned depends_on: [p9-1, p9-2, p9-3, p9-4] unblocks: [] -contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md +contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md contract_sections: [report §16.3 desktop (also tasks/phase-9-ui.md epic), design §1 ask/search scenes, design §2 wire schemas v1, design §8 module boundaries] --- @@ -14,23 +14,23 @@ contract_sections: [report §16.3 desktop (also tasks/phase-9-ui.md epic), desig ## Goal -Stand up a Tauri 2.x app (`kb-desktop` crate as backend, `kb-desktop-frontend/` as web assets) whose Tauri commands wrap `kb-app` 1:1. The frontend renders multimodal source viewers (Markdown render, PDF page viewer, image viewer with region overlay, audio player with seek). Citation clicks route to the appropriate viewer. +Stand up a Tauri 2.x app (`kebab-desktop` crate as backend, `kebab-desktop-frontend/` as web assets) whose Tauri commands wrap `kebab-app` 1:1. The frontend renders multimodal source viewers (Markdown render, PDF page viewer, image viewer with region overlay, audio player with seek). Citation clicks route to the appropriate viewer. ## Why now / why this size -Last task. Combines all backend phases into a single user-facing surface. Strict policy: backend commands are thin wrappers over `kb-app`; no new business logic. +Last task. Combines all backend phases into a single user-facing surface. Strict policy: backend commands are thin wrappers over `kebab-app`; no new business logic. ## Allowed dependencies -- backend (`kb-desktop`): - - `kb-core` - - `kb-config` - - `kb-app` +- backend (`kebab-desktop`): + - `kebab-core` + - `kebab-config` + - `kebab-app` - `tauri = "2"` + `tauri-build` - `serde`, `serde_json` - `tracing` - `thiserror` -- frontend (`kb-desktop-frontend/`): vanilla TypeScript + Vite (default; user may swap to Svelte/Solid in a follow-up). +- frontend (`kebab-desktop-frontend/`): vanilla TypeScript + Vite (default; user may swap to Svelte/Solid in a follow-up). - PDF rendering: `pdfjs-dist` - Markdown rendering: `marked` + `dompurify` - Audio: HTML `