From d1b99b29946ce0a884160818632739bf20150639 Mon Sep 17 00:00:00 2001 From: altair823 Date: Fri, 1 May 2026 16:32:28 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20mark=20P0=E2=80=93P4=20done,=20add=20SM?= =?UTF-8?q?OKE=20recipe,=20refresh=20README?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit State drift after P0–P4 completion + 3 post-merge hotfixes (PR #20 --config across subcommands, PR #24 --config in kb ask, PR #25 RRF fusion_score normalization). README still framed the project as "spec frozen, code 0 lines"; phase docs and task specs all carried status: planned. Sweep: - README.md: top banner now "P0–P4 done (17/31 tasks) + 3 hotfixes applied"; command table marks each subcommand's owning phase and current status (kb ask = ✅ via P4-3, kb eval = ⏳ P5); phase roadmap table grew a Status column (P0–P4 completed, P5 next, P6–P9 pending); component count bumped 30 → 31 to reflect P3-5 (app-wiring, post-spec); core decisions table notes the RRF [0,1] normalization invariant; build+실행 section drops the "P0 미시작" caveat; new pointers to HOTFIXES.md and SMOKE.md. - docs/SMOKE.md (new): ~80-line recipe for running the full pipeline against an isolated /tmp/kb-smoke/ workspace via --config, without polluting ~/.config/kb/ or ~/.local/share/kb/. Covers fixture seeding, sample config.toml with the post-merge defaults, doctor → ingest → list → search × 3 modes → inspect → ask sequence, verification checklist, and known-behaviour notes (fastembed model download, RAG response time, --config hard-fail on missing path). - tasks/phase-{0..4}-*.md: status frontmatter flipped planned → completed. - tasks/p0/, tasks/p1/, tasks/p2/, tasks/p3/, tasks/p4/: same status flip across all 17 component task specs (1+6+2+5+3). P5–P9 stay planned. cargo test --workspace: 319 passed; clippy clean (no source changes in this commit, just docs + frontmatter). Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 118 ++++++++++--------- docs/SMOKE.md | 158 ++++++++++++++++++++++++++ tasks/p0/p0-1-skeleton.md | 2 +- tasks/p1/p1-1-source-fs.md | 2 +- tasks/p1/p1-2-parse-md-frontmatter.md | 2 +- tasks/p1/p1-3-parse-md-blocks.md | 2 +- tasks/p1/p1-4-normalize.md | 2 +- tasks/p1/p1-5-chunk.md | 2 +- tasks/p1/p1-6-store-sqlite.md | 2 +- tasks/p2/p2-1-fts-schema.md | 2 +- tasks/p2/p2-2-lexical-retriever.md | 2 +- tasks/p3/p3-1-embedder-trait.md | 2 +- tasks/p3/p3-2-fastembed-adapter.md | 2 +- tasks/p3/p3-3-lancedb-store.md | 2 +- tasks/p3/p3-4-hybrid-fusion.md | 2 +- tasks/p3/p3-5-app-wiring.md | 2 +- tasks/p4/p4-1-llm-trait.md | 2 +- tasks/p4/p4-2-ollama-adapter.md | 2 +- tasks/p4/p4-3-rag-pipeline.md | 2 +- tasks/phase-0-skeleton.md | 2 +- tasks/phase-1-markdown-ingestion.md | 2 +- tasks/phase-2-lexical-search.md | 2 +- tasks/phase-3-vector-hybrid.md | 2 +- tasks/phase-4-local-llm-rag.md | 2 +- 24 files changed, 245 insertions(+), 75 deletions(-) create mode 100644 docs/SMOKE.md diff --git a/README.md b/README.md index 16c0ef3..c75c85d 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # kb — Local-first Knowledge Base -> **상태:** spec 동결 단계. 코드 0줄. 30 component task spec 모두 작성/리뷰/머지 완료. 다음 단계 = P0 구현. +> **상태:** P0–P4 구현 완료 (31 component task 중 17 완료) + 3건 post-merge hotfix 적용. `kb index` / `kb search --mode {lexical,vector,hybrid}` / `kb ask` 모두 실 동작. 다음 단계 = P5 (eval suite). 자세한 진행 상황은 [tasks/INDEX.md](tasks/INDEX.md), 머지 후 발견된 버그와 fix는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md). `kb` 는 개인용 로컬 knowledge base + RAG 도구다. Markdown / PDF / 이미지 / 음성을 한 곳에 색인하고, 의미 검색 + citation 포함 LLM 답변을 단일 binary 로 제공한다. 모든 추론은 로컬 (Ollama / fastembed / whisper.cpp) 에서 돌아간다. @@ -10,17 +10,18 @@ ## 무엇인가 -| 명령 | 동작 | -|------|------| -| `kb init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 | -| `kb ingest ` | Markdown/PDF/이미지/음성 색인 (idempotent) | -| `kb search ""` | hybrid (lexical + vector) top-k 검색 — citation 포함 | -| `kb ask ""` | RAG 답변 + 근거 인용. 근거 부족 시 거절 | -| `kb inspect doc/chunk ` | 디버그용 raw record 보기 | -| `kb doctor` | 설정/모델/DB 헬스 체크 | -| `kb eval run / compare` | golden query 회귀 측정 (chunker/모델 교체 평가) | +| 명령 | 동작 | 상태 | +|------|------|------| +| `kb init` | XDG 경로에 데이터 디렉토리 + config.toml 생성 | ✅ P0 | +| `kb ingest []` | Markdown 색인 (idempotent). PDF/이미지/음성은 P6+. | ✅ P3-5 | +| `kb search --mode {lexical,vector,hybrid} ""` | 검색 — citation 포함, hybrid는 RRF fusion | ✅ P3-5 | +| `kb list docs` | 색인된 문서 목록 | ✅ P3-5 | +| `kb inspect doc ` / `kb inspect chunk ` | raw record 보기 | ✅ P3-5 | +| `kb ask ""` | RAG 답변 + 근거 인용. 근거 부족 시 거절. Ollama 필요. | ✅ P4-3 | +| `kb doctor` | 설정/모델/DB 헬스 체크 | ✅ P0 | +| `kb eval run / compare` | golden query 회귀 측정 | ⏳ P5 | -기계 친화 모드: 모든 명령에 `--json` 플래그. 출력은 frozen wire schema v1 (`schema_version` 필드 항상 포함). +기계 친화 모드: 모든 명령에 `--json` 플래그. 출력은 frozen wire schema v1 (`schema_version` 필드 항상 포함, 예: `ingest_report.v1`, `search_hit.v1`, `answer.v1`, `doctor.v1`). --- @@ -35,13 +36,14 @@ | vector | LanceDB (embedded, model 별 분리 table) | | Markdown parser | `pulldown-cmark` | | embedding | `fastembed-rs` (`multilingual-e5-small`, 384d) | -| LLM | Ollama HTTP (default `qwen2.5:14b-instruct`) | -| 음성 ASR | `whisper.cpp` (via `whisper-rs`) | -| OCR | Tesseract (default) + macOS Apple Vision sidecar (feature gate) | -| TUI | Ratatui + crossterm | -| Desktop | Tauri 2 + `pdfjs-dist` (native PDF render backend 금지) | +| LLM | Ollama HTTP (default `qwen2.5:7b-instruct` ─ 사용자 환경에 맞춰 `gemma4:26b` 등으로 교체 가능) | +| 음성 ASR | `whisper.cpp` (via `whisper-rs`) — P8 | +| OCR | Tesseract (default) + macOS Apple Vision sidecar (feature gate) — P6 | +| TUI | Ratatui + crossterm — P9 | +| Desktop | Tauri 2 + `pdfjs-dist` (native PDF render backend 금지) — P9 | | citation 형식 | URI fragment (`path#L12-L34`, W3C Media Fragments) | | ID 생성 | `blake3(canonical_json(tuple))[..32]` hex | +| RRF fusion_score | `[0, 1]` 정규화 — `2 / (k_rrf + 1)` 로 나눠 mode 간 비교 가능 (post-merge hotfix) | | layout | XDG (`~/.local/share/kb/`, `~/.config/kb/`, …) | 전체는 [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](docs/superpowers/specs/2026-04-27-kb-final-form-design.md) 참조. @@ -61,37 +63,37 @@ kb-cli, kb-tui, kb-desktop ├─> kb-chunk ├─> kb-store-sqlite ├─> kb-store-vector - ├─> kb-embed-local + ├─> kb-embed-local (kb-embed trait crate) ├─> kb-search - ├─> kb-llm-local + ├─> kb-llm-local (kb-llm trait crate) ├─> kb-rag ├─> kb-eval └─> kb-config └─> kb-core (모두 의존) ``` -UI → store/llm/parse 직접 의존 금지. 모든 user-facing 진입은 `kb-app` facade 만 통한다 (design §8). +UI → store/llm/parse 직접 의존 금지. 모든 user-facing 진입은 `kb-app` facade 만 통한다 (design §8). `kb-cli` 가 `--config ` flag 를 honor 하려면 `kb_app::*_with_config(cfg, …)` companion 을 통해 Config 을 명시적으로 thread 하는 패턴 — 자세한 이유는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md) 의 `--config` 항목. --- ## Phase 로드맵 -| Phase | 내용 | 핵심 산출 crate | 선행 | -|-------|------|----------------|------| -| **P0** | Workspace 뼈대 + 도메인 계약 + ID recipe | `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` | – | -| **P1** | Markdown ingestion (walk → parse → chunk → SQLite) | `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-sqlite` | P0 | -| **P2** | SQLite FTS5 lexical 검색 + citation | `kb-search` (lexical) | P1 | -| **P3** | Local embedding + LanceDB + hybrid (RRF) | `kb-embed`, `kb-embed-local`, `kb-store-vector`, `kb-search` | P2 | -| **P4** | Local LLM + RAG + grounded answer | `kb-llm`, `kb-llm-local`, `kb-rag` | P3 | -| **P5** | Golden query / regression eval | `kb-eval` | P4 | -| **P6** | 이미지 ingestion (OCR + caption) | `kb-parse-image` | P5 | -| **P7** | PDF text + page citation | `kb-parse-pdf` | P5 | -| **P8** | 음성 transcription + timestamp citation | `kb-parse-audio` | P5 | -| **P9** | TUI + desktop app | `kb-tui`, `kb-desktop` | P5 | +| Phase | 내용 | 핵심 산출 crate | 선행 | 상태 | +|-------|------|----------------|------|------| +| **P0** | Workspace 뼈대 + 도메인 계약 + ID recipe | `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` | – | ✅ 완료 | +| **P1** | Markdown ingestion (walk → parse → chunk → SQLite) | `kb-source-fs`, `kb-parse-md`, `kb-normalize`, `kb-chunk`, `kb-store-sqlite` | P0 | ✅ 완료 | +| **P2** | SQLite FTS5 lexical 검색 + citation | `kb-search` (lexical) | P1 | ✅ 완료 | +| **P3** | Local embedding + LanceDB + hybrid (RRF) + kb-app wiring | `kb-embed`, `kb-embed-local`, `kb-store-vector`, `kb-search` | P2 | ✅ 완료 | +| **P4** | Local LLM + RAG + grounded answer | `kb-llm`, `kb-llm-local`, `kb-rag` | P3 | ✅ 완료 | +| **P5** | Golden query / regression eval | `kb-eval` | P4 | ⏳ 다음 | +| **P6** | 이미지 ingestion (OCR + caption) | `kb-parse-image` | P5 | ⏳ | +| **P7** | PDF text + page citation | `kb-parse-pdf` | P5 | ⏳ | +| **P8** | 음성 transcription + timestamp citation | `kb-parse-audio` | P5 | ⏳ | +| **P9** | TUI + desktop app | `kb-tui`, `kb-desktop` | P5 | ⏳ | P0~P5 직렬. P6~P9 P5 이후 병렬 가능. -각 phase 는 component-level 단위로 더 분해되어 있다 (총 30 component task). 자세한 분해는 [tasks/INDEX.md](tasks/INDEX.md). +각 phase 는 component-level 단위로 더 분해되어 있다 (총 31 component task — P3-5 app-wiring 추가). 자세한 분해는 [tasks/INDEX.md](tasks/INDEX.md). 머지 후 발견된 버그/fix 의 dated 로그는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md). --- @@ -107,59 +109,66 @@ kb/ │ │ │ └── 2026-04-27-kb-final-form-design.md # frozen design (12 sections) │ │ └── plans/ │ │ └── 2026-04-27-task-decomposition.md # task 분해 implementation plan -│ ├── spec/ # P0 에서 생성 — 도메인 모델 / ID / module-boundary 등 stub -│ └── wire-schema/v1/ # P0 에서 생성 — JSON Schema 7개 (citation, search_hit, answer, …) +│ ├── SMOKE.md # 로컬 워크스페이스에 직접 돌려보는 절차 +│ └── wire-schema/v1/ # JSON Schema 7 (citation, search_hit, answer, …) ├── tasks/ │ ├── INDEX.md # phase 인덱스 + component task 트리 +│ ├── HOTFIXES.md # post-merge dated fix 로그 │ ├── _template.md # task spec 작성 템플릿 │ ├── phase-0-skeleton.md … phase-9-ui.md # phase epic (high-level) -│ ├── p0/p0-1-skeleton.md # component task (1 spec) -│ ├── p1/p1-1 … p1-6 # component tasks (6) +│ ├── p0/p0-1-skeleton.md # component task (1) +│ ├── p1/p1-1 … p1-6 # (6) │ ├── p2/p2-1, p2-2 # (2) -│ ├── p3/p3-1 … p3-4 # (4) +│ ├── p3/p3-1 … p3-5 # (5 — p3-5 = app-wiring, post-spec 추가) │ ├── p4/p4-1 … p4-3 # (3) │ ├── p5/p5-1, p5-2 # (2) │ ├── p6/p6-1 … p6-3 # (3) │ ├── p7/p7-1, p7-2 # (2) │ ├── p8/p8-1, p8-2 # (2) │ └── p9/p9-1 … p9-5 # (5) -├── crates/ # P0 에서 생성 — Rust crates -│ ├── kb-core/ -│ ├── kb-parse-types/ -│ ├── kb-config/ -│ ├── kb-app/ -│ └── kb-cli/ -└── fixtures/ # P0 에서 생성 — 테스트 fixture 트리 - ├── markdown/ source-fs/ search/ embed/ vector/ - ├── rag/ eval/ image/ pdf/ audio/ - └── … +├── crates/ +│ ├── kb-core/ kb-parse-types/ kb-config/ # 도메인 + 설정 (P0) +│ ├── kb-source-fs/ # 워크스페이스 walk + checksum (P1-1) +│ ├── kb-parse-md/ # Markdown frontmatter + blocks (P1-2/3) +│ ├── kb-normalize/ # ParsedBlock → CanonicalDocument (P1-4) +│ ├── kb-chunk/ # heading-aware chunker (P1-5) +│ ├── kb-store-sqlite/ # SQLite + FTS5 (V001/V002/V003) (P1-6, P2-1, P3-3) +│ ├── kb-search/ # Lexical + Vector + Hybrid retriever (P2-2, P3-4) +│ ├── kb-embed/ kb-embed-local/ # Embedder trait + fastembed adapter (P3-1, P3-2) +│ ├── kb-store-vector/ # LanceDB VectorStore (P3-3) +│ ├── kb-llm/ kb-llm-local/ # LanguageModel trait + Ollama adapter (P4-1, P4-2) +│ ├── kb-rag/ # RAG pipeline (P4-3) +│ ├── kb-app/ # facade (P0 시그니처 + P3-5 본체) +│ └── kb-cli/ # binary (P0 → 핫픽스로 --config flag wiring 강화) +├── migrations/ # SQLite refinery V001/V002/V003 +└── fixtures/ # 테스트 fixture 트리 ``` --- -## 빌드 + 실행 (P0 완료 후) +## 빌드 + 실행 ```bash # build cargo build --release -# 첫 실행 +# 첫 실행 — XDG 경로에 config.toml 생성 ./target/release/kb init # config 손보고 ${EDITOR:-vi} ~/.config/kb/config.toml # 색인 -./target/release/kb ingest ~/KnowledgeBase +./target/release/kb ingest # 검색 -./target/release/kb search "Markdown chunking 규칙" +./target/release/kb search "Markdown chunking 규칙" --mode hybrid -# 질문 +# 질문 (Ollama 필요) ./target/release/kb ask "내 KB 설계에서 저장소 전략은?" ``` -**현재는 P0 미시작 — 위 명령 모두 동작하지 않는다.** spec 만 동결됐다. +워크스페이스를 격리해서 직접 돌려보는 패턴은 [docs/SMOKE.md](docs/SMOKE.md) 참조 — `--config ` 로 임시 디렉토리에 격리된 KB 를 만들 수 있다. --- @@ -194,6 +203,7 @@ ${EDITOR:-vi} ~/.config/kb/config.toml 2. **새 component task 추가** — `tasks/_template.md` 복사 후 `tasks/p/p--.md` 생성. `contract_sections` 에 design doc 섹션 명시. `Allowed/Forbidden dependencies` 는 design §8 module-boundary 표 따름. 3. **구현** — component task 1개당 sub-agent 1세션 권장. `cargo test -p ` + DoD 체크리스트 통과. PR 으로 머지. 4. **버전 변경** — `parser_version` / `chunker_version` / `embedding_version` 등 변경은 design §9 의 cascade rule 따름. 영향 받는 record 는 재처리 필요. +5. **post-merge 핫픽스** — 머지 후 발견된 버그는 [tasks/HOTFIXES.md](tasks/HOTFIXES.md) 에 dated entry 추가 + 영향 받는 task spec 의 `Risks/notes` 에 cross-link 한 줄 추가. 원래 spec 본문은 frozen 으로 두고 HOTFIXES.md 가 live source of truth. --- @@ -209,3 +219,5 @@ ${EDITOR:-vi} ~/.config/kb/config.toml - Frozen design: [docs/superpowers/specs/2026-04-27-kb-final-form-design.md](docs/superpowers/specs/2026-04-27-kb-final-form-design.md) - Task 분해 plan: [docs/superpowers/plans/2026-04-27-task-decomposition.md](docs/superpowers/plans/2026-04-27-task-decomposition.md) - Task 인덱스: [tasks/INDEX.md](tasks/INDEX.md) +- Post-merge 핫픽스 로그: [tasks/HOTFIXES.md](tasks/HOTFIXES.md) +- Smoke recipe: [docs/SMOKE.md](docs/SMOKE.md) diff --git a/docs/SMOKE.md b/docs/SMOKE.md new file mode 100644 index 0000000..ddc9aca --- /dev/null +++ b/docs/SMOKE.md @@ -0,0 +1,158 @@ +--- +title: "kb 스모크 실행 가이드" +date: 2026-05-01 +--- + +# kb 스모크 실행 가이드 + +P3-5 머지 후 (`kb-app::ingest` / `search` / `list` / `inspect` 와이어링) 부터, 그리고 P4-3 머지 후 (`kb ask` 와이어링) 부터 사용자가 자기 설치본을 직접 검증할 수 있다. 이 문서는 사용자 환경 (`~/.config/kb/`, `~/.local/share/kb/`) 을 건드리지 않고 임시 디렉토리에 격리된 KB 를 띄워 전체 파이프라인을 1세션 안에 한 번 돌리는 절차다. + +## 준비 + +빌드: + +```bash +cargo build --release -p kb-cli # debug 도 무방. 디버그가 더 빠르게 빌드됨. +``` + +원격 Ollama (선택, `kb ask` 만 필요): + +```bash +# Mac 등 별도 호스트에서 +OLLAMA_HOST=0.0.0.0:11434 ollama serve +ollama pull gemma4:26b # 또는 qwen2.5:32b 등 — 자세한 비교는 README +``` + +본 머신에서 reachability 검증: + +```bash +curl http://:11434/api/tags +``` + +`{"models": [...]}` 가 나오면 네트워크 + 방화벽 OK. + +## 격리된 워크스페이스 생성 + +```bash +mkdir -p /tmp/kb-smoke/{workspace,data} +cat > /tmp/kb-smoke/workspace/intro.md <<'EOF' +--- +title: 인사말 +tags: [demo] +lang: ko +--- +# 안녕 + +이 문서는 스모크 테스트 fixture 다. +EOF +``` + +여러 파일을 시드하고 싶으면 본인 KB 일부를 `cp -r` 으로 복사해도 좋다 (다음 절차는 6개 markdown 가정). + +## 격리된 config + +`/tmp/kb-smoke/config.toml`: + +```toml +schema_version = 1 + +[workspace] +root = "/tmp/kb-smoke/workspace" +include = ["**/*.md"] +exclude = [".git/**", "node_modules/**", ".obsidian/**"] + +[storage] +data_dir = "/tmp/kb-smoke/data" +sqlite = "{data_dir}/kb.sqlite" +vector_dir = "{data_dir}/lancedb" +asset_dir = "{data_dir}/assets" +artifact_dir = "{data_dir}/artifacts" +model_dir = "{data_dir}/models" +runs_dir = "{data_dir}/runs" +copy_threshold_mb = 100 + +[indexing] +max_parallel_extractors = 2 +max_parallel_embeddings = 1 +watch_filesystem = false + +[chunking] +target_tokens = 500 +overlap_tokens = 80 +respect_markdown_headings = true +chunker_version = "md-heading-v1" + +[models.embedding] +provider = "fastembed" # "none" 으로 두면 lexical-only — Ollama 불필요 +model = "multilingual-e5-small" +version = "v1" +dimensions = 384 +batch_size = 64 + +[models.llm] +provider = "ollama" +model = "gemma4:26b" # 사용자 환경에 맞춰 교체 +context_tokens = 16384 +endpoint = "http://192.168.0.47:11434" +temperature = 0.2 +seed = 42 + +[search] +default_k = 10 +hybrid_fusion = "rrf" +rrf_k = 60 +snippet_chars = 220 + +[rag] +prompt_template_version = "rag-v1" +score_gate = 0.05 # RRF 정규화 후 [0, 1] 범위라 default 그대로 OK +explain_default = false +max_context_tokens = 6000 +``` + +`KB_*` 환경변수로 override 가능 (`KB_MODELS_LLM_MODEL=qwen2.5:32b kb …` 등). 자세한 키 목록은 `crates/kb-config/src/lib.rs` 의 `apply_env` 매치 암. + +## 명령 시퀀스 + +```bash +KB() { ./target/debug/kb --config /tmp/kb-smoke/config.toml "$@"; } + +KB doctor # 1. health check +KB ingest # 2. 워크스페이스 색인 +KB list docs # 3. 색인 결과 목록 +KB search --mode lexical "코루틴" --k 3 # 4. lexical 검색 +KB search --mode vector "memory safety" --k 3 # 5. vector 검색 +KB search --mode hybrid "Cargo workspace" --k 3 # 6. hybrid 검색 +KB inspect chunk # 7. raw chunk 보기 +KB ask "이 KB 안에서 ..." --mode hybrid --k 5 # 8. RAG 답변 (Ollama 필요) +KB --json ask "..." --mode hybrid # 9. 기계 친화 출력 검증 +``` + +각 명령은 0 종료 코드면 정상. `kb ask` 는 거절 시 종료 코드 1 (`RefusalSignal`) — 의도된 동작. + +## 검증 체크리스트 + +- `kb doctor` 가 `--config` path 를 honor 하고 그 안의 `storage.data_dir` 를 출력 (XDG default 가 아님). +- `kb ingest` idempotent — 두 번째 실행이 `new=0 updated=N`. +- `kb list docs` 출력에 frontmatter 의 `title` 이 아닌 deterministic `doc_id` (32-hex) + `workspace_path` 가 보임. +- `kb search --mode hybrid` 의 `fusion_score` 가 `[0, 1]` 범위 (top-1 종종 1.0 — 두 retriever 모두 rank 1 일 때). +- `kb ask` JSON 응답에 `model.id` 가 config 의 모델 (`gemma4:26b` 등) 과 일치, `embedding.id = multilingual-e5-small`, `citations[].marker` 가 `[1]` / `[2]` 형식 (square-bracketed bare index). +- 코퍼스에 없는 주제로 `kb ask` → `refusal_reason: "llm_self_judge"` (또는 `no_chunks` / `score_gate`) + `grounded: false`. + +## 정리 + +```bash +rm -rf /tmp/kb-smoke/data # 데이터만 날리고 다시 ingest 가능 +rm -rf /tmp/kb-smoke # 통째로 정리 +``` + +`~/.config/kb/` 와 `~/.local/share/kb/` 는 한 번도 터치되지 않는다 (`--config` flag 가 정확히 honor 되는 경우 — P3-5 hotfix 이후 보장). + +## 알려진 동작 + +- 첫 `kb ingest` 시 fastembed 모델 다운로드 (~470MB) — `data_dir/models/fastembed/` 에 캐시. +- `kb ask` 응답 시간 = LLM 토큰 throughput 에 종속. M4 Pro 48GB + gemma4:26b 기준 답변 50–100 토큰에 20–55초. +- `--config` path 가 존재하지 않거나 malformed 면 `kb doctor` 가 hard fail (defaults 가 silently mask 하지 않게 하는 hotfix 동작). +- 매 CLI invocation 마다 fastembed 모델 init 비용 (~4초) — process-level 캐시 부재 때문. P9 TUI 진입 시 `App` 의 `OnceLock` 으로 세션 동안 한 번만 init. + +자세한 history 와 발견된 버그는 [tasks/HOTFIXES.md](../tasks/HOTFIXES.md) 참조. diff --git a/tasks/p0/p0-1-skeleton.md b/tasks/p0/p0-1-skeleton.md index e829c89..15b5665 100644 --- a/tasks/p0/p0-1-skeleton.md +++ b/tasks/p0/p0-1-skeleton.md @@ -3,7 +3,7 @@ phase: P0 component: workspace + kb-core + kb-config + kb-app + kb-cli task_id: p0-1 title: "Workspace skeleton + frozen domain types/traits + ID recipe + facade" -status: planned +status: completed depends_on: [] unblocks: [p1-1, p1-2, p1-3, p1-4, p1-5, p1-6, p2-1, p2-2, p3-1, p3-2, p3-3, p3-4, p4-1, p4-2, p4-3, p5-1, p5-2, p6-1, p6-2, p6-3, p7-1, p7-2, p8-1, p8-2, p9-1, p9-2, p9-3, p9-4, p9-5] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p1/p1-1-source-fs.md b/tasks/p1/p1-1-source-fs.md index 1ef57af..f849e12 100644 --- a/tasks/p1/p1-1-source-fs.md +++ b/tasks/p1/p1-1-source-fs.md @@ -3,7 +3,7 @@ phase: P1 component: kb-source-fs task_id: p1-1 title: "Local filesystem source connector" -status: planned +status: completed depends_on: [p0-1] unblocks: [p1-2, p1-3, p1-4, p1-5, p1-6] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p1/p1-2-parse-md-frontmatter.md b/tasks/p1/p1-2-parse-md-frontmatter.md index e759e9f..07f2e02 100644 --- a/tasks/p1/p1-2-parse-md-frontmatter.md +++ b/tasks/p1/p1-2-parse-md-frontmatter.md @@ -3,7 +3,7 @@ phase: P1 component: kb-parse-md (frontmatter submodule) task_id: p1-2 title: "Markdown frontmatter parsing → Metadata" -status: planned +status: completed depends_on: [p0-1] unblocks: [p1-4] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p1/p1-3-parse-md-blocks.md b/tasks/p1/p1-3-parse-md-blocks.md index d668bd0..dcd4813 100644 --- a/tasks/p1/p1-3-parse-md-blocks.md +++ b/tasks/p1/p1-3-parse-md-blocks.md @@ -3,7 +3,7 @@ phase: P1 component: kb-parse-md (blocks submodule) task_id: p1-3 title: "Markdown body → Block tree with line spans" -status: planned +status: completed depends_on: [p0-1] unblocks: [p1-4] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p1/p1-4-normalize.md b/tasks/p1/p1-4-normalize.md index 14133bd..dd8bddf 100644 --- a/tasks/p1/p1-4-normalize.md +++ b/tasks/p1/p1-4-normalize.md @@ -3,7 +3,7 @@ phase: P1 component: kb-normalize task_id: p1-4 title: "Lift parser output → CanonicalDocument with deterministic IDs" -status: planned +status: completed depends_on: [p1-2, p1-3] unblocks: [p1-5, p1-6] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p1/p1-5-chunk.md b/tasks/p1/p1-5-chunk.md index 11677d8..3eb7c8c 100644 --- a/tasks/p1/p1-5-chunk.md +++ b/tasks/p1/p1-5-chunk.md @@ -3,7 +3,7 @@ phase: P1 component: kb-chunk task_id: p1-5 title: "Markdown heading-aware chunker (md-heading-v1)" -status: planned +status: completed depends_on: [p1-4] unblocks: [p1-6, p2-2, p3-2] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p1/p1-6-store-sqlite.md b/tasks/p1/p1-6-store-sqlite.md index 5857195..93b7e7c 100644 --- a/tasks/p1/p1-6-store-sqlite.md +++ b/tasks/p1/p1-6-store-sqlite.md @@ -3,7 +3,7 @@ phase: P1 component: kb-store-sqlite (P1 subset) task_id: p1-6 title: "SQLite store: assets/documents/blocks/chunks + asset writer + migrations" -status: planned +status: completed depends_on: [p1-1, p1-4, p1-5] unblocks: [p2-1, p3-3, p4-3] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p2/p2-1-fts-schema.md b/tasks/p2/p2-1-fts-schema.md index 0552797..4e46623 100644 --- a/tasks/p2/p2-1-fts-schema.md +++ b/tasks/p2/p2-1-fts-schema.md @@ -3,7 +3,7 @@ phase: P2 component: kb-store-sqlite (FTS5 migration) task_id: p2-1 title: "FTS5 virtual table + triggers (V002 migration)" -status: planned +status: completed depends_on: [p1-6] unblocks: [p2-2] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p2/p2-2-lexical-retriever.md b/tasks/p2/p2-2-lexical-retriever.md index 5c8529d..75524a5 100644 --- a/tasks/p2/p2-2-lexical-retriever.md +++ b/tasks/p2/p2-2-lexical-retriever.md @@ -3,7 +3,7 @@ phase: P2 component: kb-search (lexical mode) task_id: p2-2 title: "Lexical Retriever via SQLite FTS5 + bm25 + citation" -status: planned +status: completed depends_on: [p2-1] unblocks: [p3-4, p4-3] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p3/p3-1-embedder-trait.md b/tasks/p3/p3-1-embedder-trait.md index f5dc211..f84e2a1 100644 --- a/tasks/p3/p3-1-embedder-trait.md +++ b/tasks/p3/p3-1-embedder-trait.md @@ -3,7 +3,7 @@ phase: P3 component: kb-embed (trait crate) task_id: p3-1 title: "Embedder trait + EmbeddingInput/Kind validation" -status: planned +status: completed depends_on: [p0-1] unblocks: [p3-2, p3-3, p3-4] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p3/p3-2-fastembed-adapter.md b/tasks/p3/p3-2-fastembed-adapter.md index f7ec71f..1b4510e 100644 --- a/tasks/p3/p3-2-fastembed-adapter.md +++ b/tasks/p3/p3-2-fastembed-adapter.md @@ -3,7 +3,7 @@ phase: P3 component: kb-embed-local (fastembed adapter) task_id: p3-2 title: "fastembed-rs Embedder for multilingual-e5-small" -status: planned +status: completed depends_on: [p3-1] unblocks: [p3-3, p3-4] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p3/p3-3-lancedb-store.md b/tasks/p3/p3-3-lancedb-store.md index 946b962..3a80a3f 100644 --- a/tasks/p3/p3-3-lancedb-store.md +++ b/tasks/p3/p3-3-lancedb-store.md @@ -3,7 +3,7 @@ phase: P3 component: kb-store-vector (LanceDB) task_id: p3-3 title: "LanceDB VectorStore + embedding_records writer" -status: planned +status: completed depends_on: [p3-2, p1-6] unblocks: [p3-4] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p3/p3-4-hybrid-fusion.md b/tasks/p3/p3-4-hybrid-fusion.md index 088f11a..0d34a6d 100644 --- a/tasks/p3/p3-4-hybrid-fusion.md +++ b/tasks/p3/p3-4-hybrid-fusion.md @@ -3,7 +3,7 @@ phase: P3 component: kb-search (hybrid) task_id: p3-4 title: "Hybrid Retriever (RRF) over lexical + vector" -status: planned +status: completed depends_on: [p2-2, p3-3] unblocks: [p4-3] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p3/p3-5-app-wiring.md b/tasks/p3/p3-5-app-wiring.md index 60ef05d..206f855 100644 --- a/tasks/p3/p3-5-app-wiring.md +++ b/tasks/p3/p3-5-app-wiring.md @@ -3,7 +3,7 @@ phase: P3 component: kb-app (facade wiring) task_id: p3-5 title: "Wire kb-app facade — ingest / search / list / inspect end-to-end" -status: planned +status: completed depends_on: [p1-6, p2-2, p3-2, p3-3, p3-4] unblocks: [p4-3, p9-1, p9-2, p9-4] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p4/p4-1-llm-trait.md b/tasks/p4/p4-1-llm-trait.md index 1476505..2363f72 100644 --- a/tasks/p4/p4-1-llm-trait.md +++ b/tasks/p4/p4-1-llm-trait.md @@ -3,7 +3,7 @@ phase: P4 component: kb-llm (trait crate) task_id: p4-1 title: "LanguageModel trait + GenerateRequest/TokenChunk" -status: planned +status: completed depends_on: [p0-1] unblocks: [p4-2, p4-3] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p4/p4-2-ollama-adapter.md b/tasks/p4/p4-2-ollama-adapter.md index 8d347ba..6dae6c0 100644 --- a/tasks/p4/p4-2-ollama-adapter.md +++ b/tasks/p4/p4-2-ollama-adapter.md @@ -3,7 +3,7 @@ phase: P4 component: kb-llm-local (Ollama adapter) task_id: p4-2 title: "OllamaLanguageModel — streaming /api/generate" -status: planned +status: completed depends_on: [p4-1] unblocks: [p4-3] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/p4/p4-3-rag-pipeline.md b/tasks/p4/p4-3-rag-pipeline.md index c841ba2..4eae6b8 100644 --- a/tasks/p4/p4-3-rag-pipeline.md +++ b/tasks/p4/p4-3-rag-pipeline.md @@ -3,7 +3,7 @@ phase: P4 component: kb-rag task_id: p4-3 title: "RAG pipeline: retrieve → gate → pack → generate → cite-validate" -status: planned +status: completed depends_on: [p3-4, p4-2] unblocks: [p5-1] contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md diff --git a/tasks/phase-0-skeleton.md b/tasks/phase-0-skeleton.md index 53bcccf..17f4833 100644 --- a/tasks/phase-0-skeleton.md +++ b/tasks/phase-0-skeleton.md @@ -1,7 +1,7 @@ --- phase: P0 title: "Workspace 뼈대 + 도메인 계약" -status: planned +status: completed depends_on: [] source: kb_local_rust_report.md §3, §4, §6, §7, §13 --- diff --git a/tasks/phase-1-markdown-ingestion.md b/tasks/phase-1-markdown-ingestion.md index bac534b..72a639a 100644 --- a/tasks/phase-1-markdown-ingestion.md +++ b/tasks/phase-1-markdown-ingestion.md @@ -1,7 +1,7 @@ --- phase: P1 title: "Markdown ingestion 파이프라인" -status: planned +status: completed depends_on: [P0] source: kb_local_rust_report.md §8, §14, §17 Phase 1 --- diff --git a/tasks/phase-2-lexical-search.md b/tasks/phase-2-lexical-search.md index fd65119..62f575e 100644 --- a/tasks/phase-2-lexical-search.md +++ b/tasks/phase-2-lexical-search.md @@ -1,7 +1,7 @@ --- phase: P2 title: "SQLite FTS5 lexical 검색 + citation" -status: planned +status: completed depends_on: [P1] source: kb_local_rust_report.md §10, §15, §17 Phase 2 --- diff --git a/tasks/phase-3-vector-hybrid.md b/tasks/phase-3-vector-hybrid.md index 1e363bd..bb36513 100644 --- a/tasks/phase-3-vector-hybrid.md +++ b/tasks/phase-3-vector-hybrid.md @@ -1,7 +1,7 @@ --- phase: P3 title: "Local embedding + LanceDB + hybrid search" -status: planned +status: completed depends_on: [P2] source: kb_local_rust_report.md §10, §11, §15, §17 Phase 3 --- diff --git a/tasks/phase-4-local-llm-rag.md b/tasks/phase-4-local-llm-rag.md index ac8a84c..2d288d6 100644 --- a/tasks/phase-4-local-llm-rag.md +++ b/tasks/phase-4-local-llm-rag.md @@ -1,7 +1,7 @@ --- phase: P4 title: "Local LLM + RAG + grounded answer" -status: planned +status: completed depends_on: [P3] source: kb_local_rust_report.md §11, §15.2, §17 Phase 4 ---