Files
kebab/CLAUDE.md
altair823 6bfa9795c6 docs: split user-facing docs by audience — README narrow + HANDOFF 진척도 + ARCHITECTURE 내부
사용자 결정 (2026-05-02): \"README.md는 사용자가 가장 빠르게 이 앱을
사용할 수 있도록 하는 내용만 포함하자. mermaid 다이어그램으로 논리적인
아키텍처 다이어그램 하나 정도만 들어가면 충분할 것 같아\".

세 문서로 분리, audience 겹치지 않음:

1. **README.md (narrow)** — 사용자 first stop. Quick start / 명령 표 /
   Mermaid 1개 (논리 아키텍처) / Configuration pointer / 비-목표 / 라이선스.
   진척도 / crate 그래프 / 디렉토리 트리 / 핵심 결정 표 모두 빠짐.

2. **HANDOFF.md (신규)** — phase-level 진척 dashboard. Phase status table,
   component count (33), \"다음 task 후보\" (P9-2/3/4/5, P8 보류), 머지 후
   발견된 deviation 짧은 요약 (P3-5/P4-3 --config, P6-2 OCR, P6-3 caption,
   P7-2 chunk_id, P7-3 storage UNIQUE, P9-1 ratatui generic). 본문 detail
   은 tasks/HOTFIXES.md.

3. **docs/ARCHITECTURE.md (신규)** — crate 의존성 그래프, 디렉토리 트리,
   핵심 기술 결정 표, 외부 AI 통합 절. README 의 Mermaid 가 여기로 링크.

CLAUDE.md 의 \"User-facing docs\" 절 갱신:
- 세 문서 audience 분리 명시.
- implementation PR 이 셋 다 sync 의무, spec PR 은 안 건드림.
- 갱신 trigger 별 (CLI / TUI / Configuration / phase epic / crate 추가 /
  load-bearing deviation) 어느 문서를 손대는지 매핑.
- Out of scope (HOTFIXES detail / version cascade / per-task spec
  rationale) 어디에도 안 적힘 명시.

CLAUDE.md `## Project` 절도 새 문서 layout 반영. 18 crates → ~20 crates.

Memory feedback 갱신 (`feedback_readme_sync_rule.md`) — 미래 conversation
에서 자동 적용.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 13:51:51 +00:00

113 lines
9.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project
Single-user local-first knowledge base + RAG. Rust 2024 workspace, ~20 crates, single binary (`kebab`). All inference is local (Ollama + fastembed + whisper.cpp).
The repo's documentation is split by audience — don't duplicate across them:
- **[README.md](README.md)** — first stop for an end user. Quick start, command table, one Mermaid logical-architecture diagram, configuration pointers, license. Stays narrow.
- **[HANDOFF.md](HANDOFF.md)** — phase-level progress dashboard for someone picking the project up. Phase status table, component count, "next task candidates", short summary of post-merge deviations. The README never duplicates this.
- **[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)** — internal structure: crate dependency graph, directory tree, locked-in technical decisions. The README links here from the Mermaid diagram.
- **[docs/superpowers/specs/2026-04-27-kebab-final-form-design.md](docs/superpowers/specs/2026-04-27-kebab-final-form-design.md)** — frozen design contract.
- **[tasks/INDEX.md](tasks/INDEX.md)** — per-component task tree.
- **[tasks/HOTFIXES.md](tasks/HOTFIXES.md)** — dated post-merge deviation log; live source of truth where behavior and the frozen spec disagree.
## Build / test / lint
```bash
cargo test -p <crate> # preferred — workspace has 18 crates
cargo test -p <crate> <test_name> # single test (substring match)
cargo test --workspace --no-fail-fast -j 1 # full suite — see -j 1 below
cargo clippy --workspace --all-targets -- -D warnings # CI gate
cargo build --release # produces target/release/kebab
```
`-j 1` for the full workspace test isn't optional: 18 integration-test binaries each link `lance` + `datafusion` + `arrow` + `tantivy` and the parallel link step exhausts memory (linker gets SIGKILL'd, build silently fails partway). Per-crate runs are fine in parallel.
`target/` is 610 GB after a fresh build (DataFusion + Lance + fastembed + 18 × test-binary debug info). The dev/test profile is already trimmed (`debug = "line-tables-only"`, `split-debuginfo = "unpacked"` — see workspace `Cargo.toml`). Run `cargo clean` after phase merges if disk pressure shows up; backtraces still resolve to function + line.
## The facade rule
`kebab-app` is the only crate UI binaries (`kebab-cli`, future `kebab-tui`, `kebab-desktop`) may touch. Every user-facing entry has a `*_with_config(cfg, …)` companion that takes an explicit `Config`:
- `kebab-cli` calls the `*_with_config` form so `--config <path>` is honored.
- The bare `kebab_app::ingest(...)` / `search(...)` / `ask(...)` form re-loads `Config::load(None)` (XDG default) and silently bypasses any explicit path. Two regressions of exactly this shape are recorded in `tasks/HOTFIXES.md` (P3-5 + P4-3 follow-ups). When wiring a new CLI subcommand, always thread the `Config` through.
`*_with_config` is `#[doc(hidden)] pub fn` but it's the **official** config-explicit API, not a test seam.
## Spec contract
`docs/superpowers/specs/2026-04-27-kebab-final-form-design.md` (12 sections) is the single contract for the whole workspace. Every component task spec under `tasks/p<N>/` lists which `contract_sections` it implements.
- Changing the design doc requires updating every referencing task spec in the same PR.
- Task specs themselves stay **frozen** as the historical contract once the task is merged. Don't edit them retroactively to match what shipped.
- Live deviations from the original contract go in `tasks/HOTFIXES.md` as dated entries, plus a one-line cross-link in the original spec's `Risks / notes`. Treat HOTFIXES.md as the live source of truth when behavior and spec disagree.
`tasks/INDEX.md` is the dashboard for which phases / components are done; update its phase status when a phase epic completes.
## Allowed / forbidden deps
Each task spec lists `Allowed dependencies` and `Forbidden dependencies` per design §8. The most load-bearing ones:
- `kebab-core` MUST NOT depend on any other `kebab-*` crate. Domain types only.
- `kebab-eval`'s `metrics` and `compare` modules MUST NOT import retrieval / embedding / LLM crates directly. The runner is allowed to use `kebab-app`'s facade (P5-1 inheritance — see deviations in that task spec).
- UI crates (`kebab-cli`, future `kebab-tui`, `kebab-desktop`) MUST NOT import `kebab-store-*` / `kebab-llm-*` / `kebab-parse-*` directly — only `kebab-app`.
Read the relevant task spec's deps section before adding an import. New crates inherit the same boundary rules.
## Wire schema v1
All `--json` output carries a `schema_version` field (`ingest_report.v1`, `search_hit.v1`, `answer.v1`, `doctor.v1`, …). Schemas live in `docs/wire-schema/v1/`. The wire shape is the contract for external integrations (Claude Code skills, MCP, etc.); breaking it requires a `*.v2` major bump and parallel-running both for one phase.
## Versioning cascade
`parser_version` / `chunker_version` / `embedding_version` / `prompt_template_version` / `index_version` follow the cascade rule in design §9. Changing any of these invalidates downstream records (chunks, embeddings, eval runs, …). When changing a version: either ship a re-process job or treat it as a breaking schema bump. The eval runner snapshots all five into `eval_runs.config_snapshot_json`.
## Naming + paths
- Crate prefix: `kebab-` (kebab-case package, `kebab_` snake_case in Rust modules).
- Binary: `kebab`.
- Env var prefix: `KEBAB_*` (e.g. `KEBAB_RAG_SCORE_GATE`, `KEBAB_EVAL_GOLDEN`, `KEBAB_COMMIT_HASH`).
- XDG paths: `~/.config/kebab/`, `~/.local/share/kebab/`, `~/.cache/kebab/`, `~/.local/state/kebab/`.
- SQLite filename: `kebab.sqlite` (under `data_dir`).
- Workspace ignore: `.kebabignore` (per directory).
The migration from the old `kb` name lives in commits `911fb49 / f1a448d / f9714aa`. If you spot a leftover `kb` reference, treat it as a leftover and fix it (the rename PR sweep covered crates/, docs/, tasks/, README, design doc, fixtures — but workspace root `Cargo.toml` comments needed a follow-up; assume similar misses are possible).
## Smoke + integration
`docs/SMOKE.md` walks through running the full pipeline against an isolated TempDir KB via `--config /tmp/kebab-smoke/config.toml`. Use this instead of touching `~/.local/share/kebab/` when verifying a fresh clone or a CLI flag change. Most CLI regressions surface here, not in unit tests (see HOTFIXES.md).
## User-facing docs (README + HANDOFF + ARCHITECTURE)
Three sibling docs split the audience. Every implementation PR (`feat/*`) keeps them in sync; spec PRs (`spec/*`) don't touch any of the three.
**[README.md](README.md) — end user.** Stays narrow. The three surfaces a user touches:
- **CLI** — new `kebab <subcommand>`, flag, `--json` field, or exit-code change. Update the **명령** table and the **Quick start** block if the new flow needs a different invocation.
- **TUI** — new pane, key binding, or run-time behavior visible to a `kebab tui` user. Update the row in the **명령** table and the Mermaid diagram if a new external surface lands.
- **Configuration** — new `config.toml` field, `KEBAB_*` env, default change, or XDG path. Update the **Configuration** section AND the config example block in `docs/SMOKE.md`.
The Mermaid logical-architecture diagram stays the only diagram in the README. If a new media type / external service / store crosses the diagram boundary, update it; otherwise leave it alone.
The README does NOT carry: phase status, component count, post-merge deviations, crate dependency graph, directory tree, locked-in technical decisions. Those live in HANDOFF or ARCHITECTURE.
**[HANDOFF.md](HANDOFF.md) — handing off.** Phase-level progress + next-task candidates. Flip the relevant phase row from ⏳ to ✅ when a phase epic completes. Add a one-line entry under "머지 후 발견된 버그 / 결정 (요약)" when a HOTFIXES entry lands that's load-bearing for someone picking up the project. Per-component progress lives in `tasks/INDEX.md`, not here.
**[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) — implementation detail.** Crate dependency graph, directory tree, locked-in technical decisions. Update when:
- A new crate is added — extend the graph + directory tree.
- A locked-in decision flips (e.g. OCR engine default changes per a HOTFIXES entry) — update the table and link the HOTFIXES entry.
- A directory moves — update the tree.
Out of scope for all three: HOTFIXES detail (`tasks/HOTFIXES.md`), version cascade mechanics (CLAUDE.md §Versioning cascade), per-task spec rationale (`tasks/p<N>/`).
If a feature ships behind a flag that's off-by-default, mention the flag explicitly in the README so a user reading only the README knows the surface exists but is gated.
## Remote
Git remote is Gitea: `https://gitea.altair823.xyz/altair823-org/kebab.git`. PRs are created via the Gitea REST API (`POST /repos/altair823-org/kebab/pulls`) — `gh` CLI does not work against this host. Auth uses `~/.netrc` (populated via `git credential fill`).