feat(p0-1): workspace skeleton + frozen contracts #5
Reference in New Issue
Block a user
Delete Branch "feat/p0-1-skeleton"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
요약
P0-1 task spec 구현. Cargo workspace (Rust 2024, resolver=3) + 5 crate 도입, 모든 domain type / trait / ID recipe / Citation grammar / Config schema / CLI 진입점을 frozen design doc 기준으로 1:1 동결.
기준 문서: tasks/p0/p0-1-skeleton.md, docs/superpowers/specs/2026-04-27-kb-final-form-design.md — design §3 / §3.7b / §4 / §6 / §7 / §8 / §9 / §10 / §2 wire schema 참조.
구현 내용
kb-coreid_from+ 6개 ID recipe, Citation 5 variant W3C Media Fragments codeckb-parse-typesParsedBlock/ParsedPayload/Warning. 의존:kb-core+ serde + thiserror onlykb-configConfig::{defaults, load, from_file, apply_env}, 4개 XDG resolverkb-appinit_workspace/doctor실구현, 나머지 6개bail!("not yet wired (P<n>-<i>)")kb-cli--config / --verbose / --debug / --json, §10 exit code 0/1/2/3 매핑추가 산출:
docs/wire-schema/v1/*.schema.json7개 stubdocs/spec/*.md7개 stubmigrations/V001__init.sql(schema_meta+migrations)fixtures/11 subdir + 3 phase-0 markdown fixturekb-cli/src/wire.rs—--jsonmodeschema_versionwrapper convention 정착검증
스모크:
XDG_*=/tmp/kbtest cargo run -p kb-cli -- doctor --json→{"checks":[...],"ok":true,"schema_version":"doctor.v1"}, exit 0.Definition of Done
kb --help8 subcommandkb initXDG dir + idempotentkb doctor --jsondoctor.v1Review
single_char_add_str) 모두 반영 (commit5af07c1).다음
머지 후 feat/p1-1-source-fs 분기 → P1-1 (kb-source-fs) 진행.
Stand up the Cargo workspace (Rust 2024 / resolver=3) with the kb-core crate per the frozen design (§3, §4, §7, §10). kb-core has zero deps on other kb-* crates and exposes: - Newtype IDs (AssetId / DocumentId / BlockId / ChunkId / EmbeddingId / IndexId) with Display + FromStr that reject anything but 32 lower-hex. - id_from + id_for_{asset,doc,block,chunk,embedding,index} per §4.2; pinned hex test values computed via an independent JCS+blake3 tool. - CanonicalDocument, Block (8 variants), SourceSpan, Inline (§3.4). - Citation (5 variants) with W3C Media Fragments to_uri / parse; round-trip property holds for every variant. - Metadata + Provenance (§3.6); SearchQuery / SearchHit / RetrievalDetail (§3.7); DocFilter / DocSummary mirrors of wire §2.5. - Answer / AnswerCitation / RefusalReason / ModelRef (§3.8). - IngestReport, JobRepo support types, VectorRecord / VectorHit. - Component traits (SourceConnector / Extractor / Chunker / Embedder / Retriever / LanguageModel / DocumentStore / VectorStore / JobRepo) plus their input helpers (SourceScope / ExtractContext / ChunkPolicy / EmbeddingInput / GenerateRequest / TokenChunk / FinishReason). - CoreError (§10). - nfc + to_posix helpers (§4.1, §6.6). 20 unit tests cover ID determinism (1000-run regression), key-order invariance, two pinned hex values, newtype rejection of bad input, Citation round-trip for all 5 variants, and to_posix collapsing + Korean NFC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Adds the kb-app crate (§7) as the single facade between UI crates (kb-cli / kb-tui / kb-desktop) and the rest of the workspace. Public surface mirrors the task spec exactly: - init_workspace(force) — XDG dir creation + config.toml seed; idempotent unless force=true. Honors XDG envs and tilde-expands the workspace root to $HOME/KnowledgeBase. - doctor() — emits a doctor.v1 report with config_loaded + data_dir_writable checks; downstream checks land in later phases. - ingest / list_docs / inspect_doc / inspect_chunk / search / ask — bail!("not yet wired (P<n>-<i>)") so kb-cli surfaces exit code 2 cleanly per §10. - AskOpts + DoctorReport + DoctorCheck. - doctor_signal::{DoctorUnhealthy, RefusalSignal, NoHitSignal} — signal types the CLI downcasts on for §10 exit-code mapping. - logging::init() — daily-rolling file appender at $XDG_STATE_HOME/kb/logs/kb.log, plus stderr-fallback EnvFilter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Adds the kb binary with clap v4 derive subcommands mapping 1:1 to kb-app facade functions: init | ingest | list docs | inspect (doc|chunk) | search | ask | doctor | eval run Global flags: --config, --verbose, --debug, --json. On --json, output conforms to wire schema v1 (e.g. doctor.v1 emitted by kb-app::doctor). Exit-code mapping per design §10: 0 success 1 RefusalSignal / NoHitSignal (kb ask refusal, kb search no-hit) 2 any other anyhow::Error 3 DoctorUnhealthy Tracing initialized at startup with the file appender from kb-app. Verified via: XDG_*=… cargo run -p kb-cli -- init → idempotent XDG_*=… cargo run -p kb-cli -- doctor --json → {"schema_version":"doctor.v1","ok":true,…} exit 0 XDG_*=… cargo run -p kb-cli -- doctor (human form, ✓ marks) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>- docs/wire-schema/v1/ ships 7 schema stubs (citation, search_hit, answer, ingest_report, doc_summary, chunk_inspection, doctor) that pin schema_version + required fields per design §2. Full property validation lands in later phases. - docs/spec/ ships 7 markdown stubs each linking to the canonical frozen design (domain-model, ids, canonical-document, chunk-policy, citation-policy, module-boundaries, ai-generation-guidelines). - migrations/V001__init.sql contains only schema_meta + migrations tables per design §5.1; data tables ship in P1-6/P2-1/P3-3. - fixtures/ has the 11 subdirectories every downstream task references (markdown, source-fs, search/{lexical,hybrid}, embed, vector, rag, eval, image, pdf, audio). Empty subdirs use .gitkeep so they track. fixtures/markdown/ ships the 3 phase-0 fixtures: simple-note.md, nested-headings.md, code-and-table.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Three follow-ups from the code-quality review pass on P0-1: - Re-export `IngestItemKind` from `kb-core` so downstream tasks constructing `IngestItem` don't need `kb_core::ingest::IngestItemKind`. - Document the `--json` wire-schema convention by introducing `kb-cli/src/wire.rs` with `wire_*` helpers paralleling the existing inline `wire_ingest`. Each Ok-path `--json` branch now routes through these helpers so future P1-5/P3/P4/P5 implementations slot the `schema_version` envelope in automatically. `DoctorReport` keeps its struct-field `schema_version` (the documented exception), and the helper round-trips it idempotently. Records the convention in `kb-app/src/lib.rs`'s top docstring. - Fix clippy `single_char_add_str` in `kb_core::normalize` (replace `out.push_str(".")` with `out.push('.')`). Verified: `cargo check`, `cargo test` (5 new wire-helper tests), `cargo clippy -D warnings`, and `RUSTFLAGS=-D warnings cargo build` all clean. Smoke-tested `kb doctor --json` still emits `{"schema_version":"doctor.v1",...}`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>코드 리뷰 (COMMENT only)
내부 spec compliance + code quality review 통과 후 important 3건 fix commit (
5af07c1) 반영 확인. 추가로 PR 차원의 inline 코멘트 9개 (problems 6 + strengths 3) 첨부.핵심 판정: must-fix 없음. nice-to-fix 만 — 후속 phase 자연 해소 가능.
검증 통과
cargo check / test (30 pass) / clippy / build (warnings-as-errors)모두 cleancargo tree -p kb-parse-types→ kb-core + serde + thiserror onlyschema_versioncontract 1:1 부합kb doctor --jsonexit 0 +doctor.v1shape추천 후속 작업 (P1-* 진행 중 자연 해소 가능)
id_for_block/chunk/embedding/index4개 pinned hex test 추가Citation::parse경로 내#동작 결정 (test pin 또는 reject)Citation::parse에러 메시지에 입력 인용 (8군데, mechanical)kb-config/src/lib.rs::apply_env일반KB_<SECTION>_<KEY>매핑kb-config,kb-parse-types,kb-app의thiserror)approve는 작성자(=리뷰어 본인)가 별도 처리.
@@ -0,0 +17,4 @@/// Initialize tracing. Returns a guard to keep alive until exit. Idempotent/// — a second call is a no-op.pub fn init(level: LogLevel) -> Result<Option<WorkerGuard>> {🟡 minor —
Result<Option<WorkerGuard>>이지만 항상Ok(Some(_))현재 함수는
Err도Ok(None)도 반환하지 않음 (fs::create_dir_all실패 시만Err).Option은 죽은 surface. caller (kb-cli/src/main.rs:159)가.ok().flatten()으로 두 layer를 한꺼번에 떠는 형태도 같은 noise.옵션:
Result<WorkerGuard>로 단순화 (caller도.ok()한 번만).현 phase 작동에는 영향 없으므로 후속 cleanup으로 OK.
@@ -0,0 +10,4 @@//! Future tasks (P1-5, P3, P4, P5) replacing stub `bail!` paths must call//! these helpers from the relevant CLI subcommand handler before//! `serde_json::to_string`.//!👍 strength — wire convention 정착
quality review 후
kb-cli/src/wire.rs도입한 게 정확한 결정. 도메인 타입 (kb-core)에schema_version필드를 박지 않고 CLI emit 시점에 wrap 하는 방향이 design §2 의도와 부합 (도메인 ↔ wire 분리).특히
wire_search_hit이retrieval.fusion_score→ 최상위scorepromote 하는 로직 (§2.2 wire schema와 부합)을 한 곳에 모아둔 것 — P1-5/P3/P4/P5 어느 task가 이 함수를 호출하든 spec drift 없이 동일한 wire shape 보장.나중에 wire schema 갱신할 때 단일 진입점이라 영향 범위 명확.
@@ -0,0 +8,4 @@description = "Config schema + XDG path resolution"[dependencies]kb-core = { path = "../kb-core" }🟡 minor — 사용 안 하는 dep 선언
kb-core(line 11) 와thiserror(line 13) 가 declare 됐지만kb-config/src/lib.rs에서 import되지 않음. 동일 패턴이kb-parse-types/Cargo.toml(thiserror) 와kb-app/Cargo.toml(thiserror) 에도 있음.빌드 시간 증가 + 미래 grep으로 "왜 의존성 있지?" 혼란 유발. 두 옵션:
# reserved for P1-*: <reason>주석 — 의도가 분명할 때만.kb-config의 경우kb-core::CoreError를 결국 사용할 가능성 높으므로 옵션 2 + 주석 권장.@@ -0,0 +196,4 @@}// Match a small, intentional whitelist for P0 — full env→config// mapping lands when the rest of the schema is wired up.match k.as_str() {🟡 nice-to-fix —
apply_env4-key whitelistP0 unit test가 의존하는 4개 키만 처리. design §6.4 contract는 generic
KB_<SECTION>_<KEY>패턴이라 P1+에서 키 추가될 때마다 여기 손대야 함. 주석에 의도가 명시돼 있고 P0 단계 동작에는 문제 없으니 이번 PR에서 막진 않음.P1-* component task가 새 config 키를 도입할 때 자연스럽게 확장하거나, 후속 cleanup PR에서 reflection-style 일반 매핑 (예: serde + nested key path)으로 전환 권장.
또한 negative test 하나 (
env_unknown_key_is_ignored)가 있으면 화이트리스트 동작이 명시적으로 pin 됨.@@ -0,0 +88,4 @@/// `section = None` and `model = ""` for the relevant variants./// Round-trip property holds for citations whose non-URI fields are at/// their default values (see test).pub fn parse(s: &str) -> Result<Self> {🟡 minor — parse error 메시지에 입력 인용 부족
citation.rs의bail!("bad line start")/bail!("unknown fragment")등 (8군데) 모두 입력 문자열을 인용하지 않음. 사용자/디버거 입장에서 "bad line start"만 받으면s값을 추측해야 함.같은 패턴으로 다 바꾸면 디버깅 비용이 한 번에 줄어듦. 회귀 위험 0.
@@ -0,0 +89,4 @@/// Round-trip property holds for citations whose non-URI fields are at/// their default values (see test).pub fn parse(s: &str) -> Result<Self> {let (path_str, frag) = match s.rsplit_once('#') {🟡 nice-to-fix — 경로에
#포함될 때 동작 미정rsplit_once('#')은 마지막#를 fragment delimiter로 선택. POSIX/NFC에서 파일명에#은 합법이라 (드물지만)notes/foo#bar.md#L1-L5같은 입력은 path를notes/foo#bar.md로 올바르게 parse하지만,notes/foo#bar.md(fragment 없음) + 잘못된 위치의#는 silently mis-parse 가능.둘 중 하나 권장:
a#b#L5-L10→ path=a#b, fragment=L5-L10동작을 명시적으로 박아둠.to_posix/WorkspacePath생성에서#reject — 더 안전, design §0 Q3의 W3C Media Fragments 문법과도 부합.결정 자체보다 미정 상태가 문제. P1-1 source-fs 들어가기 전에 정하는 게 좋음.
@@ -0,0 +51,4 @@s.len())));}if !s.bytes().all(|b| matches!(b, b'0'..=b'9' | b'a'..=b'f')) {🟡 nice-to-fix —
FromStr이 lowercase-hex만 수용blake3::Hash::to_hex()가 lowercase로 출력하므로 내부 round-trip은 안전. 다만 외부 시스템(다른 프로젝트, 사람이 손으로 친 ID)에서 uppercase hex로 들어오면 silentlyInvalidId. wire schema input 검증 단계에서 normalize하거나, 여기서 case-insensitive 비교 + lowercase로 정규화해 저장하는 쪽이 robustness 측면에서 안전.현재 phase에서는 모든 ID가
id_from을 거치므로 문제 없음 — P1+에서 외부 입력 받기 시작할 때 결정 필요.@@ -0,0 +283,4 @@// → cec9353553efb238a7919d38d3e148f1...let id = id_for_asset("deadbeef");assert_eq!(id.0, "cec9353553efb238a7919d38d3e148f1");}👍 strength — pinned hex test 구조가 정확
외부 도구(
b3sum)로 계산한 hex를 literal로 박았고, 주석에 canonical-JSON 입력 +b3sum명령까지 명시. self-referential 아니므로 JCS 또는 blake3 파이프라인 회귀가 진짜로 잡힘. 이게 ID recipe test의 올바른 형태.같은 패턴을
id_for_block/id_for_chunk/id_for_embedding/id_for_index에도 4개 더 추가하면 모든 ID 함수가 외부 검증 안전망에 들어감 — P0-1 spec test plan의 "eachid_for_*recipe matches design §4.2 byte-for-byte" 항목을 더 강하게 만족.@@ -0,0 +1,70 @@//! `kb-core` — frozen domain types, traits, and ID recipe.👍 strength — module 분할 깔끔
kb-core16 모듈이 design §3 sub-section 경계를 그대로 따라감 (ids/citation/document/chunk/metadata/search/answer/ingest/jobs/vector/errors/traits/media/asset/versions/normalize). 거대 dump 파일 없음. 가장 큰citation.rs도 316 lines로 hold-in-context 가능.lib.rs자체가 70 lines,pub mod+ 큐레이션된 re-export만 있고 로직 없음. P1-* component task가kb_core::*로 단일 facade를 통해 import할 수 있어 downstream 의존성 그래프가 안 흔들림.재리뷰: 모든 항목 해소 — APPROVE 권고 (gate 우회 정책상 COMMENT로 등록)
이전 리뷰 9건 (problems 6 + minor 3) 모두 commit
286e627/ca0eb2f/d91b603/d2c8728에 반영됨. fresh subagent로 항목별 코드 다시 읽고 검증 완료.검증 결과
FromStrlowercase-only hex → upper+lower 수용 + lowercase 정규화ids.rs:34-37,49-62ids.rs:331-476WorkspacePath::new에서#reject +to_posixResult 시그니처 + 호출자 모두 갱신asset.rs:33-40,normalize.rs:24-61,citation.rs:99apply_env27 arm 전체 schema 매핑 + negative test + positive test 3종config/lib.rs:192-329,431-469logging::init→Result<WorkerGuard>(dead Option 제거) + 호출자 정리logging.rs:21,cli/main.rs:163Citation::parse모든bail!/anyhow!19개 site에 입력 인용thiserror미사용 dep 제거 (3 Cargo.toml) +kb-corereservation 주석외부 b3sum 재검증
id_for_asset_pinnedcec9353553efb238a7919d38d3e148f1id_for_doc_pinned8547fe58cb42d593fd761d77242401dbid_for_block_pinned8a7bf22de7ec3293a792028c829b3812id_for_chunk_pinned8809f627777fe7ca5c4433b97dd88ce9id_for_embedding_pinned71992c457a5da39880a6d17d646ed0fdid_for_index_pinnede733ee2f9936f0e1ac5143cdbf0f2b54각 테스트에
serde_json_canonicalizer::to_vec(&tuple) == expected_json_bytes별도 assert 존재 → JSON canonicalization 회귀와 hash 회귀가 독립적으로 잡힘.Gate 통과
cargo check --workspacecleanRUSTFLAGS="-D warnings" cargo build --workspacecleancargo test --workspace→ 41 passed / 0 failed (kb-core 27, kb-config 9, kb-cli 5; +11 vs 직전)cargo clippy --workspace --all-targets -- -D warningscleankb doctor --jsonexit 0 +doctor.v1새로 발견된 이슈
없음.
결론
모든 항목 papered over 아닌 root-cause fix. APPROVE 권고. self-review gate 정책상 본 코멘트는 COMMENT로 등록 — author 측���서 수동 APPROVE + merge 부탁드림.