refactor(spec): introduce kb-parse-types thin crate
PR #1 review left a design-debt note: ParsedBlock landing in kb-core would (a) force every crate to recompile on parser-internal changes, and (b) cause namespace pollution when P6/P7/P8 parsers add their own variants. Resolution: a new thin crate kb-parse-types sits between kb-core and parsers. Owns ParsedBlock + ParsedPayload + Warning + forward-refs for image/pdf/audio parser intermediates. Depends on kb-core only (for SourceSpan / Inline). Updates: - design §3.7b: add new section defining kb-parse-types - design §8: add kb-parse-types to module-boundary diagram + forbidden list - design §3.4 Inline stays in kb-core; kb-parse-types references it (no duplication) - p0-1 skeleton: workspace + Cargo deps + public surface block - p1-3 parse-md-blocks: outputs Vec<kb_parse_types::ParsedBlock> directly - p1-4 normalize: Allowed gains kb-parse-types, drops cross-coupling note - INDEX + phase-0 epic: list kb-parse-types in P0 deliverables
This commit is contained in:
@@ -581,7 +581,7 @@ pub struct RetrievalDetail {
|
||||
|
||||
### 3.7a Forward-declared types
|
||||
|
||||
`Block::ImageRef` / `AudioRef` variant 은 v1 부터 존재하나, 그 안의 `ocr` / `caption` / `transcript` 필드는 P1 에선 항상 `None`. 다음 타입은 `kb-core` 에 stub 으로 둠:
|
||||
`Block::ImageRef` / `AudioRef` variant 은 v1 부터 존재하나, 그 안의 `ocr` / `caption` / `transcript` 필드는 P1 에선 항상 `None`. 다음 타입은 `kb-core` 에 stub 으로 둠 (최종 도메인 모델 슬롯):
|
||||
|
||||
```rust
|
||||
pub struct OcrText { pub joined: String, pub regions: Vec<OcrRegion>, pub engine: String, pub engine_version: String }
|
||||
@@ -600,6 +600,69 @@ pub enum AudioType { M4a, Mp3, Wav, Flac, Ogg, Other(String) }
|
||||
|
||||
`OffsetDateTime` 는 `time::OffsetDateTime`, `Result` 는 crate-local alias.
|
||||
|
||||
### 3.7b Parser intermediate types — `kb-parse-types`
|
||||
|
||||
Parser 의 *중간* 표현 (`ParsedBlock` 류) 은 `kb-core` 가 아니라 별도의 thin crate **`kb-parse-types`** 에 둔다. 이유: `kb-normalize` 는 medium-agnostic 한 ID/Provenance lift 를 책임지고 어떤 parser 도 직접 import 하면 안 된다. 그러나 normalize 에 들어오는 입력 타입이 어딘가에 정의되어야 하는데, 그것을 `kb-core` 에 박으면 (a) parser-별 ParsedBlock 변종 (`ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`) 이 향후 합류할 때 core 의 namespace 가 폭발하고, (b) parser 의 의미 변경이 core 변경이 되어 모든 의존자가 영향을 받는다.
|
||||
|
||||
`kb-parse-types` 는 이 둘 사이의 **유일한 layer** 다. 의존 그래프:
|
||||
|
||||
```text
|
||||
kb-core (도메인 모델 — Block, Chunk, SourceSpan, IDs, …)
|
||||
▲
|
||||
│
|
||||
kb-parse-types (parser 중간 표현 — ParsedBlock, ParsedImageRegion[P+], ParsedPdfPage[P+], ParsedAudioSegment[P+], Inline)
|
||||
▲ ▲
|
||||
│ │
|
||||
kb-parse-md, kb-parse-pdf, kb-normalize
|
||||
kb-parse-image, kb-parse-audio
|
||||
```
|
||||
|
||||
`kb-parse-types` 는:
|
||||
- `kb-core` 에만 의존 (`Block`, `SourceSpan`, `Lang` 등 도메인 타입 사용).
|
||||
- 다른 어떤 `kb-*` 에도 의존하지 않는다.
|
||||
- 어떤 parser 의 구체 라이브러리 (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`) 에도 의존하지 않는다.
|
||||
- serde + thiserror 정도의 외부 의존만 가진다.
|
||||
|
||||
P1 에서 정의되는 타입:
|
||||
|
||||
```rust
|
||||
// kb-parse-types — depends on kb-core only.
|
||||
pub struct ParsedBlock {
|
||||
pub kind: ParsedBlockKind,
|
||||
pub heading_path: Vec<String>,
|
||||
pub source_span: kb_core::SourceSpan,
|
||||
pub payload: ParsedPayload,
|
||||
}
|
||||
|
||||
pub enum ParsedBlockKind { Heading, Paragraph, List, Code, Table, Quote, ImageRef, AudioRef }
|
||||
|
||||
pub enum ParsedPayload {
|
||||
Heading { level: u8, text: String },
|
||||
Paragraph { text: String, inlines: Vec<kb_core::Inline> },
|
||||
List { ordered: bool, items: Vec<Vec<kb_core::Inline>> },
|
||||
Code { lang: Option<String>, code: String },
|
||||
Table { headers: Vec<String>, rows: Vec<Vec<String>> },
|
||||
Quote { text: String, inlines: Vec<kb_core::Inline> },
|
||||
ImageRef { src: String, alt: String },
|
||||
AudioRef { src: String }, // duration_ms filled by extractor before chunking
|
||||
}
|
||||
|
||||
pub struct Warning { pub kind: WarningKind, pub note: String }
|
||||
pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed }
|
||||
```
|
||||
|
||||
`Inline` 은 `kb-core` (§3.4) 에 있는 도메인 타입. `kb-parse-types` 는 그것을 *참조* 만 한다 — 같은 의미를 두 crate 에 중복 정의하지 않는다 (그러면 normalize 가 identity-conversion 을 해야 해서 무의미).
|
||||
|
||||
P6/P7/P8 에서 추가될 타입 (forward-ref):
|
||||
|
||||
```rust
|
||||
pub struct ParsedImageRegion { /* OCR/EXIF 추출 전 단계 */ }
|
||||
pub struct ParsedPdfPage { pub page: u32, pub text: String }
|
||||
pub struct ParsedAudioSegment { pub start_ms: u64, pub end_ms: u64, pub text: String }
|
||||
```
|
||||
|
||||
→ 새 medium 추가 시 `kb-core::Block` 변종은 변하지 않고, `kb-parse-types` 만 확장된다.
|
||||
|
||||
### 3.8 Answer / RAG types
|
||||
|
||||
```rust
|
||||
@@ -1151,7 +1214,9 @@ kb-cli, kb-tui, kb-desktop
|
||||
└─> kb-app
|
||||
├─> kb-source-fs
|
||||
├─> kb-parse-md / kb-parse-pdf / kb-parse-image / kb-parse-audio
|
||||
│ └─> kb-parse-types (parser intermediate)
|
||||
├─> kb-normalize
|
||||
│ └─> kb-parse-types
|
||||
├─> kb-chunk
|
||||
├─> kb-store-sqlite (DocumentStore, JobRepo, Retriever[lexical])
|
||||
├─> kb-store-vector (VectorStore)
|
||||
@@ -1164,11 +1229,15 @@ kb-cli, kb-tui, kb-desktop
|
||||
└─> kb-core (모두 의존)
|
||||
```
|
||||
|
||||
`kb-parse-types` 는 `kb-core` 와 parsers/normalize 사이의 thin layer (§3.7b 참조). parser-별 중간 표현 (`ParsedBlock`, `ParsedImageRegion`, `ParsedPdfPage`, `ParsedAudioSegment`, `Inline`) 을 한 곳에 모아 (a) `kb-core` 의 namespace 폭발을 막고 (b) `kb-normalize` 가 parser 를 직접 import 하지 않게 한다.
|
||||
|
||||
핵심 금지:
|
||||
- UI → store/llm/parse 직접 의존 ✗
|
||||
- parse-* → store/llm/embed ✗
|
||||
- parse-* → kb-normalize ✗ (단방향: parsers → kb-parse-types ← normalize)
|
||||
- chunk → llm/embed ✗
|
||||
- normalize → store ✗
|
||||
- normalize → store / parse-* ✗
|
||||
- kb-parse-types → 어떤 parser/normalize/store/llm/embed/search/rag/ui ✗ (`kb-core` 만 의존)
|
||||
- 다른 store 와 cross-write ✗
|
||||
|
||||
`cargo deny` + workspace deny.toml + CI 체크로 강제.
|
||||
|
||||
@@ -25,7 +25,7 @@ P0~P5 는 직렬. P6~P9 는 P5 이후 병렬 가능.
|
||||
|
||||
| # | 코드 | 제목 | 핵심 산출 crate | 선행 |
|
||||
|---|------|------|----------------|------|
|
||||
| P0 | [phase-0-skeleton.md](phase-0-skeleton.md) | Workspace 뼈대 + 도메인 계약 | kb-core, kb-config, kb-app, kb-cli | – |
|
||||
| P0 | [phase-0-skeleton.md](phase-0-skeleton.md) | Workspace 뼈대 + 도메인 계약 | kb-core, kb-parse-types, kb-config, kb-app, kb-cli | – |
|
||||
| P1 | [phase-1-markdown-ingestion.md](phase-1-markdown-ingestion.md) | Markdown ingestion 파이프라인 | kb-source-fs, kb-parse-md, kb-normalize, kb-chunk, kb-store-sqlite | P0 |
|
||||
| P2 | [phase-2-lexical-search.md](phase-2-lexical-search.md) | SQLite FTS5 lexical 검색 + citation | kb-search (lexical) | P1 |
|
||||
| P3 | [phase-3-vector-hybrid.md](phase-3-vector-hybrid.md) | Local embedding + LanceDB + hybrid | kb-embed, kb-embed-local, kb-store-vector, kb-search | P2 |
|
||||
|
||||
@@ -14,7 +14,7 @@ contract_sections: [§3 (all), §4, §5.1 schema_meta+migrations, §6 (config +
|
||||
|
||||
## Goal
|
||||
|
||||
Stand up the Cargo workspace (Rust 2024, resolver=3) with `kb-core`, `kb-config`, `kb-app`, `kb-cli` crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts.
|
||||
Stand up the Cargo workspace (Rust 2024, resolver=3) with `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` crates. Freeze every domain type, trait, ID recipe, error type, and CLI entry shape per the frozen design doc so that all subsequent component tasks compile against stable contracts.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -25,6 +25,7 @@ Every other task imports `kb-core`. If types or trait signatures wobble after th
|
||||
- workspace `[workspace.dependencies]`: `anyhow = "1"`, `thiserror = "2"`, `serde = { version = "1", features = ["derive"] }`, `serde_json = "1"`, `time = { version = "0.3", features = ["serde", "macros"] }`, `uuid = { version = "1", features = ["v7", "serde"] }`, `blake3 = "1"`, `tracing = "0.1"`
|
||||
- per crate:
|
||||
- `kb-core`: workspace deps + `serde_json::Map`, `serde-json-canonicalizer`, `unicode-normalization`
|
||||
- `kb-parse-types`: workspace deps + `kb-core` ONLY (no parsers, no stores, no normalize). Defines parser intermediate representations per design §3.7b.
|
||||
- `kb-config`: workspace deps + `toml = "0.8"`, `dirs = "5"` (XDG paths)
|
||||
- `kb-app`: workspace deps + `kb-core`, `kb-config`, `tracing-subscriber`, `tracing-appender`
|
||||
- `kb-cli`: workspace deps + `kb-core`, `kb-config`, `kb-app`, `clap = { version = "4", features = ["derive"] }`
|
||||
@@ -32,6 +33,7 @@ Every other task imports `kb-core`. If types or trait signatures wobble after th
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-core` MUST NOT depend on any other `kb-*` crate.
|
||||
- `kb-parse-types` MUST depend ONLY on `kb-core`. No parser libraries (`pulldown-cmark`, `pdf-extract`, `image`, `whisper-rs`, …), no other `kb-*` crate.
|
||||
- `kb-config` MUST NOT depend on `kb-app`, `kb-cli`, parsers, stores, embedders, search, llm, rag, tui, desktop.
|
||||
- `kb-app` MUST NOT yet depend on parsers/stores/embedders/search/llm/rag (those crates do not exist yet — facade methods stub out and return `unimplemented!()` or `anyhow::bail!("not yet wired (Pn-i)")`).
|
||||
- `kb-cli` MUST NOT call any non-`kb-app` crate directly.
|
||||
@@ -112,8 +114,7 @@ pub struct AudioRefBlock { /* per §3.4 */ }
|
||||
pub enum Inline { /* per §3.4 */ }
|
||||
pub enum SourceSpan { /* per §3.4 */ }
|
||||
|
||||
// ParsedBlock (intermediate, exposed via core for normalize — see p1-4 spec)
|
||||
pub struct ParsedBlock { /* mirror of Block without BlockId */ }
|
||||
// (ParsedBlock + parser intermediates live in kb-parse-types per design §3.7b — NOT in kb-core.)
|
||||
|
||||
// Chunk + Citation (§3.5)
|
||||
pub struct Chunk { /* per §3.5 */ }
|
||||
@@ -230,6 +231,42 @@ pub fn to_posix(path: &std::path::Path) -> WorkspacePath; // §6.6
|
||||
pub fn nfc(input: &str) -> String; // §4.1
|
||||
```
|
||||
|
||||
```rust
|
||||
// ── kb-parse-types ──────────────────────────────────────────────────────────
|
||||
// Per design §3.7b. Defines parser intermediate representations consumed by
|
||||
// kb-normalize. Depends on kb-core only — never on parser libraries.
|
||||
|
||||
pub struct ParsedBlock {
|
||||
pub kind: ParsedBlockKind,
|
||||
pub heading_path: Vec<String>,
|
||||
pub source_span: kb_core::SourceSpan,
|
||||
pub payload: ParsedPayload,
|
||||
}
|
||||
|
||||
pub enum ParsedBlockKind { Heading, Paragraph, List, Code, Table, Quote, ImageRef, AudioRef }
|
||||
|
||||
pub enum ParsedPayload {
|
||||
Heading { level: u8, text: String },
|
||||
Paragraph { text: String, inlines: Vec<kb_core::Inline> },
|
||||
List { ordered: bool, items: Vec<Vec<kb_core::Inline>> },
|
||||
Code { lang: Option<String>, code: String },
|
||||
Table { headers: Vec<String>, rows: Vec<Vec<String>> },
|
||||
Quote { text: String, inlines: Vec<kb_core::Inline> },
|
||||
ImageRef { src: String, alt: String },
|
||||
AudioRef { src: String },
|
||||
}
|
||||
|
||||
// `Inline` itself lives in kb-core (§3.4) — parse-types references it, never duplicates it.
|
||||
|
||||
pub struct Warning { pub kind: WarningKind, pub note: String }
|
||||
pub enum WarningKind { MalformedFrontmatter, MalformedTable, EncodingFallback, ExtractFailed }
|
||||
|
||||
// Forward-ref for P6/P7/P8 — defined when those phases land.
|
||||
pub struct ParsedImageRegion;
|
||||
pub struct ParsedPdfPage;
|
||||
pub struct ParsedAudioSegment;
|
||||
```
|
||||
|
||||
```rust
|
||||
// ── kb-config ───────────────────────────────────────────────────────────────
|
||||
pub struct Config { /* full schema per §6.4 */ }
|
||||
@@ -306,16 +343,17 @@ All tests must run with no network, no Ollama, no models.
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] `Cargo.toml` workspace lists `kb-core`, `kb-config`, `kb-app`, `kb-cli` and resolver=3, edition 2024
|
||||
- [ ] `Cargo.toml` workspace lists `kb-core`, `kb-parse-types`, `kb-config`, `kb-app`, `kb-cli` and resolver=3, edition 2024
|
||||
- [ ] `cargo check --workspace` passes
|
||||
- [ ] `cargo test --workspace` passes
|
||||
- [ ] `kb-parse-types` `cargo tree` shows ONLY `kb-core` + `serde`/`thiserror` style deps (no parser libs, no other `kb-*`)
|
||||
- [ ] `kb --help` prints subcommands
|
||||
- [ ] `kb init` creates XDG dirs idempotently and writes `config.toml`
|
||||
- [ ] `kb doctor` returns wire JSON conforming to `doctor.v1` (in `--json` mode)
|
||||
- [ ] `docs/wire-schema/v1/*.schema.json` stubs exist (7 files: citation, search_hit, answer, ingest_report, doc_summary, chunk_inspection, doctor)
|
||||
- [ ] `docs/spec/` stubs exist linking to the frozen design (one file per: domain-model, ids, canonical-document, chunk-policy, citation-policy, module-boundaries, ai-generation-guidelines)
|
||||
- [ ] No imports outside Allowed dependencies (CI deny check)
|
||||
- [ ] PR body links design §3, §4, §6, §7, §8, §9, §10
|
||||
- [ ] PR body links design §3, §3.7b, §4, §6, §7, §8, §9, §10
|
||||
|
||||
## Out of scope
|
||||
|
||||
|
||||
@@ -7,14 +7,14 @@ status: planned
|
||||
depends_on: [p0-1]
|
||||
unblocks: [p1-4]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_sections: [§3.4 Block, §3.4 SourceSpan, §0 Q3 citation]
|
||||
contract_sections: [§3.4 Block, §3.4 SourceSpan, §3.7b kb-parse-types, §0 Q3 citation]
|
||||
---
|
||||
|
||||
# p1-3 — Markdown body → Block tree
|
||||
|
||||
## Goal
|
||||
|
||||
Parse Markdown body bytes into a flat `Vec<ParsedBlock>` (intermediate, crate-private) with heading paths and line ranges preserved, ready for `kb-normalize` to lift into `CanonicalDocument`.
|
||||
Parse Markdown body bytes into a flat `Vec<kb_parse_types::ParsedBlock>` with heading paths and line ranges preserved, ready for `kb-normalize` to lift into `CanonicalDocument`.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -23,6 +23,7 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-parse-types` (defines `ParsedBlock`, `ParsedPayload`, `Warning`)
|
||||
- `pulldown-cmark` (CommonMark with source-map; GFM tables enabled via feature)
|
||||
- `serde`
|
||||
- `thiserror`
|
||||
@@ -42,27 +43,30 @@ This is the heaviest part of P1 parser. Separating it from frontmatter and from
|
||||
|
||||
| output | type | downstream |
|
||||
|--------|------|------------|
|
||||
| `Vec<ParsedBlock>` (intermediate type, crate-private) | – | `kb-normalize` |
|
||||
| `Vec<Warning>` | – | propagated into Provenance |
|
||||
| `Vec<kb_parse_types::ParsedBlock>` | shared type from `kb-parse-types` | `kb-normalize` |
|
||||
| `Vec<kb_parse_types::Warning>` | shared type | propagated into Provenance |
|
||||
|
||||
## Public surface (signatures only — no new types)
|
||||
|
||||
```rust
|
||||
pub fn parse_blocks(body: &[u8], body_offset_lines: u32) -> anyhow::Result<(Vec<ParsedBlock>, Vec<Warning>)>;
|
||||
pub fn parse_blocks(
|
||||
body: &[u8],
|
||||
body_offset_lines: u32,
|
||||
) -> anyhow::Result<(Vec<kb_parse_types::ParsedBlock>, Vec<kb_parse_types::Warning>)>;
|
||||
```
|
||||
|
||||
`ParsedBlock` is a crate-internal mirror that maps 1:1 to `kb_core::Block` variants once `kb-normalize` assigns `BlockId`s.
|
||||
`ParsedBlock` is defined in `kb-parse-types` (design §3.7b). `kb-parse-md` does NOT define its own; it consumes the shared type. Lift to `kb_core::Block` (with `BlockId` assignment) is `kb-normalize`'s job (p1-4).
|
||||
|
||||
## Behavior contract
|
||||
|
||||
- Source-map: each `ParsedBlock` carries `SourceSpan::Line { start, end }` relative to the original file (i.e., add `body_offset_lines`).
|
||||
- Heading tree: every block records its ancestor heading texts in order (e.g., `["아키텍처", "Chunking 정책"]`).
|
||||
- Code blocks: language tag preserved (` ```rust ` → `Some("rust")`), fenced content not split.
|
||||
- Tables: GFM tables produce `TableBlock` with header row + body rows; if a table cell is malformed, fall back to a `Paragraph` block + warning.
|
||||
- Image references: `` produces `ImageRefBlock` with `asset_id = None`, `src = "..."`, `alt = "..."`. Resolution to `AssetId` happens later in `kb-normalize`.
|
||||
- Lists: ordered/unordered preserved; nested list items flattened into one `ListBlock` with each top-level item's text.
|
||||
- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per design §3.4). Drop other inlines silently.
|
||||
- Malformed input never panics. Worst case: empty `Vec<ParsedBlock>` + warning.
|
||||
- Code blocks: `ParsedPayload::Code { lang: Some("rust"), code }` — fenced content not split.
|
||||
- Tables: GFM tables produce `ParsedPayload::Table { headers, rows }`; if a table cell is malformed, fall back to `ParsedPayload::Paragraph` + `Warning::MalformedTable`.
|
||||
- Image references: `` produces `ParsedPayload::ImageRef { src, alt }`. `AssetId` resolution happens later in `kb-normalize` (when image src can be matched to a workspace asset).
|
||||
- Lists: ordered/unordered preserved via `ParsedPayload::List { ordered, items }`; nested list items flattened so each `items[i]` is a `Vec<kb_core::Inline>` for one top-level item.
|
||||
- Inline elements: only `Text`, `Code`, `Link`, `Strong`, `Emph` (per `kb_core::Inline` per design §3.4). Drop other inlines silently.
|
||||
- Malformed input never panics. Worst case: empty `Vec<ParsedBlock>` + `Warning::ExtractFailed`.
|
||||
|
||||
## Storage / wire effects
|
||||
|
||||
@@ -95,7 +99,7 @@ All tests under `cargo test -p kb-parse-md --lib blocks`.
|
||||
## Out of scope
|
||||
|
||||
- Frontmatter (p1-2).
|
||||
- Lifting `ParsedBlock` → `kb_core::Block` with `BlockId` (p1-4).
|
||||
- Lifting `kb_parse_types::ParsedBlock` → `kb_core::Block` with `BlockId` (p1-4 normalize).
|
||||
- Chunking (p1-5).
|
||||
|
||||
## Risks / notes
|
||||
|
||||
@@ -7,14 +7,14 @@ status: planned
|
||||
depends_on: [p1-2, p1-3]
|
||||
unblocks: [p1-5, p1-6]
|
||||
contract_source: ../../docs/superpowers/specs/2026-04-27-kb-final-form-design.md
|
||||
contract_sections: [§3.4, §4 ID recipe, §3.6 Provenance]
|
||||
contract_sections: [§3.4, §3.7b kb-parse-types, §4 ID recipe, §3.6 Provenance, §8 module boundaries]
|
||||
---
|
||||
|
||||
# p1-4 — Lift to CanonicalDocument
|
||||
|
||||
## Goal
|
||||
|
||||
Combine `Metadata` (p1-2) + `Vec<ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a `CanonicalDocument` with deterministic `doc_id` and `block_id`s per design §4 recipe.
|
||||
Combine `Metadata` (p1-2) + `Vec<kb_parse_types::ParsedBlock>` (p1-3) + `RawAsset` (p1-1) into a `kb_core::CanonicalDocument` with deterministic `doc_id` and `block_id`s per design §4 recipe.
|
||||
|
||||
## Why now / why this size
|
||||
|
||||
@@ -23,6 +23,7 @@ Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` pure
|
||||
## Allowed dependencies
|
||||
|
||||
- `kb-core`
|
||||
- `kb-parse-types` (input shapes — `ParsedBlock`, `ParsedPayload`, `Warning`)
|
||||
- `kb-config`
|
||||
- `serde`
|
||||
- `serde-json-canonicalizer` (canonical JSON for ID hashing)
|
||||
@@ -33,17 +34,15 @@ Single responsibility: ID generation + struct assembly. Keeps `kb-parse-md` pure
|
||||
|
||||
## Forbidden dependencies
|
||||
|
||||
- `kb-source-fs`, `kb-parse-md` (consumed via plain types only — must not couple back), `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
|
||||
Note: this crate accepts `ParsedBlock` from `kb-parse-md` either by (a) exposing `ParsedBlock` as a `kb-core` type, or (b) `kb-parse-md` re-exporting via a public DTO. Pick (a): move `ParsedBlock` into `kb-core` so this task does not import `kb-parse-md`.
|
||||
- `kb-source-fs`, `kb-parse-md` (consumed via shared `kb-parse-types` only — `kb-parse-md` must NOT appear in this crate's `cargo tree`), `kb-parse-pdf`, `kb-parse-image`, `kb-parse-audio`, `kb-chunk`, `kb-store-*`, `kb-embed*`, `kb-search`, `kb-llm*`, `kb-rag`, `kb-tui`, `kb-desktop`
|
||||
|
||||
## Inputs
|
||||
|
||||
| input | type | source |
|
||||
|-------|------|--------|
|
||||
| `RawAsset` | `kb_core::RawAsset` | p1-1 |
|
||||
| `Metadata` + frontmatter span + warnings | from p1-2 | parser caller |
|
||||
| `Vec<ParsedBlock>` + warnings | from p1-3 | parser caller |
|
||||
| `Metadata` + frontmatter span + warnings | `(kb_core::Metadata, Option<FrontmatterSpan>, Vec<kb_parse_types::Warning>)` | p1-2 |
|
||||
| `Vec<ParsedBlock>` + warnings | `(Vec<kb_parse_types::ParsedBlock>, Vec<kb_parse_types::Warning>)` | p1-3 |
|
||||
| `parser_version` | `kb_core::ParserVersion` | constant in `kb-parse-md` |
|
||||
|
||||
## Outputs
|
||||
@@ -58,9 +57,9 @@ Note: this crate accepts `ParsedBlock` from `kb-parse-md` either by (a) exposing
|
||||
pub fn build_canonical_document(
|
||||
asset: &kb_core::RawAsset,
|
||||
metadata: kb_core::Metadata,
|
||||
blocks: Vec<kb_core::ParsedBlock>,
|
||||
blocks: Vec<kb_parse_types::ParsedBlock>,
|
||||
parser_version: &kb_core::ParserVersion,
|
||||
warnings: Vec<Warning>,
|
||||
warnings: Vec<kb_parse_types::Warning>,
|
||||
) -> anyhow::Result<kb_core::CanonicalDocument>;
|
||||
|
||||
pub fn id_for_doc(workspace_path: &kb_core::WorkspacePath, asset: &kb_core::AssetId, parser_version: &kb_core::ParserVersion) -> kb_core::DocumentId;
|
||||
@@ -99,8 +98,8 @@ All tests under `cargo test -p kb-normalize`.
|
||||
- [ ] `cargo check -p kb-normalize` passes
|
||||
- [ ] `cargo test -p kb-normalize` passes
|
||||
- [ ] Determinism test runs ≥ 1000 iterations under 1 second
|
||||
- [ ] No `kb-parse-md` import (consumed via `kb-core::ParsedBlock`)
|
||||
- [ ] PR links design §4.2, §4.3
|
||||
- [ ] No `kb-parse-md` (or any other parser crate) appears in `cargo tree -p kb-normalize` — input types come from `kb-parse-types` only
|
||||
- [ ] PR links design §3.7b, §4.2, §4.3, §8
|
||||
|
||||
## Out of scope
|
||||
|
||||
|
||||
@@ -17,6 +17,7 @@ compile 되는 Rust 2024 workspace 와 domain spec 확정. 이후 모든 phase
|
||||
| crate | 역할 |
|
||||
|-------|------|
|
||||
| `kb-core` | domain type, trait, error, ID 규칙 |
|
||||
| `kb-parse-types` | parser 중간 표현 (`ParsedBlock` 등) — `kb-core` 와 parsers/normalize 사이 thin layer (design §3.7b) |
|
||||
| `kb-config` | config 로딩, 기본값, 경로 확장 |
|
||||
| `kb-app` | facade. CLI/TUI/desktop 공통 진입점 |
|
||||
| `kb-cli` | `kb` 바이너리 skeleton (`--help`만 동작) |
|
||||
@@ -28,7 +29,7 @@ compile 되는 Rust 2024 workspace 와 domain spec 확정. 이후 모든 phase
|
||||
```toml
|
||||
[workspace]
|
||||
resolver = "3"
|
||||
members = ["crates/kb-core", "crates/kb-config", "crates/kb-app", "crates/kb-cli"]
|
||||
members = ["crates/kb-core", "crates/kb-parse-types", "crates/kb-config", "crates/kb-app", "crates/kb-cli"]
|
||||
|
||||
[workspace.package]
|
||||
edition = "2024"
|
||||
|
||||
Reference in New Issue
Block a user