Add policy_hash: String to kb_core::Chunk to align with the §5.5 SQLite
schema (chunks.policy_hash NOT NULL), so kb-store-sqlite persistence is a
straight field copy rather than a recompute.
This is a §9 schema migration:
- §5.5 (the persistence schema) is authoritative.
- §3.5 (the domain model) must accommodate.
The chunker already computed policy_hash for the chunk_id recipe (§4.2);
P1-5 stored it implicitly. We now hold it explicitly on the Chunk so any
DocumentStore::put_chunks impl can read it directly.
Follow-up commits update kb-chunk to populate the field and refresh the
P1-5 snapshot baseline accordingly.
`#[serde(tag = "kind")]` rejects newtype variants whose payload is not a
struct, so 4 of 5 `Inline` variants (`Text(String)`, `Code(String)`,
`Strong(Vec<…>)`, `Emph(Vec<…>)`) failed to serialize at runtime — only
`Link { text, href }` worked. Convert every variant to struct form so the
internally-tagged shape is well-formed and round-trips through JSON.
Add `inline_serde_round_trip` covering all five variants. Per design §9,
this is a wire-schema migration; no `docs/wire-schema/v1/*.json` change
required since `Inline` is not directly referenced there. Callers in
kb-parse-md follow in the next commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Document AssetStorage::Copied / Reference path semantics so P1-6 (asset
writer) knows that at scan time `Copied.path` is the SOURCE path and the
writer is responsible for both copying bytes AND overwriting `path`
with the destination.
- Rename the dangling-symlink test to make its scope explicit
(`dangling_symlink_pseudo_cycle_does_not_crash`); the prior name implied
a real two-step directory cycle but the targets were broken links.
- Add `two_step_directory_cycle_visited_set_breaks_loop`: builds
`a/loop -> ../b` and `b/loop -> ../a` over real directories with real
files, asserting scan terminates with a finite, deterministic asset
list — exercises the canonical-path visited-set in walker::walk_files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Cargo.toml: remove `thiserror` from kb-config, kb-parse-types, kb-app
(unused — none of those crates' src trees reference thiserror; CoreError
in kb-core is the only consumer).
- kb-config keeps the `kb-core` dep with a one-line comment marking
CoreError reserved for P1-* config-error wiring per the review thread.
- ids.rs: switch `validate_hex32` from a hand-rolled `matches!` byte range
to `is_ascii_hexdigit()` so the hex check is the canonical idiom (and
satisfies `clippy::manual_is_ascii_check` under `-D warnings`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- WorkspacePath: add `WorkspacePath::new(s)` validating constructor that
rejects any string containing `#` (collides with the W3C-Media-Fragments
separator that Citation URIs depend on). Doc-comment on the type now
explains the invariant.
- normalize::to_posix changes signature to `Result<WorkspacePath, CoreError>`
and now flows through `WorkspacePath::new`, so a path with `#` is rejected
at construction rather than at every reader. Only one caller existed
outside tests, so the signature change is contained.
- Citation::parse uses `WorkspacePath::new` on the path side. With multiple
`#` separators in the input, `rsplit_once` would otherwise leave a `#` on
the path; the new constructor closes the hole.
- Citation::parse + parse_hms_ms: every `bail!` / `anyhow!` site now quotes
the offending substring (and the full input where useful) so error
messages identify what went wrong without re-deriving from context.
- New tests: `to_posix_rejects_hash_in_path`,
`parse_path_with_hash_rejected_at_to_posix_layer`. Existing
`to_posix(...).0` call sites updated for the Result signature.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- ids.rs: validate_hex32 now accepts upper+lower hex; FromStr lowercases
the stored representation so equality and hashing stay canonical.
- Renamed test newtype_rejects_uppercase →
newtype_accepts_uppercase_normalizes_to_lowercase and added
newtype_rejects_invalid_chars_after_uppercase_pass.
- Added pinned-hex independence tests for id_for_block / id_for_chunk /
id_for_embedding / id_for_index. Each test also asserts that the bytes
serde-json-canonicalizer emits for the tuple match the literal JSON we
hashed externally with b3sum, so a future field-rename can't silently
drift the hash without flagging the JSON layer first.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups from the code-quality review pass on P0-1:
- Re-export `IngestItemKind` from `kb-core` so downstream tasks
constructing `IngestItem` don't need `kb_core::ingest::IngestItemKind`.
- Document the `--json` wire-schema convention by introducing
`kb-cli/src/wire.rs` with `wire_*` helpers paralleling the existing
inline `wire_ingest`. Each Ok-path `--json` branch now routes through
these helpers so future P1-5/P3/P4/P5 implementations slot the
`schema_version` envelope in automatically. `DoctorReport` keeps its
struct-field `schema_version` (the documented exception), and the
helper round-trips it idempotently. Records the convention in
`kb-app/src/lib.rs`'s top docstring.
- Fix clippy `single_char_add_str` in `kb_core::normalize` (replace
`out.push_str(".")` with `out.push('.')`).
Verified: `cargo check`, `cargo test` (5 new wire-helper tests),
`cargo clippy -D warnings`, and `RUSTFLAGS=-D warnings cargo build` all
clean. Smoke-tested `kb doctor --json` still emits
`{"schema_version":"doctor.v1",...}`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>