Add the integration snapshot test pinning the full `CanonicalDocument`
JSON for `fixtures/markdown/code-and-table.md` (run through the real
`kb-parse-md::parse_frontmatter` + `parse_blocks`, dev-dep only).
Non-deterministic `provenance.events[*].at` for the Parsed and
Normalized events is stripped before comparison; the Discovered
event's `at` is pinned by constructing the test `RawAsset` with a
fixed `discovered_at`. Run with `UPDATE_SNAPSHOTS=1` to regenerate.
Add the 1000-iteration determinism property: same inputs ⇒ byte-
identical JSON (modulo the same stripped timestamps), in under one
second of wall-clock time. A regression in canonical JSON, BLAKE3
hashing, ordinal counting, or any other deterministic field would
surface here immediately.
The integration test depends on `kb-parse-md` only as a dev-dep, so
`cargo tree -p kb-normalize --depth 1 --edges normal` confirms no
parser implementation appears in the production dep tree per design
§8.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mechanical sweep over `Inline::Text(_)` / `Code(_)` / `Strong(_)` / `Emph(_)`
construction and match sites under the new struct-variant shape introduced
in the previous commit. `Inline::Link { text, href }` is unchanged.
The snapshot test in `tests/blocks_snapshots.rs` previously projected
`ParsedBlock` into a `BlockView`/`PayloadView` shim because the old
`Inline` could not serialize. With the schema fix in place we now
serialize `ParsedBlock` directly through serde — the shim and its
`flatten_inline` helper are removed. Inlines surface as structured
objects (`{"kind":"text","text":"…"}` etc.). Regenerated
`nested-headings.blocks.snapshot.json` to reflect the new shape via
the existing `--ignored` emitter; `code-and-table.blocks.snapshot.json`
has no inlines and is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two snapshot tests (`nested-headings.md`, `code-and-table.md`) under
crates/kb-parse-md/tests/blocks_snapshots.rs, with matching baseline JSON
next to each fixture. The snapshot view projects `kb_core::Inline` to
flat strings — `Inline` carries `serde(tag = "kind")` which is
incompatible with newtype variants holding a primitive (`Text(String)`),
so direct serialization of `ParsedBlock` would fail today. The view
preserves the contract that matters for P1-3 (heading paths, source
spans, payload kinds, payload text/code/table content) and will keep
working once kb-core fixes the Inline schema in a later task.
Also tightens `level_to_use >= 1 && <= 6` into `(1..=6).contains(&_)` to
satisfy `clippy::manual_range_contains`.
Spec §"Behavior contract" line 74 says `id:` is captured into
`metadata.user_id_alias` only. Remove the redundant `user.insert`
that was also writing it into the user map, and update the snapshot
baseline accordingly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two markdown fixtures with hand-authored JSON baselines that pin the
§0 Q9 derive output across runs:
- frontmatter-only.md exercises the YAML happy path with most fields,
unknown keys, an `id:` field, and a non-UTC created_at (so the
baseline shows original_timestamps preservation).
- mixed-lang.md is body-only with no `lang:` field; baseline pins the
lingua autodetect result for our enabled language set.
A separate `emit_snapshots` test (marked `#[ignore]`) regenerates the
baselines from the current parser output. A determinism test parses
the fixture twice and asserts equality so any non-determinism (e.g.
key ordering, lingua nondeterminism) fails fast.
fixtures/source-fs/tree-1/:
README.md
notes/alpha.md
notes/beta.md
ignored/skip.tmp (excluded by .kbignore *.tmp)
.kbignore ("*.tmp")
.DS_Store (implicitly excluded by FsSourceConnector)
The committed baseline (tree-1.snapshot.json) has discovered_at,
source_uri.value, and stored.path replaced with "<stripped>" so the
JSON is portable across checkout locations and CI runs. The test
applies the same stripping to scan output before comparing.
The determinism test runs scan twice and asserts byte-identical
serialized JSON (post-strip) — same filesystem state must yield the
same Vec<RawAsset>.
Regenerate baseline with `KB_REGEN_SNAPSHOT=1 cargo test -p kb-source-fs
--test snapshot_tree1 -- tree_1_snapshot_matches_baseline`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>