Two markdown fixtures with hand-authored JSON baselines that pin the §0 Q9 derive output across runs: - frontmatter-only.md exercises the YAML happy path with most fields, unknown keys, an `id:` field, and a non-UTC created_at (so the baseline shows original_timestamps preservation). - mixed-lang.md is body-only with no `lang:` field; baseline pins the lingua autodetect result for our enabled language set. A separate `emit_snapshots` test (marked `#[ignore]`) regenerates the baselines from the current parser output. A determinism test parses the fixture twice and asserts equality so any non-determinism (e.g. key ordering, lingua nondeterminism) fails fast.
490 B
490 B
Mixed Language Note
이 문서는 한국어와 영어가 섞여 있습니다. The body has both Korean
sentences and English sentences. lingua는 통계적 언어 감지기를 제공합니다.
This is to test that auto-detect picks one of ko or en deterministically
when no lang: field is present in the frontmatter.
본문은 첫 4 KB만 분석되지만, 짧은 문서에서도 잘 동작해야 합니다. The detector should pick the dominant language across the sample window.