Blank-line paragraph segmentation (whitespace-only lines as boundaries, blank lines themselves never in any chunk's range). Paragraphs > 80 lines split into 80-line windows with 20-line overlap (stride 60), sharing the input lang and symbol=None per spec §9.3. tier2_shared exposes a new build_chunk_no_symbol helper so Chunk id/hash/token semantics stay identical with Tier 1/2. Extracts build_chunk_from_span as private core so build_chunk and build_chunk_no_symbol share mechanics without drift. 4 unit tests cover multi-paragraph shell (4 paragraphs, blank-line boundaries verified), 200-line oversize line-window split (chunks 1-80 / 61-140 / 121-200), empty file, and lang preservation when input is yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
16 lines
344 B
Bash
16 lines
344 B
Bash
#!/usr/bin/env bash
|
|
set -euo pipefail
|
|
|
|
# First paragraph: env setup
|
|
export KEBAB_HOME="${KEBAB_HOME:-$HOME/.local/share/kebab}"
|
|
mkdir -p "$KEBAB_HOME"
|
|
cd "$KEBAB_HOME"
|
|
|
|
# Second paragraph: ingest
|
|
echo "ingesting workspace..."
|
|
kebab ingest --config /etc/kebab/config.toml
|
|
|
|
# Third paragraph: report
|
|
echo "done"
|
|
kebab schema --json | jq '.stats'
|