feat(p10-3): code-text-paragraph-v1 chunker — paragraph + line-window fallback
Blank-line paragraph segmentation (whitespace-only lines as boundaries, blank lines themselves never in any chunk's range). Paragraphs > 80 lines split into 80-line windows with 20-line overlap (stride 60), sharing the input lang and symbol=None per spec §9.3. tier2_shared exposes a new build_chunk_no_symbol helper so Chunk id/hash/token semantics stay identical with Tier 1/2. Extracts build_chunk_from_span as private core so build_chunk and build_chunk_no_symbol share mechanics without drift. 4 unit tests cover multi-paragraph shell (4 paragraphs, blank-line boundaries verified), 200-line oversize line-window split (chunks 1-80 / 61-140 / 121-200), empty file, and lang preservation when input is yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
200
crates/kebab-chunk/tests/fixtures/sample_long_paragraph.txt
vendored
Normal file
200
crates/kebab-chunk/tests/fixtures/sample_long_paragraph.txt
vendored
Normal file
@@ -0,0 +1,200 @@
|
||||
line 001
|
||||
line 002
|
||||
line 003
|
||||
line 004
|
||||
line 005
|
||||
line 006
|
||||
line 007
|
||||
line 008
|
||||
line 009
|
||||
line 010
|
||||
line 011
|
||||
line 012
|
||||
line 013
|
||||
line 014
|
||||
line 015
|
||||
line 016
|
||||
line 017
|
||||
line 018
|
||||
line 019
|
||||
line 020
|
||||
line 021
|
||||
line 022
|
||||
line 023
|
||||
line 024
|
||||
line 025
|
||||
line 026
|
||||
line 027
|
||||
line 028
|
||||
line 029
|
||||
line 030
|
||||
line 031
|
||||
line 032
|
||||
line 033
|
||||
line 034
|
||||
line 035
|
||||
line 036
|
||||
line 037
|
||||
line 038
|
||||
line 039
|
||||
line 040
|
||||
line 041
|
||||
line 042
|
||||
line 043
|
||||
line 044
|
||||
line 045
|
||||
line 046
|
||||
line 047
|
||||
line 048
|
||||
line 049
|
||||
line 050
|
||||
line 051
|
||||
line 052
|
||||
line 053
|
||||
line 054
|
||||
line 055
|
||||
line 056
|
||||
line 057
|
||||
line 058
|
||||
line 059
|
||||
line 060
|
||||
line 061
|
||||
line 062
|
||||
line 063
|
||||
line 064
|
||||
line 065
|
||||
line 066
|
||||
line 067
|
||||
line 068
|
||||
line 069
|
||||
line 070
|
||||
line 071
|
||||
line 072
|
||||
line 073
|
||||
line 074
|
||||
line 075
|
||||
line 076
|
||||
line 077
|
||||
line 078
|
||||
line 079
|
||||
line 080
|
||||
line 081
|
||||
line 082
|
||||
line 083
|
||||
line 084
|
||||
line 085
|
||||
line 086
|
||||
line 087
|
||||
line 088
|
||||
line 089
|
||||
line 090
|
||||
line 091
|
||||
line 092
|
||||
line 093
|
||||
line 094
|
||||
line 095
|
||||
line 096
|
||||
line 097
|
||||
line 098
|
||||
line 099
|
||||
line 100
|
||||
line 101
|
||||
line 102
|
||||
line 103
|
||||
line 104
|
||||
line 105
|
||||
line 106
|
||||
line 107
|
||||
line 108
|
||||
line 109
|
||||
line 110
|
||||
line 111
|
||||
line 112
|
||||
line 113
|
||||
line 114
|
||||
line 115
|
||||
line 116
|
||||
line 117
|
||||
line 118
|
||||
line 119
|
||||
line 120
|
||||
line 121
|
||||
line 122
|
||||
line 123
|
||||
line 124
|
||||
line 125
|
||||
line 126
|
||||
line 127
|
||||
line 128
|
||||
line 129
|
||||
line 130
|
||||
line 131
|
||||
line 132
|
||||
line 133
|
||||
line 134
|
||||
line 135
|
||||
line 136
|
||||
line 137
|
||||
line 138
|
||||
line 139
|
||||
line 140
|
||||
line 141
|
||||
line 142
|
||||
line 143
|
||||
line 144
|
||||
line 145
|
||||
line 146
|
||||
line 147
|
||||
line 148
|
||||
line 149
|
||||
line 150
|
||||
line 151
|
||||
line 152
|
||||
line 153
|
||||
line 154
|
||||
line 155
|
||||
line 156
|
||||
line 157
|
||||
line 158
|
||||
line 159
|
||||
line 160
|
||||
line 161
|
||||
line 162
|
||||
line 163
|
||||
line 164
|
||||
line 165
|
||||
line 166
|
||||
line 167
|
||||
line 168
|
||||
line 169
|
||||
line 170
|
||||
line 171
|
||||
line 172
|
||||
line 173
|
||||
line 174
|
||||
line 175
|
||||
line 176
|
||||
line 177
|
||||
line 178
|
||||
line 179
|
||||
line 180
|
||||
line 181
|
||||
line 182
|
||||
line 183
|
||||
line 184
|
||||
line 185
|
||||
line 186
|
||||
line 187
|
||||
line 188
|
||||
line 189
|
||||
line 190
|
||||
line 191
|
||||
line 192
|
||||
line 193
|
||||
line 194
|
||||
line 195
|
||||
line 196
|
||||
line 197
|
||||
line 198
|
||||
line 199
|
||||
line 200
|
||||
15
crates/kebab-chunk/tests/fixtures/sample_shell.sh
vendored
Normal file
15
crates/kebab-chunk/tests/fixtures/sample_shell.sh
vendored
Normal file
@@ -0,0 +1,15 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# First paragraph: env setup
|
||||
export KEBAB_HOME="${KEBAB_HOME:-$HOME/.local/share/kebab}"
|
||||
mkdir -p "$KEBAB_HOME"
|
||||
cd "$KEBAB_HOME"
|
||||
|
||||
# Second paragraph: ingest
|
||||
echo "ingesting workspace..."
|
||||
kebab ingest --config /etc/kebab/config.toml
|
||||
|
||||
# Third paragraph: report
|
||||
echo "done"
|
||||
kebab schema --json | jq '.stats'
|
||||
Reference in New Issue
Block a user