feat(p10-3): Tier 3 paragraph + line-window fallback chunker — shell direct + Tier 1/2 0-chunk/Err 자동 picked up #155
Reference in New Issue
Block a user
Delete Branch "feat/p10-3-tier3-paragraph"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
code-text-paragraph-v1chunker 활성화 (design §9.3). 빈 줄 (whitespace-only) 기준 paragraph 분할 + paragraph >80 lines 면 80-line window / 20-line overlap (stride 60) 의 line-window split. symbol = None, lang 은 입력 보존..sh/.bash/.zsh파일이ingest_one_code_asset의 새"shell"arm 으로 Tier 3 chunker 에 직결. parser_version ="none-v1"(Tier 2 sentinel 재사용).Ok(empty)또는Err면 Tier 3 retry. extract 가Err(Tier 1 extractor 실패) 인 경우도 Tier 3 synthesize fallback. p10-2 의 비-k8s YAML / invalid YAML / Rust AST 실패 케이스 자동 picked up. retry 시chunker_version→"code-text-paragraph-v1"+canonical.parser_version→"none-v1"swap 으로 downstream stamping +try_skip_unchangedkey 일관.shellexempt from fallback chain — 이미 Tier 3 direct 라 retry-double-up 방지.tier2_shared::build_chunk_no_symbolhelper 추가 —build_chunk_from_span내부 helper 분리로build_chunk와 코드 공유. Chunk id/hash/token/policy_hash semantics 가 Tier 1/2 와 동일.split_key: None호출 → 같은 doc 의 paragraph 들이 동일 chunk_id 공유 → SQLite UNIQUE 위반. 이제 항상Some(para.line_start)로 distinct id 보장.--code-lang shell+ Mermaid 다이어그램) / HANDOFF (p10-3 ✅) / docs/ARCHITECTURE (chunker tree + flowchart) / docs/SMOKE (P10-3 walkthrough) / tasks/INDEX + tasks/p10/INDEX (✅).Test plan
cargo test --workspace --no-fail-fast -j 1PASS (memory-conscious-j 1)cargo clippy --workspace --all-targets -- -D warningscleancode_ingest_smoke2 새 integration test (tier3_shell_ingest_searchable+tier3_yaml_fallback_picks_up_non_k8s_yaml), 12 + 2 = 14 통합 테스트--code-lang shell/--code-lang yaml검색 결과 확인Branch
feat/p10-3-tier3-paragraph(head:46f408d). 9 commits A-I. Spec:tasks/p10/p10-3-tier3-paragraph-fallback.md. Plan:docs/superpowers/plans/2026-05-21-p10-3-tier3-paragraph-fallback.md.🤖 Generated with Claude Code
Two new tests verify end-to-end Tier 3 wiring: - tier3_shell_ingest_searchable: .sh file → --code-lang shell search → Citation::Code { symbol: None, lang: "shell" }, chunker_version "code-text-paragraph-v1". - tier3_yaml_fallback_picks_up_non_k8s_yaml: docker-compose-shaped yaml (no apiVersion/kind) triggers k8s chunker's Ok(vec![]) result, fallback retries with Tier 3 → Citation::Code { symbol: None, lang: "yaml" } and chunker_version "code-text-paragraph-v1". Also fixes a bug in CodeTextParagraphV1Chunker (Task B): short paragraphs (≤80 lines) were emitted with split_key=None, causing all paragraphs from the same document to share the same chunk_id (UNIQUE constraint violation at put_chunks). Fix: always use para.line_start as split_key so every paragraph gets a distinct id regardless of size. Brings code_ingest_smoke to 14 tests (Tier 1: 9, Tier 2: 3, Tier 3: 2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>