feat(p10-1A-2): Rust AST chunker — tree-sitter-rust 코드 색인 활성화 #140

Merged
altair823 merged 19 commits from feat/p10-1a-2-rust-ast-chunker into main 2026-05-19 23:40:19 +00:00

19 Commits

Author SHA1 Message Date
c780aca904 fix(p10-1a-2): PR review round 2 — README wire fields + SMOKE config completeness + edge-case note + gitignore dedup
PR #140 회차 2 actionable 4건:
- README.md: `citation.kind = "code"` 행에서 wire 필드 구조 정정 — citation 안에는 `lang`, SearchHit top-level 에는 `code_lang`/`repo` (round 1 SMOKE 정정과 동일 클래스)
- docs/SMOKE.md: 격리 config 블록에 `extra_skip_globs = []` 추가 (P10 섹션의 "위 격리 config 블록 참조" 와 정합)
- crates/kebab-parse-code/src/rust.rs: comment-only 파일 → 0 blocks 동작을 module doc 에 한 줄 명시 (pdf-page-v1 의 "empty page produces no chunks" 패턴과 동일)
- .gitignore: `/target/` 제거 — `/target` (no trailing slash) 이 디렉토리 + 파일 + 심링크 모두 매칭하므로 `/target/` (dir 전용) 는 redundant

verify: `cargo check -p kebab-parse-code` clean (주석/문서 외 영향 없음).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 23:35:00 +00:00
b1d5047399 fix(p10-1a-2): PR review round 1 — doc inconsistencies + observable backfill error path
PR #140 회차 1 actionable 7건 반영:
- docs/SMOKE.md: parser_version "code-rust-ast-v1" → "code-rust-v1" (chunker_version 과 혼동); jq path .citation.code_lang → .citation.lang (wire 의 code_lang 은 SearchHit top-level)
- docs/ARCHITECTURE.md: Mermaid pcode→ptypes 잘못된 edge → pcode→core 로 정정 (kebab-parse-code Cargo.toml 실제 dep 와 일치); 디렉토리 트리에서 code-rust-ast-v1 chunker 표기 위치 kebab-parse-code → kebab-chunk 로 정정
- crates/kebab-app/src/app.rs: backfill_repo 의 .ok().flatten() 실패 silent swallow → tracing::warn 로 관측 가능, 비-abort 의도 보존
- crates/kebab-parse-code/src/rust.rs: impl_item arm 의 "function_item 만 unit 생성" 1A scope 한정 주석을 외부에서도 보이도록 arm 상단에 한 줄 추가 (내부 주석은 유지)

verify: kebab-parse-code 7/7 / kebab-app --lib 51/51 / code_ingest_smoke 3/3 green; touched-crate clippy clean (재부팅 전 검증).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 23:24:20 +00:00
80c2d31fb3 docs(p10-1a-2): README/HANDOFF/ARCHITECTURE/SMOKE/INDEX + HOTFIXES; chore: bump version 0.6.0 → 0.7.0
- README: note Rust .rs ingest active (code-rust-ast-v1), update Mermaid parse node + chunker labels, update supported formats note in Quick start and ingest command table; add code citation fields (symbol, code_lang, repo) and filter flags note
- HANDOFF: flip P10 row to note 1A-1  + 1A-2 PR open; add one-liner cross-link to HOTFIXES 2026-05-19 entries
- ARCHITECTURE: add kebab-parse-code node + edge (app → pcode, pcode → ptypes) to Mermaid graph; add directory tree entry; add code parser locked-in decision row (tree-sitter lives parser-side, design §6.3)
- SMOKE: add P10-1A-2 Rust code ingest section (ingest.code config keys, verification steps, known behaviors); add checklist item
- tasks/INDEX.md: flip p10-1A-1 to , update p10-1A-2 to 🟡 PR open
- tasks/p10/INDEX.md: same flips
- tasks/HOTFIXES.md: add two 2026-05-19 dated entries (AST_CHUNK_MAX_LINES constant vs config deviation + SourceType::Code deferred)
- tasks/p10/p10-1a-2-rust-ast-chunker.md: append two HOTFIXES cross-link lines in Risks/notes
- docs/superpowers/specs/2026-04-27-kebab-final-form-design.md §10.1: note p10-1A-2 surface activation
- Cargo.toml: version 0.6.0 → 0.7.0 (dogfooding-ready = minor bump trigger per CLAUDE.md)
- Cargo.lock: regenerated

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 22:48:11 +00:00
97e9f558f4 test(p10-1a-2): code-rust-ast-v1 chunker snapshot + full-suite gate
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 22:14:57 +00:00
da51e59081 feat(p10-1a-2): populate schema.v1 code_lang_breakdown
Add `SqliteStore::code_lang_breakdown()` that queries
`json_extract(metadata_json, '$.code_lang')`, groups by it, and
skips NULL rows — returning `BTreeMap<String, u32>`.

Wire it into `collect_stats` in `kebab-app::schema`, replacing the
`BTreeMap::new()` placeholder inserted by 1A-1.

Test: `store::tests::code_lang_breakdown_counts_by_code_lang` asserts
rust=1 and that a null-code_lang doc does NOT appear in the map.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 21:41:52 +00:00
11a0fc758f docs(p10-1a-2): note backfill invariant at search_with_opts non-trace path (Task 8 review)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 21:20:13 +00:00
b5d1fe8c1e feat(p10-1a-2): backfill SearchHit.repo from doc metadata (Task 8b)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 21:13:01 +00:00
580576c2c6 feat(p10-1a-2): wire code ingest dispatch (ingest_one_code_asset)
Add `MediaType::Code("rust")` dispatch arm in `ingest_one_asset`,
`ingest_one_code_asset` fn (faithful mirror of `ingest_one_pdf_asset`),
and `backfill_code_lang` post-processing in `App::search_uncached`.
Integration test `code_ingest_smoke.rs` verifies full pipeline:
ingest `.rs` → Citation::Code hit with lang/symbol/line_start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 20:14:59 +00:00
808b92a6c5 feat(p10-1a-2): code-rust-ast-v1 chunker (1:1 + oversize split)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 17:40:11 +00:00
c74f8d269e chore(p10-1a-2): sync Cargo.lock for kebab-parse-code deps (Task 6 follow-up)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 17:36:54 +00:00
df85bafa7f fix(p10-1a-2): module-prefix glue symbols + crate desc + invariant hardening (Task 6 review)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 17:35:52 +00:00
a93b33ffbe fix(p10-1a-2): correct <module> label scope + de-dup leading attribute (Task 6 review)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 17:28:49 +00:00
402a4506a2 feat(p10-1a-2): tree-sitter-rust AST extractor (parser-side)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 17:22:09 +00:00
a531dc37dc feat(p10-1a-2): route .rs files to MediaType::Code(rust)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 17:17:36 +00:00
7a6a24ad10 feat(p10-1a-2): add MediaType::Code(lang) variant
TDD: red → green cycle confirmed. New `Code(String)` variant serializes
as `{"code":"rust"}` via serde `rename_all = "lowercase"`. All exhaustive
`match` sites updated (`media_label`, `ingest_one_asset` catch-all →
explicit or-pattern). Design §3.5 enum listing synced. Also fix
`/target` symlink gitignore pattern so integration-test binary lookup
via workspace-relative path works with CARGO_TARGET_DIR redirect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 17:14:45 +00:00
42712b50c2 feat(p10-1a-2): map SourceSpan::Code -> Citation::Code in citation_helper
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 15:59:03 +00:00
9f3edb7e24 feat(p10-1a-2): add internal SourceSpan::Code variant + design §3.4 sync
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 15:52:01 +00:00
5c265bb59f build(p10-1a-2): add tree-sitter + tree-sitter-rust workspace deps
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 15:38:19 +00:00
a08ed32199 docs(p10-1a-2): task spec + implementation plan
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 15:36:08 +00:00