# p10-1B β€” Python + TS/JS AST chunkers **Status:** 🟑 μ§„ν–‰ 쀑 **Contract sections:** Β§3.3 (chunker_version `code-python-ast-v1` / `code-ts-ast-v1` / `code-js-ast-v1`), Β§3.4 (symbol path β€” Python `pkg.module.Class.method`, TS/JS `module/Class.method` / `module/default`), Β§3.5 (code_lang `python` / `typescript` / `javascript`), Β§5 (ν™•μž₯자 λΌμš°νŒ… ν™œμ„±ν™”), Β§6.1 (`kebab-parse-code/src/{python,typescript,javascript}.rs`), Β§6.2 (`kebab-chunk/src/code_{python,ts,js}_ast_v1.rs`), Β§9.1 (Tier 1 AST per-language + oversize fallback). **Design:** [2026-05-15-kebab-code-ingest-design.md](../../docs/superpowers/specs/2026-05-15-kebab-code-ingest-design.md) Β§1B. **Plan:** [2026-05-20-p10-1b-py-ts-js-ast-chunkers.md](../../docs/superpowers/plans/2026-05-20-p10-1b-py-ts-js-ast-chunkers.md). ## Goal 1A-2 κ°€ 깐 인프라 (`SourceSpan::Code`, `MediaType::Code(String)`, `Citation::Code` λ§€ν•‘, `citation_helper` arm, `backfill_code_lang` + `backfill_repo`, `schema.v1.code_lang_breakdown`, `[ingest.code]` 절, HOTFIXES) μœ„μ— **Python + TypeScript + JavaScript** 3 μ–Έμ–΄μ˜ extractor + chunker λ₯Ό ν™œμ„±ν™”. design Β§1B 묢음과 μΌμΉ˜ν•˜λŠ” 단일 PR. λ¨Έμ§€ μ‹œμ λΆ€ν„° Python / TS / JS ν”„λ‘œμ νŠΈλ„ dogfooding κ°€λŠ₯. ## λ™κ²°λœ 섀계 κ²°μ • (이 task 둜 ν™•μ •) - **Symbol path 의 module prefix = workspace 경둜 β†’ module path λ³€ν™˜** (design Β§3.4 μ˜ˆμ‹œ μΆ©μ‹€, μ‚¬μš©μž λͺ…μ‹œ κ²°μ •): - **Python**: `crates/x/src/foo/bar.py` 같은 workspace_path λ₯Ό `/`/`__init__.py` 처리 + `.py`Β·`.pyi` strip + `/` β†’ `.` λ³€ν™˜ ν›„ dotted prefix 둜 μ‚¬μš©. μ˜ˆμ‹œ: `kebab_eval/metrics.py` 의 `def compute_mrr()` β†’ symbol `kebab_eval.metrics.compute_mrr`. `pkg/__init__.py` λŠ” module `pkg` 자체. λ³€ν™˜μ€ `kebab-parse-code::lang::module_path_for_python(workspace_path)` 단일 ν•¨μˆ˜ (source of truth). - **TS/JS**: `src/search/retriever/Retriever.ts` β†’ `src/search/retriever/Retriever` prefix + `/` κ΅¬λΆ„μž 보쑴 + `.ts`/`.tsx`/`.js`/`.jsx`/`.mjs`/`.cjs` strip. μ˜ˆμ‹œ: `src/search/retriever/Retriever.ts` 의 method `search` β†’ `src/search/retriever/Retriever.search`. `module/default` λŠ” `export default function/class` 경우. λ³€ν™˜μ€ `module_path_for_tsjs(workspace_path)`. - **Rust 1A-2 λŠ” retrofit ν•˜μ§€ μ•ŠμŒ** β€” 1A λŠ” file-scope nesting 만 μ‚¬μš© (workspace prefix μ—†μŒ). 비일관 수용; HOTFIXES 2026-05-20 에 기둝 + μ‚¬μš©μžκ°€ λͺ…μ‹œ μš”μ²­ μ‹œ retrofit (chunker_version bump + re-ingest cascade ν•„μš”). - **TypeScript grammar selection**: `tree-sitter-typescript` crate 의 `LANGUAGE_TYPESCRIPT` λŠ” `.ts`, `LANGUAGE_TSX` λŠ” `.tsx` 에 μ‚¬μš©. 파일 ν™•μž₯자둜 선택. `code-ts-ast-v1` ν•˜λ‚˜μ˜ chunker κ°€ λ‘˜ λ‹€ 처리 (parser_version `code-ts-v1`). - **JavaScript grammar**: `tree-sitter-javascript` 단일 LanguageFn κ°€ `.js` / `.mjs` / `.cjs` / `.jsx` λͺ¨λ‘ 처리. 별도 λΆ„κΈ° λΆˆν•„μš”. - **Expression-level ν•¨μˆ˜ (arrow fn / function expression assigned to const)**: 1B 1μ°¨μ—μ„œλŠ” *declaration-level 만* unit (function_declaration / class_declaration / method_definition / interface_declaration / type_alias_declaration / decorated_definition λ“±). `const foo = () => {...}` 같은 expression-level 은 glue 둜 작힘. HOTFIXES 2026-05-20 기둝; 후속 phase μ—μ„œ lexical_declaration μ•ˆμ˜ ν•¨μˆ˜ ν‘œν˜„μ‹ unwrap μΆ”κ°€ κ²€ν† . - **App dispatch μΌλ°˜ν™”**: ν˜„μž¬ `ingest_one_code_asset` 은 RustAstExtractor + CodeRustAstV1Chunker ν•˜λ“œμ½”λ”©. 1B μ—μ„œ `lang: &str` λ°›μ•„ dispatch (Rust 도 동일 ν•¨μˆ˜λ‘œ 흑수) β€” Extractor 와 Chunker λ₯Ό trait object κ°€ μ•„λ‹ˆλΌ enum/match 둜 선택 (kebab-app 만 λ³€κ²½, kebab-core/Chunker trait λΆˆλ³€). frozen design 영ν–₯ μ—†μŒ. - frozen design μžμ²΄λŠ” λ³€κ²½ μ—†μŒ (Β§3.4 의 symbol path μ˜ˆμ‹œλŠ” 이미 λ³Έ κ²°μ •κ³Ό 일치). Β§10.1 (post-merge surface) 에 1B ν™œμ„±ν™” ν•œ 쀄 μΆ”κ°€. ## Acceptance criteria - `cargo test --workspace --no-fail-fast -j 1` passes (λ©”λͺ¨λ¦¬ μ˜μ‹μ μœΌλ‘œλŠ” per-crate; full-suite gate λŠ” Task K 직전 1회). - `cargo clippy --workspace --all-targets -- -D warnings` passes. - 3 μ–Έμ–΄ 각각의 fixture (`tests/fixtures/sample.{py,ts,js}`) ingest β†’ chunk snapshot μ•ˆμ • + `Citation::Code` 의 symbol/line 이 Β§3.4 μ»¨λ²€μ…˜ (workspace path β†’ module path) κ³Ό 일치. - 격리 TempDir KB 에 Python/TS/JS 파일 ν•˜λ‚˜μ”© 두고 `kebab search --code-lang {python|typescript|javascript} --json` κ°€ 정상 κ²°κ³Ό λ°˜ν™˜. - `kebab schema --json | jq .stats.code_lang_breakdown` 에 `python`, `typescript`, `javascript` 카운트 λ“±μž₯. - README + HANDOFF + ARCHITECTURE + SMOKE + tasks/INDEX + tasks/p10/INDEX κ°±μ‹ . - frozen design Β§10.1 ν•œ 쀄 μΆ”κ°€ (1B ν™œμ„±ν™”). - HOTFIXES 2026-05-20 에 (a) Rust 1A-2 symbol path 비일관 (1B 와 닀름), (b) expression-level ν•¨μˆ˜ λ‹¨μœ„ μ œμ™Έ β€” 두 편차 기둝. - workspace `Cargo.toml` minor bump (0.7.0 β†’ 0.8.0) β€” 도그푸딩 κ°€λŠ₯ surface ν™•μž₯. ## Allowed dependencies - `kebab-parse-code` 에 `tree-sitter-python`, `tree-sitter-typescript`, `tree-sitter-javascript` μΆ”κ°€ (workspace deps 경유). κΈ°μ‘΄ `kebab-core` / `anyhow` / `gix` / `tree-sitter` / `tree-sitter-rust` / `serde_json` / `time` / `tracing` μœ μ§€. - `kebab-chunk` 의 μƒˆ λͺ¨λ“ˆ 3개 (`code_python_ast_v1.rs` / `code_ts_ast_v1.rs` / `code_js_ast_v1.rs`) β€” 1A-2 chunker 와 동일 dep (kebab-core + serde_json_canonicalizer + blake3 + anyhow + tracing). tree-sitter μ ˆλŒ€ import κΈˆμ§€. - `kebab-app` λ³€κ²½ β€” μƒˆ crate dep μ—†μŒ. - `kebab-source-fs` β€” ν™•μž₯자 μΆ”κ°€λ§Œ, μƒˆ dep μ—†μŒ. ## Forbidden dependencies - `kebab-chunk` κ°€ `tree-sitter-*` 직접 import κΈˆμ§€ (AST λŠ” parser-side). - UI crate (cli / mcp / tui) κ°€ `kebab-parse-code` 직접 import κΈˆμ§€. - `kebab-parse-code` κ°€ store / embed / llm / rag 직접 import κΈˆμ§€ (design Β§8 inheritance). ## Risks / notes - tree-sitter-typescript 의 `LANGUAGE_TYPESCRIPT` 와 `LANGUAGE_TSX` κ°€ 별도 LanguageFn β€” 잘λͺ» μ„ νƒν•˜λ©΄ TSX JSX κ°€ parse μ‹€νŒ¨. 파일 ν™•μž₯자 기반 선택을 단일 ν•¨μˆ˜μ—μ„œ κ²°μ • (ν…ŒμŠ€νŠΈλ‘œ κ³ μ •). - tree-sitter-python 의 `decorated_definition` λ…Έλ“œ 처리 β€” λ°μ½”λ ˆμ΄ν„°κ°€ wrap ν•˜λŠ” ν˜•νƒœλΌ `function_definition` / `class_definition` κ°€ child. unwrap ν•„μš” (decorator 라인은 unit_start backward extension 으둜 μžμ—°μŠ€λŸ½κ²Œ 포함됨). - Python `pkg/__init__.py` 의 module path = `pkg` 자체 (basename 제거). `module_path_for_python` κ°€ 이걸 처리. - TS/JS 의 `export default function/class` β€” name 이 없을 수 있음 (`export default function () {...}`). symbol `module/default` 둜 ν‘œκΈ° (design Β§3.4). - `module_path_for_python` / `module_path_for_tsjs` κ°€ workspace_path 의 λΉ„-ASCII / 곡백 / 특수문자 처리 ν•„μš”. 1B 1μ°¨μ—μ„œλŠ” κ·ΈλŒ€λ‘œ 전달 (sanitize μ—†μŒ); HOTFIXES 에 path-sanitize λΆ€μž¬ 기둝. - 1A-2 `ingest_one_code_asset` μΌλ°˜ν™”λ‘œ μΈν•œ dispatch μ½”λ“œ λ³€κ²½ β€” Rust κΈ°μ‘΄ λ™μž‘ byte-identical μœ μ§€λ₯Ό 톡합 ν…ŒμŠ€νŠΈλ‘œ 확인. - λ¨Έμ§€ ν›„ deviation 은 `tasks/HOTFIXES.md` 에 dated 둜그 + λ³Έ spec `Risks / notes` 에 one-line cross-link. - **[HOTFIXES 2026-05-20]** Rust 1A-2 symbol 은 file-scope nesting 만 (workspace prefix μ—†μŒ); 1B 의 Python/TypeScript/JavaScript 와 비일관 β€” retrofit 은 μ‚¬μš©μž λͺ…μ‹œ μš”μ²­ μ‹œ. μžμ„Έν•œ λ‚΄μš©: `tasks/HOTFIXES.md` (2026-05-20, "Rust 1A-2 symbol path"). - **[HOTFIXES 2026-05-20]** TypeScript/JavaScript 의 expression-level ν•¨μˆ˜ (`const foo = () => {}` λ“±) λŠ” `` glue 둜 처리됨, 독립 unit 미방좜 β€” 후속 phase μ—μ„œ `lexical_declaration` unwrap κ²€ν† . μžμ„Έν•œ λ‚΄μš©: `tasks/HOTFIXES.md` (2026-05-20, "expression-level functions"). - **[HOTFIXES 2026-05-20]** `module_path_for_python` / `module_path_for_tsjs` κ°€ path-sanitize μ•ˆ 함 (특수문자/곡백 κ·ΈλŒ€λ‘œ prefix 에 듀어감) β€” 후속 phase μ—μ„œ NFKC + μ‚¬μš©κΈˆμ§€ 문자 λ³€ν™˜ κ²€ν† . μžμ„Έν•œ λ‚΄μš©: `tasks/HOTFIXES.md` (2026-05-20, "module_path_for_python / _tsjs do not sanitize").