From 70a5068c0dd8edaa1814cb91c5f47a63664a02bd Mon Sep 17 00:00:00 2001 From: altair823 Date: Sun, 24 May 2026 14:15:32 +0000 Subject: [PATCH] =?UTF-8?q?docs(v0.17.0/PR-B/B2):=20HOTFIXES=202026-05-24?= =?UTF-8?q?=20closure=20+=20p10-1d=20Risks=20=EA=B0=B1=EC=8B=A0?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - tasks/HOTFIXES.md: 새 2026-05-24 PR-B closure entry — extractor 의 type_definition 분기, PARSER_VERSION bump, same-workspace_path orphan purge, 사용자 영향, 잔여 nested typedef Risks. - tasks/HOTFIXES.md: 기존 2026-05-21 typedef 항목의 Status / Next step 을 v0.17.0 closure 표현으로 갱신 (관찰 기록은 frozen 유지). - tasks/p10/p10-1d-c-cpp-ast-chunker.md: Risks 의 typedef idiom 라인 을 closure ✅ + 잔여 nested typedef 안내로 갱신. Co-Authored-By: Claude Opus 4.7 (1M context) --- tasks/HOTFIXES.md | 20 +++++++++++++++++--- tasks/p10/p10-1d-c-cpp-ast-chunker.md | 2 +- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/tasks/HOTFIXES.md b/tasks/HOTFIXES.md index 023126c..598d0c7 100644 --- a/tasks/HOTFIXES.md +++ b/tasks/HOTFIXES.md @@ -64,6 +64,20 @@ multi-root 도그푸딩(2026-05-20)에서 관찰한 본문 vs 테스트 / glue c Cross-link: `tasks/p10/INDEX.md`, `migrations/V002__fts.sql`, design §5.5 / §3.5. +## 2026-05-24 — v0.17.0 PR-B: C typedef-wrapped struct/enum/union 이 typedef alias unit 으로 방출 (closure of 2026-05-21) + +`crates/kebab-parse-code/src/c.rs::extract_blocks` 에 `type_definition` 분기 추가. 내부 anonymous `struct_specifier` / `enum_specifier` / `union_specifier` (name field 없음) 인 typedef 일 때 declarator 의 typedef alias identifier 를 추출해 synthetic unit 방출. named inner aggregate (`typedef struct Pt { ... } P;`) 와 plain alias (`typedef int MyInt;`) 는 기존대로 glue (top-level typedef-wrapped anonymous aggregate 만 v2 의 1차 범위). + +**parser_version cascade**: `PARSER_VERSION` `code-c-v1` → `code-c-v2` bump. design §9 — `doc_id = (workspace_path, asset_id, parser_version)`. 같은 file (asset_id 불변) + 새 parser_version → 새 doc_id. 즉 같은 workspace_path 에 옛 doc_id 와 새 doc_id 가 동시 INSERT 시도 → `idx_docs_workspace_path` UNIQUE 충돌. + +**Same-workspace_path orphan purge (B1 Step 5b)**: `crates/kebab-store-sqlite/src/store.rs` 에 두 helper 신규 — `stale_chunk_ids_for_workspace_path_except_doc_id(workspace_path, keep_doc_id)` (chunk_ids 수집) + `purge_document_at_workspace_path_except_doc_id(workspace_path, keep_doc_id)` (CASCADE document/chunks 제거). `crates/kebab-app/src/lib.rs::try_skip_unchanged` 의 parser_mismatch 분기에서 `purge_workspace_path_for_parser_bump` wrapper 호출 → 옛 chunk_ids 의 LanceDB orphan 도 `delete_by_chunk_ids` 로 정리 후 SQLite document row 제거 → 이후 `Ok(None)` 반환 → caller 가 새 doc_id 로 INSERT. 기존 `purge_orphan_at_workspace_path` (asset_id 변경 케이스) 는 그대로 — bytes 변경 경로 회귀 없음. + +**사용자 영향**: 기존 v0.16.x KB 의 C 파일은 v0.17.0 binary 로 다음 ingest 시 자동 재처리 (parser_version mismatch → cleanup → 새 doc). 명시적 re-ingest 명령 불필요 (다음 `kebab ingest` 가 자연스럽게 처리). `typedef struct {...} Foo;` 가 `Citation::Code.symbol = "Foo"` 로 search 에 노출. + +**미해결 (Risks)**: nested typedef (`typedef struct { struct {...} inner; } Outer;`) 의 inner 익명 struct 는 여전히 glue — v2 의 1차 범위는 top-level typedef alias 만. + +Cross-link: `crates/kebab-parse-code/src/c.rs::recover_typedef_alias`, `tasks/p10/p10-1d-c-cpp-ast-chunker.md` Risks/notes section. + ## 2026-05-21 — p10-2: k8s multi-resource YAML chunk_id collision **Origin**: P10 종합 도그푸딩 (`/tmp/kebab-p10-dogfood/`, 16 파일). 한 파일에 2+ k8s document (Deployment + Service, `---` 구분) 인 YAML 이 ingest 실패. @@ -84,11 +98,11 @@ Cross-link: `tasks/p10/p10-2-tier2-resource-aware.md` Risks/notes section. **Symptom**: `typedef struct { ... } Foo;` in a `.c` file does NOT emit a struct-level unit. tree-sitter-c classifies the construct as a top-level `type_definition` with an *anonymous* inner `struct_specifier` (no `name` field), so the extractor's `struct_specifier` arm doesn't fire — the whole declaration falls into `` glue. The named typedef alias `Foo` is therefore not searchable as a symbol. -**Status**: Consistent with spec p10-1d-c-cpp-ast-chunker.md's Risks/notes ("Anonymous union / struct … anonymous → glue"), but the spec's main body line 22 ("struct_specifier (named, top-level) → 1 unit") suggests this idiom WOULD emit. Tension noted, not yet fixed. +**Status**: ✅ closed — v0.17.0 (2026-05-24) PR-B 에서 extractor 의 `type_definition` 분기 추가로 해소. 영향은 위 2026-05-24 PR-B 절 참조. 이하는 closure 전 round-2 dogfood 관찰 기록 (frozen). -**Workaround**: search the struct by its field/function names, or use `--code-lang c` to broaden scope. Typedef-aliased struct names won't surface as `Citation::Code.symbol`. +**Workaround (pre-v0.17.0)**: search the struct by its field/function names, or use `--code-lang c` to broaden scope. Typedef-aliased struct names won't surface as `Citation::Code.symbol`. -**Next step**: dogfood real C code for a week+; if this turns out to be a frequent pain point (kernel-style code, libuv, etc.), revisit the extractor to detect `type_definition` → inner `struct_specifier` and emit a synthetic unit named after the typedef alias. +**Resolution (v0.17.0)**: extractor 가 top-level `type_definition` 노드를 만나 내부 anonymous `struct_specifier` / `enum_specifier` / `union_specifier` 가 있으면 `declarator` field 의 typedef alias 이름으로 synthetic unit 방출. `PARSER_VERSION` `code-c-v1` → `code-c-v2` bump. design §9 cascade 동작 — 같은 `(workspace_path, asset_id)` 의 `doc_id` 가 새 parser_version 으로 다르게 계산됨. 옛 doc/chunks row + LanceDB orphan 회피용 same-workspace_path orphan purge helper 동반 (`stale_chunk_ids_for_workspace_path_except_doc_id` + `purge_document_at_workspace_path_except_doc_id`). Cross-link: `tasks/p10/p10-1d-c-cpp-ast-chunker.md` Risks/notes section. diff --git a/tasks/p10/p10-1d-c-cpp-ast-chunker.md b/tasks/p10/p10-1d-c-cpp-ast-chunker.md index cad3208..f60dfb6 100644 --- a/tasks/p10/p10-1d-c-cpp-ast-chunker.md +++ b/tasks/p10/p10-1d-c-cpp-ast-chunker.md @@ -113,7 +113,7 @@ crates/kebab-parse-code/Cargo.toml [edit] — 위 2 dep 신규 entry. - **Template specialization** (`template<> class Foo`): tree-sitter-cpp 의 `template_declaration` 안의 `class_specifier` name 만 추출 — `Foo` 만 symbol 에 들어가고 `` 미포함. design 의 generic 무시 룰 일관. - **`extern "C"` block 안의 fn**: 일반 fn 처리. 외부 wrapping block 은 glue. - **Anonymous union / struct** (`struct { int x; }` 변수 안에): 흔치 않음 + named 만 unit. anonymous 는 glue. -- **typedef-wrapped struct/enum idiom** (`typedef struct { ... } Foo;`) — anonymous inner struct → glue. Named typedef alias 미캡처. dogfood 후 HOTFIXES 검토. See [HOTFIXES.md 2026-05-21 entry](../HOTFIXES.md). +- **typedef-wrapped struct/enum idiom** (`typedef struct { ... } Foo;`) — ✅ v0.17.0 (2026-05-24) PR-B 에서 해소. extractor 의 `type_definition` 분기가 inner anonymous `struct_specifier` / `enum_specifier` / `union_specifier` 를 탐지해 declarator 의 typedef alias 이름으로 synthetic unit 방출. `PARSER_VERSION` `code-c-v1` → `code-c-v2` bump + same-workspace_path orphan purge cascade 동반. **잔여 미해결**: nested typedef (`typedef struct { struct {...} inner; } Outer;`) 의 inner 익명 struct 는 여전히 glue — v2 의 1차 범위는 top-level typedef alias 만. See [HOTFIXES.md 2026-05-21 entry](../HOTFIXES.md) (frozen 관찰) + 2026-05-24 closure entry. - **Macro-heavy code** (Linux kernel 등): `#define FOO(x) ...` 매크로가 function-like 라도 parser 가 fn 으로 인식 안 함. preprocessor glue 로 처리 — symbol 안 잡힘. 의도된 동작 (parser 의 macro expansion 안 함). - **`__attribute__((...))`** annotations: tree-sitter-c 의 attribute 노드는 declarator 옆 sibling. 무시 가능. function name 추출에 영향 없음. - **fixture 크기**: sample.c 는 ~30 line (top-level fn + struct + enum + preprocessor), sample.cpp 는 ~50 line (nested namespace + class + method + template + free fn). oversize fallback 의 별도 검증은 1A-2 의 long_section_snapshot 패턴이 이미 cover (필요 시 별도 fixture).