Commit Graph

17 Commits

Author SHA1 Message Date
685007789a style: cargo fmt --all (round 4 ingest log feature follow-up)
Phase C4 executor 의 마지막 `fix(test): clippy + fmt fixes` commit 이
test file 부분만 fmt 적용. workspace 전체 fmt 누락 발견 → cargo fmt --all
적용. 모든 import alphabetical reorder + line wrapping 정합.

추가 untracked artifact 동시 commit:
- docs/superpowers/specs/2026-05-28-v0.20-ingest-log-spec.md (491 line, ACCEPT)
- docs/superpowers/plans/2026-05-28-v0.20-ingest-log-plan.md (616 line, ACCEPT)

workspace test: 1370 passed / 0 failed / 50 ignored, ingest_log_smoke green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 04:18:40 +00:00
7c85de065a chore: workspace-wide cleanup — clippy::pedantic baseline + auto-fix
cut PR v0.18.0 전 마지막 정리. 사용자 요청: "전체 코드베이스를 깔끔하고 알아보기 쉽게".

## Workspace lints

- `Cargo.toml` 의 `[workspace.lints.clippy]` 에 `pedantic = "warn"` (priority -1) + 의도적 allow-list 추가:
  - cast_possible_truncation / cast_possible_wrap / cast_sign_loss / cast_precision_loss — ONNX i64 / hash modular reduction 등 의도적 truncation.
  - doc_markdown / missing_errors_doc / missing_panics_doc — cosmetic doc style.
  - too_many_lines / module_name_repetitions / must_use_candidate / needless_pass_by_value / manual_let_else / items_after_statements / similar_names — informational only.
  - format_collect / match_wildcard_for_single_variants / trivially_copy_pass_by_ref / unnecessary_wraps — intentional patterns (exhaustive match, future Result variants 등).
  - default_trait_access — `Foo::default()` 가 idiomatic.
  - float_cmp — NLI / RRF score 의 explicit threshold 비교 의도.
  - struct_excessive_bools / case_sensitive_file_extension_comparisons / naive_bytecount / ignore_without_reason — domain-specific 의도.
  - format_push_string / return_self_not_must_use / match_same_arms — builder / wire-label / hot-path 패턴 보존.
  - needless_continue / used_underscore_binding / nonminimal_bool / unreadable_literal / many_single_char_names / doc_link_with_quotes / assigning_clones / collapsible_str_replace / trivial_regex / elidable_lifetime_names / range_plus_one / explicit_iter_loop / implicit_hasher / ref_option — remaining low-value style.
- 각 24 crate `Cargo.toml` 에 `[lints] workspace = true` 추가.

## Auto-fix

`cargo clippy --workspace --all-targets --fix` 적용 — 128 files changed, 552 insertions / 472 deletions. 주로:
- uninlined_format_args (~18): `format!("{}", x)` → `format!("{x}")`.
- redundant_closure_for_method_calls (~33): `.map(|x| x.foo())` → `.map(T::foo)`.
- 그 외 mechanical refactor.

## 검증

- `cargo clippy --workspace --all-targets -j 1 -- -D warnings` clean (pedantic + 모든 lint group).
- `cargo test --workspace --no-fail-fast -j 1` — **1293 tests pass + 1 pre-existing flaky fail** (`kebab-mcp::tools_call_ask_multi_hop::ask_tool_routes_multi_hop_true_to_decompose_first`, HOTFIX candidate, cleanup 무관). 회귀 0.

Wire 영향: 없음.
Behavior 영향: 없음 (mechanical refactor only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 03:01:58 +00:00
93ddece111 feat(v0.17.0/PR-B/B1): C typedef extractor + parser_version bump + orphan purge cascade
closure of HOTFIXES 2026-05-21. C typedef-wrapped anonymous
struct/enum/union 이 typedef alias 이름으로 symbol unit 방출.

- crates/kebab-parse-code/src/c.rs: type_definition 분기 추가.
  inner anonymous struct_specifier / enum_specifier / union_specifier
  탐지 → declarator field 의 type_identifier 재귀 추출 → synthetic
  unit (typedef alias). named inner aggregate / plain alias 는
  기존대로 glue. PARSER_VERSION code-c-v1 → code-c-v2.
  recover_typedef_alias + extract_typedef_alias_name helper 추가.

- crates/kebab-store-sqlite/src/store.rs: 두 helper 신규
  (parser_version bump cascade 용 doc-id 기반 orphan purge).
  - stale_chunk_ids_for_workspace_path_except_doc_id(workspace_path,
    keep_doc_id) — sister of stale_chunk_ids_at, doc_id 기반.
  - purge_document_at_workspace_path_except_doc_id(workspace_path,
    keep_doc_id) — CASCADE document/chunks 제거, assets 보존.
  keep_doc_id="" 가 "모든 doc 제거" 사용.

- crates/kebab-app/src/lib.rs: try_skip_unchanged 의 parser_mismatch
  분기에서 purge_workspace_path_for_parser_bump 호출. helper 가
  app.vector() 로 lazy 접근 + delete_by_chunk_ids + SQLite document
  row 제거. Ok(None) 반환 전 cleanup 끝나서 caller 의 새 INSERT 시
  idx_docs_workspace_path UNIQUE 충돌 회피.

- tests:
  - c.rs unit tests 4 신규 — typedef_struct_emits_unit /
    typedef_enum_emits_unit / typedef_union_emits_unit /
    typedef_to_existing_type_stays_glue (negative).
  - tier1_c_ingest_searchable: parser_version assertion code-c-v1 →
    code-c-v2.
- 회귀: bytes-edit 경로 (asset_id 변경) 의 기존 purge_orphan_at_workspace_path
  + purge_vector_orphans_for_workspace_path 는 그대로 — 신규 분기와
  공존, 기존 test 모두 PASS.

미해결 (Risks): nested typedef (typedef struct { struct {...} inner; }
Outer;) 의 inner 익명 struct 는 여전히 glue — v2 의 1차 범위는
top-level typedef alias 만.

cargo test --workspace --no-fail-fast -j 1 + clippy 통과.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 20:30:57 +00:00
1969c8e3b5 fix(dogfood): k8s multi-resource YAML chunk_id collision
P10 dogfooding found that a k8s manifest with 2+ documents (e.g.
Deployment + Service in one file) fails to ingest:
  UNIQUE constraint failed: chunks.chunk_id

Root cause: tier2_shared::push_chunks_with_oversize's non-oversize branch
hardcoded split_key = None. K8sManifestResourceV1Chunker calls it once per
resource; with split_key None every resource from the same document gets
the same id_hash (= base_policy_hash) → identical chunk_id. p10-3's
code_text_paragraph_v1 had the same bug (fixed in df3c5b8) but it calls
build_chunk_no_symbol directly — the push_chunks_with_oversize path was
never fixed.

Fix: push_chunks_with_oversize gains a base_split_key parameter for the
non-oversize single-chunk case. k8s chunker passes Some(resource.line_start)
so each resource gets a distinct chunk_id; dockerfile / manifest pass None
(1 chunk per file — no sibling collision, chunk_id stays stable).

Regression coverage: k8s_multi_doc_emits_one_chunk_per_resource now asserts
chunk_id distinctness; new integration test
tier2_k8s_multi_resource_yaml_ingests_without_collision ingests a real
2-document YAML end-to-end.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 23:49:37 +00:00
192835e5bf test(p10-1d): integration smoke tests for C + C++
Verifies end-to-end ingest + search + Citation::Code shape:
- tier1_c_ingest_searchable: .c file → --code-lang c search → symbol
  = function name (no nesting), lang = "c", chunker_version = "code-c-ast-v1".
- tier1_cpp_ingest_searchable: .cpp file → --code-lang cpp search →
  symbol starts with namespace::Class prefix, lang = "cpp",
  chunker_version = "code-cpp-ast-v1".

Brings code_ingest_smoke to 18 tests (Tier 1: 9 → 11, Tier 2: 3,
Tier 3: 4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 14:31:35 +00:00
1034de25a2 fix(p10-3+p10-1d): land the missing try_skip_unchanged fallback-aware fix
PR #155 (p10-3) merged WITHOUT the reviewer's required Option B1 fix —
the implementer reported a commit SHA (2a39513) that never made it to main.
Result: every reingest of a Tier 3-fallback file (non-k8s YAML, invalid
YAML, AST extractor failure) re-runs full extract + chunk + embed because
the parser/chunker version comparison can never match (stored is
code-text-paragraph-v1 / none-v1, but caller uses Tier 1/2 dispatch
values).

This commit:
1. Adds the 7th param `fallback_chunker_version: Option<&ChunkerVersion>`
   to try_skip_unchanged + the stored_is_tier3_fallback detection branch
   (skip parser/chunker equality, keep embedder check).
2. Threads `None` through non-code call sites (md / image / pdf).
3. Code call site computes tier3_fallback_cv covering all Tier 1/2 langs
   that can fall back: rust / python / ts / js / go / java / kotlin /
   yaml / dockerfile / toml / json / xml / groovy / go-mod / c / cpp
   (p10-1D additions).
4. Adds tier3_yaml_fallback_reingest_is_unchanged + tier3_shell_reingest_is_unchanged
   regression tests (the originally-promised PR #155 regression coverage
   that also never made it to main).

Smoke tests: 14 + 2 = 16 PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 14:19:17 +00:00
df3c5b8caf test(p10-3): integration smoke tests for Tier 3 (shell + yaml fallback)
Two new tests verify end-to-end Tier 3 wiring:
- tier3_shell_ingest_searchable: .sh file → --code-lang shell search →
  Citation::Code { symbol: None, lang: "shell" }, chunker_version
  "code-text-paragraph-v1".
- tier3_yaml_fallback_picks_up_non_k8s_yaml: docker-compose-shaped yaml
  (no apiVersion/kind) triggers k8s chunker's Ok(vec![]) result, fallback
  retries with Tier 3 → Citation::Code { symbol: None, lang: "yaml" } and
  chunker_version "code-text-paragraph-v1".

Also fixes a bug in CodeTextParagraphV1Chunker (Task B): short paragraphs
(≤80 lines) were emitted with split_key=None, causing all paragraphs from the
same document to share the same chunk_id (UNIQUE constraint violation at
put_chunks). Fix: always use para.line_start as split_key so every paragraph
gets a distinct id regardless of size.

Brings code_ingest_smoke to 14 tests (Tier 1: 9, Tier 2: 3, Tier 3: 2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 11:37:44 +00:00
166e1ddfaf test(p10-2): integration smoke tests for Tier 2 (k8s yaml + Dockerfile + Cargo.toml)
Three new tests in code_ingest_smoke.rs verifying isolated-TempDir ingest +
--code-lang filter + Citation::Code.lang / .symbol shape for each Tier 2
chunker. Brings the suite to 12 tests (Rust 3 + Python 1 + TS 1 + JS 1 +
Go 1 + Java 1 + Kotlin 1 + yaml 1 + dockerfile 1 + manifest 1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 13:23:01 +00:00
ff1bedbef5 feat(p10-1c-jk): activate Kotlin in ingest_one_code_asset dispatch
Replaces Kotlin bail! arms with KotlinAstExtractor + CodeKotlinAstV1Chunker.
Adds kotlin_file_ingests_and_searches_as_code_citation integration test —
asserts citation.lang=kotlin, symbol=com.foo.Foo.bar, code_lang=kotlin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 10:54:55 +00:00
ebc4ef2eea feat(p10-1c-jk): activate Java in ingest_one_code_asset dispatch
Replaces Java bail! arms with JavaAstExtractor + CodeJavaAstV1Chunker. Adds
java_file_ingests_and_searches_as_code_citation integration test — asserts
citation.lang=java, symbol=com.foo.Foo.bar, code_lang=java.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 10:44:05 +00:00
c19aa006d0 feat(p10-1c-go): activate Go in ingest_one_code_asset dispatch
Replaces Go bail! arms with GoAstExtractor + CodeGoAstV1Chunker. Adds
go_file_ingests_and_searches_as_code_citation integration test — asserts
citation.lang=go, symbol=chunk.ParseDoc, code_lang=go.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 09:13:47 +00:00
d53995a6d4 feat(p10-1b): code-js-ast-v1 chunker + activate JavaScript in app dispatch
Chunker: duplicate-with-substitution from code-ts-ast-v1 / code-rust-ast-v1.
Dispatch: replaces JS bail! arms with JavascriptAstExtractor + CodeJsAstV1Chunker.
Integration test javascript_file_ingests_and_searches_as_code_citation asserts
citation.lang=javascript, symbol=src/Bar.Bar.baz, code_lang=javascript.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 01:16:07 +00:00
31245a4328 fix(p10-1b): TS parser_version code-typescript-v1 → code-ts-v1 (naming consistency)
Task H implementer chose code-typescript-v1 but plan + design §3.3 use the
short form (chunker is code-ts-ast-v1 / code-js-ast-v1). Aligning parser
versions to match: rust=code-rust-v1 / python=code-python-v1 / ts=code-ts-v1
/ js=code-js-v1 (Task K). Fixes 2 sites: const PARSER_VERSION + integration
test assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 01:05:17 +00:00
acb61b6830 feat(p10-1b): activate TypeScript in ingest_one_code_asset dispatch
Replaces TS bail! arms with TypescriptAstExtractor + CodeTsAstV1Chunker.
Adds typescript_file_ingests_and_searches_as_code_citation integration test —
asserts citation.lang=typescript, symbol=src/Foo.Foo.bar, code_lang=typescript.
JS arms remain bail!() (Task L).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 00:59:41 +00:00
1815091247 feat(p10-1b): activate Python in ingest_one_code_asset dispatch
Replaces Python bail! arms with PythonAstExtractor + CodePythonAstV1Chunker.
Adds python_file_ingests_and_searches_as_code_citation integration test —
asserts citation.lang=python, symbol=kebab_eval.metrics.compute_mrr,
code_lang=python. TS/JS arms remain bail!() (Tasks J/L).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 00:49:01 +00:00
b5d1fe8c1e feat(p10-1a-2): backfill SearchHit.repo from doc metadata (Task 8b)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 21:13:01 +00:00
580576c2c6 feat(p10-1a-2): wire code ingest dispatch (ingest_one_code_asset)
Add `MediaType::Code("rust")` dispatch arm in `ingest_one_asset`,
`ingest_one_code_asset` fn (faithful mirror of `ingest_one_pdf_asset`),
and `backfill_code_lang` post-processing in `App::search_uncached`.
Integration test `code_ingest_smoke.rs` verifies full pipeline:
ingest `.rs` → Citation::Code hit with lang/symbol/line_start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 20:14:59 +00:00