Commit Graph

709 Commits

Author SHA1 Message Date
27baec82ea fix(dogfood): auto-purge stored docs for filesystem-deleted files
Files deleted from disk (rm a.md) were leaving stale documents + chunks +
embeddings in the store, surfacing as ghost citations in search/ask.
Existing purge_orphan_at_workspace_path only handled content-changed
stale (WHERE workspace_path=? AND asset_id != ?) — file deletion has no
new asset_id.

Fix: post-walker-scan sweep. Compute (stored_paths - scanned_paths),
for each candidate check filesystem existence — only purge when the
file is TRULY missing. Scope-narrowing case (file on disk but outside
include glob) is explicitly NOT purged to protect users from accidental
data loss via config edits.

Adds:
- DocumentStore::all_workspace_paths trait method + SqliteStore impl
- purge_deleted_workspace_path in store-sqlite (returns chunk_ids for
  vector delete; deletes doc CASCADE + asset row + copied storage file)
- sweep_deleted_files in kebab-app::ingest path; called once per ingest
  before the per-asset loop
- IngestReport.purged_deleted_files counter (additive, serde default)
- CLI ingest summary mentions purge count when > 0
- 2 integration tests: file_deletion_auto_purge + include_scope_narrowing_does_NOT_purge

dogfood discovery (PR #142 1B + multi-root: kebab-docs + httpx + zod
+ lodash). Per user decision: only filesystem deletion auto-purges;
scope narrowing requires explicit kebab reset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 06:51:07 +00:00
acf8cf3be2 chore: bump version 0.8.3 → 0.9.0
dogfood-discovered routing additions (PR #147) land:
- .mts / .cts → MediaType::Code(typescript)
- .mdx → MediaType::Markdown

minor bump 사유: 사용자 도그푸딩 surface 확장 — 이전에 skip 되던 28+ 파일이
이제 색인됨. design §10.4 dogfooding-ready surface 확장 = minor trigger.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.9.0
2026-05-20 06:29:27 +00:00
ea5f7b22c8 Merge pull request 'feat(dogfood): route .mts/.cts → typescript + .mdx → markdown' (#147) from feat/dogfood-routing-cts-mts-mdx into main 2026-05-20 06:28:41 +00:00
5497c6e7b5 feat(dogfood): route .mts/.cts to typescript + .mdx to markdown
Dogfood (PR #142 1B + multi-root: kebab-docs + httpx + zod + lodash)
showed 28 files skipped by extension that are routable to existing
extractors:
- .mts (ESM TypeScript) / .cts (CommonJS TypeScript) — same grammar as
  .ts in tree-sitter-typescript 0.23 (LANGUAGE_TYPESCRIPT covers JSX-
  agnostic variants; LANGUAGE_TSX stays for .tsx only)
- .mdx (Markdown + JSX) — routed as MediaType::Markdown; the md parser
  folds JSX islands through as raw passthrough

Changes:
- crates/kebab-source-fs/src/media.rs: 'mts'|'cts' → Code(typescript),
  'mdx' → Markdown. +2 unit tests.
- crates/kebab-parse-code/src/lang.rs: code_lang_for_path matches mts/cts;
  module_path_for_tsjs strips .mts/.cts as well. Test cases extended.
- crates/kebab-parse-code/src/typescript.rs: doc comment on select_grammar
  refreshed to mention .mts/.cts.
- crates/kebab-parse-code/tests/lang.rs: 2 new assertions.

verify: kebab-source-fs 44 / kebab-parse-code lib 20 + lang 4 all pass; clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 06:24:21 +00:00
5a90940f1c chore: bump version 0.8.2 → 0.8.3
dogfood-discovered fix (PR #146) lands: idempotent re-ingest now correctly
returns Unchanged for twin files (identical content at different paths)
via document-centric try_skip_unchanged lookup.

patch bump 사유: advertised idempotency 의 정상 동작 복원. 새 wire / config / surface 변경 없음.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.8.3
2026-05-20 06:20:34 +00:00
4389b887f0 Merge pull request 'fix(dogfood): document-centric try_skip_unchanged for twin-file idempotency' (#146) from fix/dogfood-bug4-idempotent-twin-files into main 2026-05-20 06:16:28 +00:00
360f825f3a docs(dogfood): refresh try_skip_unchanged doc-comment to match new flow (PR #146 review)
Round 1 review found the function-level doc-comment still described the
old asset-side algorithm (item 2 asset-row checksum, item 3 id_for_doc
miss). Updated to the document-centric flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 05:35:17 +00:00
641b92af7d fix(dogfood): document-centric try_skip_unchanged for twin-file idempotency
Identical-content files at different workspace paths share one assets row
(assets.asset_id = blake3 content hash, PRIMARY KEY). The UPSERT
`ON CONFLICT(asset_id) DO UPDATE SET workspace_path = excluded` made
twin files overwrite each other's workspace_path on every ingest, so
`get_asset_by_workspace_path(path1)` returned the OTHER twin's row (or
None) — break idempotent unchanged-detection for both files.

Fix: switch try_skip_unchanged to document-centric lookup. `documents.
workspace_path` is already UNIQUE (V001) and `id_for_doc(path, ...)`
includes path, so each twin has its own stable document row. Compare
`doc.source_asset_id` with the new asset's checksum instead of going
through the assets table.

Dogfood (multi-root: kebab-docs + httpx + zod + lodash) showed 27 of
726 docs marked Updated on every idempotent re-ingest — all 27 are
twin-file victims (empty `__init__.py` ×3, AGENTS.md ↔ CLAUDE.md
same content, duplicate logo PDFs/JPGs).

After: re-ingest reports 0 new / 0 updated / 726 unchanged.

No schema migration needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 05:27:21 +00:00
08fb743598 chore: bump version 0.8.1 → 0.8.2
dogfood-discovered fixes (PR #145) land in production:
- schema.v1.repo_breakdown 가 실제로 채워짐 (이전: 항상 빈 BTreeMap)
- workspace.include glob 가 walker 에서 enforce 됨 (이전: 완전 무시)

patch bump 사유: 둘 다 advertised surface 의 정상 동작 복원.
새 wire / config / surface 변경 없음.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.8.2
2026-05-20 05:20:48 +00:00
0a2a7ae214 Merge pull request 'fix(dogfood): schema.repo_breakdown + workspace.include walker enforcement (dogfood-discovered)' (#145) from fix/dogfood-bugs-schema-walker-incremental into main 2026-05-20 05:18:59 +00:00
803d02b68b fix(dogfood): enforce workspace.include in walker (allow-list semantics)
config.workspace.include was completely ignored by the walker — connector.rs
log_scope_include_warning literally said "handled by extractor router" but
no extractor router exists. Dogfooding (PR #142 1B + multi-root corpus
kebab-docs + httpx + zod + lodash) showed user-set include of code+md still
ingested 84 .png + 8 .pdf files.

Fix: walker treats scope.include as an allow-list — empty Vec preserves
backward-compat (all files pass), non-empty requires file path to match at
least one pattern (AND with the existing exclude rules). Removed the
misleading debug log.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 05:15:04 +00:00
4e8b84c4e0 fix(dogfood): populate schema.v1.repo_breakdown (Task 9 follow-up)
Dogfooding (PR #142 1B + multi-root corpus: kebab-docs + httpx + zod + lodash)
revealed schema.v1.repo_breakdown is always {} despite the 1A-2 Task 9
having added the code_lang_breakdown sibling. The schema.rs:171 placeholder
`BTreeMap::new()` was left in place. Mirror Task 9's code_lang_breakdown
query for the repo field — same metadata_json JSON-path pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 05:09:19 +00:00
16dc02cfa2 chore: bump version 0.8.0 → 0.8.1
dogfood-discovered code_lang/repo filter bug (PR #144) fix lands in
production. patch bump because:
- 1A-1 advertised CLI flags --code-lang / --repo were live but inert
  (SearchFilters fields propagated but never applied to retriever SQL)
- fix restores intended behavior; no new wire surface
- user has dogfooded against httpx + zod + lodash and re-validating
  needs the fixed binary

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.8.1
2026-05-20 03:35:36 +00:00
74f1b0571b Merge pull request 'fix(p10-1a-1): apply code_lang + repo filters in lexical SQL and filter_chunks (dogfood)' (#144) from fix/p10-1a-1-code-lang-repo-filter-sql into main 2026-05-20 03:34:53 +00:00
918ee6c0be fix(p10-1a-1): apply code_lang + repo filters in lexical SQL and filter_chunks (dogfood-discovered)
p10-1A-1 (PR #139) added SearchFilters.code_lang + .repo fields and the CLI
--code-lang / --repo flags propagate them correctly into SearchFilters, but
neither the lexical retriever's FTS SQL nor the shared filter_chunks helper
(used by the vector retriever) ever applied them — so a code-lang-filtered
search returned all-doc hits (markdown / pdf / code mixed).

Discovered while dogfooding p10-1B with httpx + zod + lodash clones:
`kebab search 'AsyncClient' --code-lang python --json` returned markdown
hits from httpx/docs/ first.

Fix: add IN-list filters on json_extract(d.metadata_json, '$.code_lang')
and '$.repo' to both lexical.rs and filters.rs, mirroring the existing
media filter pattern. Two regression tests added in each crate covering
the new filter behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 03:27:01 +00:00
68ada396f3 Merge pull request 'fix(p10-1b): apply round-1 lang.rs doc + tests/ test case missed in 4503b5b' (#143) from fix/p10-1b-lang-doc-test-staging-miss into main v0.8.0 2026-05-20 02:31:13 +00:00
23c4ad97b9 fix(p10-1b): apply round-1 lang.rs doc + tests/ test case missed in 4503b5b
PR #142 round-1 fix commit 4503b5b 보고에는 lang.rs 의 (a) module_path_for_python
doc comment 갱신 (tests/examples/benches 가 의도적으로 strip 안 됨 명시) 과
(b) tests/test_foo.py → tests.test_foo 단언 추가가 포함됐다고 적혔으나,
실제 commit 에는 lang.rs 변경이 staging 되지 않아 main 에 안 들어감 (review
loop round 2 이 working tree 상태만 신뢰하고 commit 검증을 안 함).

이번 PR 이 누락된 (5)+(6) 항목만 retro 적용. lang.rs +9 lines (test 1 +
doc 4 + 주석 2 + 빈줄 2). cargo test -p kebab-parse-code --lib → 20/20 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 02:28:53 +00:00
1f566b8bfa Merge pull request 'feat(p10-1B): Python + TS/JS AST chunkers — tree-sitter-{python,typescript,javascript} 코드 색인 활성화' (#142) from feat/p10-1b-py-ts-js into main 2026-05-20 02:26:24 +00:00
26562588e3 fix(p10-1b): PR review round 2 — fold TS class-method decorators into unit line range
Round 1 push-back on TS/JS class-method decorator handling was based on
an inaccurate doc comment in typescript.rs that claimed decorators are
method_definition children; tree-sitter-typescript 0.23 actually places
them as class_body preceding siblings. Round 2 correctly identified the
cross-language inconsistency with Python's decorated_definition arm.

Fix: extend unit_start backward walk in typescript.rs to also accept
'decorator' siblings (three-line change + corrected doc comment).
javascript.rs is unaffected: tree-sitter-javascript stores the decorator
as a named child INSIDE method_definition, so method_definition.start_row
already covers the decorator line without any sibling walk.

Adds three regression tests:
- class_method_decorator_folded_into_method_unit (TS): asserts @Log() is
  inside the emitted method unit code and line_start == 2.
- ts_class_decorator_folded_into_class_unit (TS): class-level @Injectable()
  folded into the class unit, line_start == 1.
- js_class_method_decorator_already_folded_by_grammar (JS): documents
  that JS already includes the decorator via grammar semantics.

verify: per-crate cargo test (20 passed) + clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 02:20:22 +00:00
4503b5b12f fix(p10-1b): PR review round 1 — 5 actionable items
(1) tasks/HOTFIXES.md: add 2026-05-20 entry for path-sanitize gap in
    module_path_for_python / _tsjs (promised in task spec line 55 but
    not landed in round 0). Bidirectional cross-link added.

(2) crates/kebab-parse-code: dedup filename_from_workspace_path /
    strip_extension / join_symbol via new pub(crate) module scaffold.rs.
    Removed 9 byte-identical fn copies across rust/python/typescript/
    javascript extractors. Pure refactor — no behavior change.

(3) crates/kebab-parse-code/tests/fixtures/sample.py: @staticmethod was
    semantically inappropriate on a module-level fn (class-method
    decorator). Changed to @no_type_check; test assertion updated.

(5)+(6) crates/kebab-parse-code/src/lang.rs: add tests/test_foo.py case
    to module_path_for_python test + doc clarifying that tests/ /
    examples/ / benches/ are intentionally not stripped.

(4) PUSH BACK — TS/JS class decorator handling is design intent of 1B
    1차 (typescript.rs:242-244 + HOTFIXES entry 2 already in place).
    No code change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 02:03:52 +00:00
44813df052 docs(p10-1b): README/HANDOFF/ARCHITECTURE/SMOKE/INDEX + HOTFIXES; chore: bump version 0.7.0 → 0.8.0
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 01:48:06 +00:00
d6bb6cfd3b test(p10-1b): per-language chunker snapshots (python/ts/js)
Mirrors code_rust_ast_snapshot pattern. In-memory CanonicalDocument build so
no kebab-parse-code dep (boundary §6.3 respected).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 01:39:17 +00:00
d53995a6d4 feat(p10-1b): code-js-ast-v1 chunker + activate JavaScript in app dispatch
Chunker: duplicate-with-substitution from code-ts-ast-v1 / code-rust-ast-v1.
Dispatch: replaces JS bail! arms with JavascriptAstExtractor + CodeJsAstV1Chunker.
Integration test javascript_file_ingests_and_searches_as_code_citation asserts
citation.lang=javascript, symbol=src/Bar.Bar.baz, code_lang=javascript.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 01:16:07 +00:00
c215034653 feat(p10-1b): tree-sitter-javascript AST extractor (JS + JSX)
Single-grammar variant of typescript.rs — JS handles .jsx via the same
LanguageFn. No interface/type/enum arms; otherwise identical mapping +
workspace-path prefix via module_path_for_tsjs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 01:09:22 +00:00
31245a4328 fix(p10-1b): TS parser_version code-typescript-v1 → code-ts-v1 (naming consistency)
Task H implementer chose code-typescript-v1 but plan + design §3.3 use the
short form (chunker is code-ts-ast-v1 / code-js-ast-v1). Aligning parser
versions to match: rust=code-rust-v1 / python=code-python-v1 / ts=code-ts-v1
/ js=code-js-v1 (Task K). Fixes 2 sites: const PARSER_VERSION + integration
test assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 01:05:17 +00:00
acb61b6830 feat(p10-1b): activate TypeScript in ingest_one_code_asset dispatch
Replaces TS bail! arms with TypescriptAstExtractor + CodeTsAstV1Chunker.
Adds typescript_file_ingests_and_searches_as_code_citation integration test —
asserts citation.lang=typescript, symbol=src/Foo.Foo.bar, code_lang=typescript.
JS arms remain bail!() (Task L).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 00:59:41 +00:00
20feb3133e feat(p10-1b): code-ts-ast-v1 chunker (1:1 + oversize split)
Duplicate of code-rust-ast-v1 / code-python-ast-v1 with language-agnostic body unchanged.
Cross-chunker policy_hash identity asserted vs md-heading-v1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 00:56:41 +00:00
de63f161ac feat(p10-1b): tree-sitter-typescript AST extractor (TS + TSX via grammar selection)
Adds `kebab_parse_code::typescript::TypescriptAstExtractor` (PARSER_VERSION
`code-typescript-v1`), mirroring the Python extractor (P10-1B Task E) and
the Rust scaffold (P10-1A-2). One `Block::Code` per top-level AST semantic
unit (free fn / class / each method / interface / type alias / enum,
recursively per nested class), each carrying `SourceSpan::Code` with the
unit's dotted symbol path prefixed by `module_path_for_tsjs`.

Grammar selection per `tree-sitter-typescript` 0.23: the workspace path's
`.tsx` extension routes to `LANGUAGE_TSX`, everything else to
`LANGUAGE_TYPESCRIPT`. The `export_statement` arm unwraps a `declaration`
field (`function_declaration` / `class_declaration` / `interface_declaration`
/ `type_alias_declaration` / `enum_declaration`) using the OUTER statement's
line range so `export ` is folded in; for `export default function () {}`
and `export default class {}` (where the inner node sits under the `value`
field as `function_expression` / `class` with no `name`), the symbol leaf
is `default`. Bare value exports / re-exports fall into glue.

Glue grouping reuses the Python post-pass: `<module>` only when the entire
group is imports + bare re-exports; demoted to `<top-level>` if the file
produced any real unit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 00:54:27 +00:00
1815091247 feat(p10-1b): activate Python in ingest_one_code_asset dispatch
Replaces Python bail! arms with PythonAstExtractor + CodePythonAstV1Chunker.
Adds python_file_ingests_and_searches_as_code_citation integration test —
asserts citation.lang=python, symbol=kebab_eval.metrics.compute_mrr,
code_lang=python. TS/JS arms remain bail!() (Tasks J/L).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 00:49:01 +00:00
6a0b340941 feat(p10-1b): code-python-ast-v1 chunker (1:1 + oversize split)
Duplicate of code-rust-ast-v1 with language-agnostic body unchanged. Cross-chunker
policy_hash identity asserted vs md-heading-v1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 00:46:17 +00:00
9664e97497 feat(p10-1b): tree-sitter-python AST extractor (PythonAstExtractor)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 00:41:35 +00:00
8bdb3e8090 refactor(p10-1b): generalize ingest_one_code_asset for multi-language dispatch
Rust path observably unchanged (verified by existing code_ingest_smoke tests).
Python/TS/JS arms bail with TODO; per-lang extractor + chunker land in subsequent tasks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 00:35:53 +00:00
dcad9ccda2 feat(p10-1b): module_path_for_python / _tsjs helpers (workspace path → module prefix)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 00:31:33 +00:00
ed0f4769b3 feat(p10-1b): route .py/.pyi/.ts/.tsx/.js/.mjs/.cjs/.jsx to MediaType::Code
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 00:30:07 +00:00
0c61758931 build(p10-1b): add tree-sitter-python/-typescript/-javascript workspace deps
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 00:28:31 +00:00
39b766ea59 docs(p10-1b): task spec + implementation plan
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 00:26:58 +00:00
7f287abacb Merge pull request 'test(eval): normalize elapsed_ms before determinism comparison (flake fix)' (#141) from fix/eval-runner-timing-flake into main 2026-05-20 00:08:40 +00:00
d715631928 test(eval): normalize elapsed_ms before determinism comparison (flake fix)
`runner_lexical_is_deterministic_per_query_payload` 가 full-suite 첫 실행에서
간헐적으로 `elapsed_ms: 0` vs `elapsed_ms: 1` 차이로 깨지는 timing flake 가
있었음 (PR #140 회차 0 의 full-suite 실행에서 관찰).

원인: per_query 전체 JSON 을 byte-identical 비교하는데 QueryResult.elapsed_ms
가 timing 기반이라 µs-scale wall-clock jitter 가 그대로 비교에 들어감. 의도는
"timing 외에 byte-identical" — 인접 snapshot test #7 은 projection 으로
timing 을 명시적으로 제외하지만 #6 은 누락.

Fix: 비교 직전 양쪽 run 의 elapsed_ms 를 0 으로 normalize. 의도 그대로
표현하고 다른 field 의 결정성 검증은 보존. 50회 반복 stress 통과 (이전:
간헐 실패).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 00:01:41 +00:00
73e5b359d8 Merge pull request 'feat(p10-1A-2): Rust AST chunker — tree-sitter-rust 코드 색인 활성화' (#140) from feat/p10-1a-2-rust-ast-chunker into main v0.7.0 2026-05-19 23:40:15 +00:00
c780aca904 fix(p10-1a-2): PR review round 2 — README wire fields + SMOKE config completeness + edge-case note + gitignore dedup
PR #140 회차 2 actionable 4건:
- README.md: `citation.kind = "code"` 행에서 wire 필드 구조 정정 — citation 안에는 `lang`, SearchHit top-level 에는 `code_lang`/`repo` (round 1 SMOKE 정정과 동일 클래스)
- docs/SMOKE.md: 격리 config 블록에 `extra_skip_globs = []` 추가 (P10 섹션의 "위 격리 config 블록 참조" 와 정합)
- crates/kebab-parse-code/src/rust.rs: comment-only 파일 → 0 blocks 동작을 module doc 에 한 줄 명시 (pdf-page-v1 의 "empty page produces no chunks" 패턴과 동일)
- .gitignore: `/target/` 제거 — `/target` (no trailing slash) 이 디렉토리 + 파일 + 심링크 모두 매칭하므로 `/target/` (dir 전용) 는 redundant

verify: `cargo check -p kebab-parse-code` clean (주석/문서 외 영향 없음).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 23:35:00 +00:00
b1d5047399 fix(p10-1a-2): PR review round 1 — doc inconsistencies + observable backfill error path
PR #140 회차 1 actionable 7건 반영:
- docs/SMOKE.md: parser_version "code-rust-ast-v1" → "code-rust-v1" (chunker_version 과 혼동); jq path .citation.code_lang → .citation.lang (wire 의 code_lang 은 SearchHit top-level)
- docs/ARCHITECTURE.md: Mermaid pcode→ptypes 잘못된 edge → pcode→core 로 정정 (kebab-parse-code Cargo.toml 실제 dep 와 일치); 디렉토리 트리에서 code-rust-ast-v1 chunker 표기 위치 kebab-parse-code → kebab-chunk 로 정정
- crates/kebab-app/src/app.rs: backfill_repo 의 .ok().flatten() 실패 silent swallow → tracing::warn 로 관측 가능, 비-abort 의도 보존
- crates/kebab-parse-code/src/rust.rs: impl_item arm 의 "function_item 만 unit 생성" 1A scope 한정 주석을 외부에서도 보이도록 arm 상단에 한 줄 추가 (내부 주석은 유지)

verify: kebab-parse-code 7/7 / kebab-app --lib 51/51 / code_ingest_smoke 3/3 green; touched-crate clippy clean (재부팅 전 검증).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 23:24:20 +00:00
80c2d31fb3 docs(p10-1a-2): README/HANDOFF/ARCHITECTURE/SMOKE/INDEX + HOTFIXES; chore: bump version 0.6.0 → 0.7.0
- README: note Rust .rs ingest active (code-rust-ast-v1), update Mermaid parse node + chunker labels, update supported formats note in Quick start and ingest command table; add code citation fields (symbol, code_lang, repo) and filter flags note
- HANDOFF: flip P10 row to note 1A-1  + 1A-2 PR open; add one-liner cross-link to HOTFIXES 2026-05-19 entries
- ARCHITECTURE: add kebab-parse-code node + edge (app → pcode, pcode → ptypes) to Mermaid graph; add directory tree entry; add code parser locked-in decision row (tree-sitter lives parser-side, design §6.3)
- SMOKE: add P10-1A-2 Rust code ingest section (ingest.code config keys, verification steps, known behaviors); add checklist item
- tasks/INDEX.md: flip p10-1A-1 to , update p10-1A-2 to 🟡 PR open
- tasks/p10/INDEX.md: same flips
- tasks/HOTFIXES.md: add two 2026-05-19 dated entries (AST_CHUNK_MAX_LINES constant vs config deviation + SourceType::Code deferred)
- tasks/p10/p10-1a-2-rust-ast-chunker.md: append two HOTFIXES cross-link lines in Risks/notes
- docs/superpowers/specs/2026-04-27-kebab-final-form-design.md §10.1: note p10-1A-2 surface activation
- Cargo.toml: version 0.6.0 → 0.7.0 (dogfooding-ready = minor bump trigger per CLAUDE.md)
- Cargo.lock: regenerated

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 22:48:11 +00:00
97e9f558f4 test(p10-1a-2): code-rust-ast-v1 chunker snapshot + full-suite gate
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 22:14:57 +00:00
da51e59081 feat(p10-1a-2): populate schema.v1 code_lang_breakdown
Add `SqliteStore::code_lang_breakdown()` that queries
`json_extract(metadata_json, '$.code_lang')`, groups by it, and
skips NULL rows — returning `BTreeMap<String, u32>`.

Wire it into `collect_stats` in `kebab-app::schema`, replacing the
`BTreeMap::new()` placeholder inserted by 1A-1.

Test: `store::tests::code_lang_breakdown_counts_by_code_lang` asserts
rust=1 and that a null-code_lang doc does NOT appear in the map.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 21:41:52 +00:00
11a0fc758f docs(p10-1a-2): note backfill invariant at search_with_opts non-trace path (Task 8 review)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 21:20:13 +00:00
b5d1fe8c1e feat(p10-1a-2): backfill SearchHit.repo from doc metadata (Task 8b)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 21:13:01 +00:00
580576c2c6 feat(p10-1a-2): wire code ingest dispatch (ingest_one_code_asset)
Add `MediaType::Code("rust")` dispatch arm in `ingest_one_asset`,
`ingest_one_code_asset` fn (faithful mirror of `ingest_one_pdf_asset`),
and `backfill_code_lang` post-processing in `App::search_uncached`.
Integration test `code_ingest_smoke.rs` verifies full pipeline:
ingest `.rs` → Citation::Code hit with lang/symbol/line_start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 20:14:59 +00:00
808b92a6c5 feat(p10-1a-2): code-rust-ast-v1 chunker (1:1 + oversize split)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 17:40:11 +00:00
c74f8d269e chore(p10-1a-2): sync Cargo.lock for kebab-parse-code deps (Task 6 follow-up)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 17:36:54 +00:00
df85bafa7f fix(p10-1a-2): module-prefix glue symbols + crate desc + invariant hardening (Task 6 review)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 17:35:52 +00:00