kebab

Author	SHA1	Message	Date
altair823	192da45dbf	chore(rag): PR #167 회차 1 리뷰 반영 - `parse_decompose_response_drops_partial_empty_keeps_valid` 신규 회귀 핀 — `["", "valid q", " "]` → `["valid q"]` (trim+filter chain 동작 pin). - `multi_hop_decompose` 의 `stop: Vec::new()` 옆 doc comment 추가 — 의도 명시 (instruction-following 모델 기대 + prose 추가 시 MultiHopDecomposeFailed refusal 가 policy). 회차 1 question 의 답변. - plan 의 PR-3 implementation order 에 회차 1 carry-over 추가: 1) ask + ask_multi_hop 의 §4-§9 mirror → 공통 helper 추출, 2) decompose template 의 substitution corner case → format! named arg 으로 교체. 회차 1 의 다른 suggestion (mirror refactor, substitution corner case, history block helper) 는 PR-3 합리적 timing 으로 plan 에 명시 — 회차 2 reply 에 정리. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 06:49:21 +00:00
altair823	cf35f36f88	feat(rag): fb-41 PR-2 — RagPipeline::ask_multi_hop skeleton (fixed depth=2) PR-2 of fb-41 multi-hop RAG. Decompose + retrieve + synthesize 3-stage pipeline가 `opts.multi_hop=true` 일 때 dispatch. Dynamic decide loop 는 PR-3. - `AskOpts.multi_hop: bool` 필드 추가 + `impl Default for AskOpts` 도입 (HOTFIXES 2026-05-07 의 known limitation 해소). 9 explicit init site 모두 `multi_hop: false` 추가 — Default 도입으로 향후 `..Default::default()` 점진 migrate 가능. - `RagPipeline::ask` 의 entry 에 dispatcher 한 줄 (`if opts.multi_hop { return self.ask_multi_hop(...) }`). - `RagPipeline::ask_multi_hop` 신규 method. 1) decompose LLM call → JSON array of strings parse, 2) 각 sub-query 로 retrieve + chunk_id dedup pool, 3) score gate / no-chunks 가드, 4) pack_context (single-pass 와 helper 공유), 5) synthesize LLM call w/ MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT, 6) citation extract + Answer build. `prompt_template_version` = "rag-multi-hop-v1" 로 stamp — eval `compare` 가 single-pass vs multi-hop 분리. - Prompt const 신규: MULTI_HOP_DECOMPOSE_SYSTEM_PROMPT + MULTI_HOP_DECOMPOSE_USER_TEMPLATE + MULTI_HOP_SYNTHESIZE_SYSTEM_PROMPT + PROMPT_TEMPLATE_VERSION_MULTI_HOP + MULTI_HOP_MAX_SUB_QUERIES_DEFAULT. - `kebab_core::RefusalReason::MultiHopDecomposeFailed` variant 신규. Cascade: kebab-store-sqlite `refusal_reason_label` + kebab-tui `ask refusal render` exhaustive match 갱신. - `parse_decompose_response` + `strip_markdown_json_fence` helper — markdown code fence (```json / ```) strip + JSON array of strings parse + trim + drop empty + cap at MULTI_HOP_MAX_SUB_QUERIES_DEFAULT. None 반환 시 caller 가 `MultiHopDecomposeFailed` refusal. Tests (55 passing total, 8 신규): - 6 unit (parse_decompose_response 의 bare array / fence variants / garbage / cap / trim 회귀 핀). - 2 integration: `ask_multi_hop_dispatches_and_decompose_garbage_refuses` (decompose garbage → MultiHopDecomposeFailed + 정확히 1 LLM call) + `ask_with_multi_hop_false_keeps_single_pass_path` (회귀 핀, 기존 caller 자동 backwards-compat). Happy-path multi-hop (decompose 성공 → synthesize) 의 integration test 는 ScriptedLm helper 가 PR-3 의 decide loop 와 함께 도입될 때 같이 추가. 현 `MockLanguageModel` 는 canned single response 라 2-LLM-call sequence 핀 불가. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 06:45:32 +00:00
altair823	ed34f2e03f	Merge pull request 'feat(eval): fb-41 multi-hop golden set + spec/plan' (#166 ) from feat/fb-41-multi-hop-eval-golden into main	2026-05-25 06:27:06 +00:00
altair823	624b44c46b	chore(eval): PR #166 회차 1 리뷰 반영 - `mh-s-004` 의 `must_contain: ["i"]` 한 글자 → `["INSERT", "i 입력모드"]` 보강. trigram 0-hit + noise 매칭 위험 해소. - 3 question 영어 변경 (`mh-c-005` / `mh-i-001` / `mh-s-002`) — fixture 의 lang 다양성 mix (12 ko + 3 en). 영어 dogfood 시 measurement gap 회피. - plan 의 PR-1 단락이 outdated (kebab-eval crate 미survey 단계 작성 → 실제 PR 와 deviation). actual 변경 명시 + 초안 대비 deviation 명시. 회차 1 의 다른 2 suggestion (mh-c-002 의 `v0.17.2` hard-coded, 15 question / 5-per-bucket 회귀 핀의 frozen size) 은 baseline anchor 의도 적 freeze — 회차 2 reply 에 명시. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 06:26:15 +00:00
altair823	caf690dc72	feat(eval): fb-41 multi-hop golden set + spec/plan PR-1 of fb-41 multi-hop RAG (spec: docs/superpowers/specs/2026-05-25- p9-fb-41-multi-hop-rag-design.md, plan: docs/superpowers/plans/2026- 05-25-p9-fb-41-multi-hop-rag.md). XL 작업의 첫 PR — baseline 측정 anchor 만 추가. RAG pipeline 미변경, fixture file + parse 회귀 핀. 사용자 결정 4 axis (2026-05-25): - approach: query decomposition (LLM 서브-질문) - trigger: explicit `--multi-hop` flag - MVP scope: dynamic N-hop (LLM 이 depth 결정, decompose seed + ReAct-style decide loop hybrid) - eval: multi-hop golden set 먼저 (본 PR) 본 PR: - `fixtures/multi_hop_golden.yaml` 신규. 15 question (5 cross-doc + 5 intra-doc + 5 single-fact negative). 기존 `GoldenQuery` struct 그대로 사용 — 별 loader / type 변경 없음. `expected_chunk_ids` 비어 있어 curator 가 `kebab ingest` 후 채울 수 있는 template 형태. `must_contain` 으로 baseline 측정 가능 (P5-2 metric). - `crates/kebab-eval/tests/loader.rs::loads_multi_hop_golden_fixture` 신규 회귀 핀. fixture parse OK + 15 question + 5/5/5 bucket 분포 + 모든 question 에 must_contain 최소 1 개. baseline 측정 protocol (별 run, commit 에 artifact 안 포함): 1. v0.17.2 binary 로 single-pass `kebab eval run --fixture multi_hop_golden.yaml` 실행 2. P@5, P@10, must_contain pass rate, citation_coverage 캡처 3. PR-3 (dynamic iter 머지) 후 동일 fixture + `multi_hop=true` 로 재실행 → Δ 비교 PR 분할 6 단계 (plan 참조): PR-1 (본 PR — fixture only), PR-2 (RagPipeline::ask_multi_hop fixed depth=2), PR-3 (dynamic iter), PR-4 (CLI flag + wire), PR-5 (MCP + SKILL.md), PR-6 (TUI toggle + trace render). 마지막 PR 후 v0.18.0 cut. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 06:22:08 +00:00
altair823	1640ecf288	chore: bump version 0.17.1 → 0.17.2 v0.17.1 post-dogfood polish cut. 두 PR 묶어 release: - PR #164 — `[image.ocr] request_timeout_secs` 별 노브 (v0.17.1 미진행 closure). LLM 패턴을 OCR 어댑터에 동일 적용, 별 노브로 분리 (OCR vs LLM 의 cold start 패턴 차이로 독립 조절). - PR #165 — `heading_path` FTS5 column filter 로 text-only 매칭 + raw-mode escape hatch (2026-05-24 v0.17.0 trigram entry 의 JSON 노이즈 closure). lexical.rs 가 non-raw 분기 결과를 `text : (<expr>)` 로 wrap, 색인 자체는 V007 verbatim 그대로 유지. raw mode `'heading_path : <token>'` 로 opt-in 가능. 둘 다 additive (옛 config 호환) + re-ingest 불필요. binary 교체만. HANDOFF 한 줄 요약 + 머지 후 결정 절에 v0.17.2 entry 추가. HOTFIXES 의 두 entry anchor 가 `post-v0.17.1 dogfood` → `v0.17.2` 로 갱신. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v0.17.2	2026-05-25 05:55:50 +00:00
altair823	90e77631a8	Merge pull request 'feat(search): heading_path FTS5 text column filter' (#165 ) from feat/heading-text-column-filter into main	2026-05-25 05:48:22 +00:00
altair823	fa251db48f	chore(search): PR #165 회차 2 리뷰 반영 HOTFIXES entry 의 MCP / agent 가시성 단락이 회차 1 의 SKILL.md 추가 결정과 contradiction (`별도 SKILL.md 갱신 불필요` 잘못된 표기). 갱신 사실 + 새 escape hatch 가 v0.17.0 raw mode pattern 위에 build 됐다는 점 명시. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:45:41 +00:00
altair823	3114c31841	chore(search): PR #165 회차 1 리뷰 반영 - HOTFIXES test 카운트 표기 정정: `9 신규 / 갱신 unit test` 의 산수 ambiguity → `9 unit test (8 갱신 + 1 신규) + 2 신규 통합 test = 11 total` 로 명시. - SKILL.md (Claude Code integration) 의 search 절에 column scoping + heading_path raw-mode escape hatch 안내 한 bullet 추가. 회차 1 의 follow-up suggestion 반영 — heading 검색 의도 agent 가 새 escape hatch `'heading_path : <token>'` 를 발견 가능. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:44:21 +00:00
altair823	271329efbd	feat(search): heading_path FTS5 text column filter (default text-only matching) v0.17.0 trigram tokenizer entry 가 미수정으로 남겨둔 heading_path_json JSON 노이즈 (HOTFIXES 2026-05-24) closure. trigram 이 chunks_fts.heading_path 컬럼 (V002/V007 트리거가 chunks.heading_path_json 그대로 INSERT) 의 JSON 표기 + 안의 path 세그먼트 (app, src) 까지 3-gram 색인해서 query 가 우연히 false positive hit 하는 문제. column filter 채택 — heading 색인 유지 (V007 verbatim 불변), 매칭 대상만 text 컬럼 한정. - build_match_string 가 non-raw 분기에서 combined expression 을 `text : (<expr>)` 로 wrap. FTS5 column filter syntax 가 OR/AND sub-expression 허용. - Raw mode (`'...'`) 는 그대로 — 사용자가 명시 의도로 `'heading_path : agent'` 같은 explicit opt-in 가능 (escape hatch). - 8 기존 build_match_string unit test expected string 갱신 + `build_match_string_raw_mode_preserves_heading_filter` 신규. - `lexical_heading_only_token_does_not_hit_default_mode` 신규 회귀 핀 (heading-only unique token 이 default mode 에서 0 hit). - `lexical_raw_mode_can_opt_into_heading_path_filter` 신규 — 같은 fixture 가 raw mode 로 hit 확인 (escape hatch 동작 핀). 사용자 영향: lexical / hybrid 검색의 본문 precision ↑. recall 변화 없음 (text 본문 token 매칭은 동일). re-ingest 불필요 (FTS query 시점 매칭만 변경). lexical_snapshot_run_1 + hybrid_snapshot 도 fixture regenerate 불필요 (text 본문 매칭 query 라 BM25 동일). HOTFIXES: 2026-05-24 v0.17.0 entry 의 `heading_path_json` 노이즈 항목 closure 표기 + 새 2026-05-25 post-v0.17.1 dogfood entry 추가. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:40:51 +00:00
altair823	f2867540d2	Merge pull request 'feat(ocr): request_timeout_secs config knob' (#164 ) from feat/ocr-timeout-config into main	2026-05-25 05:14:27 +00:00
altair823	e118844256	chore(ocr): PR #164 회차 1 리뷰 반영 - HOTFIXES 헤더 `v0.17.2` (vaporware) → `post-v0.17.1 dogfood` 로 변경, release tag 결정과 무관하게 정확한 anchor. - HOTFIXES caller 수 `6 (5+3)` → `9 call site (6+3)` 으로 정정. - OcrCfg.request_timeout_secs doc 의 edge case 가 LlmCfg sister doc 과 동일한 구체 예제 (`u64::MAX`, `86400`) + reqwest 0.12.x 명시 주석으로 강화. - LLM + OCR 양쪽의 legacy TOML fixture (78 줄 거의 동일) 를 module-level `LEGACY_PRE_TIMEOUT_TOML` const 로 추출. 두 test 가 동일 source 공유 → 옛 schema 가 또 변하면 한 곳만 수정. reqwest::Duration::ZERO fact-check (회차 1 점 5) 는 회차 2 reply 에서 검증 결과 보고. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:13:09 +00:00
altair823	41c5edc517	feat(image.ocr): request_timeout_secs config knob + closure of v0.17.1 미진행 v0.17.1 (PR #162) 가 LLM 쪽 hard-coded 300s 를 [models.llm] request_timeout_secs 로 풀어준 것과 같은 패턴을 OCR 어댑터에 적용. 사용자 결정으로 별 노브 분리 ([image.ocr] request_timeout_secs) — OCR 는 LLM 대비 cold start 패턴이 달라 독립 조절이 편함. - OcrCfg.request_timeout_secs: u64 (serde default 300) - KEBAB_IMAGE_OCR_REQUEST_TIMEOUT_SECS env override - OllamaVisionOcr::build / from_parts 시그니처에 timeout 인자 추가 - REQUEST_TIMEOUT 상수 제거 - 3 신규 unit test (default / env / legacy parse) — LlmCfg 패턴 그대로 - HOTFIXES 2026-05-25 v0.17.1 entry 의 두 미진행 항목 모두 closure (OCR timeout = 본 PR, --stream docs = PR #163 에서 이미 완료) 기존 config / 옛 KB 영향 없음 — 새 필드는 default 로 채워지고 동작도 동일 (300s). vision 모델 cold start 가 길면 env 또는 config 로 늘릴 수 있음. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:06:53 +00:00
altair823	d02149c010	docs(v0.17.1): HANDOFF + INDEX — v0.17.1 cut sync - HANDOFF 한 줄 요약 v0.17.0 → v0.17.1 + release URL 추가 (v0.17.1 cut: PR #162 + #163 한 묶음 안내). - 머지 후 발견 deviation 절: 2026-05-25 v0.17.1 entry 추가. - INDEX P10 Dogfooding Feedback section 하단에 'v0.17.1 post-dogfood polish' subsection 추가 (PR #162, #163 각 한 줄). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 03:35:58 +00:00
altair823	0c69b9621b	chore: bump version 0.17.0 → 0.17.1 v0.17.1 patch release — v0.17.0 post-dogfood follow-up 두 PR 머지 후. - PR #162: [models.llm] request_timeout_secs config + 권장 모델 가이드 - PR #163: sudo 없이 ollama 설치 가이드 + kebab ask --stream UX 권장 둘 다 additive only (config field) + docs only — wire breaking 없음, 기존 사용자 영향 없음. patch bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v0.17.1	2026-05-25 03:34:12 +00:00
altair823	0d69d85757	Merge pull request 'docs: sudo 없이 ollama 설치 + ask --stream 권장 (v0.17.0 post-dogfood)' (#163 ) from docs/ollama-install-and-stream into main Reviewed-on: #163	2026-05-25 03:26:24 +00:00
altair823	a67300317b	docs(ollama): sudo 없이 설치 가이드 + ask --stream 권장 (v0.17.0 post-dogfood) 확장 도그푸딩에서 사용된 두 패턴을 README + SMOKE 에 옮김. (1) sudo / systemd 없이 격리 디렉토리에 ollama 설치 — tarball 받아 /opt/ollama/{bin,models,logs} 같은 사용자 디렉토리에 풀고 OLLAMA_MODELS env 로 모델 위치 분리. 컨테이너 / WSL2 / 회사 머신 등 root 권한 제약 환경에 유용. 도그푸딩 머신에서 /build/cache/ollama 로 같은 패턴 검증. (2) cold start 가 긴 모델 (8B+ 또는 첫 호출) 은 `kebab ask --stream` 권장 — 동일 inference 시간이라도 progressive 토큰이 5분 timeout 한도 안에서 빠르게 surface 됨. p9-fb-33 의 streaming 경로를 UX 개선 권고로 명시. 코드 변경 없음 — docs only. README + SMOKE 두 군데 동일 패턴 sub-bullet + bash snippet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 03:23:35 +00:00
altair823	abb05ebc23	Merge pull request 'feat: [models.llm] request_timeout_secs config + 권장 모델 가이드' (#162 ) from feat/llm-timeout-config into main Reviewed-on: #162	2026-05-25 03:21:19 +00:00
altair823	26fdc4f344	docs(llm-timeout): 0-as-disable 함정 명시 + HOTFIXES typo + 용어 정리 PR #162 워커 리뷰 반영. - MEDIUM (W2) + LOW (W1): request_timeout_secs = 0 이 reqwest 의 의미상 disable 이 아닌 instant timeout (모든 요청 즉시 실패). LlmCfg field rustdoc + ollama.rs module-level comment + README 세 군데에 명시 + u64::MAX / 86400 같은 large finite 값 권장. - NIT (W1): HOTFIXES 2026-05-25 entry 의 '답변이 인 5분' typo → '답변이 5분' (1자 삭제). - NIT (W1): README + HOTFIXES 의 '확장 도그푸딩' 내부 jargon → '후속 도그푸딩' 으로 통일. 코드 동작 변경 없음 — doc only. cargo test request_timeout 3 PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 03:14:41 +00:00
altair823	3f5e0e6e90	feat(llm): [models.llm] request_timeout_secs config + 권장 모델 가이드 v0.17.0 확장 도그푸딩 (2026-05-25) 에서 발견된 두 가지를 한 PR 에 묶음. (1) llm.generate_stream 의 hard-coded 300s timeout 을 config 노브로 빼냄. 8B+ 모델 (gemma4:e4b 등) 은 CPU only 환경에서 5분 안에 첫 RAG 답변 못 마치고 `error: kb-rag: llm.generate_stream` 으로 떨어지던 문제. - kebab-config::LlmCfg 에 request_timeout_secs: u64 additive 필드 (#[serde(default = "default_llm_request_timeout_secs")] default 300). 옛 config 가 키 누락해도 그대로 파싱 + 동일 동작. - env override KEBAB_MODELS_LLM_REQUEST_TIMEOUT_SECS. - kebab-llm-local::ollama.rs 의 REQUEST_TIMEOUT 상수 제거 → OllamaLanguageModel::new 가 Duration::from_secs( llm.request_timeout_secs) 로 reqwest client 빌드. doc comment 도 동일 갱신. - 신규 unit test 3 — default 300 핀 / env override / legacy config (필드 누락) backward-compat. (2) docs — README 사전 요구 절 + docs/SMOKE.md ollama 안내에 한 단락: CPU only / RAM ≤ 16 GB 환경 ⇒ ≤ 4B Q4 모델 권장 (gemma3:4b / qwen2.5:3b / phi3:mini). 8B+ 시도 시 timeout 패턴 사전 안내. request_timeout_secs 노브 사용법. HOTFIXES 2026-05-25 entry — 위 두 변경 + 미진행 사항 (kebab-parse-image OCR 의 같은 hard-coded 300s 는 scope 외 follow-up 으로 등재 + ask --stream 권장 강조 후속) 기록. workspace cargo test -j 1 + clippy 통과. 코드 변경은 backwards-compat (additive serde field) 라 기존 사용자 영향 없음. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 03:01:03 +00:00
altair823	578a60e3bb	docs(v0.17.0): HANDOFF — version + PR-A/B/C closure entries (R1) - 한 줄 요약 v0.16.1 → v0.17.0 + release notes URL + PR-A/B/C 한 줄 요약. - 머지 후 발견 deviation 절: PR-A 외 PR-B / PR-C 의 2026-05-24 closure entry 추가. - '다음 task 후보' 의 P10 round 2 follow-up 라인: 세 항목 모두 v0.17.0 closure 표시. - 'P10 dogfooding 백로그' 의 chunk_breakdown + C typedef 두 항목도 ✅ v0.17.0 closure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 20:55:47 +00:00
altair823	64f518e08e	docs(v0.17.0): HANDOFF + INDEX — v0.17.0 cut sync (R1) - HANDOFF 한 줄 요약 v0.16.1 → v0.17.0, release notes URL, PR-A/B/C 셋 한 줄 요약. 머지 후 발견 deviation 절에 PR-B / PR-C closure entry 추가. "다음 task 후보" + "P10 백로그" 의 세 항목 ✅ v0.17.0 closure 표시. - INDEX 의 P10 섹션 하단에 신규 "P10 Dogfooding Feedback (v0.17.0)" subsection — PR-A/B/C 3 항목 listup (Gemini round 2 권장 형식). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 20:54:39 +00:00
altair823	fa9f91ead4	chore: bump version 0.16.1 → 0.17.0 v0.17.0 release cut — PR-A (한국어 trigram FTS tokenizer + lexical builder + hint surface) + PR-B (C typedef alias unit + parser_version cascade + orphan purge) + PR-C (code_lang_chunk_breakdown additive wire field) 셋 머지 후. Breaking changes: - V007 migration (chunks_fts unicode61 → trigram) — chunks 원본 / embedding / vector 불변, FTS shadow 자동 backfill. 사용자는 다음 open 시 V007 즉시 적용 (re-ingest 불필요). kebab.sqlite 파일 크기 ~2-5배 또는 수백 MB 증가. - 영어 lexical 검색이 substring 매칭으로 동작 변경 (token → tokenization/tokenizer 도 hit, recall ↑ / 단어 경계 ↓). - C parser_version code-c-v1 → code-c-v2 (typedef alias 추출 cascade). 같은 file 의 옛 doc/chunks/vector 는 same-workspace_path orphan purge 가 자동 정리. Additive (backwards-compat): - SearchResponse.hint additive field — 한국어 2자 query 등 trigram 비호환 시 안내. - schema.v1.stats.code_lang_chunk_breakdown additive field — chunk 단위 언어별 분포. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v0.17.0	2026-05-24 20:52:14 +00:00
altair823	9ee89c2a94	Merge pull request 'feat: v0.17.0 PR-C — code_lang_chunk_breakdown additive wire field' (#161 ) from feat/code-lang-chunk-breakdown into main Reviewed-on: #161	2026-05-24 20:35:28 +00:00
altair823	13a3361ba2	docs(v0.17.0/PR-C): rustdoc — code_lang_breakdown / repo_breakdown 가 실제로 doc count 임을 명시 (PR #161 워커 리뷰 MEDIUM 반영) JSON schema description 은 PR-C 본체에서 'code chunk count' → 'doc count' 로 정정했으나 Rust struct field 의 rustdoc 은 같은 오기재를 그대로 carry — Gemini round 2 가 JSON schema 만 봤고 rustdoc 은 miss. 워커 둘 다 동일 finding (MEDIUM). implementation 변경 없음 — 의미가 doc count 였던 사실이 처음부터 일관. wording 만 맞춤. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 20:35:01 +00:00
altair823	0def913abd	feat(v0.17.0/PR-C): code_lang_chunk_breakdown additive wire field closure of HOTFIXES 2026-05-22 "code_lang_breakdown chunk granularity" LOW. Chunk-level companion of the existing doc-count metric. - crates/kebab-store-sqlite/src/store.rs: code_lang_chunk_breakdown() method. chunks INNER JOIN documents → COUNT(c.chunk_id) GROUP BY metadata_json.code_lang, NULL skipped. BTreeMap<String, u32>. + lib unit test code_lang_chunk_breakdown_counts_chunks_not_docs (1 rust doc + 3 chunks → rust=3 chunks vs rust=1 doc). - crates/kebab-app/src/schema.rs: Stats.code_lang_chunk_breakdown additive field + collect_stats builder. tests_stats_ext 의 stats_includes_code_lang_and_repo_breakdown_fields 가 신규 필드도 검증. - docs/wire-schema/v1/schema.schema.json: 신규 additive 필드 명세 + 기존 code_lang_breakdown / repo_breakdown description 정정 ("code chunk count" → "doc count", Gemini round 2 권고). - tasks/HOTFIXES.md: 2026-05-24 PR-C closure entry. wire additive, schema_version bump 불필요. v0.16.x 호출 호환. cargo test --workspace --no-fail-fast -j 1 + clippy 통과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 20:35:01 +00:00
altair823	ff9d5f5f86	Merge pull request 'feat: v0.17.0 PR-B — C typedef-wrapped struct/enum/union → typedef alias unit' (#160 ) from feat/c-typedef-struct-unit into main Reviewed-on: #160	2026-05-24 20:33:15 +00:00
altair823	70a5068c0d	docs(v0.17.0/PR-B/B2): HOTFIXES 2026-05-24 closure + p10-1d Risks 갱신 - tasks/HOTFIXES.md: 새 2026-05-24 PR-B closure entry — extractor 의 type_definition 분기, PARSER_VERSION bump, same-workspace_path orphan purge, 사용자 영향, 잔여 nested typedef Risks. - tasks/HOTFIXES.md: 기존 2026-05-21 typedef 항목의 Status / Next step 을 v0.17.0 closure 표현으로 갱신 (관찰 기록은 frozen 유지). - tasks/p10/p10-1d-c-cpp-ast-chunker.md: Risks 의 typedef idiom 라인 을 closure ✅ + 잔여 nested typedef 안내로 갱신. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 20:32:36 +00:00
altair823	93ddece111	feat(v0.17.0/PR-B/B1): C typedef extractor + parser_version bump + orphan purge cascade closure of HOTFIXES 2026-05-21. C typedef-wrapped anonymous struct/enum/union 이 typedef alias 이름으로 symbol unit 방출. - crates/kebab-parse-code/src/c.rs: type_definition 분기 추가. inner anonymous struct_specifier / enum_specifier / union_specifier 탐지 → declarator field 의 type_identifier 재귀 추출 → synthetic unit (typedef alias). named inner aggregate / plain alias 는 기존대로 glue. PARSER_VERSION code-c-v1 → code-c-v2. recover_typedef_alias + extract_typedef_alias_name helper 추가. - crates/kebab-store-sqlite/src/store.rs: 두 helper 신규 (parser_version bump cascade 용 doc-id 기반 orphan purge). - stale_chunk_ids_for_workspace_path_except_doc_id(workspace_path, keep_doc_id) — sister of stale_chunk_ids_at, doc_id 기반. - purge_document_at_workspace_path_except_doc_id(workspace_path, keep_doc_id) — CASCADE document/chunks 제거, assets 보존. keep_doc_id="" 가 "모든 doc 제거" 사용. - crates/kebab-app/src/lib.rs: try_skip_unchanged 의 parser_mismatch 분기에서 purge_workspace_path_for_parser_bump 호출. helper 가 app.vector() 로 lazy 접근 + delete_by_chunk_ids + SQLite document row 제거. Ok(None) 반환 전 cleanup 끝나서 caller 의 새 INSERT 시 idx_docs_workspace_path UNIQUE 충돌 회피. - tests: - c.rs unit tests 4 신규 — typedef_struct_emits_unit / typedef_enum_emits_unit / typedef_union_emits_unit / typedef_to_existing_type_stays_glue (negative). - tier1_c_ingest_searchable: parser_version assertion code-c-v1 → code-c-v2. - 회귀: bytes-edit 경로 (asset_id 변경) 의 기존 purge_orphan_at_workspace_path + purge_vector_orphans_for_workspace_path 는 그대로 — 신규 분기와 공존, 기존 test 모두 PASS. 미해결 (Risks): nested typedef (typedef struct { struct {...} inner; } Outer;) 의 inner 익명 struct 는 여전히 glue — v2 의 1차 범위는 top-level typedef alias 만. cargo test --workspace --no-fail-fast -j 1 + clippy 통과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 20:30:57 +00:00
altair823	67559fb3ce	Merge pull request 'feat: v0.17.0 한국어 trigram FTS tokenizer + lexical builder + hint surface' (#159 ) from feat/korean-trigram-tokenizer into main Reviewed-on: #159	2026-05-24 20:29:00 +00:00
altair823	d79e432916	test(v0.17.0/A5): CLI hint surface e2e coverage (worker-1 nit) PR #159 worker-1 review 의 LOW 가독성 nit 반영 — CLI stderr [hint] line + --json hint shape 통합 test 가 없었음. - search_plain_emits_short_query_hint_to_stderr — 빈 KB + 2자 query → stderr 가 "[hint]" + "3자 이상" 포함 확인. - search_json_emits_hint_field_for_short_query — 동일 입력 --json → search_response.v1.hint 필드 set + 표준 advisory 문자열 정합. - search_json_omits_hint_field_when_query_is_long_enough — 3자 query → hint 필드 absent (additive serializer 의 None 제외 동작). wire_search_response 5 → 8 PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 12:45:11 +00:00
altair823	0ee18149e7	test(v0.17.0/A5 follow-up): trigram tokenizer downstream test fixes trigram tokenizer 가 snippet 단위 + 단어 경계 + BM25 raw score 분포를 모두 바꿔서 unicode61 assumption 기반의 3 test 가 regression. - wire_search_response::search_json_truncates_with_max_tokens + search_plain_emits_truncated_hint_to_stderr: 단일 doc + 작은 max_tokens 로는 snippet 이 짧아서 budget loop 가 trip 안 함. 다중 doc fixture (5 doc) + budget 30 token 으로 hit-pop 경로 통해 truncated=true 보장. - fetch_integration::fetch_chunk_with_context_returns_neighbors: fixture body 의 2-char tokens (A1/A3 등) 가 trigram 비호환으로 0-hit. apples/banana/cherry/durian/elder 5-char unique words 로 갱신, query 도 cherry 로 deterministic pin. - eval/runner::runner_per_query_snapshot_matches_fixture: trigram token stream 으로 BM25 raw score 변동. UPDATE_SNAPSHOTS=1 로 regenerate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 12:21:34 +00:00
altair823	8a68289499	docs(v0.17.0/A6): HANDOFF + HOTFIXES + README + SMOKE + SKILL — 한국어 trigram closure - HOTFIXES: 새 2026-05-24 절 — v0.17.0 closure 영향 (한국어 lexical 3-gram, 영어 substring 변경, BM25 분포, 디스크 용량, heading_path JSON 노이즈 관찰). 기존 2026-05-22 한국어 lexical 항목의 Status / Next step 을 closure 표현으로 갱신. - HANDOFF: 머지 후 발견 deviation 절에 2026-05-24 entry + 기존 2026-05-22 항목을 closure cross-link 로 정리. P10 백로그 한국어 tokenizer 항목 ✅ v0.17.0 + "다음 task 후보" follow-up 라인의 상태 갱신. - README: 검색 명령 행에 trigram 동작 + hint + 디스크 용량 한 줄. - SMOKE: 새 "한국어 trigram 검색 (v0.17.0)" 절 — 도그푸딩 query 시퀀스 (충돌은 raw / 해시 충돌 multi-token / Rust 충돌은 mixed / 충돌 2자 + stderr / --json hint 검증) + 영어 substring 동작 변경 안내. - SKILL.md: search 절에 hint 필드 안내 한 줄 — agent 가 short query 케이스에서 같은 query 재시도 대신 사용자에게 surface 하도록. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 11:54:44 +00:00
altair823	6ac7fea7b9	feat(v0.17.0/A5): trigram-aware build_match_string + SearchResponse.hint PR-A 본체. plan Task A4 Step 1c + A5. - lexical.rs::build_match_string 재설계: whole-phrase + token-AND OR-combined, 3자 미만 토큰 drop, 후보 없음 시 None (빈 MATCH 회피). raw single-quote mode 유지. - SearchResponse.hint additive — empty result + trimmed < 3 chars + non-raw 케이스에 short_query_hint helper 가 set. - CLI 'kebab search' 가 [hint] stderr 한 줄 (text mode). - TUI SearchState.short_query_hint + poll_worker stale-aware set + fire_search/mark_input_changed reset + dynamic_status 표시. - docs/wire-schema/v1/search_response.schema.json hint additive. - 신규 unit tests (lexical 9 PASS, 기존 2 expectation 갱신) + 통합 회귀 (search_korean: multi_token + mixed, 3 PASS) + BM25 snapshot regen (trigram token stream). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 11:54:25 +00:00
altair823	fe123c0c6d	test(A4): korean + english trigram matching at FTS level 3개 신규 unit tests in tests/fts.rs §7: 1. fts_trigram_korean_3char_substring_hits — Codex sqlite 3.45.1 검증 동작 5개 assert pin: raw 3자 substring hit (충돌은/발생한), quoted phrase hit (\"해시 충돌\"/\"시 충\"), raw 해시충 0-hit (원문 미존재). 2. fts_trigram_korean_short_query_zero_hit_pinned — 2자 한국어 query (충돌·키) 0-hit 회귀 감지. trigram 구조 변경 시 먼저 fail. 3. fts_trigram_english_substring_hits — substring recall 동작 변경 pin (token→tokenizer, to 0-hit). 검증: cargo test -p kebab-store-sqlite --test fts → 13/13 PASS (신규 3 + 기존 10). Step 1c (multi-token 한국어 query e.g. \"해시 충돌\") 와 Step 5 (lexical BM25 snapshot 갱신) 는 Task A5 의 build_match_string() 재설계 후 진행. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 00:57:37 +00:00
altair823	753b1ff5e5	task(A4-step0): synthetic korean fixture for trigram tests 도그푸딩 실 한국어 위키 문서 (hash-table.md, 4512줄 mediawiki HTML, CC-BY-SA) 는 크기·라이선스 부담으로 직접 commit 회피. 대신 도그푸딩 query 들 (해시 충돌·충돌은·시 충·해시충·충돌) 을 모두 cover 하는 합성 fixture 작성. trigram tokenizer 의 정확한 매칭 동작 (3자 substring hit, 2자 0-hit, raw vs quoted phrase) 검증용. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 00:54:30 +00:00
altair823	8dcedc4b11	feat(p10-r2): V007 trigram migration + design §5.5 + fts diff-check Task A2 + A3 한 묶음. migrations/V007__fts_trigram.sql 신규: - chunks_fts shadow 를 DROP + 재생성 (tokenize = trigram). - chunks_ai/ad/au trigger 재생성 (V002 와 동일). - chunks 에서 backfill INSERT — 사용자 re-ingest 불필요, V007 자동. - V002 는 historical cold-upgrade replay 위해 그대로 유지. design §5.5 갱신: - verbatim block 의 tokenize 만 trigram 으로 교체. - §5.5 본문 상단에 한국어 채택 사유 + trade-off (영어 lexical 변경, BM25 분포, 디스크 ~2-10x, contentless 아님) prose 한 단락 추가. crates/kebab-store-sqlite/tests/fts.rs: - fts_v002_matches_design_section_5_5_verbatim → fts_v007_matches_design_section_5_5_verbatim 으로 rename. - extract_migration_5_5_verbatim_block() 의 include_str! path 를 V007__fts_trigram.sql 로 변경. 주석/assertion msg V007 로. - V002 cold-upgrade test 들 (fts_v002_backfill_*) 은 그대로 유지. 검증: cargo test -p kebab-store-sqlite --test fts → 10/10 PASS (`fts_v007_matches_design_section_5_5_verbatim` 포함). Codex round 1/2 의 design §5.5 contentless 정정·trigram tokenizer 채택 사유 명시 발견 반영. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 00:52:40 +00:00
altair823	8781c6112b	task(A1): builder baseline + sqlite version + snapshot locations Task A1 step 1-3 완료. plan A5 의 baseline 노트 슬롯 채움. 핵심 발견: - build_match_string() (lexical.rs:177-200): trim → strip_single_quotes raw FTS verbatim / 그 외 whitespace split + escape_fts5_token (\"...\" + inner doubling) + space join (implicit AND). - raw mode = single quote '...' 가 trimmed 전체 감쌈 (lexical.rs:167). - SQLite: rusqlite 0.32 + libsqlite3-sys 0.30.1 bundled (in-tree, SQLite ~3.46.x) → trigram 사용 가능. - Snapshot: tests/lexical.rs::lexical_snapshot_run_1 + tests/hybrid.rs:: hybrid_snapshot_run_1 (KEBAB_UPDATE_SNAPSHOTS=1 로 regenerate). inline normalize_bm25_top_score 는 numerical 무관. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 00:47:24 +00:00
altair823	14197b5e02	docs(p10-round-2): HANDOFF + HOTFIXES sync for v0.17.0 follow-up P10 도그푸딩 round 2 의 follow-up 후보를 HANDOFF "다음 task" / "P10 백로그" 절에 반영. HOTFIXES 의 round 2 항목 (한국어 lexical 한계 + code_lang_breakdown + ranking deferred) 정합. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 00:43:31 +00:00
altair823	584247f1ea	spec+plan(v0.17.0): korean trigram tokenizer + dogfood fixes P10 도그푸딩 round 2 (2026-05-22) follow-up. SQLite FTS5 tokenizer unicode61 → trigram 으로 교체해 한국어 lexical 검색 지원 + 작은 버그픽스 2 (C typedef-wrapped struct 미노출, code_lang_breakdown 집계 단위). Codex + Gemini round 1/2/3 리뷰 반영: - [r1] 2자 한국어 query 0-hit, build_match_string() multi-token 깨짐, contentless → shadow, parser_version cascade, BM25/heading_path/디스크 - [r2] same-workspace_path orphan purge (parser bump cascade 실제 동작), trigram 테스트 예시 sqlite 3.45.1 검증, builder 권장안 (whole phrase OR) - [r3] SMOKE 시나리오 정정, TUI stale hint 방지, search_response.v1 hint 필드, new purge helpers, single quote raw mode 통일, fixture 도입 PR 구성: PR-A (trigram + builder + 안내), PR-B (C typedef + orphan purge), PR-C (stats + wire). 셋 머지 후 v0.17.0 release cut. design: docs/superpowers/specs/2026-05-22-korean-trigram-tokenizer-design.md plan: docs/superpowers/plans/2026-05-22-korean-trigram-tokenizer.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 00:43:31 +00:00
altair823	a0c0dca321	fix(dogfood): k8s multi-resource YAML chunk_id collision (#158 ) v0.16.1	2026-05-21 23:57:49 +00:00
altair823	667495ae6a	docs(dogfood): HOTFIXES entry for k8s multi-resource chunk_id collision PR #158 code-reviewer recommendation. Records the dogfood-discovered k8s multi-resource chunk_id collision + the deliberate decision NOT to bump chunker_version (dogfood-only stage, single-resource k8s chunk_id shift is benign churn). Cross-link added to p10-2 spec Risks/notes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 23:57:34 +00:00
altair823	08d72a12e0	chore: bump version 0.16.0 → 0.16.1 (k8s multi-resource chunk_id fix) Patch bump — bug fix only (P10 dogfood-discovered k8s multi-resource chunk_id collision). New binary needed to resume dogfooding. No wire schema change, no DB migration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 23:54:33 +00:00
altair823	1969c8e3b5	fix(dogfood): k8s multi-resource YAML chunk_id collision P10 dogfooding found that a k8s manifest with 2+ documents (e.g. Deployment + Service in one file) fails to ingest: UNIQUE constraint failed: chunks.chunk_id Root cause: tier2_shared::push_chunks_with_oversize's non-oversize branch hardcoded split_key = None. K8sManifestResourceV1Chunker calls it once per resource; with split_key None every resource from the same document gets the same id_hash (= base_policy_hash) → identical chunk_id. p10-3's code_text_paragraph_v1 had the same bug (fixed in `df3c5b8`) but it calls build_chunk_no_symbol directly — the push_chunks_with_oversize path was never fixed. Fix: push_chunks_with_oversize gains a base_split_key parameter for the non-oversize single-chunk case. k8s chunker passes Some(resource.line_start) so each resource gets a distinct chunk_id; dockerfile / manifest pass None (1 chunk per file — no sibling collision, chunk_id stays stable). Regression coverage: k8s_multi_doc_emits_one_chunk_per_resource now asserts chunk_id distinctness; new integration test tier2_k8s_multi_resource_yaml_ingests_without_collision ingests a real 2-document YAML end-to-end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 23:49:37 +00:00
altair823	c6207d196e	chore(p10-1d-followup): reviewer nit cleanup — C extractor tests + HOTFIXES + cpp snapshot (#157 )	2026-05-21 22:47:38 +00:00
altair823	840c6c40a6	test(p10-1d-followup): cpp snapshot exercises actual CppAstExtractor Reviewer nit #3: the hand-built fixed_doc() only verified chunker 1:1 mapping. New tests invoke CppAstExtractor against tests/fixtures/sample.cpp and snapshot the real extractor → chunker pipeline (14 blocks emitted covering namespace::chunk::Class, ctor/dtor/operator/template/free-fn convention, glue <top-level> blocks between units). Adds kebab-parse-code as a dev-dep of kebab-chunk (same precedent as kebab-parse-md). Both the existing hand-built test AND the new extractor-driven tests are kept — the former for fast chunker-only validation, the latter for end-to-end regression detection. Added tests: - code_cpp_ast_extractor_snapshot: asserts all 8 named symbol units are present - code_cpp_ast_extractor_chunks_deterministic: chunker output is stable Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 22:43:57 +00:00
altair823	b81574afa9	docs(p10-1d-followup): HOTFIXES entry — typedef-wrapped struct/enum in C falls into glue PR #156 reviewer nit #2. Documents the tension between spec body ("struct_specifier (named, top-level) → 1 unit") and the actual behavior for the C idiom `typedef struct { ... } Foo;` — the inner struct_specifier is anonymous, so the extractor falls into glue. Workaround: dogfood-driven revisit if frequent pain point emerges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 22:40:04 +00:00
altair823	6beff35a2f	test(p10-1d-followup): add in-file unit tests to C AST extractor Mirrors the cpp.rs 15-test pattern. Covers function_definition (incl. pointer-return, static/extern/inline), struct_specifier / enum_specifier / union_specifier (named), anonymous struct/enum/union → glue, typedef-wrapped struct → glue (per spec risks note), preprocessor directives → glue, empty file → <module> post-pass, preprocessor-only → <module>, mixed fn + glue → <top-level> present, determinism (20 runs). 17 tests total. Reviewer nit #1 (PR #156 code-reviewer). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 22:39:36 +00:00
altair823	75a4207aa1	feat(p10-1d): C + C++ AST chunkers — P10 Tier 1 chunker family complete (#156 ) v0.16.0	2026-05-21 15:48:34 +00:00
altair823	86aa180ad7	chore: bump version 0.15.0 → 0.16.0 (p10-1d C + C++ AST chunkers) Minor bump — additive new chunker_versions code-c-ast-v1 + code-cpp-ast-v1 + new routing langs c / cpp + new tree-sitter-c / tree-sitter-cpp workspace deps. P10 Tier 1 chunker family complete. No DB migration, no wire schema major bump. Also lands the missing p10-3 try_skip_unchanged fallback-aware fix (Option B1 — 7th param) that PR #155 was supposed to ship but never made it to main (implementer reported commit SHA 2a39513 that didn't exist in the merged branch). Same commit extends tier3_fallback_cv to include c/cpp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 15:38:00 +00:00

1 2 3 4 5 ...

834 Commits