kebab

Author	SHA1	Message	Date
altair823	e2ae9a4589	docs(spec): arctic-embed-l-v2.0 임베더 통합 spec + plan 별칭 제거 후 설명형 recall 보강 최선책(측정 arctic recall@10 130/132, 용어 무손실). candle 다중모델화(CLS pooling+query: prefix) 우선 + Ollama embed provider 폴백. correctness 게이트 = candle≈Ollama 코사인>0.99. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 04:14:26 +00:00
altair823	fc5103642e	docs: 별칭 제거 문서 동기화 + version 0.25.0 HOTFIXES 2026-06-03 dated entry, 2026-05-30 design spec 제거 banner, HANDOFF 1줄, README(별칭 섹션/config/명령표 정리), ARCHITECTURE(결정 표 + 디렉토리 트리), SMOKE/DOGFOOD config-migrate 예시 정정. workspace version 0.24.0 → 0.25.0 (+ Cargo.lock). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 21:37:58 +00:00
altair823	ca8c0645fb	docs(spec): doc-side expansion(별칭) 제거 spec + plan 연구문서(2026-06-03) 결론 따라 별칭 기능 제거 설계. 유지(Metadata.aliases, AssetChunked/AssetTimings, embedding 캐시)/제거(Chunk.aliases, expansion.rs, ExpansionCfg, chunk_aliases_fts, run_alias_query, ExpansionProgress) 경계 + 신규 DROP 마이그레이션 + wire expansion_progress 제거 결정. 구현 plan 8 task. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 20:13:12 +00:00
altair823	c7af6612b7	docs(research): expansion 비용 재고 + 별칭 대체 딥리서치 별칭(doc-side per-chunk LLM expansion)이 ingest 임계경로 병목으로 확정된 뒤 대안 조사. 동시성(OLLAMA_NUM_PARALLEL 최대 1.28×)·모델스왑(qwen3.5 중국어/ degeneration)·백그라운드(총량·treadmill 불변) 모두 실측 소진. Step 0 측정: 별칭 없이도 cross-lingual recall@10 완벽(en/ko/syn/abbr), 약점은 설명형뿐 → 별칭 ROI 음수. Step 1: bge-m3 dense 는 lateral(설명형 +3 / 용어 -3, 순0). 4-agent 딥리서치: 잔존 = reverse-dictionary 과제, 측정-우선 계층(heading enrichment → arctic-ko 임베더 → bge-reranker-v2-m3 → near-tie 게이트 expansion). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 20:07:42 +00:00
altair823	6ec4e6809f	fix(embed-candle): address round-1 review - commit track-spec + meta-spec/plan into branch (HIGH: dangling `amends:` ref) - inline parity evidence (cosine 1.0, max_abs_diff 2.01e-7) into HOTFIXES + release notes; drop refs to deleted IMPL_REPORT/SPIKE_REPORT (MEDIUM) - model guard: reject non-e5-large `model` before the 2GB download so model_id() can't mislabel vectors (MEDIUM) + unit test - parity test now covers BOTH query: and passage: prefixes (MEDIUM) - guard encodings.first() index; document zero-attention/pooling invariant; clarify embed_batch prefixing doc (LOW) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 16:54:20 +00:00
altair823	6d86214060	docs(plan): config 마이그레이션 구현 계획 (TDD, 13 tasks) reconcile(additive)+step 체인(non-additive) 분리, init/migrate 공유 annotated_default_document, app facade 백업+atomic write, doctor 체크, CLI config migrate, wire config_migration.v1. bite-sized TDD steps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 11:39:31 +00:00
altair823	6bbb8f854b	docs(spec): config 마이그레이션 설계 계약 kickoff 인계(#197)의 brainstorm 결과를 확정한 spec. 트리거=명시 명령 `kebab config migrate`+doctor 안내, 주석 보존=toml_edit 부분 편집, 메커니즘=reconciliation(additive)+step 체인(non-additive) 하이브리드. init/migrate 가 주석 달린 default 문서를 공유. 안전 3축(멱등·백업·dry-run) + atomic write. wire schema config_migration.v1 신설. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 11:34:19 +00:00
altair823	16f3d6eef2	docs: config 마이그레이션 작업 인계 kickoff config.toml 스키마 진화 시 기존 사용자 파일 자동 마이그레이션 기능의 별도 세션 인계 문서. 현황(serde default forward-compat 있음/파일 마이그레이션 없음/schema_version 장식), 핵심 난점(주석 보존), 설계 3안(전체재작성/toml_edit append/백업), 트리거(명령 vs 자동), 방법론(v0.21.0 PR #195/#196 패턴) 정리. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 11:11:08 +00:00
altair823	88c5b83dea	docs: derivation-cache spec/handoff 독자 관점 보강 PR #195 구현(`e9b5202`) 기준으로 빠졌던 디테일 보강: - chunk_id(위치 기반 벡터 식별자) vs cache_key(내용 해시 조회 키) 구분 callout - §7 호환성/마이그레이션 신설: 본문 재색인 불필요, V012 가산이나 binary 교체 필요, 별칭 sentinel 묶음→개별 변경의 기존 KB 영향(레거시 호환) - version_key 에 kind 토큰("doc\|") 반영, orphan sentinel cleanup(LIKE prefix) 명시 - embed_with_cache 순서 보존 불변, 별칭 개별 벡터 근거(희석 13/18→16/18) - 정정: derivation_cache_gc 는 메서드만 존재하고 미연결(캐시 현재 무한 누적, 후속) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 10:25:00 +00:00
altair823	a8fd76499c	feat(expansion): doc-side expansion 별칭 개별 dense 벡터 + 파생물 캐시(V012) 별칭을 줄별 개별 dense 벡터(sentinel `{chunk}#alias#N`)로 색인하고 boilerplate 청크는 별칭 생성을 skip. 묶음 1벡터 방식은 평균화로 특정 표현이 희석돼 오히려 회귀(13/18)했던 것을 폐기. 변형 일관성 14/18 → 16/18, mean_spread@10 0.222 → 0.111 (나무위키 ~1000 문서 CS corpus). `kebab-core::strip_alias_suffix` 가 suffix 형과 per-alias 형 둘 다 처리. 파생물 캐시(V012): embedding 벡터 + 별칭 LLM 결과를 청크 내용 해시 키로 캐싱해 재색인 시 내용 불변 청크의 재계산을 skip. cache_key = blake3(kind ‖ text_blake3 ‖ version_key)[:32], version_key 에 model/prompt/dimensions 포함 → §9 cascade 와 정합(버전 bump 시 자동 miss). 측정: 정답 3개 cold 1879s → warm 13s ≈ 145배. 순수 가산이라 corpus_revision bump 없음. search/ask 는 kebab.sqlite+lancedb 만으로 동작 → 외부 서버 색인 후 DB 만 복사하는 이식 워크플로 가능. V012 schema migration + 신규 surface 로 workspace version 0.20.2 → 0.21.0 (minor) bump. README/HANDOFF/ARCHITECTURE/HOTFIXES sync. known limitation: stack·svm 설명형 2개 잔존 + grounded 판정이 부분 인용을 grounded 로 오분류(후속 후보). 측정 상세: docs/superpowers/handoffs/2026-05-31-namu-wiki-alias-cache-study.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 08:24:04 +00:00
altair823	d279f343e7	docs(spec,plan): 별도 벡터 인프라 — FK 제거(V011) + CASCADE 대체 + filter_chunks PoC: 별칭 순수 벡터가 영어 설명형 rank 7~30 (concat 본문 희석으로 미회복) → 별도 벡터 명분. 차단요인 3건: embedding_records FK(787, V011 재생성), CASCADE 대체(명시 DELETE), filter_chunks sentinel strip. plan Task 4.5/4.6. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 13:25:45 +00:00
altair823	825543549d	docs(plan): 별칭 dense 별도 벡터 구현 plan ALIAS_SUFFIX(core) → embed_aliases flag → ingest sentinel 벡터+purge → VectorRetriever strip+dedup → 측정. TDD, 완성 코드. doc-side expansion PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 10:28:43 +00:00
altair823	bcb8b93751	docs(spec): 별칭 dense 별도 벡터 설계 spec PoC(concat) 측정: dense 별칭이 6/0/2/0.25 (설명형은 dense 본령 실증), 단 영어 설명형 2개는 concat 본문 희석으로 미회복. 처방: 별칭을 sentinel chunk_id 별도 벡터로 색인(본문 벡터 불변=회귀 안전, 별칭 순수 신호). flag ingest.expansion.embed_aliases default off. lexical 완화는 폐기. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 10:26:24 +00:00
altair823	69b53d1c97	docs(spec): doc-side expansion 검색 메커니즘을 shipped 구현에 맞춰 정정 Task 6 리뷰 MINOR-1: spec 본문이 단일 UNION ALL+GROUP BY 로 기술됐으나 shipped = 2-query(run_query+run_alias_query) + Rust merge_body_alias(body 우선). 서로 다른 FTS 테이블 bm25 절대값 비교가 무의미해 body-우선 merge 가 더 깨끗. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 03:20:13 +00:00
altair823	467a974901	docs(plan): doc-side expansion 구현 plan + spec 정제 (별도 FTS 테이블) spec: chunks_fts §5.5 verbatim 충돌 회피 → 별도 chunk_aliases_fts 테이블 + lexical 내부 body+alias 병합(RetrievalDetail/wire schema 무변경)으로 정제. plan: 7 task TDD (Chunk 필드 → V010 → config → ExpansionGenerator → ingest hook → lexical 병합 → 측정/문서). 완성 코드 + 빌드 규약. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 02:04:58 +00:00
altair823	098413922b	docs(spec): 색인시 doc-side expansion 설계 spec (Phase 2) brainstorm 확정: 청크당 별칭 생성(같은언어+한↔영 번역), additive+수동 재색인, 1차 단순 품질제어. 별도 FTS5 aliases 채널 → RRF 3채널 융합. flag off 기본, kebab eval variants 로 on/off 측정. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 01:54:46 +00:00
altair823	8bb7c276d0	docs: Phase 2 doc-side expansion 킥오프 + 구현 방법론 핸드오프 새 세션이 Phase 2(색인시 doc-side expansion)를 자립적으로 이어받을 컨텍스트 문서. 배경(rerank 반증→재정의→Phase1 진단 B우세→딥리서치→PoC), 설계 방향(KO↔EN 번역 별칭 + 별도 FTS5 필드 + RRF, flag off), 이미 만든 측정 도구(kebab eval variants + dogfood golden), 그리고 지금까지와 동일한 구현 방법론(brainstorm→spec→plan→OMC teammate sequential 구현+리뷰 +독립검증, 모델 라우팅, 빌드 redirect+exit, 측정=variant eval 프록시금지, gitea-pr 리뷰루프)을 담음. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 01:19:14 +00:00
altair823	5ad1f98227	docs(handoff): doc-side expansion 딥리서치 + PoC 결과 (Phase 2 방향 확정) 딥리서치(104 agent): 어휘격차 pool-miss 최선책 = 색인시 doc-side expansion. PoC(dogfood KB): recall@50=0 이던 3쿼리가 별칭 추가로 rank1~2 부활(hybrid+vector, 골든 verbatim 아님=일반화). 핵심 미검증 고리 실 corpus 정량 확인. Phase 2 = 색인시 doc-side expansion(KO↔EN 번역 별칭) → 별도 FTS5 필드 → RRF, flag off. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 00:53:24 +00:00
altair823	a58cae2ff3	docs(research): 어휘격차 pool-miss 해결 딥리서치 레퍼런스 deep-research 워크플로(104 agent, 5각도, 22소스, 25 claim 3-vote 검증, 22 confirmed/3 killed). 결론: 색인시 doc-side expansion(doc2query)이 pool-miss 최선책 — pool 자체를 키우고 per-query 지연 ~0(색인시 1회), 정확매칭 보존(별도 필드 append). 단 vanilla mt5는 같은언어라 한/영 갭은 색인시 KO↔EN 대체 query 생성 필요. query-side(HyDE=거부된 per-query LLM, Vector-PRF=recall 주장 기각)는 부적합. 검증은 기존 variant eval 로 가능. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 00:53:24 +00:00
altair823	7a1dff1684	docs(handoff): query-paraphrase robustness Phase 1 완료 + (A)/(B) 진단 8그룹×4변형(dogfood) 측정: groups=8 A_dominant=2 B_dominant=4 spread@10=0.750. 진단 — 문제는 한/영이 아니라 어휘 거리(영어 풀어쓴 문장도 miss, 일부 한국어는 OK). B(어휘격차, recall@50=0, rerank 불가) 우세 → 쿼리 확장/번역 처방 신호. A(순위출렁)는 cap_theorem/vector_database 2그룹뿐. "측정 먼저" 논제 정량 검증(rerank 단독은 부분해법). Phase 2 처방 결정 대기. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 00:53:24 +00:00
altair823	fe4c854673	docs(plan): query-paraphrase robustness Phase 1 구현 계획 5개 task: (1) GoldenQuery.group + 그룹 정합성 검증, (2) 변형 일관성 메트릭 모듈 + A/B(순위출렁/어휘격차) 분류, (3) kebab eval variants CLI, (4) dogfood golden 변형 그룹 큐레이션, (5) 측정 + 진단 리포트. TDD bite-sized, 완성 코드. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 00:53:24 +00:00
altair823	1de3f4ffca	docs(spec): query-paraphrase robustness 평가 프레임워크 설계 (측정 먼저) 목표 재정의: 한/영 overlap → 같은 의미의 다양한 표현(동의어·다른 어휘·풀어쓴 문장·한영)에서 일관된 답변 품질. 지난 reranker 실험이 overlap 프록시 최적화로 헛돈 교훈 반영 — 처방 전 진짜 지표(변형 일관성)를 직접 재는 평가부터. Phase 1(본 spec 구현): kebab-eval golden suite에 변형 그룹(intent group) + 변형 일관성 메트릭(recall_spread, answer_consistency) + recall@pool vs recall@k로 (A)순위출렁/(B)어휘격차 자동 판별. Phase 2(처방)는 측정 결과 게이트 뒤 조건부. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 00:53:24 +00:00
altair823	2429189447	docs(spec): reflect search-quality critic round-1 (eval --config, lang-filter non-goal, curation) Incorporates all critic (opus) round-1 findings into the dogfood search-quality eval design spec: BLOCKER-1: §4.4 execution commands now use --config /build/dogfood/config.toml (Task A facade-rule patch makes this the canonical path). §5.1 re-titled from "(후속 패치)" to "Task A로 적용됨 — 권장 운영 경로"; XDG workarounds demoted to "패치 전 fallback". Intro paragraph updated accordingly. MAJOR-1: §3 Non-Goals gains an explicit bullet: lang/media/code_lang SearchFilters validation is out of scope for this harness (runner uses SearchFilters::default(), runner.rs:151). §4.1 "code 검색" row no longer claims code_lang filter coverage. MINOR-1: §4.3 step 3 now names kebab inspect doc <id> as the primary chunk-selection path (breaks chunk-level curation loop); search hits demoted to "보조 확인용". MINOR-2: §4.1 golden category table gains two new rows — 한국어 N-gram fallback query (복합어/신조어 coverage) and 영어 whole-token exact query (separates substring artefacts). MINOR-3: §4.1 YAML header note added: record corpus_revision in golden file so stale-bail root cause is immediately traceable. NIT: §9 References line numbers corrected (runner.rs:31, metrics.rs:116/144); runner.rs:151 SearchFilters::default() reference added. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 03:43:00 +00:00
altair823	571996938c	docs(contract): bump default prompt_template_version to rag-v3 (Todo #1 ) line 899: V1만 legacy → V1/V2 둘 다 legacy, v0.20.2 부터 rag-v3 default 선언. line 1349 (★): config 예시 default rag-v2 → rag-v3. line 1533 (★): §9 cascade table 코드 상수 rag-v2 → rag-v3. line 287 이후: answer.v1 예시 블록에 historical snapshot 주석 추가 (n1 — model+ptv stale, 값 변경 안 함). task spec grep 판단: tasks/p9/p9-fb-15 의 rag-v2 언급 2줄은 rag-v2 도입 시점 historical 기술 → frozen 유지. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 02:45:13 +00:00
altair823	85efeeca3e	docs(plan): v0.20.2 dogfood findings 구현 plan (15 task) planner(opus) 작성 → critic 리뷰 시도 → leader 좌표 검증. 8 todo → 15 task: 코드 4 (rag-v3 / list docs / bulk / init) + 각 finding 후 전체 도그푸딩 검증 task 4 + docs-only 3 + contract + HOTFIXES/release-notes + version bump. plan critic round-1 은 환경 도구 손상으로 좌표 blocker(B-1/B-2/M-1/M-2)를 오진 → leader 가 pipeline.rs/config/cli/bulk/Cargo.toml 을 직접 grep 검증해 plan 좌표 정확 확인, executor 용 "anchor grep 재확인" + binary 경로 주의 헤더 추가. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-28 21:09:39 +00:00
altair823	2b4ba8e104	docs(spec): v0.20.2 dogfood findings 설계 + round-1 critic 반영 v0.20.1 전체 도그푸딩에서 발견된 8 todo (Ask 응답언어 rag-v3 / doc.lang docs / bulk input / list title / fusion_score·score_kind / schema index_version / Ollama hint) 를 단일 patch release 로 설계. writer worker 초안 → opus critic round-1 리뷰 반영: - B1: top-level score placeholder → 확정 (score_kind 가 의미 선언, search.rs:95-99) - M1: 이미지 caption 언어 강제 out-of-scope 명시 - M2: config default 테스트(lib.rs:1316) 갱신 필요 명시 - M3: bulk input 전체 필드 (query/mode/k/trust_min/ingested_after/media/tag/lang) - M4: rag-v3 의 eval_runs.config_snapshot_json cascade 영향 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-28 20:37:42 +00:00
altair823	b08941d6ab	docs(handoff): mark brainstorming-needed items in v0.20.1 findings todo 각 todo 에 fix path classification 추가: - 🧠 필수: design / user expectation 결정 필요 — brainstorming skill 우선 - 🔧 mechanical: spec drift 또는 명확한 fix — 별 brainstorming X - 📝 mild discussion: spec drafter self-review 로 trade-off 결정 가능 Classification: - 🧠 필수 (2): #1 Ask 영어→한국어 response policy, #4 doc.lang semantic - 🔧 mechanical (4): #2 bulk schema, #5/#6 docs sync, #7 schema rename - 📝 mild (3): #3 list title, #8 Ollama default 추가 brainstorming 후보 (직접 finding 외): - BS-A: HTML corpus 지원 (1415 file skipped) - BS-B: Tier 1/2/3 chunker UX visibility - BS-C: kebab dogfood subcommand (자동화) - BS-D: 영문 code chunk 의 tokenized_korean_text 효율 - BS-E: builtin_blacklist 명세 노출 권장 워크플로: 1. brainstorming 단계 먼저 (#1, #4 + BS-x 별 검토) 2. mechanical batch (#2, #5, #6, #7) — 한 PR 3. mild discussion batch (#3, #8) 4. dogfood retest → v0.20.2 patch release	2026-05-28 19:45:53 +00:00
altair823	6bf4e82e62	docs(handoff): v0.20.1 full dogfood findings todo for next session 머지 후 v0.20.1 의 full dogfood (사용자 실제 corpus 6293 file, 3.5 시간 ingest, §1~§11 시나리오) 발견된 findings 를 새 session 의 self- contained todo handoff 로 정리. P0 (bug / 의도와 다른 동작): - #1 Ask 영어 query → 한국어 응답 (rag-v2 prompt template 강제) - #2 bulk search input format 불명확 (wire schema 미명시) - #3 list docs title 중복 (heading-based, doc_path 보조 필요) - #4 doc.lang = und 53% (code file 의 lang detection 실패) P1 (docs drift): - #5 fusion_score 위치 (.retrieval.fusion_score) - #6 score_kind="bm25" 의미 (lexical mode 의 fusion_score) - #7 schema index_version vs lexical_index_version 혼동 P2 (setup): - #8 Ollama endpoint default 가 localhost (사용자 환경 remote) 각 todo 별 severity, scenario, suspected location, action item 명시. 새 session 시작 명령 + branch 권장 + 도그푸딩 재실행 절차 + finding cumulative table 포함. Repo state: main HEAD=a0c7fa3, clean. v0.20.1 binary OK. /build/dogfood/ KB (3940 docs, 34896 chunks) preserved for regression test.	2026-05-28 19:40:10 +00:00
altair823	028d9ad4ea	docs(release): v0.20.1 release notes draft + spec/plan dogfood cross-link #1 (사용자 요청): release notes draft 작성 + spec/plan 의 dogfood evidence cross-link 보강. docs/release-notes/v0.20.1-draft.md (신규): - 4 단락 본문 (한국어 2자 query 지원 + 영어 substring 회귀 + V007→V009 자동 backfill + ingest 성능 영향). - Migration cascade table (lexical_index_version, corpus_revision, wire schema shape preservation). - API + dependency 변경 (lindera v3, lindera-ko-dic v3, retired short_query_hint helper, 새 facade APIs). - Breaking changes 명시 (영어 substring 회귀, 첫 부팅 latency, DB/ binary 크기 증가). - Upgrade 절차 + Known limitation + 14 dogfood scenario reference. spec Appendix B (segmentation evidence): - "Empirical verification (2026-05-28 dogfood — post-merge update)" subsection 신규. prior-knowledge 가정 vs 실측 결과 table. Scenario 1-4 모두 verified 표시. ko-dic 의 '서울특별시' → '[서울, 특별시]' 분해 증거 명시. plan Changelog: - post-implementation entry: 22 commit on branch, S3 blockers, S7 cascade, S11 sanity regression updates, opus PR review 4 finding fixes. - dogfood evidence entry: 14 scenario verify pass, ko-dic 분해 evidence, HOTFIXES + spec Appendix B cross-link. Spec: …spec…md Appendix B Plan: …plan…md (post-implementation + dogfood evidence Changelog) Release notes: docs/release-notes/v0.20.1-draft.md	2026-05-28 13:34:33 +00:00
altair823	8c56ef3010	docs(superpowers): v0.20.x C 한국어 morphological tokenizer spec + plan artifacts 본 commit 은 v0.20.x C task (Bug #8 — 한국어 2자 query 0-hit) 의 4-stage workflow artifact 5 파일을 archive: - spec.md (668 line, status=accepted): Option A/B/C 비교 + lindera Path A (영어 substring 회귀 인정) 결정 + 12 section + 4 Appendix (B segmentation evidence, C cost evidence, D license evidence). - spec-critic-r1.md: 3 critical + 6 major finding (NEEDS_REWRITE). - spec-critic-r2.md: r1c rewrite 후 traceability matrix (ACCEPT). - plan.md (750 line, status=accepted): 11 step + dependencies + cost optimization routing + 9 closure micro-patches 적용. - plan-closure-r1.md: traceability matrix + 9 MP 의 origin. 이 artifact 들은 implementation 머지 후 frozen reference. 후속 deviation 은 tasks/HOTFIXES.md 가 source of truth. Workflow stage: 1. spec drafter (omc team writer, opus) 2. spec critic R1 (omc team critic, opus) — NEEDS_REWRITE 3. spec rewriter r1c (omc team writer, opus) — 7 item fix 4. spec closure R2 (in-process verifier, sonnet) — ACCEPT 5. plan drafter (omc team planner, opus) 6. plan closure (in-process verifier, sonnet) — ACCEPT + 9 MP 7. subagent-driven-development implementation (11 step + 5 follow-up + 1 docs polish = 17 commit) 8. PR-level final code review (in-process code-reviewer, opus) — Approved with notes (4 minor docs finding, merge as-is) Branch: feat/korean-morphological-tokenizer Version: 0.20.1	2026-05-28 12:53:31 +00:00
altair823	b106120e93	feat(fts): add V009 korean morphological tokenizer migration V007 trigram tokenizer 의 한국어 2자 query 0-hit 한계 (Bug #8) 해소를 위한 V009 migration 추가. unicode61 tokenizer 로 환원 + 한국어 형태소 분해 결과를 별 column `tokenized_korean_text` 에 pre-fill 하는 방식. - migrations/V009__fts_korean_morphological.sql 신규: column ADD, chunks_fts DROP+재정의, 3 trigger CASE expression, backfill INSERT, corpus_revision bump. - design §5.5 갱신: trigram → unicode61 + 형태소 column. CASE expression trigger 본문. - crates/kebab-store-sqlite/tests/fts.rs: V007 verbatim test 를 V009 source-of-truth 로 rename. v009_bumps_corpus_revision unit test 추가. - store.rs: clippy bool_to_int_with_if + cast_lossless 기존 경고 수정 (pdf_ocr_events 관련 코드, S1 작업 중 발견). 영어 substring 매칭은 V002 (whole-token only) 로 회귀 — spec §3 Non-Goals + 후속 release notes (v0.20.1) 에서 정직히 기술. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (S1) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 09:48:46 +00:00
altair823	43366b1b15	docs(handoff): C 한국어 morphological tokenizer (Bug #8 ) 새 session handoff v0.20.0 sub-item 1 + bugfix 1~4 + ingest log r1+r2 머지 후, 다음 우선순위 C (한국어 morphological tokenizer) 의 self-contained context. 새 session 의 첫 step + workflow patterns + 환경/memory references + cascade risk + 가능한 fix paths (Option A jieba-rs / B bi-gram supplement / C query-side expansion). spec/plan/executor cycle 동일 패턴. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 08:18:04 +00:00
altair823	7c24734cc7	docs(superpowers): v0.20.x logging r2 spec + plan artifacts logging round 2 (4 enhancement: image_w/h + V008 SQLite mirror + CLI inspect + retention) 의 spec/plan ACCEPT 후 round artifacts. - spec: 751 line (ACCEPT, 7/7 critic round 1 finding + 7/7 closure r2 traceability) - plan: 576 line (ACCEPT, 6/6 step + 13/13 AC + G1/G2/G3 plan-level resolve) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 08:04:32 +00:00
altair823	685007789a	style: cargo fmt --all (round 4 ingest log feature follow-up) Phase C4 executor 의 마지막 `fix(test): clippy + fmt fixes` commit 이 test file 부분만 fmt 적용. workspace 전체 fmt 누락 발견 → cargo fmt --all 적용. 모든 import alphabetical reorder + line wrapping 정합. 추가 untracked artifact 동시 commit: - docs/superpowers/specs/2026-05-28-v0.20-ingest-log-spec.md (491 line, ACCEPT) - docs/superpowers/plans/2026-05-28-v0.20-ingest-log-plan.md (616 line, ACCEPT) workspace test: 1370 passed / 0 failed / 50 ignored, ingest_log_smoke green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 04:18:40 +00:00
altair823	46e99470eb	docs(superpowers): v0.20 sub-item 1 bugfix1/2/3 specs + plans + DOGFOOD.md 3-round dogfood-driven fix cycle 의 산출물: - bugfix1 (Bug #2/#3/#4): spec 964 line + plan 848 line - bugfix2 (Bug #6/#7, #8 falsified): spec 308 line + plan 388 line - bugfix3 (Bug #9/#10/#11/#13/#14, #12 falsified): spec 410 line + plan 1043 line - docs/DOGFOOD.md: 전방위 dogfood checklist 의 전체 (§0 environment ~ §13 reference corpus) 각 round 의 spec/plan 가 critic + verifier round 2 closure ACCEPT 후 frozen. dogfood-driven evidence 기반. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 01:21:34 +00:00
altair823	5bba95fd71	docs(spec): HOTFIXES entry + parent spec cross-link for Bug #11 timeout deviation Bug #11 (이전 commit `fix(config): pdf.ocr.request_timeout_secs default 600 → 60`) 의 frozen-spec deviation handoff. - tasks/HOTFIXES.md: 2026-05-27 dated subsection — Discovered / Symptom / Root cause / Fix / Amends 5-field 포맷 (기존 entries 와 일치). - docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md: PDF OCR config block line 1000 (default value) + OQ-1 line 1628 에 inline HTML 주석 2 줄 cross-link. prose 변경 0 — parent spec frozen contract 보존, HTML 주석은 markdown render 시 invisible. HOTFIXES entry 가 live source of truth (CLAUDE.md "Spec contract" 규칙). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 23:16:18 +00:00
altair823	b4d9e60816	chore(release): bump version 0.19.0 → 0.20.0 — v0.20.0 sub-item 1 scanned PDF OCR # v0.20.0 — scanned PDF OCR via Ollama vision LLM v0.20.0 의 핵심 변경 = embedded text 가 없는 scanned PDF (책 스캔, 영수증, 카메라 page) 의 OCR ingest. PoC 의 5 engine 비교 (Tesseract / EasyOCR / PaddleOCR / gemma4:e4b / qwen2.5vl:3b) 에서 qwen2.5vl:3b 의 alnum 94.79% (page1) / 81.56% (받침) 가 모든 다른 engine 을 능가 — 본 release 의 default vision OCR. ## 1. OCR opt-in 사용법 `[pdf.ocr]` config 의 `enabled = true` 또는 `KEBAB_PDF_OCR_ENABLED=true` env 로 활성화. default off — OCR 한 page 당 45-100s (qwen2.5vl:3b on CPU, remote Ollama) 의 cost 가 책 archive 외 비-OCR KB 에 부적합. ```toml [pdf.ocr] enabled = true model = "qwen2.5vl:3b" # 다른 default 는 README 참조 ``` qwen2.5vl:3b 의 Ollama pull: ```bash ollama pull qwen2.5vl:3b # 3GB Ollama image ``` ## 2. v0.19 indexed scanned PDF 의 force-reingest v0.19 binary 로 scanned PDF 를 ingest 한 KB 는 자동으로 OCR path 진입 안 함 — parser_version "pdf-text-v1" 보존 (CLAUDE.md §Versioning cascade 의 trigger 회피 결정, H-4). 따라서 v0.20 binary upgrade + config `pdf.ocr.enabled = true` 만 적용 시 try_skip_unchanged 의 Unchanged path 가 OCR 실행을 skip. 명시적 재처리: ```bash kebab ingest --root /path/to/kb --force ``` ## 3. DCTDecode-only v1 scope (FlateDecode / CCITTFax page 처리) v0.20.0 의 PDF page image extract = lopdf 의 image XObject 의 /Filter == DCTDecode 만 cover (JPEG passthrough). 다른 encoding (FlateDecode raw pixel, CCITTFaxDecode bilevel, JPXDecode JPEG2000) 은 warning event 발행 + 해당 page skip. scanned PDF 의 일부 page 가 FlateDecode 또는 CCITTFax 로 encoded 시: ```bash qpdf --object-streams=disable --recompress-flate input.pdf normalized.pdf ``` v1 의 의도 = single binary 원칙 (image crate 도입 0). v1.1+ 또는 별 sub-item 에서 multi-filter 지원 검토. ## 4. Family asymmetry (image OCR gemma4:e4b vs PDF OCR qwen2.5vl:3b) image OCR (P6) 의 default 는 gemma4:e4b 그대로 (변경 0). PDF OCR (v0.20) 만 qwen2.5vl:3b. 사용자가 [image.ocr] model = "qwen2.5vl:3b" 으로 통일 가능 단 default 는 family asymmetric 보존. ## Dogfood + test 결과 - workspace test: 178 result lines, 0 failure. - workspace clippy (-D warnings): exit 0. - alnum e2e (real Ollama, manual invoke): - F1 (한국어 page1): 94.79% (≥ 0.85 threshold). - F2 (받침-intensive): 81.56% (≥ 0.70 threshold). - integration smoke + vector PDF regression: pass. ## 변경된 surface - new config: [pdf.ocr] (11 field) + 11 env override KEBAB_PDF_OCR_*. - new wire: IngestEvent::PdfOcrStarted/Finished (additive minor). - new wire: IngestItem.pdf_ocr_pages/ms_total (additive minor). - new CLI line: "📷 OCR page N..." / "✓ OCR page N (chars chars, msms via ollama-vision)". - new module: kebab-parse-pdf::{page_image, text_quality} + kebab-app::pdf_ocr_apply. - dep: workspace lopdf = "0.32" 통합. - fixture: 5 PDF (F1/F2/F4/F6/F7) under crates/kebab-parse-pdf/tests/fixtures/. ## 변경되지 않은 surface (invariant) - Extractor::extract trait body byte-identical (PR #187). - PdfTextExtractor body 변경 0 — post-extract enrichment pattern 으로 분리. - parser_version "pdf-text-v1" 보존. - chunker_version "pdf-page-v1" 보존. - workspace.dependencies 의 production dep graph 변경 0 (-e normal baseline 보존). ## sub-item 의 11 commit history `9d7faab` Step 1: foundation + cargo tree baselines `aeeff36` Step 2: lopdf /Filter probe + 5 fixture commit (F1/F2/F4/F6/F7) `fb3952d` Step 2 fix: F7 conversion engine record correction `c2cd3a7` Step 3: page_image + text_quality modules (10 test) `8d81bc1` Step 3 fix: clippy pedantic in page_image `9f003ef` Step 4: pdf_ocr_apply helper (10 test, F7 split + cancel) `fd918a6` Step 5: [pdf.ocr] config section + PdfOcrOpts doc `4672cba` Step 5 fix: clippy::bool_assert_comparison in pdf_ocr tests `b9ee09f` Step 6: wire PDF OCR enrichment + cancel propagation `4c5ccd5` Step 7: wire schema additive — IngestEvent + IngestItem + skipped `c9e0594` Step 8: CLI printer activation + ingest_progress test + spec literal `4819768` Step 9: integration smoke + vector regression + alnum e2e `1d4e301` Step 9 follow-up: Cargo.lock for dev-dep additions `90726ab` Step 10: docs sync (README + HANDOFF + ARCHITECTURE + SMOKE) ## § Acceptance §9 verifier evidence K5 의 15 row scriptable verifier 모두 green (또는 manual real-Ollama row 의 결과 보고): - Row #4 (vector PDF byte-identical): pass. - Row #5 (Extractor::extract trait byte-identical): 0 line diff. - Row #6 (wire schema additive): jq + diff exit 0. - Row #7-#8 (clippy / workspace test): exit 0. - Row #9-#10 (dep graph baseline -e normal): empty diff. - Row #11 (docs sync): grep evidence. - Row #12 (version bump): "0.20.0" + Cargo.lock cascade ≥ 22. - Row #14 (PR #187 invariant): extract_for(&asset.media_type) ≥ 1. - Row #15 (DCTDecode-only v1, F6/F7 skip): test green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 11:03:44 +00:00
altair823	c9e05941c5	feat(cli): activate per-page PDF OCR progress printer + test(app): ingest_progress emit verify + spec(pdf-ocr): align §4.6.1 literal with option_A (ms/chars) Step 8 (Group H) of v0.20.0 sub-item 1 (scanned PDF OCR) plan + Step 7 reviewer concern fix (spec literal deviation). H1 — kebab-cli/src/progress.rs printer activation: - 구 no-op stub `IngestEvent::PdfOcr* { .. } => {}` (Step 6 placeholder) 를 사람-친화 stderr line printer 로 활성화. - spec §4.6.1 line 1085-1086 wording 그대로: - PdfOcrStarted → ` 📷 OCR page {page}...` - PdfOcrFinished (skipped=false) → ` ✓ OCR page {page} ({chars} chars, {ms}ms via {ocr_engine})` - PdfOcrFinished (skipped=true) → ` ⊘ OCR page {page} skipped (no DCTDecode or engine fail, {ms}ms)` (M-4 의 skipped field carry 활용) - `!quiet` gate 정합 (AssetStarted/Finished pattern mirror). H2 — crates/kebab-app/tests/ingest_progress.rs 의 새 test: - pdf_ocr_progress_emits_started_finished_events (real Ollama 의존, `#[ignore]`). - F1 fixture (scanned_page1.pdf) ingest 시 pdf_ocr_started + pdf_ocr_finished event 가 emit 됨을 verify. Started count == Finished count invariant. - Manual invoke: `KEBAB_PDF_OCR_ENABLED=true cargo test -p kebab-app --test ingest_progress --ignored`. - mock OcrEngine inject path 부재 (Step 6 의 eager build), Step 9 I5 의 ocr_e2e pattern (real Ollama + `#[ignore]`) 와 동일. Step 7 reviewer concern fix — spec §4.6.1 literal: - line 1076-1077 의 `ocr_ms` / `ocr_chars` literal 을 wire schema 의 실제 field name `ms` / `chars` (option_A, Rust serde 와 정합) 로 갱신. - line 1087 의 printer wording 도 `{ocr_chars}` / `{ocr_ms}` → `{chars}` / `{ms}`. - line 1556 의 rationale 참조 `pdf_ocr_finished.ocr_ms` → `.ms`. - `skipped` field 도 명시 (Step 6 reviewer M-4 결과). spec: docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (§4.6.1) plan: docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 8 H1+H2) prior: `4c5ccd5` (Step 7 wire schema) — Step 7 reviewer concern 1 의 fix contract: §9 (additive minor wire bump — Step 7 commit 에서 완료) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 09:18:49 +00:00
altair823	fb3952d54f	docs(pdf-ocr): correct F7 conversion engine record in PoC doc (gs, not ImageMagick) `aeeff36` 의 PoC doc append (engine-comparison.md L134, L141) 가 F7 (`ccitt.pdf`) 의 conversion engine 을 "ImageMagick `convert -compress Group4`" 로 기록했으나, 실제 tests/fixtures/_synth/flate_ccittfax.sh:77-83 은 `gs -sDEVICE=pdfwrite -dMonoImageFilter=/CCITTFaxEncode -dEncodeMonoImages=true` flag 사용 (ImageMagick `convert` 호출 0회). fixture binary (`/Filter [ /CCITTFaxDecode ]`, 2060 bytes) 는 invariant 충족 OK (Step 2 spec compliance + code quality review verified). historical record 의 factual correction only. review trail: - impl result: .omc/reviews/2026-05-27-pdf-ocr-step-02-impl-result.md - spec review: .omc/reviews/2026-05-27-pdf-ocr-step-02-spec-review-result.md - code review: .omc/reviews/2026-05-27-pdf-ocr-step-02-code-review-result.md (I1) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 05:36:56 +00:00
altair823	aeeff3635b	poc+test(pdf-ocr): lopdf /Filter probe + 5 fixture commit (F1/F2/F4/F6/F7) for v0.20 sub-item 1 Step 2 (Group B) of v0.20.0 sub-item 1 (scanned PDF OCR) plan. B1 — lopdf /Filter probe (Python re + shell grep on synthesized fixtures, result appended to docs/superpowers/poc/2026-05-27-pdf-ocr-engine-comparison.md). Key findings: - reportlab default (useA85=1) yields /Filter [ /ASCII85Decode /DCTDecode ]; useA85=0 gives pure /Filter [ /DCTDecode ] with JPEG magic ffd8ffe0. - Pillow RGB.save('.pdf','PDF') uses DCTDecode — F6 FlateDecode requires manual PDF construction via zlib.compress. - ghostscript pdfwrite rejects TIFF input (/undefined in II*) — ImageMagick `convert -compress Group4` used for F7 CCITTFax. B2 — 5 fixture 합성·commit under crates/kebab-parse-pdf/tests/fixtures/: - F1 scanned_page1.pdf — /Filter [ /DCTDecode ], JPEG magic ffd8ffe0 (page1-clean.png, 한국어). - F2 scanned_page2.pdf — /Filter [ /DCTDecode ], JPEG magic ffd8ffe0 (page2-clean.png, 받침). - F4 mojibake.pdf — DejaVu TTF + ToUnicode CMap stripped (count=0); Noto CJK TTC has PostScript outlines unsupported by reportlab. - F6 flate_raw.pdf — /Filter /FlateDecode, DCTDecode absent (skip path input). - F7 ccitt.pdf — /Filter [ /CCITTFaxDecode ], DCTDecode absent (skip path input). Synth scripts under tests/fixtures/_synth/: - scanned_pdf.py — F1/F2 reportlab drawImage + JPEG passthrough (useA85=0). - mojibake.py — F4 reportlab DejaVu TTF + ToUnicode strip. - flate_ccittfax.sh — F6 manual zlib PDF + F7 Pillow TIFF group4 + ImageMagick convert. spec: docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md (§5.1) plan: docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (Step 2 B1+B2) contract: §9 (additive minor wire bump — 후속 step) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 04:04:47 +00:00
altair823	9d7faab650	docs+chore(plan-bootstrap): apply spec L-1 cosmetic fix + capture cargo tree baselines for v0.20 sub-item 1 verifier gates Step 1 (Group A) of v0.20.0 sub-item 1 (scanned PDF OCR) implementation plan. A1 — spec §4.2 line 740 prose pseudo-code fix: `app.pdf_ocr_engine.as_ref()` → local `pdf_ocr_engine: Option<OllamaVisionOcr>` built in `ingest_with_config_opts` (정합 with §4.4 eager init, App field 도입 0). A2 — Cargo.toml dep invariant verified (image crate 미도입 — H-3 DCTDecode-only v1 invariant 보존; kebab-parse-pdf + kebab-parse-image 가 kebab-app 의 기존 dep). description 갱신은 Step 3 (module 추가 후) 으로 이연. A3 — cargo tree baseline 캡처 — K5 row #9/#10 의 ground-truth (.omc/state/pdf-ocr-{app-parse,parse-pdf}-deps.baseline.txt). 본 sub-item 의 다른 step 의 dep graph 변경 0 invariant 의 verifier 의 baseline. Note: .omc/ 는 .gitignore 대상 — baseline files 는 로컬 파일로 존재. spec: docs/superpowers/specs/2026-05-27-pdf-scanned-ocr-spec.md plan: docs/superpowers/plans/2026-05-27-pdf-scanned-ocr-plan.md (round 1c ACCEPT) contract: §9 (additive minor wire bump — 후속 step) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 03:40:49 +00:00
altair823	574e1b1ca1	docs: v0.20 image+pdf handoff + sub-item 3 spec/plan backfill v0.19.0 release 후 다음 session 인계용 handoff 문서 + 사후 backfill. - docs/superpowers/handoffs/2026-05-26-v0.20-image-pdf-normalize-handoff.md (540 lines, 9 section) - sub-item 1/2/3 머지 결과 + 도그푸딩 baseline (1781 doc / 9050 chunks) + user memory + OMC workflow + 빌드 환경 - 현재 구현 상태 (v0.19.0, image+pdf) — 정확한 file:line + struct/fn signature + flow - 8 TODO 상세 (problem + scope + affected files + risk + trigger 조건) - 우선순위 + sequencing 권장 + 새 session 첫 단계 제안 - docs/superpowers/specs/2026-05-26-extractor-dispatch-unification-spec.md (sub-item 3 spec) - docs/superpowers/plans/2026-05-26-extractor-dispatch-unification-plan.md (sub-item 3 plan) PR #187 머지 시 source code 만 들어가고 spec/plan 누락 — 동일 PR 의 reference link 가 main 에서 404. 본 commit 으로 backfill. Assisted-by: Claude Code	2026-05-26 23:34:17 +00:00
altair823	710945c4b0	refactor(parse-md): absorb kebab-normalize + kebab-parse-types — 24 → 22 crates + §3.7b 재작성 design §3.7b 의 thin layer (ParsedBlock 류) 가 4 parser 중 1개 (markdown) 만 lift 를 경유하는 현실 — fan-in/fan-out 모두 1 → layer 의미 잃음. kebab-normalize (1097 LOC) + kebab-parse-types (98 LOC) 둘을 kebab-parse-md 로 흡수. 설계: docs/superpowers/specs/2026-05-26-normalize-absorption-spec.md 플랜: docs/superpowers/plans/2026-05-26-normalize-absorption-plan.md HOTFIXES: tasks/HOTFIXES.md 의 2026-05-26 entry (design deviation) - 5 사용 type + 3 forward-declared struct → kebab-parse-md::types module 의 pub explicit re-export. - build_canonical_document + derive_title + warning_agent → kebab-parse-md::normalize module. - 4 hard-coded agent literal (lib.rs:122/128/134/153) + warning_agent body return + tracing target literal 모두 보존 — stage label 일관성. - kebab-app callsite (lib.rs:51 use + :1119 context string) + Cargo.toml 의 2 dep (regular + dead) 제거. - kebab-chunk + kebab-store-sqlite 의 [dev-dependencies] kebab-normalize → 제거 (kebab-parse-md 로 갈음). 통합 test source 의 use shift. - test file 이동 (kebab-normalize/tests/normalize_snapshot.rs → kebab-parse-md/tests/). - workspace Cargo.toml: Hunk (a) members 2 entry 삭제 + Hunk (b) version 0.18.0 → 0.19.0 (frozen contract 변경). - design §3.7b 4-단락 재작성 (원래 intent 보존 + 현재 상태 + 보존된 surface + future re-extraction trigger). - design §8 graph 갱신 (3 edge 제거 + 2 forbidden bullet 의미 갱신 + commentary). - ARCHITECTURE.md crate graph + directory tree mechanical 갱신. - tasks/INDEX.md L169 closure mention + "Future work / deferred" 섹션 신설 (image/pdf normalize integration entry). - tasks/HOTFIXES.md 신규 entry (4-block — design deviation Symptom). - HANDOFF.md cross-link 한 줄. - 3 dead struct (ParsedImageRegion / ParsedPdfPage / ParsedAudioSegment) 는 보존 — v0.20+ image/pdf normalize integration 의 future surface (spec §11). Wire / surface impact: 0건. CLI / TUI / MCP / --json 출력 / config / XDG path / parser_version 모두 unchanged. wire-invisible provenance.events[].agent + tracing target literal "kb-normalize" 도 보존 — old DB row 와 new DB row 의 audit log 일관성. Verification: cargo test --workspace --no-fail-fast -j 1 → 1313 passed / 0 failed (172 result blocks). cargo clippy --workspace --all-targets -j 1 -- -D warnings → 0 warning (5m 46s). cargo metadata --no-deps --format-version 1 \| jq '.workspace_members \| length' = 22. cargo tree -p kebab-app --depth 2 \| grep -E "kebab_(parse_types\|normalize)" = 0 줄.	2026-05-26 15:00:59 +00:00
altair823	bd48baa19a	refactor(source-fs): drop kebab-parse-code dep — 9 tree-sitter grammars drag 제거 kebab-source-fs 가 kebab-parse-code 의 9 tree-sitter grammars 를 drag 했던 무거운 의존성 제거. 4 surface (code_lang_for_path / is_generated_file / is_oversized / BUILTIN_BLACKLIST) 만 사용하지만 dep 그래프에서 9 grammar 전체 link → kebab-source-fs::code_meta 로 이전 + kebab-parse-code 측 cleanup. 핵심 변경: - kebab-source-fs::code_meta 신설: 4 surface 이전 (BUILTIN_BLACKLIST `pub` for frozen contract + 3 helper fn `pub(crate)`). lib.rs 의 `pub use code_meta::BUILTIN_BLACKLIST` 1 줄 추가 (Option A — 다른 mod surface 무근거 확장 0). - callsite migration: media.rs (1) + walker.rs (2) + connector.rs (2) 모두 `kebab_source_fs::code_meta::` 로 갱신. - kebab-parse-code 측 cleanup: skip.rs 삭제 + lang.rs narrow edit (code_lang_for_path body + unit test 2 + Path import 삭제, module_path_for_ 보존) + lib.rs 헤더 doc rewrite (migration breadcrumb 포함). - tests/{lang,skip}.rs 13 test 이동 — 12 unit (`src/code_meta.rs::tests`) + 1 integration (`tests/code_meta.rs` for BUILTIN_BLACKLIST frozen contract). - design §8 graph: edge 제거 + p10-2 inline note. ARCHITECTURE.md 산문 1 줄 갱신. kebab-core::metadata.rs:36 stale dep reference 정정. G1+G5: cargo tree -p kebab-source-fs \| grep tree-sitter = 0 줄. G2+G3: workspace test 회귀 0 + 13 test 1:1 이동. G4: design §8 + ARCHITECTURE.md 갱신. Wire 영향: 없음 (internal Rust crate-API surface 만, user-facing 0). Cargo workspace.version bump 불필요. Refs: - docs/superpowers/specs/2026-05-26-source-fs-dep-lightening-spec.md (v3, 4-round APPROVE) - docs/superpowers/plans/2026-05-26-source-fs-dep-lightening-plan.md (v4, 4-round ACCEPT) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 12:19:32 +00:00
altair823	336962715a	fix(rag): S3 NLI unavailable — hypothesis char budget + token-count fallback retry S3 dogfood query 의 `nli_model_unavailable` consistent fail root cause = mDeBERTa-v3 tokenizer 의 `OnlyFirst` strategy + 949-token hypothesis. 기존 char-budget 단독 fix 의 KR-extreme density 미해결 → token-count fallback retry + RC1-residual trait dispatch 정합. 핵심 변경: - kebab-nli::NliVerifier: `hypothesis_token_count(&str) -> Result<usize>` trait method 추가 (default `Ok(0)` backward-compat). `OnnxNliVerifier` 가 trait impl block 안에서 real mDeBERTa tokenize override — vtable 등록 보장 (round-3 critic RC1-residual closure). - kebab-rag::pipeline: `MAX_NLI_HYPOTHESIS_CHARS_INITIAL = 1200` + `MAX_NLI_HYPOTHESIS_CHARS_MIN = 150` const + `pub(crate) fn truncate_chars` pure-fn + `pub fn truncate_hypothesis_for_nli_with_budget` retry helper (char budget 반감 retry, min floor 시 graceful unavailable). step 8.5 hook 의 callsite explicit `match` + `return self.refuse_nli_model_unavailable` 패턴 (`?` 금지 — round-2 plan critic CRITICAL #1 closure). - SpyNliVerifier 신규 helper (closure score_fn + hypothesis_token_count_fn, 2-arg constructor). - §5.1 의 2 ignored test (EN-long err + vtable dispatch RC1-residual pin) + §5.2 의 4 boundary test (truncate_chars) + §5.3 의 3 mock multi-hop test (long_en_grounded / long_kr_retries / unrelenting_fallback). +7 new tests (2 ignored default skip). - tasks/HOTFIXES.md 신규 dated entry `## 2026-05-26 — S3 NLI unavailable ...` — Symptom / Root cause / Action / Amends 4-block. - spec + plan (`docs/superpowers/{specs,plans}/2026-05-26-s3-nli-model-unavailable-diagnose-.md`) — 4 round spec + 3 round plan OMC reviewer ACCEPT 산출물. 검증: - cargo test -p kebab-nli -j 1 → 11/11 pass + 7 ignored default skip. - cargo test -p kebab-rag -j 1 → 19+3+3+... 전체 pass + 3 new mock + 4 new boundary. - cargo test --workspace --no-fail-fast -j 1 → 1313 pass (+7 new)*, 0 failed. 회귀 0 (HOTFIX #15 이미 fixed, no remaining flaky). - cargo clippy --workspace --all-targets -j 1 -- -D warnings clean (type_complexity allow on Arc<dyn Fn> type aliases). KR safe (token-count retry path) + graceful fallback (min floor 시 기존 unavailable wire 유지, regression 0). Wire 영향 없음 (additive trait method). Cargo bump 불필요. Refs: - spec: docs/superpowers/specs/2026-05-26-s3-nli-model-unavailable-diagnose-spec.md (4 round APPROVE — analyst → critic + verifier × 4 rounds) - plan: docs/superpowers/plans/2026-05-26-s3-nli-model-unavailable-diagnose-plan.md (3 round ACCEPT — planner → critic-plan + verifier-plan × 3 rounds) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 09:12:21 +00:00
altair823	a210bf5d52	docs(rag): HOTFIX #15 spec + plan (3 round OMC reviewer approve) OMC team `hotfix-15-mcp-flaky` 의 spec + plan 작성 + 리뷰 산출물. - spec: analyst 가 진단 (root cause = PR-7 probe-first 가 PR-5 test 의 stale empty-KB contract 와 mismatch) + Option A 권장 (test-only fix). 3 round review (critic + verifier): CRITICAL C1 (fixture/query FTS5 0 hits) + MAJOR M1/M2 + 등 closure. - plan: planner 가 7 steps + subagent dispatch task 작성. 3 round review (critic-plan + verifier-plan): empirical SQLite REPL 검증, level-1 dated entry placement, actual KebabHandler/KebabAppState pattern 정합. implementation = `429287f` commit (executor). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 06:52:04 +00:00
altair823	98cf4e8a04	chore(release): bump version 0.17.2 → 0.18.0 + cut fb-41 multi-hop v0.18.0 cut PR. fb-41 multi-hop RAG + NLI verification 의 user-visible surface (PR #176-180) + post-PR9 cleanup/refactor (PR #181) ship 마무리. ## 변경 사항 ### Version - workspace `Cargo.toml`: 0.17.2 → 0.18.0. Cargo.lock 자동 cascade (24 kebab-* crate 모두 0.18.0). ### Frozen design contract - `docs/superpowers/specs/2026-04-27-kebab-final-form-design.md`: - §3.8 RAG types — RefusalReason 에 NliVerificationFailed + NliModelUnavailable + MultiHopDecomposeFailed 추가 + Multi-hop RAG + NLI verification 의 ask_multi_hop facade + step 8.5 NLI hook + HopRecord / VerificationSummary 명시. - §9 versioning rules 표 — nli_model_version row 신규 (선택 — v0.19+ second adapter 시 wire surface candidate). ### Status transitions - `docs/superpowers/specs/2026-05-25-p9-fb-41-finalize-spec.md`: status approved-by-team → completed. - `docs/superpowers/plans/2026-05-25-p9-fb-41-finalize-plan.md`: status approved-by-team → completed (spec_status 도). ### User-facing docs - `README.md`: 명령 표의 `kebab ask` row 에 `--multi-hop` flag + NLI 옵션 안내 한 단락 (mDeBERTa-v3 XNLI 280 MB 자동 다운로드 / RAM peak ~7-8 GB / threshold tuning 0.5 prod / 0.0 disable). - `docs/SMOKE.md`: `[rag] nli_threshold = 0.0` config 예시 + 활성화 절차 + first-run download + RAM 권장 inline 안내. ### Handoff + dashboard - `HANDOFF.md`: 한 줄 요약 의 현재 version 0.17.2 → 0.18.0. v0.18.0 cut entry 추가 (fb-41 multi-hop + NLI + cleanup ship). Component 카운트 단락에 fb-41 PR-9 의 kebab-nli + ask_multi_hop 추가 명시. 머지 후 결정 절 맨 위에 v0.18.0 fb-41 entry 신규. - `tasks/INDEX.md`: p9-fb-41 ⏳ → ✅ 머지 (v0.18.0). v0.18.0 subsection 신규 — PR #176-181 의 6 sub-PR + cleanup 각 한 줄 요약. ## 비범위 / 별 작업 - HOTFIXES.md 의 fb-41 entry 는 이미 PR #180 (PR-9d closure) 에서 작성 완료 — 본 cut PR 에서 추가 anchor 불필요. - SKILL.md 의 v0.18+ NLI 안내는 이미 PR-9c-2 에서 inline 추가 완료. ## 검증 - `cargo check --workspace -j 1` 통과 (모든 24 crate v0.18.0 확인). - frozen design 의 RefusalReason enum 확장이 kebab-core 의 production code 와 정합 (PR-9c-1 시점부터 동일 variants 있음). Wire 영향: 없음 (additive minor 는 PR-9c-1 에서 이미 ship, 본 commit 은 documentation cascade only). Behavior 영향: 없음. 머지 후 `gitea-release v0.18.0` 으로 tag + release notes 작성. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 05:18:08 +00:00
altair823	44fbffff26	docs(rag): fb-41 PR-9 spec + plan — NLI verification + v0.18.0 cut fb-41 multi-hop RAG 의 dogfood S7 hallucination root cause = LLM-self-judge ceiling. 대응 = NLI-based post-synthesis verification (mDeBERTa-v3 XNLI, 280 MB ONNX). 산출물: - docs/superpowers/specs/2026-05-25-p9-fb-41-finalize-spec.md (review_round=5, 4 OMC reviewer APPROVE: 1 CRITICAL + 9 MAJOR + 3 MINOR → 1 NIT carry-forward). - docs/superpowers/plans/2026-05-25-p9-fb-41-finalize-plan.md (plan_review_round=3, 4 OMC reviewer APPROVE: 15 issues → 0 actionable). 5 sub-PR (PR-9a~9d) + cut PR. 작업 21-31h / wall time 28-44h. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 21:22:20 +00:00
altair823	7150c376bb	feat(rag): fb-41 PR-3a — HopRecord wire + RagCfg multi-hop knobs PR-3 의 분할 첫 PR. wire additive (HopRecord + HopKind + Answer.hops field) + RagCfg 의 multi_hop_* 3 노브. RAG pipeline 동작 미변경 — 모든 Answer literal 의 `hops = None`. PR-3b (후속) 가 ask_multi_hop 의 happy path 에서 dynamic decide loop 구현 + hops trace 채움. 분할 이유: 원래 PR-3 가 wire + cfg + decide loop + ScriptedLm + helper refactor + 5+ tests 단일 PR 였는데 ~1500 줄 단일 patch 가 review 부담 + 회기 위험 ↑. additive foundation 부터 ship 후 decide loop 별 PR — 사용자 결정 (2026-05-25). - `kebab_core::HopRecord` (iter, kind, sub_queries, context_chunks_added, forced_stop, llm_call_ms) + `HopKind` (Decompose / Decide / Synthesize) — wire-additive shape. - `kebab_core::Answer.hops: Option<Vec<HopRecord>>` — `#[serde(default, skip_serializing_if = "Option::is_none")]`, single-pass / refusal path 는 None, PR-3b 의 multi-hop happy path 가 Some. - `kebab_config::RagCfg` 에 3 신규 노브: - `multi_hop_max_depth: u32` (default 3) - `multi_hop_max_sub_queries_per_iter: u32` (default 5) - `multi_hop_max_pool_chunks: u32` (default 30) 3 모두 `#[serde(default)]` + env override (`KEBAB_RAG_MULTI_HOP_MAX_*`) + legacy parse 핀 (`LEGACY_PRE_TIMEOUT_TOML` 공유). - 9 Answer literal site (pipeline.rs ×6 + kebab-cli + kebab-tui tests + kebab-eval test) 에 `hops: None` 명시 추가. exhaustive field check 가 자동 guard — 빠진 site 시 compile fail. - plan 의 PR-3 단락 → PR-3a / PR-3b 분할 명시 + scope 정정. Tests (163 passing across kebab-config + kebab-core + kebab-rag): - 5 신규 multi-hop knob test (default / env override / legacy parse). - 기존 50+57+31+19+3+3 test 모두 hops:None 추가 후도 통과. Wire 영향: `answer.v1` 의 optional `hops` 필드 — `skip_serializing_ if = None` 이라 single-pass response 에 emit 안 됨. wire breaking 아님, JSON Schema 갱신은 PR-3b 또는 PR-4 (실제 emit 시점). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 07:15:01 +00:00
altair823	192da45dbf	chore(rag): PR #167 회차 1 리뷰 반영 - `parse_decompose_response_drops_partial_empty_keeps_valid` 신규 회귀 핀 — `["", "valid q", " "]` → `["valid q"]` (trim+filter chain 동작 pin). - `multi_hop_decompose` 의 `stop: Vec::new()` 옆 doc comment 추가 — 의도 명시 (instruction-following 모델 기대 + prose 추가 시 MultiHopDecomposeFailed refusal 가 policy). 회차 1 question 의 답변. - plan 의 PR-3 implementation order 에 회차 1 carry-over 추가: 1) ask + ask_multi_hop 의 §4-§9 mirror → 공통 helper 추출, 2) decompose template 의 substitution corner case → format! named arg 으로 교체. 회차 1 의 다른 suggestion (mirror refactor, substitution corner case, history block helper) 는 PR-3 합리적 timing 으로 plan 에 명시 — 회차 2 reply 에 정리. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 06:49:21 +00:00

1 2 3

139 Commits