kebab

Author	SHA1	Message	Date
altair823	16ddb1dfc3	docs: arctic 임베더 문서 동기화 (README/ARCHITECTURE/HANDOFF/HOTFIXES) README Configuration: provider candle/ollama + arctic 모델(candle CLS / ollama 태그) + endpoint + e5→arctic cascade 경고. ARCHITECTURE: 백엔드 그래프 노드(embedollama) + 임베딩 백엔드 결정표(채택 근거 측정 recall@10 130) + 디렉토리 트리. HANDOFF 1줄. HOTFIXES 2026-06-03 arctic dated entry(레지스트리/pooling/prefix/cascade + 수동 cosine 0.999984 실측 결과). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 04:59:23 +00:00
altair823	e2ae9a4589	docs(spec): arctic-embed-l-v2.0 임베더 통합 spec + plan 별칭 제거 후 설명형 recall 보강 최선책(측정 arctic recall@10 130/132, 용어 무손실). candle 다중모델화(CLS pooling+query: prefix) 우선 + Ollama embed provider 폴백. correctness 게이트 = candle≈Ollama 코사인>0.99. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 04:14:26 +00:00
altair823	fc5103642e	docs: 별칭 제거 문서 동기화 + version 0.25.0 HOTFIXES 2026-06-03 dated entry, 2026-05-30 design spec 제거 banner, HANDOFF 1줄, README(별칭 섹션/config/명령표 정리), ARCHITECTURE(결정 표 + 디렉토리 트리), SMOKE/DOGFOOD config-migrate 예시 정정. workspace version 0.24.0 → 0.25.0 (+ Cargo.lock). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 21:37:58 +00:00
altair823	a48c405826	refactor(wire): ExpansionProgress 이벤트 + 렌더 제거 IngestEvent::ExpansionProgress variant + 직렬화 테스트 제거(AssetChunked/ AssetTimings 유지). CLI/TUI 의 expansion 렌더 제거, AssetTimings 한 줄에서 expand 세그먼트 제거. ingest_progress.v1 schema 의 expansion_progress kind 제거, expansion_ms 설명을 "값 0 유지"로 갱신. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 21:37:44 +00:00
altair823	ca8c0645fb	docs(spec): doc-side expansion(별칭) 제거 spec + plan 연구문서(2026-06-03) 결론 따라 별칭 기능 제거 설계. 유지(Metadata.aliases, AssetChunked/AssetTimings, embedding 캐시)/제거(Chunk.aliases, expansion.rs, ExpansionCfg, chunk_aliases_fts, run_alias_query, ExpansionProgress) 경계 + 신규 DROP 마이그레이션 + wire expansion_progress 제거 결정. 구현 plan 8 task. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 20:13:12 +00:00
altair823	c7af6612b7	docs(research): expansion 비용 재고 + 별칭 대체 딥리서치 별칭(doc-side per-chunk LLM expansion)이 ingest 임계경로 병목으로 확정된 뒤 대안 조사. 동시성(OLLAMA_NUM_PARALLEL 최대 1.28×)·모델스왑(qwen3.5 중국어/ degeneration)·백그라운드(총량·treadmill 불변) 모두 실측 소진. Step 0 측정: 별칭 없이도 cross-lingual recall@10 완벽(en/ko/syn/abbr), 약점은 설명형뿐 → 별칭 ROI 음수. Step 1: bge-m3 dense 는 lateral(설명형 +3 / 용어 -3, 순0). 4-agent 딥리서치: 잔존 = reverse-dictionary 과제, 측정-우선 계층(heading enrichment → arctic-ko 임베더 → bge-reranker-v2-m3 → near-tie 게이트 expansion). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 20:07:42 +00:00
altair823	8bfa4ba76e	fix(ingest-progress): 리뷰 반영 — store_ms 경계 정정 + 중복 expansion 프레임 가드 - store_ms 에서 stale-vector orphan purge(LanceDB I/O) 제거 → embed/vector phase (embed_ms)로 이동. store_ms 가 이제 SQLite put_* 만 의미(진단 정확도; 편집 재색인 시 920ms 오귀속 제거). purge 는 여전히 unconditional + upsert 이전. - 최종 expansion_progress 프레임을 done != last_done 로 가드 (throttle 배수 시 중복 프레임 + chunks==0 시 0/0 프레임 제거). - schema/HOTFIXES: store_ms/embed_ms 설명 정정 + dangling IMPL_REPORT 참조 제거. clippy -D warnings 0, test 312 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 14:49:02 +00:00
altair823	a48b055358	feat(ingest): asset 내부 phase 진행 로깅 (asset_chunked/expansion_progress/asset_timings) + v0.24.0 asset(문서) 단위뿐이던 ingest 진행 이벤트에 문서 내부 phase 가시성을 추가. 큰 문서가 expansion(별칭 LLM, 청크당 순차)으로 수십 분 걸려도 진행바가 1/N 에 멈춘 듯 보이던 문제 해결. wire ingest_progress.v1 additive (backward-compat): - asset_chunked {idx,total,chunks} — 청킹 직후, markdown/image/pdf 전 경로 - expansion_progress {idx,total,done,chunks} — expansion 루프 스로틀 (25청크 또는 1s, 종료 시 done==chunks). 캐시 히트도 done 에 포함 - asset_timings {idx,total,parse_ms,chunk_ms,expansion_ms,embed_ms,store_ms} — markdown 경로 phase별 wall-clock 설계: timing 은 kebab_core::IngestItem(wire-stable) 변경을 피해 신규 AssetTimings 이벤트로 ingest_one_asset 가 직접 emit (AssetFinished 무변경). CLI(progress.rs): 진행바 sub-message(→ N chunks / 별칭 확장 done/chunks) + asset 종료 시 phase timing 한 줄(fmt_ms). TUI reducer no-op arm. 검증: clippy -D warnings exit 0; cargo test -p kebab-app -p kebab-cli 312 passed/0 failed. ordering-invariant 테스트 재작성 + 신규 직렬화 테스트. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 13:58:27 +00:00
altair823	d85d7348a5	docs(embed-candle): 도그푸딩 + A1 반증 + MKL 부정결과 증거 기록 - HOTFIXES + release-notes: candle 전체 도그푸딩 997 docs/23,151 chunks/에러 0 (9.5h) - A1(taskset -c 0-3) 실서버 반증: 4코어 제한에도 onnxruntime segfault → candle 만이 실 해법 - MKL 가속 부정 결과: 코어 더 쓰나 38~50% 느림 → 미채택, 순수-Rust 유지 - 패리티 2.01e-7 재확인, 성능 트레이드오프 명시 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 09:08:12 +00:00
altair823	edac3ae737	chore(embed-candle): PR #199 회차 1 리뷰 반영 — SMOKE.md candle 모델 주의 명시 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 17:01:35 +00:00
altair823	6ec4e6809f	fix(embed-candle): address round-1 review - commit track-spec + meta-spec/plan into branch (HIGH: dangling `amends:` ref) - inline parity evidence (cosine 1.0, max_abs_diff 2.01e-7) into HOTFIXES + release notes; drop refs to deleted IMPL_REPORT/SPIKE_REPORT (MEDIUM) - model guard: reject non-e5-large `model` before the 2GB download so model_id() can't mislabel vectors (MEDIUM) + unit test - parity test now covers BOTH query: and passage: prefixes (MEDIUM) - guard encodings.first() index; document zero-attention/pooling invariant; clarify embed_batch prefixing doc (LOW) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 16:54:20 +00:00
altair823	8f7b6ee538	feat(embed): candle 임베딩 provider (NUMA-안전, opt-in) + v0.22.0 duo-socket NUMA 서버에서 fastembed(onnxruntime)가 intra-op 스레드를 48개로 하드코딩해 NUMA 힙 손상 → double-free 로 ingest 가 죽는 문제를 회피하기 위해, 같은 multilingual-e5-large 모델을 순수 Rust(candle)로 돌리는 opt-in 임베딩 provider 를 추가한다. - 신규 crate kebab-embed-candle: CandleEmbedder (kebab_core::Embedder). hf-hub safetensors → XLMRobertaModel forward → mask mean-pool → L2 → e5 prefix. candle 의존성 트리를 이 crate 에 격리 (core/config 외 kebab-* 의존 0). - 스레드 캡: [models.embedding].num_threads + env KEBAB_EMBED_THREADS → 글로벌 rayon 풀 1회 캡 (NUMA-안전 레버). - kebab-app::embedder() 가 provider 분기 (fastembed/onnx/"" → 기존 경로 불변, candle → CandleEmbedder, 미지값 → 에러). - Phase 0 스파이크 crate 제거 (production 흡수). - 버전 0.21.1 → 0.22.0 (신규 config surface, pre-1.0 minor bump). 패리티: cosine_min=1.000000, max abs diff=2.01e-7 (< 1e-5) → embedding_version 유지, 재색인 0. fastembed default 동작/벡터 불변. wire schema 변경 없음. 검증(파일+exit code): clippy -D warnings EXIT=0(warning 0), test EXIT=0 (candle unit 5 + thread_cap rayon=4 + config 68), parity #[ignore] EXIT=0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 14:52:25 +00:00
altair823	980e20fd8d	docs: SMOKE/DOGFOOD 에 config migrate 플레이북 추가 SMOKE 에 config migrate 스모크 단계(dry-run/적용/멱등/--json), DOGFOOD §9 에 스키마 마이그레이션 시나리오(.bak byte-identical·값 보존·가시화·멱등·doctor). v0.21.1 에 포함되도록 태그 이동. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 13:58:08 +00:00
altair823	f2cc325cf3	feat(cli): kebab config migrate 서브커맨드 + wire config_migration.v1 - Cmd::Config { Migrate { --dry-run } }, --json 시 config_migration.v1. - wire_config_migration (ConfigMigrationReport 가 schema_version 자체 보유). - schema.rs WIRE_SCHEMAS 에 config_migration.v1 등록 + JSON schema 파일. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 12:09:31 +00:00
altair823	6d86214060	docs(plan): config 마이그레이션 구현 계획 (TDD, 13 tasks) reconcile(additive)+step 체인(non-additive) 분리, init/migrate 공유 annotated_default_document, app facade 백업+atomic write, doctor 체크, CLI config migrate, wire config_migration.v1. bite-sized TDD steps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 11:39:31 +00:00
altair823	6bbb8f854b	docs(spec): config 마이그레이션 설계 계약 kickoff 인계(#197)의 brainstorm 결과를 확정한 spec. 트리거=명시 명령 `kebab config migrate`+doctor 안내, 주석 보존=toml_edit 부분 편집, 메커니즘=reconciliation(additive)+step 체인(non-additive) 하이브리드. init/migrate 가 주석 달린 default 문서를 공유. 안전 3축(멱등·백업·dry-run) + atomic write. wire schema config_migration.v1 신설. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 11:34:19 +00:00
altair823	16f3d6eef2	docs: config 마이그레이션 작업 인계 kickoff config.toml 스키마 진화 시 기존 사용자 파일 자동 마이그레이션 기능의 별도 세션 인계 문서. 현황(serde default forward-compat 있음/파일 마이그레이션 없음/schema_version 장식), 핵심 난점(주석 보존), 설계 3안(전체재작성/toml_edit append/백업), 트리거(명령 vs 자동), 방법론(v0.21.0 PR #195/#196 패턴) 정리. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 11:11:08 +00:00
altair823	88c5b83dea	docs: derivation-cache spec/handoff 독자 관점 보강 PR #195 구현(`e9b5202`) 기준으로 빠졌던 디테일 보강: - chunk_id(위치 기반 벡터 식별자) vs cache_key(내용 해시 조회 키) 구분 callout - §7 호환성/마이그레이션 신설: 본문 재색인 불필요, V012 가산이나 binary 교체 필요, 별칭 sentinel 묶음→개별 변경의 기존 KB 영향(레거시 호환) - version_key 에 kind 토큰("doc\|") 반영, orphan sentinel cleanup(LIKE prefix) 명시 - embed_with_cache 순서 보존 불변, 별칭 개별 벡터 근거(희석 13/18→16/18) - 정정: derivation_cache_gc 는 메서드만 존재하고 미연결(캐시 현재 무한 누적, 후속) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 10:25:00 +00:00
altair823	a8fd76499c	feat(expansion): doc-side expansion 별칭 개별 dense 벡터 + 파생물 캐시(V012) 별칭을 줄별 개별 dense 벡터(sentinel `{chunk}#alias#N`)로 색인하고 boilerplate 청크는 별칭 생성을 skip. 묶음 1벡터 방식은 평균화로 특정 표현이 희석돼 오히려 회귀(13/18)했던 것을 폐기. 변형 일관성 14/18 → 16/18, mean_spread@10 0.222 → 0.111 (나무위키 ~1000 문서 CS corpus). `kebab-core::strip_alias_suffix` 가 suffix 형과 per-alias 형 둘 다 처리. 파생물 캐시(V012): embedding 벡터 + 별칭 LLM 결과를 청크 내용 해시 키로 캐싱해 재색인 시 내용 불변 청크의 재계산을 skip. cache_key = blake3(kind ‖ text_blake3 ‖ version_key)[:32], version_key 에 model/prompt/dimensions 포함 → §9 cascade 와 정합(버전 bump 시 자동 miss). 측정: 정답 3개 cold 1879s → warm 13s ≈ 145배. 순수 가산이라 corpus_revision bump 없음. search/ask 는 kebab.sqlite+lancedb 만으로 동작 → 외부 서버 색인 후 DB 만 복사하는 이식 워크플로 가능. V012 schema migration + 신규 surface 로 workspace version 0.20.2 → 0.21.0 (minor) bump. README/HANDOFF/ARCHITECTURE/HOTFIXES sync. known limitation: stack·svm 설명형 2개 잔존 + grounded 판정이 부분 인용을 grounded 로 오분류(후속 후보). 측정 상세: docs/superpowers/handoffs/2026-05-31-namu-wiki-alias-cache-study.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 08:24:04 +00:00
altair823	d279f343e7	docs(spec,plan): 별도 벡터 인프라 — FK 제거(V011) + CASCADE 대체 + filter_chunks PoC: 별칭 순수 벡터가 영어 설명형 rank 7~30 (concat 본문 희석으로 미회복) → 별도 벡터 명분. 차단요인 3건: embedding_records FK(787, V011 재생성), CASCADE 대체(명시 DELETE), filter_chunks sentinel strip. plan Task 4.5/4.6. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 13:25:45 +00:00
altair823	825543549d	docs(plan): 별칭 dense 별도 벡터 구현 plan ALIAS_SUFFIX(core) → embed_aliases flag → ingest sentinel 벡터+purge → VectorRetriever strip+dedup → 측정. TDD, 완성 코드. doc-side expansion PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 10:28:43 +00:00
altair823	bcb8b93751	docs(spec): 별칭 dense 별도 벡터 설계 spec PoC(concat) 측정: dense 별칭이 6/0/2/0.25 (설명형은 dense 본령 실증), 단 영어 설명형 2개는 concat 본문 희석으로 미회복. 처방: 별칭을 sentinel chunk_id 별도 벡터로 색인(본문 벡터 불변=회귀 안전, 별칭 순수 신호). flag ingest.expansion.embed_aliases default off. lexical 완화는 폐기. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 10:26:24 +00:00
altair823	69b53d1c97	docs(spec): doc-side expansion 검색 메커니즘을 shipped 구현에 맞춰 정정 Task 6 리뷰 MINOR-1: spec 본문이 단일 UNION ALL+GROUP BY 로 기술됐으나 shipped = 2-query(run_query+run_alias_query) + Rust merge_body_alias(body 우선). 서로 다른 FTS 테이블 bm25 절대값 비교가 무의미해 body-우선 merge 가 더 깨끗. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 03:20:13 +00:00
altair823	467a974901	docs(plan): doc-side expansion 구현 plan + spec 정제 (별도 FTS 테이블) spec: chunks_fts §5.5 verbatim 충돌 회피 → 별도 chunk_aliases_fts 테이블 + lexical 내부 body+alias 병합(RetrievalDetail/wire schema 무변경)으로 정제. plan: 7 task TDD (Chunk 필드 → V010 → config → ExpansionGenerator → ingest hook → lexical 병합 → 측정/문서). 완성 코드 + 빌드 규약. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 02:04:58 +00:00
altair823	098413922b	docs(spec): 색인시 doc-side expansion 설계 spec (Phase 2) brainstorm 확정: 청크당 별칭 생성(같은언어+한↔영 번역), additive+수동 재색인, 1차 단순 품질제어. 별도 FTS5 aliases 채널 → RRF 3채널 융합. flag off 기본, kebab eval variants 로 on/off 측정. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 01:54:46 +00:00
altair823	8bb7c276d0	docs: Phase 2 doc-side expansion 킥오프 + 구현 방법론 핸드오프 새 세션이 Phase 2(색인시 doc-side expansion)를 자립적으로 이어받을 컨텍스트 문서. 배경(rerank 반증→재정의→Phase1 진단 B우세→딥리서치→PoC), 설계 방향(KO↔EN 번역 별칭 + 별도 FTS5 필드 + RRF, flag off), 이미 만든 측정 도구(kebab eval variants + dogfood golden), 그리고 지금까지와 동일한 구현 방법론(brainstorm→spec→plan→OMC teammate sequential 구현+리뷰 +독립검증, 모델 라우팅, 빌드 redirect+exit, 측정=variant eval 프록시금지, gitea-pr 리뷰루프)을 담음. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 01:19:14 +00:00
altair823	b6ad947378	docs: README 명령 표 슬림 + ARCHITECTURE 상세 이전·동기화 README 의 괴물 셀(ingest 2891→544, search 2952→687, ask 1244→415, tui 2300→453자)을 "무엇 + 핵심 flag + 포인터"로 축소. 빠진 구조 detail 은 ARCHITECTURE 로 이전: - symbol path 형식에 Go/Java/Kotlin/C/C++ 추가 + code chunk provenance(citation.kind/code_lang/repo) - Markdown title 자동 채움 순서(md-frontmatter-v2) - RAG groundedness 검증(mDeBERTa-v3 XNLI, nli_threshold gate) 결정 행 신설 - TUI 행을 P9-1~4 완료 + F1 cheatsheet 로 최신화 (stale "진행 예정" 제거) flag 망라는 --help, TUI 키는 in-app F1 cheatsheet(권위 런타임 소스)로 위임 — stale 방지. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 01:10:39 +00:00
altair823	5ad1f98227	docs(handoff): doc-side expansion 딥리서치 + PoC 결과 (Phase 2 방향 확정) 딥리서치(104 agent): 어휘격차 pool-miss 최선책 = 색인시 doc-side expansion. PoC(dogfood KB): recall@50=0 이던 3쿼리가 별칭 추가로 rank1~2 부활(hybrid+vector, 골든 verbatim 아님=일반화). 핵심 미검증 고리 실 corpus 정량 확인. Phase 2 = 색인시 doc-side expansion(KO↔EN 번역 별칭) → 별도 FTS5 필드 → RRF, flag off. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 00:53:24 +00:00
altair823	a58cae2ff3	docs(research): 어휘격차 pool-miss 해결 딥리서치 레퍼런스 deep-research 워크플로(104 agent, 5각도, 22소스, 25 claim 3-vote 검증, 22 confirmed/3 killed). 결론: 색인시 doc-side expansion(doc2query)이 pool-miss 최선책 — pool 자체를 키우고 per-query 지연 ~0(색인시 1회), 정확매칭 보존(별도 필드 append). 단 vanilla mt5는 같은언어라 한/영 갭은 색인시 KO↔EN 대체 query 생성 필요. query-side(HyDE=거부된 per-query LLM, Vector-PRF=recall 주장 기각)는 부적합. 검증은 기존 variant eval 로 가능. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 00:53:24 +00:00
altair823	7a1dff1684	docs(handoff): query-paraphrase robustness Phase 1 완료 + (A)/(B) 진단 8그룹×4변형(dogfood) 측정: groups=8 A_dominant=2 B_dominant=4 spread@10=0.750. 진단 — 문제는 한/영이 아니라 어휘 거리(영어 풀어쓴 문장도 miss, 일부 한국어는 OK). B(어휘격차, recall@50=0, rerank 불가) 우세 → 쿼리 확장/번역 처방 신호. A(순위출렁)는 cap_theorem/vector_database 2그룹뿐. "측정 먼저" 논제 정량 검증(rerank 단독은 부분해법). Phase 2 처방 결정 대기. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 00:53:24 +00:00
altair823	fe4c854673	docs(plan): query-paraphrase robustness Phase 1 구현 계획 5개 task: (1) GoldenQuery.group + 그룹 정합성 검증, (2) 변형 일관성 메트릭 모듈 + A/B(순위출렁/어휘격차) 분류, (3) kebab eval variants CLI, (4) dogfood golden 변형 그룹 큐레이션, (5) 측정 + 진단 리포트. TDD bite-sized, 완성 코드. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 00:53:24 +00:00
altair823	1de3f4ffca	docs(spec): query-paraphrase robustness 평가 프레임워크 설계 (측정 먼저) 목표 재정의: 한/영 overlap → 같은 의미의 다양한 표현(동의어·다른 어휘·풀어쓴 문장·한영)에서 일관된 답변 품질. 지난 reranker 실험이 overlap 프록시 최적화로 헛돈 교훈 반영 — 처방 전 진짜 지표(변형 일관성)를 직접 재는 평가부터. Phase 1(본 spec 구현): kebab-eval golden suite에 변형 그룹(intent group) + 변형 일관성 메트릭(recall_spread, answer_consistency) + recall@pool vs recall@k로 (A)순위출렁/(B)어휘격차 자동 판별. Phase 2(처방)는 측정 결과 게이트 뒤 조건부. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 00:53:24 +00:00
altair823	166b1404e4	docs(release-notes): correct refusal判정 mechanism + O-2 phrasing leader review of writer draft: refusal 판정은 citation marker(`[#번호]`) 유무 기반이며 `<REFUSE>` 특수 마커가 아님. O-2 문구 예시도 실제 rag-v3 규칙으로 정정. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 04:58:08 +00:00
altair823	4afcaf96d2	docs(release-notes): add v0.20.2-draft (rag-v3 응답언어 + 검색 품질 eval 인프라) v0.20.2 릴리즈 노트 초안 작성. 사용자 영향 4단락 구조로 각 finding 기술. - Finding #1/O-2: rag-v3 응답언어 자동 매칭 + refusal 언어중립화 - Finding #2: bulk search input schema 확정 (15필드) - Finding #3: list docs human-readable path 보강 - Finding #7: index_version 두 곳 구분 (vector vs FTS5) - eval --config facade + 검색 품질 baseline (hybrid hit@3=1.0 / MRR=0.833) - Finding #4/#5/#6/#8: docs/schema 정비 - version cascade 주의 (rag-v3 → eval compare) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 04:54:19 +00:00
altair823	40d7faee71	docs(dogfood): add §10.2 search quality baseline scenario (v0.20.2 golden suite) eval --config facade 패치로 dogfood KB 직접 평가 가능해짐에 따라 §10 Eval 에 §10.2 검색 품질 baseline 섹션 추가. - golden suite 실행 명령 (hybrid + lexical eval run → aggregate) - v0.20.2 metric baseline 표 (hybrid hit@3=1.0 / MRR=0.833) - 정성 체크리스트 (한국어 2자 hit@3, empty=0, MRR 임계치) - golden 큐레이션 절차 + dispatch.py 오류 교훈 - §10.1 로 기존 basic eval run 재구성 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 04:54:03 +00:00
altair823	2429189447	docs(spec): reflect search-quality critic round-1 (eval --config, lang-filter non-goal, curation) Incorporates all critic (opus) round-1 findings into the dogfood search-quality eval design spec: BLOCKER-1: §4.4 execution commands now use --config /build/dogfood/config.toml (Task A facade-rule patch makes this the canonical path). §5.1 re-titled from "(후속 패치)" to "Task A로 적용됨 — 권장 운영 경로"; XDG workarounds demoted to "패치 전 fallback". Intro paragraph updated accordingly. MAJOR-1: §3 Non-Goals gains an explicit bullet: lang/media/code_lang SearchFilters validation is out of scope for this harness (runner uses SearchFilters::default(), runner.rs:151). §4.1 "code 검색" row no longer claims code_lang filter coverage. MINOR-1: §4.3 step 3 now names kebab inspect doc <id> as the primary chunk-selection path (breaks chunk-level curation loop); search hits demoted to "보조 확인용". MINOR-2: §4.1 golden category table gains two new rows — 한국어 N-gram fallback query (복합어/신조어 coverage) and 영어 whole-token exact query (separates substring artefacts). MINOR-3: §4.1 YAML header note added: record corpus_revision in golden file so stale-bail root cause is immediately traceable. NIT: §9 References line numbers corrected (runner.rs:31, metrics.rs:116/144); runner.rs:151 SearchFilters::default() reference added. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 03:43:00 +00:00
altair823	571996938c	docs(contract): bump default prompt_template_version to rag-v3 (Todo #1 ) line 899: V1만 legacy → V1/V2 둘 다 legacy, v0.20.2 부터 rag-v3 default 선언. line 1349 (★): config 예시 default rag-v2 → rag-v3. line 1533 (★): §9 cascade table 코드 상수 rag-v2 → rag-v3. line 287 이후: answer.v1 예시 블록에 historical snapshot 주석 추가 (n1 — model+ptv stale, 값 변경 안 함). task spec grep 판단: tasks/p9/p9-fb-15 의 rag-v2 언급 2줄은 rag-v2 도입 시점 historical 기술 → frozen 유지. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 02:45:13 +00:00
altair823	be79bdb83d	docs(#7 ): distinguish vector-store vs FTS5 index_version (Todo #7 ) schema.schema.json models.index_version: vector store (LanceDB) version 임을 명시. search_hit.schema.json index_version: lexical (FTS5) version 임을 명시. search_hit.schema.json retrieval: 내부 필드 목록 + hybrid 전용 fusion 설명 추가 (hunk 공유). README kebab schema 행: index_version 두 곳의 의미가 다름을 주의 표기 추가. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 02:45:04 +00:00
altair823	4e76f103c1	docs(#5,#6): clarify retrieval.* nesting + single-mode score relation (Todo #5/#6) README Score 해석 절에 score ↔ retrieval.* 구조 설명 추가: - fusion_score/lexical_score/vector_score/lexical_rank/vector_rank 는 retrieval 내부 (top-level 아님). - single-mode 에서 score==fusion_score==lexical/vector_score 가 같은 값인 것은 정상 (Finding X). search_hit.schema.json score 필드에 score_kind 관계 + single-mode 동일값 이유 설명 추가. search_hit.schema.json retrieval/index_version 설명은 Task 12 커밋에 포함 (같은 hunk). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 02:44:56 +00:00
altair823	4fd672193f	docs(#4 ): clarify lang vs code_lang semantic and und=code (Todo #4 ) lang_breakdown description에 code 문서는 자연어 감지 미수행(lang="und" 정상) 사실 추가. README에 lang vs code_lang 설명 절 신규 추가. task spec grep: tasks/p9/p9-fb-15 의 rag-v2 언급은 historical 기술 → frozen 유지. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 02:44:34 +00:00
altair823	dece5e89fc	feat(bulk): document bulk search input schema + error shape hint (Todo #2 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 02:16:21 +00:00
altair823	f5ff823984	docs(dogfood): add RAG response-language scenario (Todo #1 verification)	2026-05-28 21:28:22 +00:00
altair823	85efeeca3e	docs(plan): v0.20.2 dogfood findings 구현 plan (15 task) planner(opus) 작성 → critic 리뷰 시도 → leader 좌표 검증. 8 todo → 15 task: 코드 4 (rag-v3 / list docs / bulk / init) + 각 finding 후 전체 도그푸딩 검증 task 4 + docs-only 3 + contract + HOTFIXES/release-notes + version bump. plan critic round-1 은 환경 도구 손상으로 좌표 blocker(B-1/B-2/M-1/M-2)를 오진 → leader 가 pipeline.rs/config/cli/bulk/Cargo.toml 을 직접 grep 검증해 plan 좌표 정확 확인, executor 용 "anchor grep 재확인" + binary 경로 주의 헤더 추가. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-28 21:09:39 +00:00
altair823	2b4ba8e104	docs(spec): v0.20.2 dogfood findings 설계 + round-1 critic 반영 v0.20.1 전체 도그푸딩에서 발견된 8 todo (Ask 응답언어 rag-v3 / doc.lang docs / bulk input / list title / fusion_score·score_kind / schema index_version / Ollama hint) 를 단일 patch release 로 설계. writer worker 초안 → opus critic round-1 리뷰 반영: - B1: top-level score placeholder → 확정 (score_kind 가 의미 선언, search.rs:95-99) - M1: 이미지 caption 언어 강제 out-of-scope 명시 - M2: config default 테스트(lib.rs:1316) 갱신 필요 명시 - M3: bulk input 전체 필드 (query/mode/k/trust_min/ingested_after/media/tag/lang) - M4: rag-v3 의 eval_runs.config_snapshot_json cascade 영향 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-28 20:37:42 +00:00
altair823	b08941d6ab	docs(handoff): mark brainstorming-needed items in v0.20.1 findings todo 각 todo 에 fix path classification 추가: - 🧠 필수: design / user expectation 결정 필요 — brainstorming skill 우선 - 🔧 mechanical: spec drift 또는 명확한 fix — 별 brainstorming X - 📝 mild discussion: spec drafter self-review 로 trade-off 결정 가능 Classification: - 🧠 필수 (2): #1 Ask 영어→한국어 response policy, #4 doc.lang semantic - 🔧 mechanical (4): #2 bulk schema, #5/#6 docs sync, #7 schema rename - 📝 mild (3): #3 list title, #8 Ollama default 추가 brainstorming 후보 (직접 finding 외): - BS-A: HTML corpus 지원 (1415 file skipped) - BS-B: Tier 1/2/3 chunker UX visibility - BS-C: kebab dogfood subcommand (자동화) - BS-D: 영문 code chunk 의 tokenized_korean_text 효율 - BS-E: builtin_blacklist 명세 노출 권장 워크플로: 1. brainstorming 단계 먼저 (#1, #4 + BS-x 별 검토) 2. mechanical batch (#2, #5, #6, #7) — 한 PR 3. mild discussion batch (#3, #8) 4. dogfood retest → v0.20.2 patch release	2026-05-28 19:45:53 +00:00
altair823	6bf4e82e62	docs(handoff): v0.20.1 full dogfood findings todo for next session 머지 후 v0.20.1 의 full dogfood (사용자 실제 corpus 6293 file, 3.5 시간 ingest, §1~§11 시나리오) 발견된 findings 를 새 session 의 self- contained todo handoff 로 정리. P0 (bug / 의도와 다른 동작): - #1 Ask 영어 query → 한국어 응답 (rag-v2 prompt template 강제) - #2 bulk search input format 불명확 (wire schema 미명시) - #3 list docs title 중복 (heading-based, doc_path 보조 필요) - #4 doc.lang = und 53% (code file 의 lang detection 실패) P1 (docs drift): - #5 fusion_score 위치 (.retrieval.fusion_score) - #6 score_kind="bm25" 의미 (lexical mode 의 fusion_score) - #7 schema index_version vs lexical_index_version 혼동 P2 (setup): - #8 Ollama endpoint default 가 localhost (사용자 환경 remote) 각 todo 별 severity, scenario, suspected location, action item 명시. 새 session 시작 명령 + branch 권장 + 도그푸딩 재실행 절차 + finding cumulative table 포함. Repo state: main HEAD=a0c7fa3, clean. v0.20.1 binary OK. /build/dogfood/ KB (3940 docs, 34896 chunks) preserved for regression test.	2026-05-28 19:40:10 +00:00
altair823	028d9ad4ea	docs(release): v0.20.1 release notes draft + spec/plan dogfood cross-link #1 (사용자 요청): release notes draft 작성 + spec/plan 의 dogfood evidence cross-link 보강. docs/release-notes/v0.20.1-draft.md (신규): - 4 단락 본문 (한국어 2자 query 지원 + 영어 substring 회귀 + V007→V009 자동 backfill + ingest 성능 영향). - Migration cascade table (lexical_index_version, corpus_revision, wire schema shape preservation). - API + dependency 변경 (lindera v3, lindera-ko-dic v3, retired short_query_hint helper, 새 facade APIs). - Breaking changes 명시 (영어 substring 회귀, 첫 부팅 latency, DB/ binary 크기 증가). - Upgrade 절차 + Known limitation + 14 dogfood scenario reference. spec Appendix B (segmentation evidence): - "Empirical verification (2026-05-28 dogfood — post-merge update)" subsection 신규. prior-knowledge 가정 vs 실측 결과 table. Scenario 1-4 모두 verified 표시. ko-dic 의 '서울특별시' → '[서울, 특별시]' 분해 증거 명시. plan Changelog: - post-implementation entry: 22 commit on branch, S3 blockers, S7 cascade, S11 sanity regression updates, opus PR review 4 finding fixes. - dogfood evidence entry: 14 scenario verify pass, ko-dic 분해 evidence, HOTFIXES + spec Appendix B cross-link. Spec: …spec…md Appendix B Plan: …plan…md (post-implementation + dogfood evidence Changelog) Release notes: docs/release-notes/v0.20.1-draft.md	2026-05-28 13:34:33 +00:00
altair823	f2a76cfe94	docs(dogfood): V009 morphological tokenizer scenarios + fixture evidence v0.20.1 dogfood verification 의 fixture + scenario 를 DOGFOOD.md / SMOKE.md 에 반영. 사용자 KnowledgeBase 같은 영어/code 중심 KB 에서 한국어 0-hit 가 정상 (token 부재) 임을 명시하고, ko-dic 의 morpheme 분해 동작을 검증할 reference fixture (korea-overview.md + korea-compound.md) 를 inline 으로 제공. DOGFOOD.md §2.1 갱신: - description: trigram → unicode61 + 형태소 column. - scenarios: 한국어 2-char (한국, 서울) + compound noun (서울특별시) + 영어 whole-token 회귀 + 1-char filter 등 7 case 로 확장. - §2.1bis 신규: V009 dogfood evidence reference corpus + 검증 명령 + 예상 snippet (lindera 분해 증거) + known limitation (ko-dic compound 단일 token 정책, Option α acceptance). SMOKE.md 'V009 morphological 검색' 갱신: - trigram 시절 hint advisory + 3자 키워드 권장 시나리오 제거. - v0.20.1 의 2-char Korean / compound noun / 1-char filter / 영어 whole-token 회귀 scenario 로 교체. Reference fixture (실측 verify pass): - korea-overview.md: '한국' / '서울' / '지하철' 모두 hit. - korea-compound.md: '한국어' / '한국문화' / '서울특별시' compound hit. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md §9 Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (dogfood evidence)	2026-05-28 13:22:43 +00:00
altair823	8c56ef3010	docs(superpowers): v0.20.x C 한국어 morphological tokenizer spec + plan artifacts 본 commit 은 v0.20.x C task (Bug #8 — 한국어 2자 query 0-hit) 의 4-stage workflow artifact 5 파일을 archive: - spec.md (668 line, status=accepted): Option A/B/C 비교 + lindera Path A (영어 substring 회귀 인정) 결정 + 12 section + 4 Appendix (B segmentation evidence, C cost evidence, D license evidence). - spec-critic-r1.md: 3 critical + 6 major finding (NEEDS_REWRITE). - spec-critic-r2.md: r1c rewrite 후 traceability matrix (ACCEPT). - plan.md (750 line, status=accepted): 11 step + dependencies + cost optimization routing + 9 closure micro-patches 적용. - plan-closure-r1.md: traceability matrix + 9 MP 의 origin. 이 artifact 들은 implementation 머지 후 frozen reference. 후속 deviation 은 tasks/HOTFIXES.md 가 source of truth. Workflow stage: 1. spec drafter (omc team writer, opus) 2. spec critic R1 (omc team critic, opus) — NEEDS_REWRITE 3. spec rewriter r1c (omc team writer, opus) — 7 item fix 4. spec closure R2 (in-process verifier, sonnet) — ACCEPT 5. plan drafter (omc team planner, opus) 6. plan closure (in-process verifier, sonnet) — ACCEPT + 9 MP 7. subagent-driven-development implementation (11 step + 5 follow-up + 1 docs polish = 17 commit) 8. PR-level final code review (in-process code-reviewer, opus) — Approved with notes (4 minor docs finding, merge as-is) Branch: feat/korean-morphological-tokenizer Version: 0.20.1	2026-05-28 12:53:31 +00:00
altair823	5d9ea588ed	docs(v0.20.1): polish PR-review findings (README/HOTFIXES/schema/SKILL) opus PR-level final review (Approved with notes) 의 4 minor finding mechanical 정정: 1. README.md — `kebab search` row 의 영어 substring 매칭 표현이 V007 시절 그대로였음. V009 의 whole-token 회귀 (substring → V002 동작) 를 정직히 명시 + vector/hybrid mode 권장 안내. 2. tasks/HOTFIXES.md — 2026-05-28 entry 의 file path 정정. lexical.rs 는 lindera 호출자가 아니라 build_match_string 의 MIN_QUERY_CHARS 3→2 갱신만; lindera helper 의 실제 owner 는 kebab-chunk/src/lib.rs. ingest.rs 는 본 PR scope 외, eager backfill hook 위치는 kebab-app/ src/app.rs::App::open_with_config. 3. docs/wire-schema/v1/search_response.schema.json — `hint` field description 이 V007 trigram 3-char minimum 시절 advisory 시그니처 그대로. v0.20.1 에서 helper retired + always-omit 사실 명시 (forward-compat 차원에서 field 만 schema 에 보존). 4. integrations/claude-code/kebab/SKILL.md — `hint` field 설명의 self-contradiction ("present only with trigram in edge cases" vs "Korean 2-char now supported") 해소. retired + reuse 가능 명시. PR-level reviewer recommendation: "Merge as-is — block 사유 아님 (모든 finding minor)". 본 commit 은 reviewer 의 옵션 1 (별 docs hotfix commit) 채택. Spec: docs/superpowers/specs/2026-05-28-v0.20.x-korean-morphological-tokenizer-spec.md Plan: docs/superpowers/plans/2026-05-28-v0.20.x-korean-morphological-tokenizer-plan.md (PR-level finding follow-up)	2026-05-28 12:53:00 +00:00

1 2 3 4 5

217 Commits