kebab

Author	SHA1	Message	Date
altair823	cbcae69abf	feat(embed): candle 모델 레지스트리 + arctic-embed-l-v2.0 (CLS pooling) e5 하드코딩(HF_MODEL/SUPPORTED_MODEL/mean/query:+passage:) → 모델 레지스트리 EmbedModelSpec{name,hf_repo,pooling,query_prefix,doc_prefix,dim,version_tag}. e5(mean, query:/passage:) + arctic(CLS, query:/무접두어). pooling 모델별 분기 (mean=attention-mask-weighted / CLS=hidden[:,0,:]), tokenize/forward/L2 공유. arctic pooling=CLS 는 HF 1_Pooling/config.json(pooling_mode_cls_token:true) 확인. model_version 은 arctic 일 때 +arctic-cls 태그(embedding_version cascade 트리거); e5 는 fastembed-e5 호환(NUMA 드롭인) 위해 plain config.version 유지. correctness 게이트: tests/arctic_ollama_parity.rs (#[ignore], live Ollama) — candle arctic vs Ollama snowflake-arctic-embed2 per-sentence 코사인>0.99. 수동 실측 cosine_min=0.999984 (recall@10 130 재현 보장). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 04:59:11 +00:00
altair823	369aeb3d24	feat(embed): candle Metal (Apple Silicon GPU) opt-in build feature + v0.23.0 - kebab-embed-candle: `metal` feature → candle metal backend; select_device() picks Device::new_metal(0) (CPU fallback) under the feature, else Device::Cpu. .contiguous() before to_vec2 (Metal rejects strided views; CPU tolerates). - feature passthrough: kebab-app/embed_metal → kebab-cli/embed_metal. Build on macOS: cargo build --release --features embed_metal. - default (non-metal) path unchanged: clippy 0, candle units + thread_cap + parity pass. - README + HOTFIXES: Mac-GPU-ingest → copy sqlite+lancedb → server CPU-query workflow. - version 0.22.0 → 0.23.0 (opt-in build surface). macOS-only compile; Metal execution/speed/parity validated by user on M4 Pro (not buildable on the Linux CI/dev machine). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 11:37:08 +00:00
altair823	6ec4e6809f	fix(embed-candle): address round-1 review - commit track-spec + meta-spec/plan into branch (HIGH: dangling `amends:` ref) - inline parity evidence (cosine 1.0, max_abs_diff 2.01e-7) into HOTFIXES + release notes; drop refs to deleted IMPL_REPORT/SPIKE_REPORT (MEDIUM) - model guard: reject non-e5-large `model` before the 2GB download so model_id() can't mislabel vectors (MEDIUM) + unit test - parity test now covers BOTH query: and passage: prefixes (MEDIUM) - guard encodings.first() index; document zero-attention/pooling invariant; clarify embed_batch prefixing doc (LOW) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 16:54:20 +00:00
altair823	8f7b6ee538	feat(embed): candle 임베딩 provider (NUMA-안전, opt-in) + v0.22.0 duo-socket NUMA 서버에서 fastembed(onnxruntime)가 intra-op 스레드를 48개로 하드코딩해 NUMA 힙 손상 → double-free 로 ingest 가 죽는 문제를 회피하기 위해, 같은 multilingual-e5-large 모델을 순수 Rust(candle)로 돌리는 opt-in 임베딩 provider 를 추가한다. - 신규 crate kebab-embed-candle: CandleEmbedder (kebab_core::Embedder). hf-hub safetensors → XLMRobertaModel forward → mask mean-pool → L2 → e5 prefix. candle 의존성 트리를 이 crate 에 격리 (core/config 외 kebab-* 의존 0). - 스레드 캡: [models.embedding].num_threads + env KEBAB_EMBED_THREADS → 글로벌 rayon 풀 1회 캡 (NUMA-안전 레버). - kebab-app::embedder() 가 provider 분기 (fastembed/onnx/"" → 기존 경로 불변, candle → CandleEmbedder, 미지값 → 에러). - Phase 0 스파이크 crate 제거 (production 흡수). - 버전 0.21.1 → 0.22.0 (신규 config surface, pre-1.0 minor bump). 패리티: cosine_min=1.000000, max abs diff=2.01e-7 (< 1e-5) → embedding_version 유지, 재색인 0. fastembed default 동작/벡터 불변. wire schema 변경 없음. 검증(파일+exit code): clippy -D warnings EXIT=0(warning 0), test EXIT=0 (candle unit 5 + thread_cap rayon=4 + config 68), parity #[ignore] EXIT=0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 14:52:25 +00:00

4 Commits