Files
kebab/crates/kebab-embed-candle/Cargo.toml
altair823 8f7b6ee538 feat(embed): candle 임베딩 provider (NUMA-안전, opt-in) + v0.22.0
duo-socket NUMA 서버에서 fastembed(onnxruntime)가 intra-op 스레드를 48개로
하드코딩해 NUMA 힙 손상 → double-free 로 ingest 가 죽는 문제를 회피하기 위해,
같은 multilingual-e5-large 모델을 순수 Rust(candle)로 돌리는 opt-in 임베딩
provider 를 추가한다.

- 신규 crate kebab-embed-candle: CandleEmbedder (kebab_core::Embedder).
  hf-hub safetensors → XLMRobertaModel forward → mask mean-pool → L2 → e5
  prefix. candle 의존성 트리를 이 crate 에 격리 (core/config 외 kebab-* 의존 0).
- 스레드 캡: [models.embedding].num_threads + env KEBAB_EMBED_THREADS →
  글로벌 rayon 풀 1회 캡 (NUMA-안전 레버).
- kebab-app::embedder() 가 provider 분기 (fastembed/onnx/"" → 기존 경로 불변,
  candle → CandleEmbedder, 미지값 → 에러).
- Phase 0 스파이크 crate 제거 (production 흡수).
- 버전 0.21.1 → 0.22.0 (신규 config surface, pre-1.0 minor bump).

패리티: cosine_min=1.000000, max abs diff=2.01e-7 (< 1e-5) → embedding_version
유지, 재색인 0. fastembed default 동작/벡터 불변. wire schema 변경 없음.

검증(파일+exit code): clippy -D warnings EXIT=0(warning 0), test EXIT=0
(candle unit 5 + thread_cap rayon=4 + config 68), parity #[ignore] EXIT=0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 14:52:25 +00:00

40 lines
1.5 KiB
TOML

[package]
name = "kebab-embed-candle"
version = { workspace = true }
edition = { workspace = true }
rust-version = { workspace = true }
license = { workspace = true }
repository = { workspace = true }
description = "Pure-Rust candle adapter implementing kb_core::Embedder (multilingual-e5-large, NUMA-safe thread cap)"
[dependencies]
kebab-core = { path = "../kebab-core" }
kebab-config = { path = "../kebab-config" }
# candle stack — pinned to the workspace-locked crates.io release (0.10.x),
# same versions the Phase 0 spike compiled so build artifacts are reused.
candle-core = "0.10.2"
candle-nn = "0.10.2"
candle-transformers = "0.10.2"
tokenizers = "0.21"
hf-hub = { version = "0.4", features = ["ureq"] }
serde_json = { workspace = true }
# Thread cap: a one-shot global rayon pool sizes candle's CPU threads
# (the Phase 0 spike proved RAYON_NUM_THREADS caps candle), so a NUMA host
# can keep onnxruntime's hard-coded 48-intra-op heap corruption at bay.
rayon = "1"
anyhow = { workspace = true }
tracing = { workspace = true }
[dev-dependencies]
# Integration-test binaries can only see the library's public API + these,
# not the library's own (non-dev) dependencies — so rayon/kebab-config/kebab-core
# are repeated here for tests/parity.rs and tests/thread_cap.rs.
kebab-embed-local = { path = "../kebab-embed-local" }
kebab-config = { path = "../kebab-config" }
kebab-core = { path = "../kebab-core" }
rayon = "1"
tempfile = { workspace = true }
[lints]
workspace = true