duo-socket NUMA 서버에서 fastembed(onnxruntime)가 intra-op 스레드를 48개로 하드코딩해 NUMA 힙 손상 → double-free 로 ingest 가 죽는 문제를 회피하기 위해, 같은 multilingual-e5-large 모델을 순수 Rust(candle)로 돌리는 opt-in 임베딩 provider 를 추가한다. - 신규 crate kebab-embed-candle: CandleEmbedder (kebab_core::Embedder). hf-hub safetensors → XLMRobertaModel forward → mask mean-pool → L2 → e5 prefix. candle 의존성 트리를 이 crate 에 격리 (core/config 외 kebab-* 의존 0). - 스레드 캡: [models.embedding].num_threads + env KEBAB_EMBED_THREADS → 글로벌 rayon 풀 1회 캡 (NUMA-안전 레버). - kebab-app::embedder() 가 provider 분기 (fastembed/onnx/"" → 기존 경로 불변, candle → CandleEmbedder, 미지값 → 에러). - Phase 0 스파이크 crate 제거 (production 흡수). - 버전 0.21.1 → 0.22.0 (신규 config surface, pre-1.0 minor bump). 패리티: cosine_min=1.000000, max abs diff=2.01e-7 (< 1e-5) → embedding_version 유지, 재색인 0. fastembed default 동작/벡터 불변. wire schema 변경 없음. 검증(파일+exit code): clippy -D warnings EXIT=0(warning 0), test EXIT=0 (candle unit 5 + thread_cap rayon=4 + config 68), parity #[ignore] EXIT=0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
33 lines
1.1 KiB
Rust
33 lines
1.1 KiB
Rust
//! Thread-cap test (spec §7). Own integration binary → clean process, so the
|
|
//! one-shot global rayon pool is initialized exactly once, by us.
|
|
//!
|
|
//! Verifies that `apply_thread_cap(4)` sizes the global rayon pool to 4, which
|
|
//! is the lever that keeps candle's CPU backend NUMA-safe (vs onnxruntime's
|
|
//! hard-coded 48 intra-op threads).
|
|
|
|
use kebab_embed_candle::apply_thread_cap;
|
|
|
|
#[test]
|
|
fn thread_cap_sizes_global_rayon_pool() {
|
|
// Must run before any other rayon use in this process. As the only test in
|
|
// this binary that touches rayon, that holds.
|
|
let applied = apply_thread_cap(4);
|
|
assert!(applied, "first build_global call should succeed");
|
|
assert_eq!(
|
|
rayon::current_num_threads(),
|
|
4,
|
|
"global rayon pool must be capped at the requested 4 threads"
|
|
);
|
|
|
|
// A second cap attempt is a no-op (pool already built), not a panic.
|
|
assert!(
|
|
!apply_thread_cap(8),
|
|
"second build_global must report not-applied"
|
|
);
|
|
assert_eq!(
|
|
rayon::current_num_threads(),
|
|
4,
|
|
"thread count must stay at the first cap"
|
|
);
|
|
}
|