Tasks 5-8: new `kebab-parse-code` crate with three infrastructure modules for the code ingest framework. Ships lang.rs (extension→language identifier mapping), repo.rs (.git walk-up via gix 0.70 for RepoMeta), and skip.rs (BUILTIN_BLACKLIST, is_generated_file, is_oversized). 14 integration tests across three test files, all passing; clippy -D warnings clean. Note: gix pinned to 0.70 (not 0.83 as originally suggested) because 0.83 fails to compile against Rust 1.94.1 due to non-exhaustive match patterns in gix-hash. 0.70 resolves cleanly and has identical head_name/head_id API. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
23 lines
899 B
Rust
23 lines
899 B
Rust
//! `kebab-parse-code` — language-aware parsing for code corpora.
|
|
//!
|
|
//! Phase 1A-1 ships infrastructure only:
|
|
//!
|
|
//! - [`lang::code_lang_for_path`] — extension → language identifier.
|
|
//! - [`repo::detect_repo`] — `.git/` walk-up → repo / branch / commit metadata.
|
|
//! - [`skip::is_generated_file`] / [`skip::is_oversized`] — pre-ingest skip
|
|
//! helpers consulted by `kebab-source-fs`.
|
|
//! - [`skip::BUILTIN_BLACKLIST`] — 6-entry safety-net pattern list.
|
|
//!
|
|
//! Per-language parser modules (`rust`, `python`, `typescript`, …) land in
|
|
//! later phases (1A-2 onwards). The crate boundary follows other
|
|
//! `kebab-parse-*` crates per design §8: must NOT depend on store / embed
|
|
//! / llm / rag.
|
|
|
|
pub mod lang;
|
|
pub mod repo;
|
|
pub mod skip;
|
|
|
|
pub use lang::code_lang_for_path;
|
|
pub use repo::{RepoMeta, detect_repo};
|
|
pub use skip::{BUILTIN_BLACKLIST, is_generated_file, is_oversized};
|