마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신. - 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈 path 표기 `kb_*` → `kebab_*` 포함). - 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`, `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`, `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때 같은 prefix 사용). - CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` / `kb doctor` / `kb inspect` / `kb list` / `kb eval` → `kebab <verb>`. fenced code block + 인라인 backtick 모두. - XDG paths + env vars + binary 경로 (`target/release/kb` → `target/release/kebab`) 동기화. - design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task spec 모든 reference 통일. - task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history 기록용 author 정보라 그대로 유지 (실제 git history 의 author 는 변경 불가). - `tasks/phase-5-evaluation.md` 의 `status: planned` → `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분). ## 검증 - `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]" --include="*.md"` 0 hits (task-decomposition.md 의 git author 제외). - 모든 file path reference 살아있음 (renamed file 들 모두 새 path 로 update). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
115 lines
4.2 KiB
Markdown
115 lines
4.2 KiB
Markdown
---
|
||
phase: P1
|
||
component: kebab-source-fs
|
||
task_id: p1-1
|
||
title: "Local filesystem source connector"
|
||
status: completed
|
||
depends_on: [p0-1]
|
||
unblocks: [p1-2, p1-3, p1-4, p1-5, p1-6]
|
||
contract_source: ../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
|
||
contract_sections: [§3.3, §6.2, §6.6, §7.1, §7.2 SourceConnector, §8]
|
||
---
|
||
|
||
# p1-1 — Local filesystem source connector
|
||
|
||
## Goal
|
||
|
||
Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums, and produce `Vec<RawAsset>`.
|
||
|
||
## Why now / why this size
|
||
|
||
`SourceConnector` is the entry point of every ingest. Stable `RawAsset` output unblocks every downstream P1 task (parser, normalize, chunk, store). Small enough to deliver in one PR with full test coverage.
|
||
|
||
## Allowed dependencies
|
||
|
||
- `kebab-core`
|
||
- `kebab-config`
|
||
- `ignore` (gitignore semantics)
|
||
- `blake3`
|
||
- `walkdir`
|
||
- `time`
|
||
- `serde`
|
||
- `thiserror`
|
||
- `tracing`
|
||
|
||
## Forbidden dependencies
|
||
|
||
- `kebab-parse-*`, `kebab-normalize`, `kebab-chunk`, `kebab-store-*`, `kebab-embed*`, `kebab-search`, `kebab-llm*`, `kebab-rag`, `kebab-tui`, `kebab-desktop`
|
||
|
||
## Inputs
|
||
|
||
| input | type | source |
|
||
|-------|------|--------|
|
||
| `SourceScope` | `kebab_core::SourceScope` | `kebab-app` from config |
|
||
| filesystem | `&Path` | OS |
|
||
| `.kebabignore` | text file | workspace root, optional |
|
||
|
||
## Outputs
|
||
|
||
| output | type | downstream consumer |
|
||
|--------|------|---------------------|
|
||
| `Vec<RawAsset>` | `kebab_core::RawAsset` | `kebab-parse-md`, asset writer in `kebab-store-sqlite` (via `kebab-app`) |
|
||
|
||
## Public surface (signatures only — no new types)
|
||
|
||
```rust
|
||
pub struct FsSourceConnector { /* internal */ }
|
||
|
||
impl FsSourceConnector {
|
||
pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
|
||
}
|
||
|
||
impl kebab_core::SourceConnector for FsSourceConnector {
|
||
fn scan(&self, scope: &kebab_core::SourceScope) -> anyhow::Result<Vec<kebab_core::RawAsset>>;
|
||
}
|
||
```
|
||
|
||
## Behavior contract
|
||
|
||
- POSIX-normalize every emitted `workspace_path` (NFC, leading `./` stripped, single `/`).
|
||
- `asset_id` derived per design §4.2 from `blake3(raw bytes)` full hex.
|
||
- `media_type` selected from extension + libmagic-like sniff fallback (`.md` → Markdown, others fall through to `MediaType::Other`).
|
||
- `discovered_at` = current `OffsetDateTime::now_utc()` at scan time.
|
||
- Combine `config.workspace.exclude` ∪ `.kebabignore` for filter (union; ordering does not matter).
|
||
- Symbolic links: follow once, detect cycles via `canonicalize` + visited set.
|
||
- Files larger than `storage.copy_threshold_mb` MB → emit `AssetStorage::Reference { path, sha }` (do not copy bytes here; copying is done by the asset writer task).
|
||
- Idempotent: same input → same `Vec<RawAsset>` (sort by `workspace_path`).
|
||
|
||
## Storage / wire effects
|
||
|
||
- Reads: filesystem under `config.workspace.root`.
|
||
- Writes: nothing. (Asset copy is handled by the asset writer in `kebab-store-sqlite`.)
|
||
|
||
## Test plan
|
||
|
||
| kind | description | fixture / data |
|
||
|------|-------------|----------------|
|
||
| unit | POSIX path normalization | inline cases incl. `./a/b.md`, `a//b.md`, `a/b.md` → identical |
|
||
| unit | blake3 of known bytes matches expected hex | inline |
|
||
| unit | gitignore filter (`*.tmp`, `node_modules/**`) excludes correctly | tmp tree built in test |
|
||
| unit | `.kebabignore` ∪ config exclude works | tmp tree |
|
||
| unit | symlink cycle does not loop | tmp tree with `a -> b -> a` |
|
||
| snapshot | `Vec<RawAsset>` serialized JSON for fixture tree is stable | `fixtures/source-fs/tree-1` |
|
||
| determinism | re-running scan twice produces byte-identical JSON | `fixtures/source-fs/tree-1` |
|
||
|
||
All tests run under `cargo test -p kebab-source-fs` with no network and no model.
|
||
|
||
## Definition of Done
|
||
|
||
- [ ] `cargo check -p kebab-source-fs` passes
|
||
- [ ] `cargo test -p kebab-source-fs` passes
|
||
- [ ] Snapshot test `fixtures/source-fs/tree-1` round-trips deterministically
|
||
- [ ] No imports outside Allowed dependencies (verified via `cargo tree -p kebab-source-fs`)
|
||
- [ ] PR description links to design §3.3, §6.2, §7.2
|
||
|
||
## Out of scope
|
||
|
||
- File watching (P+).
|
||
- Asset copy/reference storage on disk (`kebab-store-sqlite` task p1-6).
|
||
- Non-fs source connectors (HTTP, S3 — P+).
|
||
|
||
## Risks / notes
|
||
|
||
- BLAKE3 of large files (>1 GB) is fast but allocate streaming; do not load whole file in memory.
|
||
- macOS resource forks / `.DS_Store` should be excluded by default.
|