Files
kebab/tasks/p1/p1-1-source-fs.md
altair823 f9714aa5cb docs(rename): kb → kebab — README, tasks/, docs/, design doc, report
마지막 commit. 모든 .md 안의 `kb` 단어 일괄 갱신.

- 19 개 crate 이름 (`kb-core`, `kb-app`, …) → `kebab-*` (Rust 모듈
  path 표기 `kb_*` → `kebab_*` 포함).
- 미래 component (`kb-tui`, `kb-desktop`, `kb-asr-whisper`, `kb-ocr`,
  `kb-mcp`, `kb-vlm`, `kb-rerank`, `kb-vision-ocr`, `kb-index`,
  `kb-smoke`, `kb-architecture`) → `kebab-*` (P6+ 가 시작될 때
  같은 prefix 사용).
- CLI 명령 예제: `kb ingest` / `kb search` / `kb ask` / `kb init` /
  `kb doctor` / `kb inspect` / `kb list` / `kb eval` →
  `kebab <verb>`. fenced code block + 인라인 backtick 모두.
- XDG paths + env vars + binary 경로 (`target/release/kb` →
  `target/release/kebab`) 동기화.
- design doc / 최초 보고서 / SMOKE / HOTFIXES / phase epic / task
  spec 모든 reference 통일.
- task-decomposition.md 의 `git -c user.name=kb` 는 과거 git history
  기록용 author 정보라 그대로 유지 (실제 git history 의 author 는
  변경 불가).
- `tasks/phase-5-evaluation.md` 의 `status: planned` →
  `completed` 도 같이 (P5-1 + P5-2 PR 머지 후 미반영분).

## 검증

- `grep -rEn "\bkb-[a-z]|\bkb_[a-z]|\.config/kb\b|kb\.sqlite|\bKB_[A-Z]"
   --include="*.md"` 0 hits (task-decomposition.md 의 git author
  제외).
- 모든 file path reference 살아있음 (renamed file 들 모두 새 path
  로 update).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:01:55 +00:00

4.2 KiB
Raw Blame History

phase, component, task_id, title, status, depends_on, unblocks, contract_source, contract_sections
phase component task_id title status depends_on unblocks contract_source contract_sections
P1 kebab-source-fs p1-1 Local filesystem source connector completed
p0-1
p1-2
p1-3
p1-4
p1-5
p1-6
../../docs/superpowers/specs/2026-04-27-kebab-final-form-design.md
§3.3
§6.2
§6.6
§7.1
§7.2 SourceConnector
§8

p1-1 — Local filesystem source connector

Goal

Walk the workspace root, apply gitignore-style filters, compute BLAKE3 checksums, and produce Vec<RawAsset>.

Why now / why this size

SourceConnector is the entry point of every ingest. Stable RawAsset output unblocks every downstream P1 task (parser, normalize, chunk, store). Small enough to deliver in one PR with full test coverage.

Allowed dependencies

  • kebab-core
  • kebab-config
  • ignore (gitignore semantics)
  • blake3
  • walkdir
  • time
  • serde
  • thiserror
  • tracing

Forbidden dependencies

  • kebab-parse-*, kebab-normalize, kebab-chunk, kebab-store-*, kebab-embed*, kebab-search, kebab-llm*, kebab-rag, kebab-tui, kebab-desktop

Inputs

input type source
SourceScope kebab_core::SourceScope kebab-app from config
filesystem &Path OS
.kebabignore text file workspace root, optional

Outputs

output type downstream consumer
Vec<RawAsset> kebab_core::RawAsset kebab-parse-md, asset writer in kebab-store-sqlite (via kebab-app)

Public surface (signatures only — no new types)

pub struct FsSourceConnector { /* internal */ }

impl FsSourceConnector {
    pub fn new(config: &kebab_config::Config) -> anyhow::Result<Self>;
}

impl kebab_core::SourceConnector for FsSourceConnector {
    fn scan(&self, scope: &kebab_core::SourceScope) -> anyhow::Result<Vec<kebab_core::RawAsset>>;
}

Behavior contract

  • POSIX-normalize every emitted workspace_path (NFC, leading ./ stripped, single /).
  • asset_id derived per design §4.2 from blake3(raw bytes) full hex.
  • media_type selected from extension + libmagic-like sniff fallback (.md → Markdown, others fall through to MediaType::Other).
  • discovered_at = current OffsetDateTime::now_utc() at scan time.
  • Combine config.workspace.exclude .kebabignore for filter (union; ordering does not matter).
  • Symbolic links: follow once, detect cycles via canonicalize + visited set.
  • Files larger than storage.copy_threshold_mb MB → emit AssetStorage::Reference { path, sha } (do not copy bytes here; copying is done by the asset writer task).
  • Idempotent: same input → same Vec<RawAsset> (sort by workspace_path).

Storage / wire effects

  • Reads: filesystem under config.workspace.root.
  • Writes: nothing. (Asset copy is handled by the asset writer in kebab-store-sqlite.)

Test plan

kind description fixture / data
unit POSIX path normalization inline cases incl. ./a/b.md, a//b.md, a/b.md → identical
unit blake3 of known bytes matches expected hex inline
unit gitignore filter (*.tmp, node_modules/**) excludes correctly tmp tree built in test
unit .kebabignore config exclude works tmp tree
unit symlink cycle does not loop tmp tree with a -> b -> a
snapshot Vec<RawAsset> serialized JSON for fixture tree is stable fixtures/source-fs/tree-1
determinism re-running scan twice produces byte-identical JSON fixtures/source-fs/tree-1

All tests run under cargo test -p kebab-source-fs with no network and no model.

Definition of Done

  • cargo check -p kebab-source-fs passes
  • cargo test -p kebab-source-fs passes
  • Snapshot test fixtures/source-fs/tree-1 round-trips deterministically
  • No imports outside Allowed dependencies (verified via cargo tree -p kebab-source-fs)
  • PR description links to design §3.3, §6.2, §7.2

Out of scope

  • File watching (P+).
  • Asset copy/reference storage on disk (kebab-store-sqlite task p1-6).
  • Non-fs source connectors (HTTP, S3 — P+).

Risks / notes

  • BLAKE3 of large files (>1 GB) is fast but allocate streaming; do not load whole file in memory.
  • macOS resource forks / .DS_Store should be excluded by default.