feat(p1-1): kb-source-fs filesystem source connector #6

Merged
altair823 merged 5 commits from feat/p1-1-source-fs into main 2026-04-30 12:45:47 +00:00
Owner

요약

P1-1 task spec 구현. kb-source-fs 신규 crate — 워크스페이스 root walk + .kbignoreconfig.workspace.exclude 합집합 필터 + BLAKE3 streaming hash → Vec<RawAsset> 결정적 산출. SourceConnector trait 첫 구현체.

기준: tasks/p1/p1-1-source-fs.md, docs/superpowers/specs/2026-04-27-kb-final-form-design.md (§3.3/§6.2/§6.6/§7.1/§7.2 SourceConnector/§8).

구현

crates/kb-source-fs:

  • hash.rs — 64 KiB 버퍼 streaming BLAKE3 (whole-file load 없음)
  • media.rs — extension → MediaType
  • walker.rswalkdir + canonical-path visited set (sibling-subtree symlink cycle 차단), ignore::Override 합집합
  • connector.rsFsSourceConnector::new(&Config) + SourceConnector::scan(&SourceScope)

기타:

  • tests/symlink_cycle.rs — dangling 1-step + real 2-step directory cycle
  • tests/snapshot_tree1.rstree-1.snapshot.json 결정적 round-trip
  • fixtures/source-fs/tree-1/
  • kb-core::AssetStorage rustdoc 갱신 (Copied.path semantics 명시)

검증

  • cargo check / build (warnings-as-errors) / test / clippy 모두 clean
  • cargo test --workspace68 passed (kb-source-fs 26 신규)
  • cargo tree -p kb-source-fs --depth 1 → 허용 deps만

Review

  • spec PASS (must-fix 0)
  • code quality APPROVED-WITH-MINOR — important 2 + minor 5 모두 반영 (f8d00bd, 967a6a6)
  • 후속 자연 해소: scope.include glob (P1-2/P1-3 router), tilde + ${VAR} 확장 (kb-config 도입 시)

다음

머지 후 feat/p1-2-parse-md-frontmatter 분기 → P1-2 진행.

## 요약 P1-1 task spec 구현. `kb-source-fs` 신규 crate — 워크스페이스 root walk + `.kbignore` ∪ `config.workspace.exclude` 합집합 필터 + BLAKE3 streaming hash → `Vec<RawAsset>` 결정적 산출. `SourceConnector` trait 첫 구현체. 기준: tasks/p1/p1-1-source-fs.md, docs/superpowers/specs/2026-04-27-kb-final-form-design.md (§3.3/§6.2/§6.6/§7.1/§7.2 SourceConnector/§8). ## 구현 `crates/kb-source-fs`: - `hash.rs` — 64 KiB 버퍼 streaming BLAKE3 (whole-file load 없음) - `media.rs` — extension → `MediaType` - `walker.rs` — `walkdir` + canonical-path visited set (sibling-subtree symlink cycle 차단), `ignore::Override` 합집합 - `connector.rs` — `FsSourceConnector::new(&Config)` + `SourceConnector::scan(&SourceScope)` 기타: - `tests/symlink_cycle.rs` — dangling 1-step + real 2-step directory cycle - `tests/snapshot_tree1.rs` — `tree-1.snapshot.json` 결정적 round-trip - `fixtures/source-fs/tree-1/` - `kb-core::AssetStorage` rustdoc 갱신 (`Copied.path` semantics 명시) ## 검증 - `cargo check / build (warnings-as-errors) / test / clippy` 모두 clean - `cargo test --workspace` → **68 passed** (kb-source-fs 26 신규) - `cargo tree -p kb-source-fs --depth 1` → 허용 deps만 ## Review - spec PASS (must-fix 0) - code quality APPROVED-WITH-MINOR — important 2 + minor 5 모두 반영 (`f8d00bd`, `967a6a6`) - 후속 자연 해소: `scope.include` glob (P1-2/P1-3 router), tilde + `${VAR}` 확장 (kb-config 도입 시) ## 다음 머지 후 feat/p1-2-parse-md-frontmatter 분기 → P1-2 진행.
altair823 added 5 commits 2026-04-30 12:44:02 +00:00
Walk config.workspace.root, apply gitignore-style filters
(config.workspace.exclude ∪ .kbignore ∪ baked-in defaults for
.DS_Store / ._*), stream BLAKE3 over each file, and emit a
deterministic Vec<RawAsset> sorted by workspace_path.

Modules:
  - hash:      streaming blake3::Hasher + 64 KiB read buffer (no whole-file
               loads); pinned digests for empty input and "hello world".
  - media:     extension → MediaType (markdown/pdf/image/audio/other).
  - walker:    ignore::OverrideBuilder for filter union; walkdir with
               manual visited-set cycle protection on top of follow_links.
  - connector: public FsSourceConnector::new(&Config) +
               SourceConnector::scan(&SourceScope) impl. Uses
               kb_core::to_posix for WorkspacePath construction (carries
               P0-1 # rejection through unchanged) and kb_core::id_for_asset
               for AssetId derivation. Storage variant signals intent only;
               actual byte copy is P1-6's responsibility.

Per design §3.3, §6.2, §6.6, §7.1, §7.2, §8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two cases:
  - root/notes -> root  (single-link cycle through workspace root).
  - root/a -> b, root/b -> a  (two-step cycle of dangling symlinks).

Both must complete in O(seconds) and surface alpha.md exactly once.
Proves walker::walk_files's visited-set guard catches realistic cycle
shapes via the public SourceConnector API.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fixtures/source-fs/tree-1/:
  README.md
  notes/alpha.md
  notes/beta.md
  ignored/skip.tmp        (excluded by .kbignore *.tmp)
  .kbignore               ("*.tmp")
  .DS_Store               (implicitly excluded by FsSourceConnector)

The committed baseline (tree-1.snapshot.json) has discovered_at,
source_uri.value, and stored.path replaced with "<stripped>" so the
JSON is portable across checkout locations and CI runs. The test
applies the same stripping to scan output before comparing.

The determinism test runs scan twice and asserts byte-identical
serialized JSON (post-strip) — same filesystem state must yield the
same Vec<RawAsset>.

Regenerate baseline with `KB_REGEN_SNAPSHOT=1 cargo test -p kb-source-fs
--test snapshot_tree1 -- tree_1_snapshot_matches_baseline`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Document AssetStorage::Copied / Reference path semantics so P1-6 (asset
  writer) knows that at scan time `Copied.path` is the SOURCE path and the
  writer is responsible for both copying bytes AND overwriting `path`
  with the destination.
- Rename the dangling-symlink test to make its scope explicit
  (`dangling_symlink_pseudo_cycle_does_not_crash`); the prior name implied
  a real two-step directory cycle but the targets were broken links.
- Add `two_step_directory_cycle_visited_set_breaks_loop`: builds
  `a/loop -> ../b` and `b/loop -> ../a` over real directories with real
  files, asserting scan terminates with a finite, deterministic asset
  list — exercises the canonical-path visited-set in walker::walk_files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- walker.rs: document why we pick walkdir over ignore::WalkBuilder
  (explicit canonical-path comparison for sibling-subtree symlinks).
- walker.rs: log canonicalize failures via tracing::debug! (was a silent
  `Err(_) => continue`) so broken/permission-denied symlink targets are
  observable at debug verbosity.
- connector.rs: TODO marker on the scope.include debug-log noting the
  filter belongs at the extractor router (P1-2/P1-3).
- connector.rs: TODO marker on expand_tilde to hoist tilde + ${VAR}
  expansion into a kb-config helper once available.
- connector.rs: comment on the .kbignore read documenting the
  re-read-on-every-scan() contract.
- connector.rs test: tighten the `.kbignore`-itself ADR comment and
  upgrade the assertion to actively pin "`.kbignore` IS emitted" instead
  of "either is fine"; future drift will now fail the test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claude-reviewer-01 reviewed 2026-04-30 12:44:29 +00:00
claude-reviewer-01 left a comment
Member

리뷰: 모든 항목 해소 — APPROVE 권고 (gate 정책상 COMMENT 등록)

내부 2단계 review (spec compliance + code quality) 통과 후 important 2 + minor 5 모두 반영 완료 (commit f8d00bd, 967a6a6).

검증 요약

항목 상태
Spec compliance (6 cluster + DoD + concern 3건) PASS
Code quality important 2 (AssetStorage.path semantics, real dir-cycle test) RESOLVED
Code quality minor 5 (walker doc, TODO markers, canonicalize debug-log, expand_tilde TODO, .kbignore ADR) RESOLVED
cargo check / build (RUSTFLAGS=-D warnings) / test / clippy --all-targets -D warnings clean
Test count: 68 pass (P0-1 41 + P1-1 26 신규 + 1 새 dir-cycle test)
cargo tree -p kb-source-fs --depth 1 허용 deps만 (kb-core, kb-config, anyhow, serde, time, blake3, tracing, walkdir, ignore)

핵심 강점

  • BLAKE3 64 KiB streaming + cross-buffer-boundary test (READ_BUFFER_BYTES * 3 + 17로 실제 boundary 통과 검증)
  • ignore::Override 음수(!pat) 시맨틱 정확 (whitelist 모드 우발 방지)
  • POSIX 정규화 + #-in-path 거부 (P0-1 invariant 계승)
  • 결정적 출력: workspace_path 사전식 정렬 + scan_idempotent_modulo_timestamp + tree_1_scan_is_deterministic 이중 검증
  • 스냅샷 portable: discovered_at/source_uri.value/stored.path strip 양쪽 일관 적용
  • Symlink cycle: 진짜 2-step directory cycle (a/loop -> ../b, b/loop -> ../a) 신규 test로 visited-set 동작 검증

후속 자연 해소 (비차단)

  1. scope.include glob 적용 — P1-2/P1-3 router 책임 (TODO 마커 박힘)
  2. tilde + ${VAR} path 확장 — kb-config path-expander helper 도입 시 hoist
  3. pedantic clippy nits — opportunistic 별도 PR

결론

papered over 없음. P1-2/P1-3로 진행 가능. self-review gate 정책상 본 코멘트는 COMMENT — author 측 수동 APPROVE + merge 부탁.

## 리뷰: 모든 항목 해소 — APPROVE 권고 (gate 정책상 COMMENT 등록) 내부 2단계 review (spec compliance + code quality) 통과 후 important 2 + minor 5 모두 반영 완료 (commit `f8d00bd`, `967a6a6`). ### 검증 요약 | 항목 | 상태 | |------|------| | Spec compliance (6 cluster + DoD + concern 3건) | PASS | | Code quality important 2 (`AssetStorage.path` semantics, real dir-cycle test) | RESOLVED | | Code quality minor 5 (walker doc, TODO markers, canonicalize debug-log, `expand_tilde` TODO, `.kbignore` ADR) | RESOLVED | | `cargo check / build (RUSTFLAGS=-D warnings) / test / clippy --all-targets -D warnings` | clean | | Test count: **68 pass** (P0-1 41 + P1-1 26 신규 + 1 새 dir-cycle test) | ✓ | | `cargo tree -p kb-source-fs --depth 1` 허용 deps만 (kb-core, kb-config, anyhow, serde, time, blake3, tracing, walkdir, ignore) | ✓ | ### 핵심 강점 - BLAKE3 64 KiB streaming + cross-buffer-boundary test (`READ_BUFFER_BYTES * 3 + 17`로 실제 boundary 통과 검증) - `ignore::Override` 음수(`!pat`) 시맨틱 정확 (whitelist 모드 우발 방지) - POSIX 정규화 + `#`-in-path 거부 (P0-1 invariant 계승) - 결정적 출력: `workspace_path` 사전식 정렬 + `scan_idempotent_modulo_timestamp` + `tree_1_scan_is_deterministic` 이중 검증 - 스냅샷 portable: `discovered_at`/`source_uri.value`/`stored.path` strip 양쪽 일관 적용 - Symlink cycle: 진짜 2-step directory cycle (`a/loop -> ../b`, `b/loop -> ../a`) 신규 test로 visited-set 동작 검증 ### 후속 자연 해소 (비차단) 1. `scope.include` glob 적용 — P1-2/P1-3 router 책임 (TODO 마커 박힘) 2. tilde + `${VAR}` path 확장 — `kb-config` path-expander helper 도입 시 hoist 3. pedantic clippy nits — opportunistic 별도 PR ### 결론 papered over 없음. P1-2/P1-3로 진행 가능. self-review gate 정책상 본 코멘트는 COMMENT — author 측 수동 APPROVE + merge 부탁.
altair823 merged commit 69a0dbb79d into main 2026-04-30 12:45:47 +00:00
altair823 deleted branch feat/p1-1-source-fs 2026-04-30 12:45:49 +00:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: altair823-org/kebab#6