Commit Graph

5 Commits

Author SHA1 Message Date
967a6a62c5 p1-1: address review (walker module doc, TODO markers, .kbignore ADR)
- walker.rs: document why we pick walkdir over ignore::WalkBuilder
  (explicit canonical-path comparison for sibling-subtree symlinks).
- walker.rs: log canonicalize failures via tracing::debug! (was a silent
  `Err(_) => continue`) so broken/permission-denied symlink targets are
  observable at debug verbosity.
- connector.rs: TODO marker on the scope.include debug-log noting the
  filter belongs at the extractor router (P1-2/P1-3).
- connector.rs: TODO marker on expand_tilde to hoist tilde + ${VAR}
  expansion into a kb-config helper once available.
- connector.rs: comment on the .kbignore read documenting the
  re-read-on-every-scan() contract.
- connector.rs test: tighten the `.kbignore`-itself ADR comment and
  upgrade the assertion to actively pin "`.kbignore` IS emitted" instead
  of "either is fine"; future drift will now fail the test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 12:42:25 +00:00
f8d00bdaf6 p1-1: address review (AssetStorage doc-comment, real directory-cycle test)
- Document AssetStorage::Copied / Reference path semantics so P1-6 (asset
  writer) knows that at scan time `Copied.path` is the SOURCE path and the
  writer is responsible for both copying bytes AND overwriting `path`
  with the destination.
- Rename the dangling-symlink test to make its scope explicit
  (`dangling_symlink_pseudo_cycle_does_not_crash`); the prior name implied
  a real two-step directory cycle but the targets were broken links.
- Add `two_step_directory_cycle_visited_set_breaks_loop`: builds
  `a/loop -> ../b` and `b/loop -> ../a` over real directories with real
  files, asserting scan terminates with a finite, deterministic asset
  list — exercises the canonical-path visited-set in walker::walk_files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 12:42:12 +00:00
0699220d5d p1-1: tree-1 fixture + snapshot/determinism tests
fixtures/source-fs/tree-1/:
  README.md
  notes/alpha.md
  notes/beta.md
  ignored/skip.tmp        (excluded by .kbignore *.tmp)
  .kbignore               ("*.tmp")
  .DS_Store               (implicitly excluded by FsSourceConnector)

The committed baseline (tree-1.snapshot.json) has discovered_at,
source_uri.value, and stored.path replaced with "<stripped>" so the
JSON is portable across checkout locations and CI runs. The test
applies the same stripping to scan output before comparing.

The determinism test runs scan twice and asserts byte-identical
serialized JSON (post-strip) — same filesystem state must yield the
same Vec<RawAsset>.

Regenerate baseline with `KB_REGEN_SNAPSHOT=1 cargo test -p kb-source-fs
--test snapshot_tree1 -- tree_1_snapshot_matches_baseline`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 12:28:00 +00:00
3d4c485415 p1-1: integration test — symlink cycles do not loop
Two cases:
  - root/notes -> root  (single-link cycle through workspace root).
  - root/a -> b, root/b -> a  (two-step cycle of dangling symlinks).

Both must complete in O(seconds) and surface alpha.md exactly once.
Proves walker::walk_files's visited-set guard catches realistic cycle
shapes via the public SourceConnector API.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 12:27:44 +00:00
7c75e10b2c p1-1: scaffold kb-source-fs crate (FsSourceConnector)
Walk config.workspace.root, apply gitignore-style filters
(config.workspace.exclude ∪ .kbignore ∪ baked-in defaults for
.DS_Store / ._*), stream BLAKE3 over each file, and emit a
deterministic Vec<RawAsset> sorted by workspace_path.

Modules:
  - hash:      streaming blake3::Hasher + 64 KiB read buffer (no whole-file
               loads); pinned digests for empty input and "hello world".
  - media:     extension → MediaType (markdown/pdf/image/audio/other).
  - walker:    ignore::OverrideBuilder for filter union; walkdir with
               manual visited-set cycle protection on top of follow_links.
  - connector: public FsSourceConnector::new(&Config) +
               SourceConnector::scan(&SourceScope) impl. Uses
               kb_core::to_posix for WorkspacePath construction (carries
               P0-1 # rejection through unchanged) and kb_core::id_for_asset
               for AssetId derivation. Storage variant signals intent only;
               actual byte copy is P1-6's responsibility.

Per design §3.3, §6.2, §6.6, §7.1, §7.2, §8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 12:27:34 +00:00