Commit Graph

405 Commits

Author SHA1 Message Date
ec8a4ddb1b p0-1: kb-cli clap entry with §10 exit-code mapping
Adds the kb binary with clap v4 derive subcommands mapping 1:1 to
kb-app facade functions:

  init | ingest | list docs | inspect (doc|chunk) | search | ask
  | doctor | eval run

Global flags: --config, --verbose, --debug, --json. On --json, output
conforms to wire schema v1 (e.g. doctor.v1 emitted by kb-app::doctor).

Exit-code mapping per design §10:
  0 success
  1 RefusalSignal / NoHitSignal (kb ask refusal, kb search no-hit)
  2 any other anyhow::Error
  3 DoctorUnhealthy

Tracing initialized at startup with the file appender from kb-app.

Verified via:
  XDG_*=… cargo run -p kb-cli -- init    → idempotent
  XDG_*=… cargo run -p kb-cli -- doctor --json
    → {"schema_version":"doctor.v1","ok":true,…} exit 0
  XDG_*=… cargo run -p kb-cli -- doctor   (human form, ✓ marks)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 05:17:18 +00:00
237ada6e21 p0-1: kb-app facade stubs + tracing init helper
Adds the kb-app crate (§7) as the single facade between UI crates
(kb-cli / kb-tui / kb-desktop) and the rest of the workspace. Public
surface mirrors the task spec exactly:

- init_workspace(force) — XDG dir creation + config.toml seed; idempotent
  unless force=true. Honors XDG envs and tilde-expands the workspace
  root to $HOME/KnowledgeBase.
- doctor() — emits a doctor.v1 report with config_loaded +
  data_dir_writable checks; downstream checks land in later phases.
- ingest / list_docs / inspect_doc / inspect_chunk / search / ask —
  bail!("not yet wired (P<n>-<i>)") so kb-cli surfaces exit code 2
  cleanly per §10.
- AskOpts + DoctorReport + DoctorCheck.
- doctor_signal::{DoctorUnhealthy, RefusalSignal, NoHitSignal} —
  signal types the CLI downcasts on for §10 exit-code mapping.
- logging::init() — daily-rolling file appender at
  $XDG_STATE_HOME/kb/logs/kb.log, plus stderr-fallback EnvFilter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 05:17:11 +00:00
76a860296e p0-1: kb-config schema + XDG path resolution
Adds the kb-config crate per design §6. Provides the frozen Config
schema (§6.4) with serde + toml round-trip, defaults() that exactly
match the reference values (e.g. score_gate=0.30, target_tokens=500,
embedding.dimensions=384, rrf_k=60), and XDG path resolvers that honor
XDG_CONFIG_HOME / XDG_DATA_HOME / XDG_CACHE_HOME / XDG_STATE_HOME.

Layer order in load(): defaults → file → env (KB_<SECTION>_<KEY>);
CLI overrides apply later in kb-cli. Env mapping covers the keys
needed by P0 smoke tests; the rest land as their config sections wire
up.

5 unit tests cover serde round-trip, defaults pinned to design,
KB_RAG_SCORE_GATE / KB_SEARCH_DEFAULT_K env override, and
XDG_CONFIG_HOME handling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 05:16:57 +00:00
030986b37c p0-1: kb-parse-types thin parser-intermediate crate
Adds the kb-parse-types crate per design §3.7b. Depends only on kb-core
+ serde/thiserror — never on parser libraries. Defines:

- ParsedBlock + ParsedBlockKind + ParsedPayload (8 variants matching
  Block variants in kb-core).
- Warning + WarningKind for parser diagnostics.
- Forward-declared ParsedImageRegion / ParsedPdfPage / ParsedAudioSegment
  shells for P6/P7/P8.

`cargo tree -p kb-parse-types` shows only kb-core, serde, and thiserror.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 05:16:50 +00:00
f86df99fe9 p0-1: workspace + kb-core domain types, traits, and ID recipe
Stand up the Cargo workspace (Rust 2024 / resolver=3) with the kb-core
crate per the frozen design (§3, §4, §7, §10). kb-core has zero
deps on other kb-* crates and exposes:

- Newtype IDs (AssetId / DocumentId / BlockId / ChunkId / EmbeddingId /
  IndexId) with Display + FromStr that reject anything but 32 lower-hex.
- id_from + id_for_{asset,doc,block,chunk,embedding,index} per §4.2;
  pinned hex test values computed via an independent JCS+blake3 tool.
- CanonicalDocument, Block (8 variants), SourceSpan, Inline (§3.4).
- Citation (5 variants) with W3C Media Fragments to_uri / parse;
  round-trip property holds for every variant.
- Metadata + Provenance (§3.6); SearchQuery / SearchHit / RetrievalDetail
  (§3.7); DocFilter / DocSummary mirrors of wire §2.5.
- Answer / AnswerCitation / RefusalReason / ModelRef (§3.8).
- IngestReport, JobRepo support types, VectorRecord / VectorHit.
- Component traits (SourceConnector / Extractor / Chunker / Embedder /
  Retriever / LanguageModel / DocumentStore / VectorStore / JobRepo)
  plus their input helpers (SourceScope / ExtractContext / ChunkPolicy
  / EmbeddingInput / GenerateRequest / TokenChunk / FinishReason).
- CoreError (§10).
- nfc + to_posix helpers (§4.1, §6.6).

20 unit tests cover ID determinism (1000-run regression), key-order
invariance, two pinned hex values, newtype rejection of bad input,
Citation round-trip for all 5 variants, and to_posix collapsing +
Korean NFC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 05:16:37 +00:00