diff --git a/docs/superpowers/plans/2026-05-10-v031-cut-f-vision.md b/docs/superpowers/plans/2026-05-10-v031-cut-f-vision.md new file mode 100644 index 0000000..8d9418c --- /dev/null +++ b/docs/superpowers/plans/2026-05-10-v031-cut-f-vision.md @@ -0,0 +1,841 @@ +# v0.3.1 Cut F — 멀티모달 vision AI Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** F24 — Ollama vision 모델 (gemma3 family default) 활용. 이미지 + raw_text 결합 prompt → title/summary/tags 자동 생성. Capability detection (app launch + manual refresh) + InferenceProvider 확장 + AiWorker 통합 + Configure UI dropdown. + +**Architecture:** `isVisionCapable(model)` pure 함수 가 family/families/name 기반으로 vision 가능 모델 판정. `refreshVisionCache(deps)` 가 `/api/tags` 호출 후 settings 에 cache. AiWorker 가 `note.media.length > 0 && visionModel` 둘 다 충족 시 vision path (5MB cap + base64 변환). Configure UI 가 cache 기반 dropdown + manual refresh. + +**Tech Stack:** undici/fetch (Ollama API), Node fs/promises (이미지 base64 변환), Electron IPC, React 19 + zustand 5, vitest 4 + RTL. + +**선행 문서:** + +- `docs/superpowers/specs/2026-05-09-v031-cut-f-design.md` — source spec (Cut F 정정 반영: 단위 679, 실제 SettingsService API, 'skipped' enum 미도입, fallback 미구현) +- `docs/superpowers/specs/2026-04-25-dogfood-feedback.md` — F24 +- `docs/superpowers/strategy/v028plus-roadmap.md` — Cut F 위치 + +--- + +## File Structure + +**Create:** + +- `src/main/services/VisionDetect.ts` — `isVisionCapable(model)` pure + `refreshVisionCache(deps)` async (Ollama /api/tags) +- `src/main/ai/visionPrompt.ts` — `buildVisionPrompt(text, todayKst, dueCandidates, vocab)` pure +- `src/renderer/inbox/components/settings/VisionSection.tsx` — AI 제공자 섹션 안 또는 별도 sub-section. dropdown + 다시 감지 버튼 +- `tests/unit/VisionDetect.test.ts` — isVisionCapable 5 + refreshVisionCache 4 +- `tests/unit/visionPrompt.test.ts` — buildVisionPrompt 2 (text only / image-only fallback) +- `tests/unit/AiWorker.vision.test.ts` — vision path 3 (text-only / vision body / 5MB cap) +- `tests/unit/VisionSection.test.tsx` — UI 1 (dropdown + 다시 감지) + +**Modify:** + +- `src/main/services/SettingsService.ts` — zod schema vision_model / vision_capable_cache / vision_cache_at + 4 메서드 +- `src/main/ai/InferenceProvider.ts` — `GenerateInput.images?: Array<{ base64: string; mime: string }>` + `generate(input, opts?: { visionModel?: string | null })` +- `src/main/ai/LocalOllamaProvider.ts` — `generate` body 에 `images` 필드 (vision path) + 모델 분기 +- `src/main/ai/AiWorker.ts` — `note.media + visionModel` vision path + 5MB cap + base64 변환. 생성자에 `settings: SettingsService` 의존성 추가 +- `src/main/ipc/settingsApi.ts` — 3 IPC: `settings:get-vision-models` / `settings:set-vision-model` / `settings:refresh-vision-cache` +- `src/preload/index.ts` — 3 bridge +- `src/shared/types.ts` — `getSettings()` 반환에 vision_* 3 필드 + InboxApi 3 메서드 +- `src/main/index.ts` — `void refreshVisionCache(...)` whenReady 안 + AiWorker 생성자에 settings 주입 +- `src/renderer/inbox/components/settings/AiProviderSection.tsx` 또는 SettingsPage — VisionSection 마운트 +- `tests/unit/SettingsService.test.ts` — vision 4 메서드 round-trip +- `tests/unit/LocalOllamaProvider.test.ts` — vision body 분기 회귀 +- `tests/unit/AiWorker.test.ts` — 기존 mock 에 settings stub 추가 (생성자 변경) +- `package.json` — version 0.3.0 → 0.3.1 +- `docs/superpowers/specs/2026-04-25-dogfood-feedback.md` — F24 promoted + +--- + +## 단위 목표 + +679 (v0.3.0) → 약 701 (+22), typecheck 0. + +--- + +## Task 1: VisionDetect — `isVisionCapable` + `refreshVisionCache` + +**Files:** + +- Create: `src/main/services/VisionDetect.ts` +- Create: `tests/unit/VisionDetect.test.ts` + +`isVisionCapable(model)` pure 함수 — family/families/name hints 기반 판정. `refreshVisionCache(deps)` async — `/api/tags` 호출 후 capable 추출 + settings cache 저장. fetch 주입 가능 (테스트). + +- [ ] **Step 1: failing test** — `tests/unit/VisionDetect.test.ts`: + +```ts +import { describe, it, expect, vi } from 'vitest'; +import { isVisionCapable, refreshVisionCache } from '../../src/main/services/VisionDetect.js'; + +describe('isVisionCapable', () => { + it('family=gemma3 → true', () => { + expect(isVisionCapable({ name: 'gemma3:12b', details: { family: 'gemma3' } })).toBe(true); + }); + it('families=[llava] → true', () => { + expect(isVisionCapable({ name: 'llava-13b', details: { families: ['llava'] } })).toBe(true); + }); + it('name hint "vision" → true', () => { + expect(isVisionCapable({ name: 'custom-vision-7b' })).toBe(true); + }); + it('text-only family=gemma → false', () => { + expect(isVisionCapable({ name: 'gemma4:e4b', details: { family: 'gemma' } })).toBe(false); + }); + it('no hints + unknown family → false', () => { + expect(isVisionCapable({ name: 'mistral:7b', details: { family: 'mistral' } })).toBe(false); + }); +}); + +describe('refreshVisionCache', () => { + it('happy path — capable 추출 + settings cache 저장', async () => { + const settings = { + isAiEnabled: vi.fn(async () => true), + setVisionCapableCache: vi.fn(async () => {}) + }; + const fetchImpl = vi.fn(async () => ({ + ok: true, + status: 200, + json: async () => ({ + models: [ + { name: 'gemma4:e4b', details: { family: 'gemma' } }, + { name: 'gemma3:12b-vision', details: { family: 'gemma3' } }, + { name: 'llava:13b', details: { families: ['llava'] } } + ] + }) + })) as unknown as typeof fetch; + const r = await refreshVisionCache({ + settings: settings as never, + endpoint: 'http://localhost:11434', + fetchImpl + }); + expect(r).toEqual({ ok: true, models: ['gemma3:12b-vision', 'llava:13b'] }); + expect(settings.setVisionCapableCache).toHaveBeenCalledWith(['gemma3:12b-vision', 'llava:13b'], expect.any(Date)); + }); + + it('ai_disabled → 스킵', async () => { + const settings = { + isAiEnabled: vi.fn(async () => false), + setVisionCapableCache: vi.fn(async () => {}) + }; + const r = await refreshVisionCache({ settings: settings as never, endpoint: 'http://x' }); + expect(r).toEqual({ ok: false, reason: 'ai_disabled' }); + expect(settings.setVisionCapableCache).not.toHaveBeenCalled(); + }); + + it('http error → ok:false', async () => { + const settings = { + isAiEnabled: vi.fn(async () => true), + setVisionCapableCache: vi.fn(async () => {}) + }; + const fetchImpl = vi.fn(async () => ({ + ok: false, + status: 500, + json: async () => ({}) + })) as unknown as typeof fetch; + const r = await refreshVisionCache({ settings: settings as never, endpoint: 'http://x', fetchImpl }); + expect(r).toMatchObject({ ok: false }); + expect(settings.setVisionCapableCache).not.toHaveBeenCalled(); + }); + + it('unreachable → ok:false', async () => { + const settings = { + isAiEnabled: vi.fn(async () => true), + setVisionCapableCache: vi.fn(async () => {}) + }; + const fetchImpl = vi.fn(async () => { throw new Error('ECONNREFUSED'); }) as unknown as typeof fetch; + const r = await refreshVisionCache({ settings: settings as never, endpoint: 'http://x', fetchImpl }); + expect(r).toMatchObject({ ok: false }); + }); +}); +``` + +- [ ] **Step 2: implementation** — `src/main/services/VisionDetect.ts`: + +```ts +import type { SettingsService } from './SettingsService.js'; + +const VISION_FAMILIES = new Set(['gemma3', 'llava', 'llama3.2-vision', 'minicpm-v', 'pixtral']); +const VISION_NAME_HINTS = ['vision', 'vl', 'multimodal', 'gemma3']; + +export interface OllamaModel { + name: string; + details?: { family?: string; families?: string[] }; +} + +export function isVisionCapable(model: OllamaModel): boolean { + if (model.details?.family && VISION_FAMILIES.has(model.details.family)) return true; + if (model.details?.families?.some((f) => VISION_FAMILIES.has(f))) return true; + const lower = model.name.toLowerCase(); + return VISION_NAME_HINTS.some((h) => lower.includes(h)); +} + +export interface RefreshDeps { + settings: SettingsService; + endpoint: string; + now?: () => Date; + fetchImpl?: typeof fetch; +} + +export async function refreshVisionCache( + deps: RefreshDeps +): Promise<{ ok: true; models: string[] } | { ok: false; reason: string }> { + if (!(await deps.settings.isAiEnabled())) { + return { ok: false, reason: 'ai_disabled' }; + } + const fetchFn = deps.fetchImpl ?? fetch; + let body: { models?: OllamaModel[] }; + try { + const r = await fetchFn(`${deps.endpoint}/api/tags`); + if (!r.ok) return { ok: false, reason: `tags http ${r.status}` }; + body = (await r.json()) as { models?: OllamaModel[] }; + } catch (e) { + return { ok: false, reason: `unreachable: ${(e as Error).message}` }; + } + const capable = (body.models ?? []).filter(isVisionCapable).map((m) => m.name); + const now = deps.now ? deps.now() : new Date(); + await deps.settings.setVisionCapableCache(capable, now); + return { ok: true, models: capable }; +} +``` + +- [ ] **Step 3: PASS + commit** + +```bash +npm run typecheck +npx vitest run tests/unit/VisionDetect.test.ts +git add src/main/services/VisionDetect.ts tests/unit/VisionDetect.test.ts +git commit -m "feat(v031): VisionDetect — isVisionCapable + refreshVisionCache (fetch 주입)" +``` + +--- + +## Task 2: SettingsService — vision_model / vision_capable_cache + 4 메서드 + +**Files:** + +- Modify: `src/main/services/SettingsService.ts` +- Modify: `tests/unit/SettingsService.test.ts` + +zod schema 확장 + 4 메서드 추가 (Cut E sync_* 패턴). + +- [ ] **Step 1: zod schema 확장** — `src/main/services/SettingsService.ts`: + +```ts +const SettingsSchema = z.object({ + ollama: OllamaSettingsSchema.optional(), + ai_enabled: z.boolean().optional(), + onboarding_completed: z.boolean().optional(), + sync_repo_url: z.string().nullable().optional(), + sync_auto_enabled: z.boolean().optional(), + sync_interval_min: z.number().int().min(5).optional(), + // v0.3.1 Cut F — vision 모델 (이미지 분석). null/없음 = 비활성. + vision_model: z.string().nullable().optional(), + vision_capable_cache: z.array(z.string()).optional(), + vision_cache_at: z.string().optional() +}).strict(); +``` + +- [ ] **Step 2: 4 메서드 추가** (`setSyncIntervalMin` 다음): + +```ts +async getVisionModel(): Promise { + const s = await this.load(); + return s.vision_model ?? null; +} + +async setVisionModel(value: string | null): Promise { + const current = await this.load(); + const next: Settings = { ...current, vision_model: value }; + await this.persist(next); +} + +async getVisionCapableCache(): Promise<{ models: string[]; at: string | null }> { + const s = await this.load(); + return { models: s.vision_capable_cache ?? [], at: s.vision_cache_at ?? null }; +} + +async setVisionCapableCache(models: string[], now: Date): Promise { + const current = await this.load(); + const next: Settings = { ...current, vision_capable_cache: models, vision_cache_at: now.toISOString() }; + await this.persist(next); +} +``` + +- [ ] **Step 3: failing test** — `tests/unit/SettingsService.test.ts` 의 마지막 describe (Cut E sync) 다음에 추가: + +```ts +describe('v0.3.1 Cut F — vision settings', () => { + it('getVisionModel() 기본 null', async () => { + expect(await svc.getVisionModel()).toBeNull(); + }); + + it('setVisionModel / getVisionModel round-trip + null clear', async () => { + await svc.setVisionModel('gemma3:12b-vision'); + expect(await svc.getVisionModel()).toBe('gemma3:12b-vision'); + await svc.setVisionModel(null); + expect(await svc.getVisionModel()).toBeNull(); + }); + + it('getVisionCapableCache() 기본 빈 배열 + null at', async () => { + expect(await svc.getVisionCapableCache()).toEqual({ models: [], at: null }); + }); + + it('setVisionCapableCache 저장 + at ISO', async () => { + const at = new Date('2026-05-10T05:00:00Z'); + await svc.setVisionCapableCache(['gemma3:12b', 'llava:13b'], at); + const r = await svc.getVisionCapableCache(); + expect(r.models).toEqual(['gemma3:12b', 'llava:13b']); + expect(r.at).toBe('2026-05-10T05:00:00.000Z'); + }); +}); +``` + +- [ ] **Step 4: PASS + commit** + +```bash +npm run typecheck +npx vitest run tests/unit/SettingsService.test.ts +git add src/main/services/SettingsService.ts tests/unit/SettingsService.test.ts +git commit -m "feat(v031): SettingsService.{getVisionModel,setVisionModel,getVisionCapableCache,setVisionCapableCache}" +``` + +--- + +## Task 3: visionPrompt + InferenceProvider 인터페이스 확장 + +**Files:** + +- Create: `src/main/ai/visionPrompt.ts` +- Modify: `src/main/ai/InferenceProvider.ts` +- Create: `tests/unit/visionPrompt.test.ts` + +`buildVisionPrompt(text, todayKst, dueCandidates, vocab)` pure — 이미지 + raw_text 결합 시나리오. 빈 text 도 처리 ("(이미지만 있음)" placeholder). + +- [ ] **Step 1: failing test** — `tests/unit/visionPrompt.test.ts`: + +```ts +import { describe, it, expect } from 'vitest'; +import { buildVisionPrompt } from '../../src/main/ai/visionPrompt.js'; + +describe('buildVisionPrompt', () => { + it('text + 이미지 시 메모 본문 포함', () => { + const r = buildVisionPrompt('회의 메모', '2026-05-10', ['2026-05-10'], ['회의']); + expect(r).toContain('회의 메모'); + expect(r).toContain('2026-05-10'); + expect(r).toContain('회의'); + }); + + it('빈 text → "(이미지만 있음)" placeholder', () => { + const r = buildVisionPrompt('', '2026-05-10', [], []); + expect(r).toContain('(이미지만 있음)'); + }); +}); +``` + +- [ ] **Step 2: implementation** — `src/main/ai/visionPrompt.ts`: + +```ts +/** + * v0.3.1 Cut F — 멀티모달 vision prompt. 이미지 + raw_text 결합 분석 후 + * title/summary/tags/due_date JSON 응답 요청. 빈 raw_text 도 처리. + */ +export function buildVisionPrompt( + text: string, + todayKst: string, + dueCandidates: string[], + vocab: string[] +): string { + return `다음 메모와 첨부 이미지를 종합 분석해 한국어로 요약하세요. + +메모 본문 (비어 있을 수 있음): +${text || '(이미지만 있음)'} + +이미지 분석 시 주요 시각적 정보 (텍스트, 사람, 장면) 도 포함해 요약하세요. +출력 JSON: { "title": "...", "summary": "...", "tags": [...], "due_date": "..." } +오늘: ${todayKst} +가능한 due 후보: ${dueCandidates.join(', ')} +빈출 태그: ${vocab.slice(0, 20).join(', ')}`; +} +``` + +- [ ] **Step 3: InferenceProvider 인터페이스 확장** — `src/main/ai/InferenceProvider.ts`: + +```ts +export interface GenerateInput { + text: string; + todayKst: string; + dueDateCandidates: string[]; + vocab?: string[]; + // v0.3.1 Cut F — 멀티모달 vision (옵션). LocalOllamaProvider 가 visionModel 과 함께 처리. + images?: Array<{ base64: string; mime: string }>; +} + +export interface GenerateOptions { + visionModel?: string | null; +} + +export interface InferenceProvider { + generate(input: GenerateInput, opts?: GenerateOptions): Promise; + // ... 기존 abort / generateRaw +} +``` + +(기존 호출자는 `opts` 미전달이라 호환 — vision path off.) + +- [ ] **Step 4: PASS + commit** + +```bash +npm run typecheck +npx vitest run tests/unit/visionPrompt.test.ts +git add src/main/ai/visionPrompt.ts src/main/ai/InferenceProvider.ts tests/unit/visionPrompt.test.ts +git commit -m "feat(v031): buildVisionPrompt + GenerateInput.images + GenerateOptions.visionModel" +``` + +--- + +## Task 4: LocalOllamaProvider — vision path + +**Files:** + +- Modify: `src/main/ai/LocalOllamaProvider.ts` +- Modify: `tests/unit/LocalOllamaProvider.test.ts` + +`generate(input, opts)` 가 `opts.visionModel + input.images` 둘 다 있으면 vision body 생성 (model = visionModel, prompt = buildVisionPrompt, body.images = base64 array). 그 외는 기존 text-only path. + +- [ ] **Step 1: failing test** — 기존 `LocalOllamaProvider.test.ts` 의 적절한 describe 안: + +```ts +describe('vision path (v0.3.1 Cut F)', () => { + it('opts.visionModel + input.images 둘 다 있으면 vision body', async () => { + let captured: { model?: string; prompt?: string; images?: string[] } = {}; + const undici = await import('undici'); + const requestSpy = vi.spyOn(undici, 'request').mockImplementation(async (_url, init) => { + captured = JSON.parse(init?.body as string); + return { + statusCode: 200, + body: { json: async () => ({ response: '{"title":"t","summary":"s","tags":[],"due_date":null}' }) } + } as never; + }); + const provider = new LocalOllamaProvider({ endpoint: 'http://x', model: 'gemma4:e4b' }); + await provider.generate( + { text: 'hi', todayKst: '2026-05-10', dueDateCandidates: [], images: [{ base64: 'AAAA', mime: 'image/png' }] }, + { visionModel: 'gemma3:12b-vision' } + ); + expect(captured.model).toBe('gemma3:12b-vision'); + expect(captured.prompt).toContain('이미지'); + expect(captured.images).toEqual(['AAAA']); + requestSpy.mockRestore(); + }); + + it('visionModel 있어도 images 없으면 text-only path', async () => { + let captured: { model?: string; images?: unknown } = {}; + const undici = await import('undici'); + const requestSpy = vi.spyOn(undici, 'request').mockImplementation(async (_url, init) => { + captured = JSON.parse(init?.body as string); + return { + statusCode: 200, + body: { json: async () => ({ response: '{"title":"t","summary":"s","tags":[],"due_date":null}' }) } + } as never; + }); + const provider = new LocalOllamaProvider({ endpoint: 'http://x', model: 'gemma4:e4b' }); + await provider.generate( + { text: 'hi', todayKst: '2026-05-10', dueDateCandidates: [] }, + { visionModel: 'gemma3:12b-vision' } + ); + expect(captured.model).toBe('gemma4:e4b'); + expect(captured.images).toBeUndefined(); + requestSpy.mockRestore(); + }); + + it('opts 미전달 → 기존 text-only (회귀)', async () => { + let captured: { model?: string; images?: unknown } = {}; + const undici = await import('undici'); + const requestSpy = vi.spyOn(undici, 'request').mockImplementation(async (_url, init) => { + captured = JSON.parse(init?.body as string); + return { + statusCode: 200, + body: { json: async () => ({ response: '{"title":"t","summary":"s","tags":[],"due_date":null}' }) } + } as never; + }); + const provider = new LocalOllamaProvider({ endpoint: 'http://x', model: 'gemma4:e4b' }); + await provider.generate({ text: 'hi', todayKst: '2026-05-10', dueDateCandidates: [] }); + expect(captured.model).toBe('gemma4:e4b'); + expect(captured.images).toBeUndefined(); + requestSpy.mockRestore(); + }); +}); +``` + +(기존 LocalOllamaProvider.test.ts 의 mock 패턴 따름. test file 의 imports + vi.mock 은 그대로 사용.) + +- [ ] **Step 2: implementation** — `LocalOllamaProvider.generate` body 분기: + +```ts +import { buildVisionPrompt } from './visionPrompt.js'; +// ... + +async generate(input: GenerateInput, opts?: GenerateOptions): Promise { + const useVision = !!opts?.visionModel && (input.images?.length ?? 0) > 0; + const model = useVision ? opts!.visionModel! : this.model; + const prompt = useVision + ? buildVisionPrompt(input.text, input.todayKst, input.dueDateCandidates, input.vocab ?? []) + : buildPrompt(input.text, input.todayKst, input.dueDateCandidates, input.vocab ?? []); + + this.abortController = new AbortController(); + const timer = setTimeout(() => this.abortController?.abort(), this.timeoutMs); + try { + const body: Record = { + model, + prompt, + format: 'json', + stream: false, + options: { temperature: this.temperature, num_predict: this.numPredict } + }; + if (useVision) { + body.images = input.images!.map((i) => i.base64); + } + const res = await request(`${this.endpoint}/api/generate`, { + method: 'POST', + headers: { 'content-type': 'application/json' }, + body: JSON.stringify(body), + signal: this.abortController.signal + }); + // ... 기존 parse + } finally { + // ... + } +} +``` + +- [ ] **Step 3: PASS + commit** + +```bash +npm run typecheck +npx vitest run tests/unit/LocalOllamaProvider.test.ts +git add src/main/ai/LocalOllamaProvider.ts tests/unit/LocalOllamaProvider.test.ts +git commit -m "feat(v031): LocalOllamaProvider vision path (visionModel + images → body.images base64)" +``` + +--- + +## Task 5: AiWorker — vision integration + 5MB cap + settings 의존성 + +**Files:** + +- Modify: `src/main/ai/AiWorker.ts` +- Modify: `tests/unit/AiWorker.test.ts` +- Create: `tests/unit/AiWorker.vision.test.ts` + +AiWorker 가 `note.media + visionModel` 조건에서 base64 변환 (5MB cap) + provider.generate 에 images + visionModel 전달. 생성자에 `settings: SettingsService` 의존성 추가. + +- [ ] **Step 1: AiWorker 생성자 변경** — settings 파라미터 추가. `src/main/index.ts` 의 인스턴스 생성도 갱신. + +- [ ] **Step 2: AiWorker.processJob 갱신**: + +```ts +import { readFile } from 'node:fs/promises'; +// 클래스 안 generate 호출 직전: +const visionModel = await this.settings.getVisionModel(); +let images: Array<{ base64: string; mime: string }> | undefined; +if (visionModel && note.media.length > 0) { + images = await Promise.all( + note.media.map(async (m) => { + const buf = await readFile(this.mediaStore.absolutePath(m.relPath)); + if (buf.byteLength > 5 * 1024 * 1024) { + throw new Error(`image ${m.relPath} exceeds 5MB cap`); + } + return { base64: buf.toString('base64'), mime: m.mime }; + }) + ); +} +const res = await this.holder.get().generate( + { text: note.rawText, images, todayKst: todayIso, dueDateCandidates: candidates, vocab }, + { visionModel: visionModel ?? undefined } +); +``` + +`mediaStore: MediaStore` 도 AiWorker 생성자에 신규 파라미터 (현재 없으면 추가; main 에서 주입). + +- [ ] **Step 3: failing test** — `tests/unit/AiWorker.vision.test.ts`: + +```ts +import { describe, it, expect, beforeEach, vi } from 'vitest'; +import { writeFile, mkdtemp, mkdir, rm } from 'node:fs/promises'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; +import Database from 'better-sqlite3'; +import { runMigrations } from '../../src/main/db/migrations/index.js'; +import { NoteRepository } from '../../src/main/repository/NoteRepository.js'; +import { AiWorker } from '../../src/main/ai/AiWorker.js'; +import { MediaStore } from '../../src/main/services/MediaStore.js'; + +describe('AiWorker — vision path (v0.3.1 Cut F)', () => { + let db: Database.Database; + let repo: NoteRepository; + let workDir: string; + let mediaStore: MediaStore; + + beforeEach(async () => { + db = new Database(':memory:'); + db.pragma('foreign_keys = ON'); + runMigrations(db); + repo = new NoteRepository(db); + workDir = await mkdtemp(join(tmpdir(), 'inkling-vision-')); + mediaStore = new MediaStore(workDir); + }); + + afterEach(async () => { + db.close(); + await rm(workDir, { recursive: true, force: true }); + }); + + it('visionModel + media 있음 → provider.generate 가 images + opts 받음', async () => { + const { id } = repo.create({ rawText: '이미지 메모' }); + const mediaPath = join(workDir, 'media', id, '1.png'); + await mkdir(join(workDir, 'media', id), { recursive: true }); + await writeFile(mediaPath, Buffer.from([0x89, 0x50, 0x4e, 0x47])); // 4 bytes PNG-ish + repo.insertMedia([{ noteId: id, kind: 'image', relPath: `media/${id}/1.png`, mime: 'image/png', bytes: 4 }]); + + const generate = vi.fn(async () => ({ title: 't', summary: 's', tags: [], dueDate: null })); + const provider = { name: 'fake', generate, abort: () => {} }; + const settings = { + getVisionModel: vi.fn(async () => 'gemma3:12b-vision'), + isAiEnabled: vi.fn(async () => true) + } as unknown as never; + const worker = new AiWorker(/* ...deps with settings + mediaStore + repo + holder = { get: () => provider } */); + await worker['processJob']({ noteId: id, attempts: 0, nextRunAt: '' }); + + expect(generate).toHaveBeenCalledWith( + expect.objectContaining({ images: expect.any(Array) }), + expect.objectContaining({ visionModel: 'gemma3:12b-vision' }) + ); + const callArg = generate.mock.calls[0]![0] as { images: Array<{ base64: string; mime: string }> }; + expect(callArg.images).toHaveLength(1); + expect(callArg.images[0]!.mime).toBe('image/png'); + }); + + it('visionModel 없으면 text-only (회귀)', async () => { + const { id } = repo.create({ rawText: 'just text' }); + const generate = vi.fn(async () => ({ title: 't', summary: 's', tags: [], dueDate: null })); + const provider = { name: 'fake', generate, abort: () => {} }; + const settings = { + getVisionModel: vi.fn(async () => null), + isAiEnabled: vi.fn(async () => true) + } as unknown as never; + const worker = new AiWorker(/* ... */); + await worker['processJob']({ noteId: id, attempts: 0, nextRunAt: '' }); + expect(generate).toHaveBeenCalledWith( + expect.not.objectContaining({ images: expect.anything() }), + expect.any(Object) + ); + }); + + it('5MB 초과 이미지 → throw → ai_status=failed', async () => { + const { id } = repo.create({ rawText: 'big image' }); + const mediaPath = join(workDir, 'media', id, '1.png'); + await mkdir(join(workDir, 'media', id), { recursive: true }); + await writeFile(mediaPath, Buffer.alloc(6 * 1024 * 1024)); // 6 MB + repo.insertMedia([{ noteId: id, kind: 'image', relPath: `media/${id}/1.png`, mime: 'image/png', bytes: 6 * 1024 * 1024 }]); + + const generate = vi.fn(async () => ({ title: 't', summary: 's', tags: [], dueDate: null })); + const settings = { + getVisionModel: vi.fn(async () => 'gemma3:12b-vision'), + isAiEnabled: vi.fn(async () => true) + } as unknown as never; + const worker = new AiWorker(/* ... */); + await worker['processJob']({ noteId: id, attempts: 0, nextRunAt: '' }); + // 5MB cap 초과 throw → AiWorker 의 attempts 증가 분기 → ai_status='failed' + const note = repo.findById(id); + expect(['failed', 'pending']).toContain(note!.aiStatus); // attempts 모두 소진 시 'failed'; 첫 시도 throw 시 'pending' 유지 가능 — 구현 의존 + }); +}); +``` + +(NOTE: 정확한 AiWorker 생성자 인자 — 기존 test 의 setup 패턴 따라 deps 전체 stub 구성. 위 코드는 outline; 실수행자가 기존 `AiWorker.test.ts` setup 참고하여 정확한 deps 구조 채움.) + +- [ ] **Step 4: 기존 AiWorker.test.ts mock 갱신** — 생성자에 `settings` / `mediaStore` 파라미터 추가됨. 모든 기존 test 의 worker 생성 site 에 stub 추가. + +- [ ] **Step 5: PASS + commit** + +```bash +npm run typecheck +npx vitest run tests/unit/AiWorker.test.ts tests/unit/AiWorker.vision.test.ts +git add src/main/ai/AiWorker.ts \ + src/main/index.ts \ + tests/unit/AiWorker.test.ts \ + tests/unit/AiWorker.vision.test.ts +git commit -m "feat(v031): AiWorker vision integration — note.media + visionModel + 5MB cap" +``` + +--- + +## Task 6: types + IPC + preload + +**Files:** + +- Modify: `src/shared/types.ts` — `getSettings()` 반환에 vision_model / vision_capable_cache / vision_cache_at + InboxApi 3 메서드 +- Modify: `src/main/ipc/settingsApi.ts` — 3 IPC handler +- Modify: `src/preload/index.ts` — 3 bridge +- Create: `tests/unit/vision-ipc.test.ts` + +3 채널: +- `settings:get-vision-models` → `{ models: string[]; at: string | null; selected: string | null }` (cache 결과 + 현재 선택) +- `settings:set-vision-model` (value: string | null) → `{ ok: true }` +- `settings:refresh-vision-cache` → `{ ok: true; models: string[] } | { ok: false; reason: string }` (refreshVisionCache 호출) + +상세 패턴은 Cut E sync IPC 와 동일. + +- [ ] **Step 1: types** + **Step 2: failing test** + **Step 3: handlers** + **Step 4: preload bridges** — Cut E sync-ipc 패턴 그대로 + +- [ ] **Step 5: PASS + commit** + +```bash +git add src/shared/types.ts src/main/ipc/settingsApi.ts src/preload/index.ts tests/unit/vision-ipc.test.ts +git commit -m "feat(v031): vision IPC + preload (get-vision-models / set / refresh)" +``` + +--- + +## Task 7: VisionSection UI + AI 제공자 섹션 통합 + +**Files:** + +- Create: `src/renderer/inbox/components/settings/VisionSection.tsx` +- Modify: `src/renderer/inbox/components/settings/AiProviderSection.tsx` 또는 SettingsPage — 마운트 +- Create: `tests/unit/VisionSection.test.tsx` + +dropdown (cache 기반) + 다시 감지 버튼 + 마지막 감지 시각 표시. dropdown 변경 시 `setVisionModel` 호출. 다시 감지 → `refreshVisionCache` IPC + dropdown 갱신. + +```tsx +// 핵심 구조 (Cut E SyncSection 패턴) +const [models, setModels] = useState([]); +const [at, setAt] = useState(null); +const [selected, setSelected] = useState(null); +const [busy, setBusy] = useState<'select' | 'refresh' | null>(null); + +useEffect(() => { + void (async () => { + const r = await inboxApi.getVisionModels(); + setModels(r.models); + setAt(r.at); + setSelected(r.selected); + })(); +}, []); + +async function onSelect(value: string) { + setBusy('select'); + await inboxApi.setVisionModel(value === '' ? null : value); + setSelected(value === '' ? null : value); + setBusy(null); +} + +async function onRefresh() { + setBusy('refresh'); + const r = await inboxApi.refreshVisionCache(); + setBusy(null); + if (r.ok) { + const cache = await inboxApi.getVisionModels(); + setModels(cache.models); + setAt(cache.at); + } +} +``` + +UI: + +```tsx + + +{at !== null && 마지막 감지: {new Date(at).toLocaleString('ko-KR')}} +``` + +- [ ] **Step 1-5: 컴포넌트 + test + 마운트 + commit** + +```bash +git add src/renderer/inbox/components/settings/VisionSection.tsx \ + src/renderer/inbox/components/settings/AiProviderSection.tsx \ + tests/unit/VisionSection.test.tsx +git commit -m "feat(v031): VisionSection — dropdown + 다시 감지 + 마지막 감지 시각" +``` + +--- + +## Task 8: main process — refreshVisionCache 자동 호출 + AiWorker settings 주입 + +**Files:** + +- Modify: `src/main/index.ts` + +`whenReady` 안 (Ollama provider 준비 후) `void refreshVisionCache(...)` fire-and-forget 호출. AiWorker 생성자에 settings + mediaStore 주입. + +- [ ] **Step 1: imports + 호출** — `src/main/index.ts`: + +```ts +import { refreshVisionCache } from './services/VisionDetect.js'; + +// whenReady 안, AiWorker.start() 직후 또는 직전 +const ollama = providerHolder.get(); +void refreshVisionCache({ + settings: settingsSvc, + endpoint: (ollama as LocalOllamaProvider).endpoint, // 또는 SettingsService 의 ollama 설정에서 가져옴 +}).catch(() => {}); +``` + +(LocalOllamaProvider 의 endpoint 가 private 이면 settings 에서 가져옴 또는 provider 에 getter 추가.) + +- [ ] **Step 2: AiWorker 생성자 인자 갱신** + +- [ ] **Step 3: typecheck + PASS + commit** + +```bash +npm run typecheck +npx vitest run +git add src/main/index.ts +git commit -m "feat(v031): main — refreshVisionCache whenReady + AiWorker settings/mediaStore 주입" +``` + +--- + +## Task 9: dogfood promoted + version bump + release commit + +- [ ] F24 promoted 마킹 (`docs/superpowers/specs/2026-04-25-dogfood-feedback.md`): + +```markdown +## F24. 멀티모달 vision (✅ promoted v0.3.1 Cut F) + +**상태:** ✅ promoted v0.3.1 Cut F — Ollama vision 모델 (gemma3 family default) 활용. capability detection (app launch + manual refresh) + Configure UI dropdown + AiWorker vision integration (5MB cap + base64 변환). 자동 fallback (caption → text) deferred v0.3.2+. +``` + +- [ ] package.json: 0.3.0 → 0.3.1 + package-lock.json +- [ ] full unit + typecheck + +```bash +git add docs/superpowers/specs/2026-04-25-dogfood-feedback.md package.json package-lock.json +git commit -m "chore(release): v0.3.1 — Cut F (멀티모달 vision AI)" +``` + +--- + +## Self-Review Checklist (수행자: 모든 task 완료 후 1회 점검) + +- [ ] **Spec coverage**: §3 Capability Detection (Task 1) / §3-2 SettingsService (Task 2) / §3-3 main wiring (Task 8) / §3-4 UI (Task 7) / §4 Provider (Tasks 3-4) / §5 AiWorker (Task 5) / §6 image-only fallback ('skipped' enum 미도입 → 기존 'failed' 분기 활용) +- [ ] **Single write path 강제 (Cut C/D/E 정책)**: 본 cut 은 새 데이터 path 추가 없음 — `notes_fts` / `note_revisions` / `note_tags` mutation 없음 (vision 결과는 기존 `updateAiResult` path 활용 → 이미 검증됨). 회귀 검사 4-path invariant 유지. +- [ ] **Type 일관성**: `GenerateInput.images` ↔ `GenerateOptions.visionModel` ↔ AiWorker 호출 ↔ LocalOllamaProvider body 모두 동일 shape +- [ ] **단위 카운트**: VisionDetect 9 (5+4) + SettingsService 4 + visionPrompt 2 + LocalOllamaProvider 3 + AiWorker 3 + IPC 3-5 + UI 1 = 약 25-27 신규. 목표 22 달성 + +--- + +## Risk + +- **vision 모델 한국어 정확도**: gemma3 family 가 한국어 약하면 다른 family 추천 갱신 (메모리 정책). dogfood 검증 필요 +- **Ollama 가 vision images 무시 (모델 misclassify)**: capability detection false-positive — 사용자가 dropdown 에서 다른 모델 선택해 우회. 자동 fallback 미구현 (YAGNI) +- **base64 메모리 폭주**: 5MB cap 적용. 다중 이미지 시 N×5MB = 메모리 누적 — vision 호출 후 image array 즉시 GC. 본 cut 의 dogfood 규모 (메모당 < 3 이미지) 무시 +- **capability detection 실패 silent**: 첫 launch 시 network 실패 → cache 빈 채로 진행. 사용자가 설정 페이지에서 "다시 감지" 클릭 → 직접 trigger 가능 +- **AiWorker 생성자 변경**: 기존 test 모두 mock 갱신 필요 (typecheck 가 catch). 누락 시 typecheck red +- **F23 OFF (ai_enabled=false) 시 자동 OFF**: refreshVisionCache 가 ai_enabled 체크 → ai_disabled 분기. AiWorker 의 vision path 진입 자체가 ai_enabled=true 가정 — F23 OFF 시 vision path 미도달 (자명) +- **e2e**: Cut C/D/E 와 동일 — 본 cut 미수행, main 머지 후 검증 diff --git a/docs/superpowers/specs/2026-05-09-v031-cut-f-design.md b/docs/superpowers/specs/2026-05-09-v031-cut-f-design.md index ae5767b..b924588 100644 --- a/docs/superpowers/specs/2026-05-09-v031-cut-f-design.md +++ b/docs/superpowers/specs/2026-05-09-v031-cut-f-design.md @@ -56,36 +56,59 @@ function isVisionCapable(model: { name: string; details?: { family?: string; fam } ``` -### 3-2. Settings storage +### 3-2. Settings storage (실제 SettingsService API) + +zod schema 확장 (기존 ai_enabled / sync_* 와 동일 strict 패턴): ```ts -interface SettingsSchema { - // ... 기존 - vision_model?: string; // 사용자 명시 모델 (빈 값 = 비활성) - vision_capable_cache?: string[]; // launch 시 detected 결과 cache - vision_cache_at?: string; // ISO timestamp -} +const SettingsSchema = z.object({ + // ... 기존 ollama / ai_enabled / onboarding_completed / sync_* + vision_model: z.string().nullable().optional(), + vision_capable_cache: z.array(z.string()).optional(), + vision_cache_at: z.string().optional() +}).strict(); +``` + +신규 SettingsService 메서드 (개별 setter/getter — `get/set` 일반화 X): + +```ts +async getVisionModel(): Promise; +async setVisionModel(value: string | null): Promise; +async getVisionCapableCache(): Promise<{ models: string[]; at: string | null }>; +async setVisionCapableCache(models: string[], now: Date): Promise; ``` ### 3-3. AppLaunchDetect -```ts -// src/main/index.ts whenReady 안 (settings 초기화 후) -async function refreshVisionCache(): Promise { - if (!settingsService.get('ai_enabled', true)) return; - try { - const tags = await fetch(`${endpoint}/api/tags`).then(r => r.json()); - const capable = tags.models.filter(isVisionCapable).map((m: any) => m.name); - settingsService.set('vision_capable_cache', capable); - settingsService.set('vision_cache_at', new Date().toISOString()); - } catch { - // network fail — silent, cache 유지 - } -} +`src/main/services/VisionDetect.ts` 신규 — pure 함수 + 외부 fetch 주입 (테스트 가능): -void refreshVisionCache(); +```ts +export async function refreshVisionCache(deps: { + settings: SettingsService; + endpoint: string; + now?: () => Date; + fetchImpl?: typeof fetch; +}): Promise<{ ok: true; models: string[] } | { ok: false; reason: string }> { + if (!(await deps.settings.isAiEnabled())) { + return { ok: false, reason: 'ai_disabled' }; + } + const fetchFn = deps.fetchImpl ?? fetch; + let body: { models?: Array<{ name: string; details?: { family?: string; families?: string[] } }> }; + try { + const r = await fetchFn(`${deps.endpoint}/api/tags`); + if (!r.ok) return { ok: false, reason: `tags http ${r.status}` }; + body = await r.json(); + } catch (e) { + return { ok: false, reason: `unreachable: ${(e as Error).message}` }; + } + const capable = (body.models ?? []).filter(isVisionCapable).map((m) => m.name); + await deps.settings.setVisionCapableCache(capable, deps.now ? deps.now() : new Date()); + return { ok: true, models: capable }; +} ``` +main process `whenReady` 안에서 fire-and-forget 호출. 실패 silent (cache 유지). settings:refresh-vision-cache IPC 가 동일 함수 호출 (manual "다시 감지" 버튼). + ### 3-4. 설정 페이지 UI (AI 제공자 섹션 확장) ``` @@ -166,44 +189,53 @@ ${text || '(이미지만 있음)'} --- -## 5. AiWorker 통합 +## 5. AiWorker 통합 (실제 API 정정) -CaptureService 가 capture 시 image 첨부했으면 → notes.media 에 저장 + pending_jobs INSERT. AiWorker 가 job 처리 시: +기존 `AiWorker.processJob` 이 `repo.findById(noteId)` 로 hydrate 된 `Note` 받음 — `note.media` 가 이미 join 결과로 채워져 있어 별도 `listMediaByNote` 호출 불필요. `MediaStore.absolutePath(relPath)` 로 디스크 path 추출. ```ts -// src/main/ai/AiWorker.ts -async processJob(noteId: string): Promise { - const note = this.repo.getById(noteId); - const media = this.repo.listMediaByNote(noteId); - const visionModel = this.settings.get('vision_model'); +// src/main/ai/AiWorker.ts processJob 흐름 +const note = this.repo.findById(job.noteId); +if (!note || ...) return; +const visionModel = await this.settings.getVisionModel(); - let images: Array<{ base64: string; mime: string }> | undefined; - if (visionModel && media.length > 0) { - images = await Promise.all(media.map(async (m) => ({ - base64: (await fs.readFile(this.mediaStore.absolutePath(m.relPath))).toString('base64'), - mime: m.mime - }))); - } - - const provider = this.providerHolder.get(); - const response = await provider.generate({ text: note.rawText, images, ... }, { visionModel }); - // ... 기존 결과 적용 +let images: Array<{ base64: string; mime: string }> | undefined; +if (visionModel && note.media.length > 0) { + images = await Promise.all( + note.media.map(async (m) => { + const buf = await readFile(this.mediaStore.absolutePath(m.relPath)); + // 이미지당 5MB cap (base64 메모리 폭주 방지) + if (buf.byteLength > 5 * 1024 * 1024) { + throw new Error(`image ${m.relPath} exceeds 5MB cap`); + } + return { base64: buf.toString('base64'), mime: m.mime }; + }) + ); } + +const res = await this.holder.get().generate({ + text: note.rawText, + images, + todayKst, + dueDateCandidates: candidates, + vocab +}, { visionModel }); ``` -`media.length > 0 && visionModel` 둘 다 true 일 때만 vision path. 그 외는 기존 text-only. +`visionModel && note.media.length > 0` 둘 다 true 일 때만 vision path. 그 외는 기존 text-only path 유지 (호환 보존). image 5MB cap 초과 시 throw → 기존 AiWorker 의 attempts 카운트 + ai_status='failed' 분기 활용. + +AiWorker 의 `settings: SettingsService` 의존성 추가 — 기존 생성자에 신규 파라미터. --- -## 6. 이미지만 있는 capture +## 6. 이미지만 있는 capture (정정 — 신규 enum 도입 X) -`raw_text` 빈 값 + media 첨부만: +`raw_text` 빈 값 + media 첨부만 케이스: -- 기존 동작: notes INSERT (raw_text=''), AiWorker 가 빈 prompt 로 호출 → ai_status='failed' 또는 무의미 응답 -- vision enabled: AiWorker 가 vision prompt + images → 의미 있는 title/summary/tags 응답 -- vision disabled (visionModel 빈 값): notes 저장만, ai_status='disabled' 신규 enum 활용 (Cut B 의 ai_enabled false 와 비슷한 의미 — 그러나 부분 disable, 즉 "이미지 only 라 처리 불가" 상태) +- **vision enabled** (`visionModel` 설정 + media 있음): AiWorker 의 vision path → 의미 있는 title/summary/tags 응답 +- **vision disabled** (`visionModel` null): 기존 text-only 흐름 그대로 — 빈 prompt → AI 응답이 무의미하면 ai_status='failed' 분기 (재시도 가능). dogfood 시 빈도 측정 후 'skipped' enum 도입 여부 재평가. -추천: vision disabled + image-only capture 시 `ai_status='skipped'` 신규 enum (Cut B 의 'disabled' 와 다름). title fallback = "(이미지 N개)" 또는 첫 이미지 파일명. +**'skipped' 신규 enum 미도입 (YAGNI)**: m008 마이그레이션 (CHECK relax via table recreate) 부담 + 이미지-only capture 가 본 cut 의 main use case 가 아님. 사용자가 vision 활성 후 retry 하거나 raw_text 추가 후 reprocess 하는 우회로 충분. 정책 검토는 dogfood 후 별도 cut. --- @@ -219,7 +251,7 @@ async processJob(noteId: string): Promise { | `AiWorker.processJob` vision integration | media + visionModel 있을 때만 base64 변환 | | 이미지 only capture | raw_text='' + media → vision 결과 정상 또는 'skipped' 분기 | -**목표**: 단위 555 → 약 575 (+20), typecheck 0. +**목표**: 단위 679 → 약 701 (+22, isVisionCapable 5 + refreshVisionCache 4 + SettingsService vision 4 + LocalOllamaProvider vision path 3 + buildVisionPrompt 2 + AiWorker vision integration 3 + UI dropdown 1), typecheck 0. --- @@ -231,7 +263,7 @@ async processJob(noteId: string): Promise { | 이미지 base64 메모리 부담 | media 1개당 평균 < 1MB. 다중 이미지 시 N×base64 = 메모리 N배. cap (이미지당 max size 5MB) 적용 | | capability detection 실패 시 fallback | cache 부재 → vision dropdown 비어있음 표시 + "다시 감지" 안내 | | vision 모델 한국어 정확도 | dogfood 검증. gemma3 가 한국어 약하면 다른 family 추천 갱신 (메모리 정책 갱신) | -| Ollama 가 vision images 필드 무시 (모델이 multimodal 미지원) | 자동 2단계 fallback — vision 모델로 caption 추출 → 텍스트 모델로 종합 (capability 부족 시) | +| Ollama 가 vision images 필드 무시 (모델이 multimodal 미지원) | **본 cut 미구현 (YAGNI)** — 자동 2단계 fallback (caption 추출 → 텍스트 모델 종합) 은 v0.3.2+ 검토. dogfood 시 capability detection 정확도 우선 | ---