docs(plan): v0.3.1 Cut F — 멀티모달 vision AI (spec 정정: 단위 679, SettingsService 개별 메서드, 'skipped' enum 미도입, fallback 미구현)

2026-05-10 04:38:45 +09:00
parent a54f134343
commit 7a56184ad2
2 changed files with 921 additions and 48 deletions
--- a/docs/superpowers/specs/2026-05-09-v031-cut-f-design.md
+++ b/docs/superpowers/specs/2026-05-09-v031-cut-f-design.md
@@ -56,36 +56,59 @@ function isVisionCapable(model: { name: string; details?: { family?: string; fam
 }
 ```

-### 3-2. Settings storage
+### 3-2. Settings storage (실제 SettingsService API)
+
+zod schema 확장 (기존 ai_enabled / sync_* 와 동일 strict 패턴):

 ```ts
-interface SettingsSchema {
-  // ... 기존
-  vision_model?: string;       // 사용자 명시 모델 (빈 값 = 비활성)
-  vision_capable_cache?: string[];  // launch 시 detected 결과 cache
-  vision_cache_at?: string;    // ISO timestamp
-}
+const SettingsSchema = z.object({
+  // ... 기존 ollama / ai_enabled / onboarding_completed / sync_*
+  vision_model: z.string().nullable().optional(),
+  vision_capable_cache: z.array(z.string()).optional(),
+  vision_cache_at: z.string().optional()
+}).strict();
+```
+
+신규 SettingsService 메서드 (개별 setter/getter — `get/set` 일반화 X):
+
+```ts
+async getVisionModel(): Promise<string | null>;
+async setVisionModel(value: string | null): Promise<void>;
+async getVisionCapableCache(): Promise<{ models: string[]; at: string | null }>;
+async setVisionCapableCache(models: string[], now: Date): Promise<void>;
 ```

 ### 3-3. AppLaunchDetect

-```ts
-// src/main/index.ts whenReady 안 (settings 초기화 후)
-async function refreshVisionCache(): Promise<void> {
-  if (!settingsService.get('ai_enabled', true)) return;
-  try {
-    const tags = await fetch(`${endpoint}/api/tags`).then(r => r.json());
-    const capable = tags.models.filter(isVisionCapable).map((m: any) => m.name);
-    settingsService.set('vision_capable_cache', capable);
-    settingsService.set('vision_cache_at', new Date().toISOString());
-  } catch {
-    // network fail — silent, cache 유지
-  }
-}
+`src/main/services/VisionDetect.ts` 신규 — pure 함수 + 외부 fetch 주입 (테스트 가능):

-void refreshVisionCache();
+```ts
+export async function refreshVisionCache(deps: {
+  settings: SettingsService;
+  endpoint: string;
+  now?: () => Date;
+  fetchImpl?: typeof fetch;
+}): Promise<{ ok: true; models: string[] } | { ok: false; reason: string }> {
+  if (!(await deps.settings.isAiEnabled())) {
+    return { ok: false, reason: 'ai_disabled' };
+  }
+  const fetchFn = deps.fetchImpl ?? fetch;
+  let body: { models?: Array<{ name: string; details?: { family?: string; families?: string[] } }> };
+  try {
+    const r = await fetchFn(`${deps.endpoint}/api/tags`);
+    if (!r.ok) return { ok: false, reason: `tags http ${r.status}` };
+    body = await r.json();
+  } catch (e) {
+    return { ok: false, reason: `unreachable: ${(e as Error).message}` };
+  }
+  const capable = (body.models ?? []).filter(isVisionCapable).map((m) => m.name);
+  await deps.settings.setVisionCapableCache(capable, deps.now ? deps.now() : new Date());
+  return { ok: true, models: capable };
+}
 ```

+main process `whenReady` 안에서 fire-and-forget 호출. 실패 silent (cache 유지). settings:refresh-vision-cache IPC 가 동일 함수 호출 (manual "다시 감지" 버튼).
+
 ### 3-4. 설정 페이지 UI (AI 제공자 섹션 확장)

 ```
@@ -166,44 +189,53 @@ ${text || '(이미지만 있음)'}

 ---

-## 5. AiWorker 통합
+## 5. AiWorker 통합 (실제 API 정정)

-CaptureService 가 capture 시 image 첨부했으면 → notes.media 에 저장 + pending_jobs INSERT. AiWorker 가 job 처리 시:
+기존 `AiWorker.processJob` 이 `repo.findById(noteId)` 로 hydrate 된 `Note` 받음 — `note.media` 가 이미 join 결과로 채워져 있어 별도 `listMediaByNote` 호출 불필요. `MediaStore.absolutePath(relPath)` 로 디스크 path 추출.

 ```ts
-// src/main/ai/AiWorker.ts
-async processJob(noteId: string): Promise<void> {
-  const note = this.repo.getById(noteId);
-  const media = this.repo.listMediaByNote(noteId);
-  const visionModel = this.settings.get('vision_model');
+// src/main/ai/AiWorker.ts processJob 흐름
+const note = this.repo.findById(job.noteId);
+if (!note || ...) return;
+const visionModel = await this.settings.getVisionModel();

-  let images: Array<{ base64: string; mime: string }> | undefined;
-  if (visionModel && media.length > 0) {
-    images = await Promise.all(media.map(async (m) => ({
-      base64: (await fs.readFile(this.mediaStore.absolutePath(m.relPath))).toString('base64'),
-      mime: m.mime
-    })));
-  }
-
-  const provider = this.providerHolder.get();
-  const response = await provider.generate({ text: note.rawText, images, ... }, { visionModel });
-  // ... 기존 결과 적용
+let images: Array<{ base64: string; mime: string }> | undefined;
+if (visionModel && note.media.length > 0) {
+  images = await Promise.all(
+    note.media.map(async (m) => {
+      const buf = await readFile(this.mediaStore.absolutePath(m.relPath));
+      // 이미지당 5MB cap (base64 메모리 폭주 방지)
+      if (buf.byteLength > 5 * 1024 * 1024) {
+        throw new Error(`image ${m.relPath} exceeds 5MB cap`);
+      }
+      return { base64: buf.toString('base64'), mime: m.mime };
+    })
+  );
 }
+
+const res = await this.holder.get().generate({
+  text: note.rawText,
+  images,
+  todayKst,
+  dueDateCandidates: candidates,
+  vocab
+}, { visionModel });
 ```

-`media.length > 0 && visionModel` 둘 다 true 일 때만 vision path. 그 외는 기존 text-only.
+`visionModel && note.media.length > 0` 둘 다 true 일 때만 vision path. 그 외는 기존 text-only path 유지 (호환 보존). image 5MB cap 초과 시 throw → 기존 AiWorker 의 attempts 카운트 + ai_status='failed' 분기 활용.
+
+AiWorker 의 `settings: SettingsService` 의존성 추가 — 기존 생성자에 신규 파라미터.

 ---

-## 6. 이미지만 있는 capture
+## 6. 이미지만 있는 capture (정정 — 신규 enum 도입 X)

-`raw_text` 빈 값 + media 첨부만:
+`raw_text` 빈 값 + media 첨부만 케이스:

- 기존 동작: notes INSERT (raw_text=''), AiWorker 가 빈 prompt 로 호출 → ai_status='failed' 또는 무의미 응답
- vision enabled: AiWorker 가 vision prompt + images → 의미 있는 title/summary/tags 응답
- vision disabled (visionModel 빈 값): notes 저장만, ai_status='disabled' 신규 enum 활용 (Cut B 의 ai_enabled false 와 비슷한 의미 — 그러나 부분 disable, 즉 "이미지 only 라 처리 불가" 상태)
+- **vision enabled** (`visionModel` 설정 + media 있음): AiWorker 의 vision path → 의미 있는 title/summary/tags 응답
+- **vision disabled** (`visionModel` null): 기존 text-only 흐름 그대로 — 빈 prompt → AI 응답이 무의미하면 ai_status='failed' 분기 (재시도 가능). dogfood 시 빈도 측정 후 'skipped' enum 도입 여부 재평가.

-추천: vision disabled + image-only capture 시 `ai_status='skipped'` 신규 enum (Cut B 의 'disabled' 와 다름). title fallback = "(이미지 N개)" 또는 첫 이미지 파일명.
+**'skipped' 신규 enum 미도입 (YAGNI)**: m008 마이그레이션 (CHECK relax via table recreate) 부담 + 이미지-only capture 가 본 cut 의 main use case 가 아님. 사용자가 vision 활성 후 retry 하거나 raw_text 추가 후 reprocess 하는 우회로 충분. 정책 검토는 dogfood 후 별도 cut.

 ---

@@ -219,7 +251,7 @@ async processJob(noteId: string): Promise<void> {
 | `AiWorker.processJob` vision integration | media + visionModel 있을 때만 base64 변환 |
 | 이미지 only capture | raw_text='' + media → vision 결과 정상 또는 'skipped' 분기 |

-**목표**: 단위 555 → 약 575 (+20), typecheck 0.
+**목표**: 단위 679 → 약 701 (+22, isVisionCapable 5 + refreshVisionCache 4 + SettingsService vision 4 + LocalOllamaProvider vision path 3 + buildVisionPrompt 2 + AiWorker vision integration 3 + UI dropdown 1), typecheck 0.

 ---

@@ -231,7 +263,7 @@ async processJob(noteId: string): Promise<void> {
 | 이미지 base64 메모리 부담 | media 1개당 평균 < 1MB. 다중 이미지 시 N×base64 = 메모리 N배. cap (이미지당 max size 5MB) 적용 |
 | capability detection 실패 시 fallback | cache 부재 → vision dropdown 비어있음 표시 + "다시 감지" 안내 |
 | vision 모델 한국어 정확도 | dogfood 검증. gemma3 가 한국어 약하면 다른 family 추천 갱신 (메모리 정책 갱신) |
-| Ollama 가 vision images 필드 무시 (모델이 multimodal 미지원) | 자동 2단계 fallback — vision 모델로 caption 추출 → 텍스트 모델로 종합 (capability 부족 시) |
+| Ollama 가 vision images 필드 무시 (모델이 multimodal 미지원) | **본 cut 미구현 (YAGNI)** — 자동 2단계 fallback (caption 추출 → 텍스트 모델 종합) 은 v0.3.2+ 검토. dogfood 시 capability detection 정확도 우선 |

 ---