Technical Decisions - aleksandar.app

# Technical Decisions Record > Vibe Reader — Architecture Decision Records (ADRs) > Documents the "why" behind significant technical choices > Last Updated: 2026-02-09 --- ## Format Each decision follows this structure: - **Context:** The situation and constraints - **Options Considered:** Alternatives evaluated - **Decision:** What we chose and why - **Consequences:** Trade-offs and implications - **Status:** `Accepted` | `Superseded` | `Proposed` --- ## TDR-001: Relational Database Schema **Date:** 2026-01-10 **Status:** Accepted **Related:** [[Feature Backlog#Completed]] ### Context The initial MVP used a flat logging structure where Words and Quotes were stored independently. As the product evolved, we needed to: 1. Associate captures with specific reading sessions (time-based grouping) 2. Associate captures with books (entity-based grouping) 3. Support queries like "show all captures from The Great Gatsby" regardless of session ### Options Considered | Option | Description | Pros | Cons | |--------|-------------|------|------| | **A. Flat Tables** | Words and Quotes as independent tables with session_id | Simple | Can't query by book without joins; no book-level aggregation | | **B. Nested JSON** | Store captures as JSON blob per session | Flexible schema | Poor query performance; no relational integrity | | **C. Relational Hierarchy** | Book → Session → Captures with foreign keys | Normalized; flexible queries; data integrity | More complex schema; requires migrations | ### Decision **Option C: Relational Hierarchy** Schema structure: ``` Book (book_id, title, created_at) └── Session (session_id, book_id, display_name, start_time, end_time, status) ├── Word (word_id, book_id, session_id, term, definition, timestamp) └── Quote (quote_id, book_id, session_id, content, timestamp) ``` Key insight: Captures are **dual-tagged** to both Session (when) and Book (what). This enables: - "Show captures from this session" (time-scoped) - "Show all captures from this book" (entity-scoped) ### Consequences - ✅ Supports Book Layer view in Archive - ✅ Enables future features like "book-level stats" - ✅ Foreign key constraints prevent orphaned data - ⚠️ Requires `CASCADE` delete rules (deleting a Session deletes its captures) - ⚠️ Schema changes require migration strategy (using `fallbackToDestructiveMigration` for now) --- ## TDR-002: Lock Screen Notification Strategy **Date:** 2026-01-18 **Status:** Superseded by TDR-008 (2026-02-07) **Related:** [[Bug Tracker#BUG-R008]], [[Feature_Backlog#Smart Capture]] ### Context The core product hypothesis requires capturing words/quotes from the lock screen with minimal friction. Android provides several notification styles with different lock screen behaviors. ### Options Considered | Option | Description | Pros | Cons | |--------|-------------|------|------| | **A. Standard Notification** | Basic notification with action buttons | Reliable; always renders | Small buttons; not prominent on lock screen | | **B. MediaStyle Notification** | Mimics music player with large controls | Prominent lock screen placement; single-tap capture | Android 13+ aggressively hides "fake" media players; requires metadata | | **C. Full-Screen Intent** | Like incoming calls; takes over screen | Guaranteed visibility | Too intrusive; user must dismiss | | **D. Quick Settings Tile** | Custom tile in notification shade | Always accessible | Requires swipe down; 2 taps minimum | | **E. Accessibility Service** | Intercept gestures/buttons | Zero-tap possible | Play Store policy concerns; user trust issues | ### Decision **Option B (MediaStyle) as primary, with Option A as fallback** Implementation approach ("Trojan Horse"): 1. Create `MediaSessionCompat` to register as media app 2. Set metadata (book title as "artist", "Tap ▶ to Capture" as title, app icon as album art) 3. Set `PlaybackState.STATE_PAUSED` so play button (▶) shows 4. Play button action → launches `SpeechCaptureActivity` 5. Fallback: If activity launch fails, show high-priority notification with `fullScreenIntent` Key learnings: - Android 13+ requires sufficient metadata or it hides the player - `FOREGROUND_SERVICE_MEDIA_PLAYBACK` permission required - Activity launch from background requires `USE_FULL_SCREEN_INTENT` permission - PendingIntent must target Activity directly (not Service → Activity trampoline) ### Consequences - ✅ Prominent lock screen placement when it works - ✅ Single-tap to capture (lowest C2C achievable) - ⚠️ Inconsistent behavior on some devices/Android versions (see [[Bug Tracker#BUG-001]]) - ⚠️ Feels like a "hack" — may break with future Android updates - 📋 TODO: Consider Option D (Quick Settings Tile) as user-selectable alternative --- ## TDR-003: Dictionary API Strategy **Date:** 2026-01-18 **Status:** Accepted **Related:** [[Feature_Backlog#Multi-Word Concept Support]] ### Context Users want to define: 1. **Single words** (e.g., "hegemony") — standard dictionary lookup 2. **Multi-word phrases** (e.g., "carpe diem", "ad hoc") — may exist in dictionary 3. **Concepts/entities** (e.g., "United States", "Pythagorean theorem") — not in dictionary Current implementation uses Free Dictionary API which handles (1) and some of (2), but fails on (3). ### Options Considered | Option | Description | Pros | Cons | |--------|-------------|------|------| | **A. Dictionary API only** | Free Dictionary API for all lookups | Simple; fast | Fails on concepts and many phrases | | **B. Dictionary → Wikipedia fallback** | Try dictionary first; if 404, try Wikipedia API | Covers most cases | Two API calls for concepts; latency | | **C. Wikipedia only** | Use Wikipedia for everything | Comprehensive | Overkill for simple words; slower | | **D. AI-powered definition** | Send to LLM for definition | Handles anything | Latency; cost; requires API key | | **E. User choice** | Show "Define" vs "Look up" buttons | User controls intent | Adds friction; more UI complexity | ### Decision **Option B: Dictionary → Wikipedia fallback** ### Implementation (2026-01-18) New files created: - `WikipediaApiService.kt` — Retrofit interface for Wikipedia REST API - `WikipediaClient.kt` — Singleton client for `https://en.wikipedia.org/api/rest_v1/` Modified `SpeechCaptureActivity.kt`: - Added "Define This Instead" TextButton (only visible for detected quotes) - Updated `defineWord()` with fallback chain: 1. Try Free Dictionary API 2. If fails → Try Wikipedia API `/page/summary/{title}` 3. If 404 → Show "Definition not found" - Wikipedia results prefixed with `(Wikipedia)` for source clarity ### Consequences - ✅ Covers words, phrases, and concepts - ✅ No additional user friction for common case (single words) - ✅ User can explicitly choose "Define This Instead" for phrases - ⚠️ Wikipedia summaries may be longer than dictionary definitions (truncated at 200 chars) - ⚠️ Need to handle Wikipedia disambiguation pages gracefully - 📋 Future: Consider caching frequent lookups locally --- ## TDR-004: Dependency Injection Pattern **Date:** 2026-01-10 **Status:** Accepted **Related:** Product Diary 2025-11-12 ### Context Initial implementation used Hilt for dependency injection. Build failures occurred due to KSP/Kotlin version mismatches, consuming significant development time. ### Options Considered | Option | Description | Pros | Cons | |--------|-------------|------|------| | **A. Hilt** | Google's recommended DI framework | Powerful; scalable; Android-aware | Complex setup; version sensitivity; slower builds | | **B. Koin** | Lightweight Kotlin DI | Simpler than Hilt; less boilerplate | Still a dependency; runtime vs compile-time | | **C. Manual DI / Singleton** | Hand-rolled dependency management | Zero dependencies; full control; fast builds | More boilerplate; must manage lifecycle manually | ### Decision **Option C: Manual DI / Singleton Pattern** Implementation: - `AppDatabase.getDatabase(context)` — Singleton Room instance - `SessionViewModelFactory` — Manual factory for ViewModel injection - Direct instantiation in `MainActivity` and `ReadingSessionService` ### Consequences - ✅ Build stability restored immediately - ✅ Simpler codebase; easier to understand - ✅ Faster build times - ⚠️ Must manually ensure singleton lifecycle - ⚠️ May need to revisit if app grows significantly - 📝 Lesson: For MVPs, prefer simplicity over "best practices" --- ## TDR-007: Weekly Vibe Redesign — Structured Output & Safeguards **Date:** 2026-02-05 **Status:** Accepted **Related:** [[Feature Backlog#Weekly Vibe Redesign]] ### Context The initial Weekly Vibe implementation (TDR not documented) used prose-based output from Gemini: - **Problem 1:** Verbose 3-paragraph prose was hard to scan; didn't feel like a "vibe" - **Problem 2:** High token usage (~1024 output tokens, ~200+ input tokens for definitions) - **Problem 3:** No safeguard against redundant API calls (user could spam "Generate" with no new data) Goal: Create a Spotify Wrapped-style experience that's visually engaging, token-efficient, and prevents wasteful API calls. ### Options Considered **Output Format:** | Option | Description | Pros | Cons | |--------|-------------|------|------| | **A. Prose** | Free-form paragraphs (current) | Natural language; flexible | Hard to style; inconsistent length | | **B. Structured JSON** | Request specific fields in JSON format | Predictable UI; constrained output | Requires parsing; model may hallucinate structure | | **C. Markdown with headers** | Semi-structured with `##` sections | Easy to parse | Still variable length; harder to style | **Safeguard Strategy:** | Option | Description | Pros | Cons | |--------|-------------|------|------| | **A. Cooldown timer** | Disable button for X minutes after generation | Simple | Arbitrary; blocks legitimate re-generation | | **B. Data fingerprint** | Hash captures; only regenerate if hash changes | Precise | Complex; hash computation overhead | | **C. Session count tracking** | Track session count at generation time; enable if new sessions | Simple; meaningful | Doesn't catch new captures within same session | ### Decision **Output: Option B (Structured JSON)** New prompt strategy: - Input: Top 10 word terms (no definitions), top 5 quotes (truncated to 50 chars) - Output: Strict JSON schema with constrained fields - Config: `maxOutputTokens: 256` (down from 1024), `temperature: 0.5` (down from 0.7) JSON schema requested: ```json { "vibe_title": "2-3 word theme title", "vibe_emoji": "single emoji", "theme_tags": ["tag1", "tag2", "tag3"], "insights": ["insight1 (max 12 words)", "insight2"], "word_spotlight": "most interesting word", "quote_spotlight": "evocative quote (max 60 chars)" } ``` **Safeguard: Option C (Session count tracking)** Implementation: - `lastVibeSessionCount: Int` — stores session count at last generation - `lastVibeTimestamp: Long` — stores generation timestamp - `canGenerateVibe: StateFlow<Boolean>` — derived state for UI Button states: - `CAN_GENERATE` → Primary button, enabled - `GENERATING` → Disabled with spinner - `UP_TO_DATE` → Muted outline with checkmark ("Vibe is current") - `NO_DATA` → Disabled with lock icon ("Start reading to unlock") ### Consequences - ✅ ~60-70% token reduction per API call - ✅ Consistent, predictable UI layout - ✅ Prevents accidental API spam - ✅ Visual feedback on button state communicates system status - ⚠️ JSON parsing requires try-catch fallback (model may output malformed JSON) - ⚠️ Session count safeguard doesn't detect new captures within existing sessions (acceptable trade-off) - 📋 Future: Could add "Force Regenerate" option for power users ### Files Modified - `WeeklyVibe.kt` — New structured data model - `GeminiClient.kt` — JSON prompt + parsing + VibeResponse data class - `SessionViewModel.kt` — Safeguard state tracking + canGenerateVibe flow - `SessionComponents.kt` — Wrapped-style card UI + button states - `ReviewScreen.kt` — Wire canGenerateVibe to LibraryView --- ## TDR-008: Lock Screen Trigger Architecture (The Platform Constraint) **Date:** 2026-02-06 **Status:** Accepted (2026-02-07) **Related:** [[Bug Tracker#BUG-R008]], [[Feature Backlog#Quick Settings Tile]] ### Context The core product hypothesis requires capturing words/quotes from the lock screen with minimal friction. The current "Trojan Horse" MediaSession approach (TDR-002) has proven unreliable due to Android platform restrictions that are actively tightening. **The Recurring Problem:** 1. Android 12 removed `ACTION_CLOSE_SYSTEM_DIALOGS` (notification shade doesn't collapse) 2. Android 13+ introduced strict Background Activity Launch Restrictions 3. Google continues to crack down on "fake media players" that abuse MediaSession The MediaSession approach may work today, break tomorrow. This is not a sustainable foundation for the product's core interaction pattern. ### North Star Alignment The product mantra is "Keep your vibe, no interruptions." The key metric is **C2C (Clicks-to-Capture)**: | Trigger Method | C2C | Flow | |----------------|-----|------| | MediaSession (ideal) | 1 | Lock screen → Tap ▶ → Listening | | MediaSession (actual) | 1-∞ | Lock screen → Tap ▶ → Maybe works? | | Quick Settings Tile | 2 | Pull shade → Tap tile → Listening | | Bubble (unlocked) | 1 | Tap bubble → Listening | | Full-screen Intent | 1 | Lock screen → Tap notification → Full takeover | ### Options Analysis ``` LOCK SCREEN TRIGGER ARCHITECTURE ═══════════════════════════════ Current State The Fork in the Road ───────────── ──────────────────── ┌─────────────────┐ ┌─────────────────────────────┐ │ MediaSession │ │ OPTION A: Full-Screen │ │ "Trojan Horse" │──────┬────────────▶│ Intent (Alarm-Style) │ │ │ │ │ C2C: 1 | Lock: ✓ │ │ ⚠️ Unreliable │ │ │ Maintenance: Low │ │ ⚠️ Shade stays │ │ │ UX: Intrusive takeover │ │ ⚠️ Google │ │ └─────────────────────────────┘ │ locking down │ │ └─────────────────┘ │ ┌─────────────────────────────┐ │ ├────────────▶│ OPTION B: Quick Settings │ │ │ │ Tile │ ▼ │ │ C2C: 2 | Lock: ✓ │ ┌─────────────────┐ │ │ Maintenance: Very Low │ │ WE ARE HERE │ │ │ UX: Reliable, standard │ │ BUG-001 │ │ └─────────────────────────────┘ │ Critical │ │ └─────────────────┘ │ ┌─────────────────────────────┐ ├────────────▶│ OPTION C: Bubble API │ │ │ (Android 11+) │ │ │ C2C: 1 | Lock: ✗ │ │ │ Maintenance: Medium │ │ │ UX: Floating, persistent │ │ └─────────────────────────────┘ │ │ ┌─────────────────────────────┐ ├────────────▶│ OPTION D: Hybrid │ │ │ MediaSession + QS Fallback │ │ │ C2C: 1-2 | Lock: ✓ │ │ │ Maintenance: High │ │ │ UX: Best when works │ │ └─────────────────────────────┘ │ │ ┌─────────────────────────────┐ └────────────▶│ OPTION E: Accessibility │ │ Service │ │ C2C: 0 | Lock: ✓ │ │ Maintenance: Low │ │ UX: Scary permissions │ │ ⚠️ Play Store risk │ └─────────────────────────────┘ ``` ### Detailed Option Breakdown #### Option A: Full-Screen Intent (Alarm-Style) **How it works:** Use `Notification.Builder.setFullScreenIntent(pendingIntent, true)` with a high-priority notification channel. This mimics incoming call/alarm behavior. | Attribute | Assessment | |-----------|------------| | Lock Screen | ✓ Works reliably, system-guaranteed | | Shade Collapse | ✓ Auto-dismisses notification shade | | C2C | 1 (tap notification) | | Maintenance | Low (stable API, used by system apps) | | Scalability | High (no ongoing platform battles) | | UX Trade-off | Feels intrusive; full screen takeover | | Permissions | Requires `USE_FULL_SCREEN_INTENT` (already have) | **Verdict:** Most technically sound. UX concern is the takeover feel, but this is how alarm clocks and phone apps work. Users understand the pattern. #### Option B: Quick Settings Tile **How it works:** Register a `TileService` that appears in the notification shade quick settings. User taps tile to trigger capture. | Attribute | Assessment | |-----------|------------| | Lock Screen | ✓ Accessible from lock screen shade | | Shade Collapse | ✓ Tile tap launches activity, shade collapses | | C2C | 2 (pull shade + tap tile) | | Maintenance | Very Low (stable since Android 7.0) | | Scalability | High (no platform restrictions) | | UX Trade-off | Extra step; user must add tile manually | | Permissions | None special required | **Verdict:** Most reliable fallback. The extra tap is a real cost, but it *always works*. Could be positioned as "power user mode" or default for users who experience MediaSession issues. #### Option C: Bubble API **How it works:** Use `Notification.BubbleMetadata` to create a floating overlay that persists across apps. | Attribute | Assessment | |-----------|------------| | Lock Screen | ✗ Bubbles collapse when device locks | | Shade Collapse | N/A (not in shade) | | C2C | 1 (when visible) | | Maintenance | Medium (API evolving) | | Scalability | Medium (Android 11+ only) | | UX Trade-off | Persistent floating icon may feel intrusive | | Permissions | None special required | **Verdict:** Does NOT solve lock screen problem. Useful only for quick capture while actively using phone (between reading sessions). Not a primary solution. #### Option D: Hybrid (MediaSession + Quick Settings Fallback) **How it works:** Keep MediaSession as primary with Quick Settings Tile as documented fallback. Guide users to tile if they experience issues. | Attribute | Assessment | |-----------|------------| | Lock Screen | ⚠️ Partial (depends on device state) | | C2C | 1-2 (varies) | | Maintenance | High (two systems to maintain) | | Scalability | Low (MediaSession may break further) | | UX Trade-off | Inconsistent experience | **Verdict:** Pragmatic short-term but not sustainable. Technical debt accumulates as Google continues tightening restrictions. #### Option E: Accessibility Service **How it works:** Register as an accessibility service, gain ability to launch activities from any state without restrictions. | Attribute | Assessment | |-----------|------------| | Lock Screen | ✓ Works anywhere, no restrictions | | C2C | 0 (could intercept gestures) | | Maintenance | Low (stable API) | | Scalability | High (no platform battles) | | UX Trade-off | Scary permission dialog; user trust issue | | Play Store | ⚠️ Policy risk; must justify accessibility use | **Verdict:** Nuclear option. Ultimate power but real trust and policy risks. Only consider if all else fails and user research shows willingness to grant permission. ### Recommendation **Short-term (Next Sprint):** 1. Implement **Full-Screen Intent** with `highPriority=true` as primary fix for BUG-001 2. Test shade collapse behavior and lock screen launch reliability 3. If intrusive feel is problematic, add brief countdown before auto-launch **Medium-term (V1 Release):** 1. Implement **Quick Settings Tile** as first-class alternative 2. Add onboarding prompt: "Add Vibe Reader to Quick Settings for reliable capture" 3. Deprecate MediaSession play button in favor of tile **Long-term (Post-V1):** 1. Monitor Google's platform direction 2. If MediaSession continues degrading, remove it entirely 3. Consider Accessibility Service only if user research supports it ### Consequences - ✅ Unblocks critical lock screen reliability issue - ✅ Reduces maintenance burden (stop fighting Android) - ⚠️ C2C increases from 1 to 2 for Quick Settings path - ⚠️ Full-screen intent may feel jarring to some users - 📋 TODO: User test both approaches to measure perceived friction --- ## TDR-009: Asymmetric Verification (Words vs. Quotes) **Date:** 2026-02-09 **Status:** Accepted **Related:** [[Feature Backlog#Skip Confirmation]], [[Feature Backlog#Auto-Dismiss Definition]], [[User Feedback#Session-001]] ### Context Smart Capture mode required verification (VERIFYING state) for all inputs, whether 1-word definitions or multi-word quotes. User feedback (Session-001) flagged this as unnecessary friction for words. The question: should verification be skipped for all modes, or only for words? ### Decision **Asymmetric verification: skip for words, keep for quotes.** Words skip the VERIFYING screen and go directly to `defineWord()`. The definition screen auto-dismisses after 5 seconds with an opt-out "Keep Open" button. Quotes (3+ words) retain the full verification flow with TTS playback. ### Rationale The error consequences are asymmetric: - **Wrong word definition:** User sees it immediately and the cost is low (glance, recognize the error, retry next time). A wrong definition doesn't persist as useful data. - **Wrong quote transcription:** User may not notice until later. A misheard sentence saved to the Library looks like a real capture. TTS playback catches these errors in-flow. Since error visibility and cost differ, the friction investment should differ too. Spending 3 taps to verify "hegemony" wastes time. Spending 3 taps to verify a 15-word quote prevents bad data. ### Implementation - `handleSpeechResult()`: SMART_CAPTURE branch calls `defineWord(text)` directly for wordCount <= 2 - DEFINING composable: `LaunchedEffect` countdown (5s) with `autoDismissActive` state - `startListening()` resets `autoDismissActive = true` on each new capture cycle ### Consequences - ✅ Word definition C2C drops from 4 interactions to 0 taps - ✅ Quote verification unchanged (TTS playback still catches transcription errors) - ✅ "Keep Open" button preserves user control for edge cases - ⚠️ If speech recognition mishears a word, user sees the wrong definition briefly before auto-dismiss. Acceptable because the word is still saved and can be re-looked up from the Library. --- ## Pending Decisions ### TDR-005: Offline Support Strategy **Status:** Not yet decided **Context:** Dictionary API requires network. What happens when user is offline? Options under consideration: - A. Fail gracefully with "No connection" message - B. Queue words for later definition (store as "pending") - C. Bundle offline dictionary (large app size) - D. Cache previous lookups for repeat words ### TDR-006: iOS Port Approach **Status:** Not yet decided **Context:** When/if we port to iOS, the lock screen strategy will differ significantly. Options under consideration: - A. Live Activities (iOS 16+) - B. Widget with Siri Shortcuts - C. Apple Watch companion app - D. Wait for iOS to offer better lock screen APIs