# Technical Decisions Record
> Vibe Reader — Architecture Decision Records (ADRs)
> Documents the "why" behind significant technical choices
> Last Updated: 2026-02-09
---
## Format
Each decision follows this structure:
- **Context:** The situation and constraints
- **Options Considered:** Alternatives evaluated
- **Decision:** What we chose and why
- **Consequences:** Trade-offs and implications
- **Status:** `Accepted` | `Superseded` | `Proposed`
---
## TDR-001: Relational Database Schema
**Date:** 2026-01-10
**Status:** Accepted
**Related:** [[Feature Backlog#Completed]]
### Context
The initial MVP used a flat logging structure where Words and Quotes were stored independently. As the product evolved, we needed to:
1. Associate captures with specific reading sessions (time-based grouping)
2. Associate captures with books (entity-based grouping)
3. Support queries like "show all captures from The Great Gatsby" regardless of session
### Options Considered
| Option | Description | Pros | Cons |
|--------|-------------|------|------|
| **A. Flat Tables** | Words and Quotes as independent tables with session_id | Simple | Can't query by book without joins; no book-level aggregation |
| **B. Nested JSON** | Store captures as JSON blob per session | Flexible schema | Poor query performance; no relational integrity |
| **C. Relational Hierarchy** | Book → Session → Captures with foreign keys | Normalized; flexible queries; data integrity | More complex schema; requires migrations |
### Decision
**Option C: Relational Hierarchy**
Schema structure:
```
Book (book_id, title, created_at)
└── Session (session_id, book_id, display_name, start_time, end_time, status)
├── Word (word_id, book_id, session_id, term, definition, timestamp)
└── Quote (quote_id, book_id, session_id, content, timestamp)
```
Key insight: Captures are **dual-tagged** to both Session (when) and Book (what). This enables:
- "Show captures from this session" (time-scoped)
- "Show all captures from this book" (entity-scoped)
### Consequences
- ✅ Supports Book Layer view in Archive
- ✅ Enables future features like "book-level stats"
- ✅ Foreign key constraints prevent orphaned data
- ⚠️ Requires `CASCADE` delete rules (deleting a Session deletes its captures)
- ⚠️ Schema changes require migration strategy (using `fallbackToDestructiveMigration` for now)
---
## TDR-002: Lock Screen Notification Strategy
**Date:** 2026-01-18
**Status:** Superseded by TDR-008 (2026-02-07)
**Related:** [[Bug Tracker#BUG-R008]], [[Feature_Backlog#Smart Capture]]
### Context
The core product hypothesis requires capturing words/quotes from the lock screen with minimal friction. Android provides several notification styles with different lock screen behaviors.
### Options Considered
| Option | Description | Pros | Cons |
|--------|-------------|------|------|
| **A. Standard Notification** | Basic notification with action buttons | Reliable; always renders | Small buttons; not prominent on lock screen |
| **B. MediaStyle Notification** | Mimics music player with large controls | Prominent lock screen placement; single-tap capture | Android 13+ aggressively hides "fake" media players; requires metadata |
| **C. Full-Screen Intent** | Like incoming calls; takes over screen | Guaranteed visibility | Too intrusive; user must dismiss |
| **D. Quick Settings Tile** | Custom tile in notification shade | Always accessible | Requires swipe down; 2 taps minimum |
| **E. Accessibility Service** | Intercept gestures/buttons | Zero-tap possible | Play Store policy concerns; user trust issues |
### Decision
**Option B (MediaStyle) as primary, with Option A as fallback**
Implementation approach ("Trojan Horse"):
1. Create `MediaSessionCompat` to register as media app
2. Set metadata (book title as "artist", "Tap ▶ to Capture" as title, app icon as album art)
3. Set `PlaybackState.STATE_PAUSED` so play button (▶) shows
4. Play button action → launches `SpeechCaptureActivity`
5. Fallback: If activity launch fails, show high-priority notification with `fullScreenIntent`
Key learnings:
- Android 13+ requires sufficient metadata or it hides the player
- `FOREGROUND_SERVICE_MEDIA_PLAYBACK` permission required
- Activity launch from background requires `USE_FULL_SCREEN_INTENT` permission
- PendingIntent must target Activity directly (not Service → Activity trampoline)
### Consequences
- ✅ Prominent lock screen placement when it works
- ✅ Single-tap to capture (lowest C2C achievable)
- ⚠️ Inconsistent behavior on some devices/Android versions (see [[Bug Tracker#BUG-001]])
- ⚠️ Feels like a "hack" — may break with future Android updates
- 📋 TODO: Consider Option D (Quick Settings Tile) as user-selectable alternative
---
## TDR-003: Dictionary API Strategy
**Date:** 2026-01-18
**Status:** Accepted
**Related:** [[Feature_Backlog#Multi-Word Concept Support]]
### Context
Users want to define:
1. **Single words** (e.g., "hegemony") — standard dictionary lookup
2. **Multi-word phrases** (e.g., "carpe diem", "ad hoc") — may exist in dictionary
3. **Concepts/entities** (e.g., "United States", "Pythagorean theorem") — not in dictionary
Current implementation uses Free Dictionary API which handles (1) and some of (2), but fails on (3).
### Options Considered
| Option | Description | Pros | Cons |
|--------|-------------|------|------|
| **A. Dictionary API only** | Free Dictionary API for all lookups | Simple; fast | Fails on concepts and many phrases |
| **B. Dictionary → Wikipedia fallback** | Try dictionary first; if 404, try Wikipedia API | Covers most cases | Two API calls for concepts; latency |
| **C. Wikipedia only** | Use Wikipedia for everything | Comprehensive | Overkill for simple words; slower |
| **D. AI-powered definition** | Send to LLM for definition | Handles anything | Latency; cost; requires API key |
| **E. User choice** | Show "Define" vs "Look up" buttons | User controls intent | Adds friction; more UI complexity |
### Decision
**Option B: Dictionary → Wikipedia fallback**
### Implementation (2026-01-18)
New files created:
- `WikipediaApiService.kt` — Retrofit interface for Wikipedia REST API
- `WikipediaClient.kt` — Singleton client for `https://en.wikipedia.org/api/rest_v1/`
Modified `SpeechCaptureActivity.kt`:
- Added "Define This Instead" TextButton (only visible for detected quotes)
- Updated `defineWord()` with fallback chain:
1. Try Free Dictionary API
2. If fails → Try Wikipedia API `/page/summary/{title}`
3. If 404 → Show "Definition not found"
- Wikipedia results prefixed with `(Wikipedia)` for source clarity
### Consequences
- ✅ Covers words, phrases, and concepts
- ✅ No additional user friction for common case (single words)
- ✅ User can explicitly choose "Define This Instead" for phrases
- ⚠️ Wikipedia summaries may be longer than dictionary definitions (truncated at 200 chars)
- ⚠️ Need to handle Wikipedia disambiguation pages gracefully
- 📋 Future: Consider caching frequent lookups locally
---
## TDR-004: Dependency Injection Pattern
**Date:** 2026-01-10
**Status:** Accepted
**Related:** Product Diary 2025-11-12
### Context
Initial implementation used Hilt for dependency injection. Build failures occurred due to KSP/Kotlin version mismatches, consuming significant development time.
### Options Considered
| Option | Description | Pros | Cons |
|--------|-------------|------|------|
| **A. Hilt** | Google's recommended DI framework | Powerful; scalable; Android-aware | Complex setup; version sensitivity; slower builds |
| **B. Koin** | Lightweight Kotlin DI | Simpler than Hilt; less boilerplate | Still a dependency; runtime vs compile-time |
| **C. Manual DI / Singleton** | Hand-rolled dependency management | Zero dependencies; full control; fast builds | More boilerplate; must manage lifecycle manually |
### Decision
**Option C: Manual DI / Singleton Pattern**
Implementation:
- `AppDatabase.getDatabase(context)` — Singleton Room instance
- `SessionViewModelFactory` — Manual factory for ViewModel injection
- Direct instantiation in `MainActivity` and `ReadingSessionService`
### Consequences
- ✅ Build stability restored immediately
- ✅ Simpler codebase; easier to understand
- ✅ Faster build times
- ⚠️ Must manually ensure singleton lifecycle
- ⚠️ May need to revisit if app grows significantly
- 📝 Lesson: For MVPs, prefer simplicity over "best practices"
---
## TDR-007: Weekly Vibe Redesign — Structured Output & Safeguards
**Date:** 2026-02-05
**Status:** Accepted
**Related:** [[Feature Backlog#Weekly Vibe Redesign]]
### Context
The initial Weekly Vibe implementation (TDR not documented) used prose-based output from Gemini:
- **Problem 1:** Verbose 3-paragraph prose was hard to scan; didn't feel like a "vibe"
- **Problem 2:** High token usage (~1024 output tokens, ~200+ input tokens for definitions)
- **Problem 3:** No safeguard against redundant API calls (user could spam "Generate" with no new data)
Goal: Create a Spotify Wrapped-style experience that's visually engaging, token-efficient, and prevents wasteful API calls.
### Options Considered
**Output Format:**
| Option | Description | Pros | Cons |
|--------|-------------|------|------|
| **A. Prose** | Free-form paragraphs (current) | Natural language; flexible | Hard to style; inconsistent length |
| **B. Structured JSON** | Request specific fields in JSON format | Predictable UI; constrained output | Requires parsing; model may hallucinate structure |
| **C. Markdown with headers** | Semi-structured with `##` sections | Easy to parse | Still variable length; harder to style |
**Safeguard Strategy:**
| Option | Description | Pros | Cons |
|--------|-------------|------|------|
| **A. Cooldown timer** | Disable button for X minutes after generation | Simple | Arbitrary; blocks legitimate re-generation |
| **B. Data fingerprint** | Hash captures; only regenerate if hash changes | Precise | Complex; hash computation overhead |
| **C. Session count tracking** | Track session count at generation time; enable if new sessions | Simple; meaningful | Doesn't catch new captures within same session |
### Decision
**Output: Option B (Structured JSON)**
New prompt strategy:
- Input: Top 10 word terms (no definitions), top 5 quotes (truncated to 50 chars)
- Output: Strict JSON schema with constrained fields
- Config: `maxOutputTokens: 256` (down from 1024), `temperature: 0.5` (down from 0.7)
JSON schema requested:
```json
{
"vibe_title": "2-3 word theme title",
"vibe_emoji": "single emoji",
"theme_tags": ["tag1", "tag2", "tag3"],
"insights": ["insight1 (max 12 words)", "insight2"],
"word_spotlight": "most interesting word",
"quote_spotlight": "evocative quote (max 60 chars)"
}
```
**Safeguard: Option C (Session count tracking)**
Implementation:
- `lastVibeSessionCount: Int` — stores session count at last generation
- `lastVibeTimestamp: Long` — stores generation timestamp
- `canGenerateVibe: StateFlow<Boolean>` — derived state for UI
Button states:
- `CAN_GENERATE` → Primary button, enabled
- `GENERATING` → Disabled with spinner
- `UP_TO_DATE` → Muted outline with checkmark ("Vibe is current")
- `NO_DATA` → Disabled with lock icon ("Start reading to unlock")
### Consequences
- ✅ ~60-70% token reduction per API call
- ✅ Consistent, predictable UI layout
- ✅ Prevents accidental API spam
- ✅ Visual feedback on button state communicates system status
- ⚠️ JSON parsing requires try-catch fallback (model may output malformed JSON)
- ⚠️ Session count safeguard doesn't detect new captures within existing sessions (acceptable trade-off)
- 📋 Future: Could add "Force Regenerate" option for power users
### Files Modified
- `WeeklyVibe.kt` — New structured data model
- `GeminiClient.kt` — JSON prompt + parsing + VibeResponse data class
- `SessionViewModel.kt` — Safeguard state tracking + canGenerateVibe flow
- `SessionComponents.kt` — Wrapped-style card UI + button states
- `ReviewScreen.kt` — Wire canGenerateVibe to LibraryView
---
## TDR-008: Lock Screen Trigger Architecture (The Platform Constraint)
**Date:** 2026-02-06
**Status:** Accepted (2026-02-07)
**Related:** [[Bug Tracker#BUG-R008]], [[Feature Backlog#Quick Settings Tile]]
### Context
The core product hypothesis requires capturing words/quotes from the lock screen with minimal friction. The current "Trojan Horse" MediaSession approach (TDR-002) has proven unreliable due to Android platform restrictions that are actively tightening.
**The Recurring Problem:**
1. Android 12 removed `ACTION_CLOSE_SYSTEM_DIALOGS` (notification shade doesn't collapse)
2. Android 13+ introduced strict Background Activity Launch Restrictions
3. Google continues to crack down on "fake media players" that abuse MediaSession
The MediaSession approach may work today, break tomorrow. This is not a sustainable foundation for the product's core interaction pattern.
### North Star Alignment
The product mantra is "Keep your vibe, no interruptions." The key metric is **C2C (Clicks-to-Capture)**:
| Trigger Method | C2C | Flow |
|----------------|-----|------|
| MediaSession (ideal) | 1 | Lock screen → Tap ▶ → Listening |
| MediaSession (actual) | 1-∞ | Lock screen → Tap ▶ → Maybe works? |
| Quick Settings Tile | 2 | Pull shade → Tap tile → Listening |
| Bubble (unlocked) | 1 | Tap bubble → Listening |
| Full-screen Intent | 1 | Lock screen → Tap notification → Full takeover |
### Options Analysis
```
LOCK SCREEN TRIGGER ARCHITECTURE
═══════════════════════════════
Current State The Fork in the Road
───────────── ────────────────────
┌─────────────────┐ ┌─────────────────────────────┐
│ MediaSession │ │ OPTION A: Full-Screen │
│ "Trojan Horse" │──────┬────────────▶│ Intent (Alarm-Style) │
│ │ │ │ C2C: 1 | Lock: ✓ │
│ ⚠️ Unreliable │ │ │ Maintenance: Low │
│ ⚠️ Shade stays │ │ │ UX: Intrusive takeover │
│ ⚠️ Google │ │ └─────────────────────────────┘
│ locking down │ │
└─────────────────┘ │ ┌─────────────────────────────┐
│ ├────────────▶│ OPTION B: Quick Settings │
│ │ │ Tile │
▼ │ │ C2C: 2 | Lock: ✓ │
┌─────────────────┐ │ │ Maintenance: Very Low │
│ WE ARE HERE │ │ │ UX: Reliable, standard │
│ BUG-001 │ │ └─────────────────────────────┘
│ Critical │ │
└─────────────────┘ │ ┌─────────────────────────────┐
├────────────▶│ OPTION C: Bubble API │
│ │ (Android 11+) │
│ │ C2C: 1 | Lock: ✗ │
│ │ Maintenance: Medium │
│ │ UX: Floating, persistent │
│ └─────────────────────────────┘
│
│ ┌─────────────────────────────┐
├────────────▶│ OPTION D: Hybrid │
│ │ MediaSession + QS Fallback │
│ │ C2C: 1-2 | Lock: ✓ │
│ │ Maintenance: High │
│ │ UX: Best when works │
│ └─────────────────────────────┘
│
│ ┌─────────────────────────────┐
└────────────▶│ OPTION E: Accessibility │
│ Service │
│ C2C: 0 | Lock: ✓ │
│ Maintenance: Low │
│ UX: Scary permissions │
│ ⚠️ Play Store risk │
└─────────────────────────────┘
```
### Detailed Option Breakdown
#### Option A: Full-Screen Intent (Alarm-Style)
**How it works:** Use `Notification.Builder.setFullScreenIntent(pendingIntent, true)` with a high-priority notification channel. This mimics incoming call/alarm behavior.
| Attribute | Assessment |
|-----------|------------|
| Lock Screen | ✓ Works reliably, system-guaranteed |
| Shade Collapse | ✓ Auto-dismisses notification shade |
| C2C | 1 (tap notification) |
| Maintenance | Low (stable API, used by system apps) |
| Scalability | High (no ongoing platform battles) |
| UX Trade-off | Feels intrusive; full screen takeover |
| Permissions | Requires `USE_FULL_SCREEN_INTENT` (already have) |
**Verdict:** Most technically sound. UX concern is the takeover feel, but this is how alarm clocks and phone apps work. Users understand the pattern.
#### Option B: Quick Settings Tile
**How it works:** Register a `TileService` that appears in the notification shade quick settings. User taps tile to trigger capture.
| Attribute | Assessment |
|-----------|------------|
| Lock Screen | ✓ Accessible from lock screen shade |
| Shade Collapse | ✓ Tile tap launches activity, shade collapses |
| C2C | 2 (pull shade + tap tile) |
| Maintenance | Very Low (stable since Android 7.0) |
| Scalability | High (no platform restrictions) |
| UX Trade-off | Extra step; user must add tile manually |
| Permissions | None special required |
**Verdict:** Most reliable fallback. The extra tap is a real cost, but it *always works*. Could be positioned as "power user mode" or default for users who experience MediaSession issues.
#### Option C: Bubble API
**How it works:** Use `Notification.BubbleMetadata` to create a floating overlay that persists across apps.
| Attribute | Assessment |
|-----------|------------|
| Lock Screen | ✗ Bubbles collapse when device locks |
| Shade Collapse | N/A (not in shade) |
| C2C | 1 (when visible) |
| Maintenance | Medium (API evolving) |
| Scalability | Medium (Android 11+ only) |
| UX Trade-off | Persistent floating icon may feel intrusive |
| Permissions | None special required |
**Verdict:** Does NOT solve lock screen problem. Useful only for quick capture while actively using phone (between reading sessions). Not a primary solution.
#### Option D: Hybrid (MediaSession + Quick Settings Fallback)
**How it works:** Keep MediaSession as primary with Quick Settings Tile as documented fallback. Guide users to tile if they experience issues.
| Attribute | Assessment |
|-----------|------------|
| Lock Screen | ⚠️ Partial (depends on device state) |
| C2C | 1-2 (varies) |
| Maintenance | High (two systems to maintain) |
| Scalability | Low (MediaSession may break further) |
| UX Trade-off | Inconsistent experience |
**Verdict:** Pragmatic short-term but not sustainable. Technical debt accumulates as Google continues tightening restrictions.
#### Option E: Accessibility Service
**How it works:** Register as an accessibility service, gain ability to launch activities from any state without restrictions.
| Attribute | Assessment |
|-----------|------------|
| Lock Screen | ✓ Works anywhere, no restrictions |
| C2C | 0 (could intercept gestures) |
| Maintenance | Low (stable API) |
| Scalability | High (no platform battles) |
| UX Trade-off | Scary permission dialog; user trust issue |
| Play Store | ⚠️ Policy risk; must justify accessibility use |
**Verdict:** Nuclear option. Ultimate power but real trust and policy risks. Only consider if all else fails and user research shows willingness to grant permission.
### Recommendation
**Short-term (Next Sprint):**
1. Implement **Full-Screen Intent** with `highPriority=true` as primary fix for BUG-001
2. Test shade collapse behavior and lock screen launch reliability
3. If intrusive feel is problematic, add brief countdown before auto-launch
**Medium-term (V1 Release):**
1. Implement **Quick Settings Tile** as first-class alternative
2. Add onboarding prompt: "Add Vibe Reader to Quick Settings for reliable capture"
3. Deprecate MediaSession play button in favor of tile
**Long-term (Post-V1):**
1. Monitor Google's platform direction
2. If MediaSession continues degrading, remove it entirely
3. Consider Accessibility Service only if user research supports it
### Consequences
- ✅ Unblocks critical lock screen reliability issue
- ✅ Reduces maintenance burden (stop fighting Android)
- ⚠️ C2C increases from 1 to 2 for Quick Settings path
- ⚠️ Full-screen intent may feel jarring to some users
- 📋 TODO: User test both approaches to measure perceived friction
---
## TDR-009: Asymmetric Verification (Words vs. Quotes)
**Date:** 2026-02-09
**Status:** Accepted
**Related:** [[Feature Backlog#Skip Confirmation]], [[Feature Backlog#Auto-Dismiss Definition]], [[User Feedback#Session-001]]
### Context
Smart Capture mode required verification (VERIFYING state) for all inputs, whether 1-word definitions or multi-word quotes. User feedback (Session-001) flagged this as unnecessary friction for words. The question: should verification be skipped for all modes, or only for words?
### Decision
**Asymmetric verification: skip for words, keep for quotes.**
Words skip the VERIFYING screen and go directly to `defineWord()`. The definition screen auto-dismisses after 5 seconds with an opt-out "Keep Open" button. Quotes (3+ words) retain the full verification flow with TTS playback.
### Rationale
The error consequences are asymmetric:
- **Wrong word definition:** User sees it immediately and the cost is low (glance, recognize the error, retry next time). A wrong definition doesn't persist as useful data.
- **Wrong quote transcription:** User may not notice until later. A misheard sentence saved to the Library looks like a real capture. TTS playback catches these errors in-flow.
Since error visibility and cost differ, the friction investment should differ too. Spending 3 taps to verify "hegemony" wastes time. Spending 3 taps to verify a 15-word quote prevents bad data.
### Implementation
- `handleSpeechResult()`: SMART_CAPTURE branch calls `defineWord(text)` directly for wordCount <= 2
- DEFINING composable: `LaunchedEffect` countdown (5s) with `autoDismissActive` state
- `startListening()` resets `autoDismissActive = true` on each new capture cycle
### Consequences
- ✅ Word definition C2C drops from 4 interactions to 0 taps
- ✅ Quote verification unchanged (TTS playback still catches transcription errors)
- ✅ "Keep Open" button preserves user control for edge cases
- ⚠️ If speech recognition mishears a word, user sees the wrong definition briefly before auto-dismiss. Acceptable because the word is still saved and can be re-looked up from the Library.
---
## Pending Decisions
### TDR-005: Offline Support Strategy
**Status:** Not yet decided
**Context:** Dictionary API requires network. What happens when user is offline?
Options under consideration:
- A. Fail gracefully with "No connection" message
- B. Queue words for later definition (store as "pending")
- C. Bundle offline dictionary (large app size)
- D. Cache previous lookups for repeat words
### TDR-006: iOS Port Approach
**Status:** Not yet decided
**Context:** When/if we port to iOS, the lock screen strategy will differ significantly.
Options under consideration:
- A. Live Activities (iOS 16+)
- B. Widget with Siri Shortcuts
- C. Apple Watch companion app
- D. Wait for iOS to offer better lock screen APIs