We needed AI to transform voice transcriptions into narrative stories for a memory preservation app. The challenge: keep costs low while maintaining high-quality storytelling that preserves the user's authentic voice.

Google Gemini 2.5 Flash won over AWS Bedrock Claude for three critical reasons: 10x cheaper per story ($0.0003 vs $0.003), permanent free tier covering 1,500 stories per day, and excellent narrative generation quality. AWS's free tier expires after 2 months; Google's is permanent.

Implementation

The Vertex AI integration is straightforward. We authenticate with a service account and send the transcription with a carefully crafted prompt:

final requestBody = {
  'contents': [{
    'role': 'user',
    'parts': [{
      'text': '''You are a compassionate storyteller helping someone preserve their memories.

The user was asked: "$prompt"
They responded: "$transcription"

Transform their response into a beautiful, narrative story in first person. Keep their voice and emotions, but enhance it with vivid details and reflective insights. Make it 2-3 paragraphs.'''
    }]
  }],
  'generationConfig': {
    'temperature': 0.7,
    'maxOutputTokens': 2048,
  }
};

final url = Uri.parse(
  'https://us-central1-aiplatform.googleapis.com/v1/projects/$projectId/'
  'locations/us-central1/publishers/google/models/gemini-2.5-flash:generateContent'
);

final response = await client.post(url, body: jsonEncode(requestBody));

Key Configuration

Temperature 0.7: Balances creativity with consistency. High enough for varied, engaging narratives but not so high that stories become unpredictable.

Max Tokens 2048: Allows 3-4 paragraph stories without truncation. Typical stories use 400-600 tokens, but the buffer prevents "MAX_TOKENS" errors.

Prompt Engineering: The prompt defines role ("compassionate storyteller"), provides full context (original question + transcription), and sets clear formatting requirements (first person, 2-3 paragraphs, maintain voice).

Cost Analysis

At 300 input tokens and 500 output tokens per story:

  • Input: $0.000023 (300 × $0.075/1M)
  • Output: $0.00015 (500 × $0.30/1M)
  • Total: $0.0003 per story

Monthly costs scale predictably:

  • 1,000 stories: $0.30
  • 10,000 stories: $3.00
  • 100,000 stories: $30.00

Compare to AWS Bedrock Claude 3.5 Sonnet at $0.003 per story—10x more expensive for marginally better quality that users wouldn't notice.

Results

The permanent free tier covers 45,000 stories per month, meaning most users never hit paid tier. Latency averages 2-3 seconds, acceptable for an async operation with a loading indicator. The low cost enables a "Re-Storify" feature where users can regenerate stories for different perspectives—three regenerations still cost less than one AWS story.

Quality is excellent: stories maintain the user's authentic voice while adding vivid sensory details and emotional depth. Users consistently report that the AI-enhanced stories feel like their memories, just better articulated.