Archive: This documents the V4 pipeline implementation. See Pipeline for the current documentation.

Pipeline V4 Documentation

Voice-to-Podcast Automation with AI - Featuring Fish Audio TTS

Last Updated: December 2025

Overview

The My Weird Prompts pipeline transforms voice-recorded prompts into full podcast episodes with AI-generated dialogue, cover art, and automatic publishing. The pipeline uses Fish Audio TTS with pre-trained voice models to create natural conversations between two AI hosts: Corn the Sloth and Herman the Donkey.

Pipeline Phases

1

Processing

  • Voice Upload: User's voice prompt uploaded to processing queue
  • Transcription: Google Gemini 2.5 transcribes and extracts metadata
  • Audio Processing: FFmpeg normalizes and prepares prompt audio

Technology: Gemini 2.5 Flash for transcription, FFmpeg for audio processing

2

Generation

  • Script Generation: AI creates dialogue script between Corn and Herman
  • Cover Art: Flux AI generates unique episode artwork (3 variants)
  • TTS Dialogue: Fish Audio TTS generates voice audio with character personalities

Technology: Gemini for scripting, Flux Schnell for images, Fish Audio for TTS

3

Assembly

  • Combines intro jingle, disclaimer, user prompt, AI dialogue, and outro
  • Loudness normalization to -16 LUFS (podcast standard)
  • MP3 encoding at 192kbps, 44.1kHz

Technology: FFmpeg for audio assembly and normalization

4

Publishing

  • CDN Upload: Audio and images uploaded to Cloudinary
  • Archive: Full episode backed up to Wasabi S3-compatible storage
  • Database: Metadata inserted into Neon PostgreSQL
  • Blog Post: Markdown file generated for Astro static site

Technology: Cloudinary CDN, Wasabi object storage, Neon PostgreSQL

5

Deployment

  • Git push triggers automatic Vercel deployment
  • New episode goes live on website within minutes
  • RSS feed automatically updated for podcast apps

Technology: Vercel auto-deploy, Astro static site generator

Technology Stack

AI Services

  • Google Gemini 2.5 Flash
  • Fish Audio TTS
  • Flux Schnell (via fal.ai)
  • Replicate (backup)

Storage

  • Cloudinary (CDN)
  • Wasabi S3 (Archive)
  • Neon PostgreSQL
  • GitHub (Source)

Deployment

  • Astro (Static Site)
  • Vercel (Hosting)
  • FFmpeg (Audio)
  • Python Pipeline

Episode Output

For each episode, the pipeline creates:

Final Audio: MP3 file with full podcast episode
Cover Art: 3 AI-generated cover image variants
Metadata: Title, description, tags, timestamps
Transcript: Full dialogue script
Blog Post: Markdown file for website

Cost Estimate

Service Cost per Episode Notes
Fish Audio TTS ~$0.30-0.40 15-minute episode
Image Generation ~$0.01-0.05 3 cover variants
Transcription Minimal Free tier
Storage ~$0.01 Wasabi + Cloudinary
Total per Episode ~$0.35-0.50 Approximate

Key Features

🎙️

Voice Cloning

Fish Audio TTS creates natural-sounding AI hosts with distinct personalities

🎨

AI Art Generation

Unique cover artwork for every episode using Flux AI

Fully Automated

Voice prompt to published episode in minutes

📊

Production Quality

Professional audio normalization and podcast standards

Open Source

The entire pipeline is open source and available on GitHub. View the code, contribute improvements, or adapt it for your own podcast automation projects.