The Pipeline
Voice-to-Podcast Automation with AI
Last Updated: December 2025
Overview
My Weird Prompts transforms voice-recorded prompts into full podcast episodes with AI-generated dialogue, cover art, and automatic publishing. The pipeline uses Inworld AI TTS to create natural conversations between two AI hosts: Corn the Sloth and Herman the Donkey.
How It Works
Voice Input
Queue-Based Processing - Drop audio files into the processing queue
- Record your question or prompt as a voice message
- Place audio files in the
prompts/to-process/directory - Run the episode generator to process queued prompts
- Supports MP3, WAV, and other common audio formats
Technology: Local filesystem queue with Python pipeline
Processing
- Transcription: Google Gemini 3 Flash transcribes the voice prompt
- Metadata: Gemini 3 Flash generates episode metadata (title, description, tags)
- Audio Processing: FFmpeg normalizes and prepares prompt audio
- Format Conversion: Ensures compatible audio format for concatenation
Technology: Gemini 3 Flash for transcription and metadata, FFmpeg for audio processing
Generation
- Research: Tavily provides research augmentation for episode content
- Script Generation: Nano Banana creates dialogue script between Corn and Herman
- Cover Art: Nano Banana Pro generates unique episode artwork (3 variants)
- TTS Dialogue: Inworld AI TTS generates voice audio with character personalities
Technology: Tavily for research, Nano Banana for scripting, Nano Banana Pro for images, Inworld AI for TTS
Assembly
- Combines intro jingle, disclaimer, user prompt, AI dialogue, and outro
- Loudness normalization to -16 LUFS (podcast standard)
- MP3 encoding at 192kbps, 44.1kHz
Technology: FFmpeg for audio assembly and normalization
Publishing
- CDN Upload: Audio and images uploaded to Cloudinary
- Archive: Full episode backed up to Wasabi S3-compatible storage
- Database: Metadata inserted into Neon PostgreSQL
- Blog Post: Markdown file generated for Astro static site
Technology: Cloudinary CDN, Wasabi object storage, Neon PostgreSQL
Deployment
- Git push triggers automatic Vercel deployment
- New episode goes live on website within minutes
- RSS feed automatically updated for podcast apps
Technology: Vercel auto-deploy, Astro static site generator
Voice Capture
The pipeline processes voice prompts from a local queue. Record your question using any audio recording app and drop it into the processing queue.
Use any voice recorder app to capture your prompt
Place the audio file in the prompts/to-process/ directory
Execute the episode generation script to process the queue
Episode is automatically published and deployed to the website
Technology Stack
AI Services
- Google Gemini 3 Flash (Metadata)
- Tavily (Research Augmentation)
- Nano Banana (Episode Generation)
- Nano Banana Pro (Cover Art)
- Inworld AI TTS
- Flux Schnell (via fal.ai)
- Replicate (backup)
Input & Integration
- Local Queue Processing
- Python Pipeline
- FFmpeg (Audio)
Storage
- Cloudinary (CDN)
- Wasabi S3 (Archive)
- Neon PostgreSQL
- GitHub (Source)
Deployment
- Astro (Static Site)
- Vercel (Hosting)
- GitHub Actions
Episode Output
For each episode, the pipeline creates:
Cost Estimate
| Service | Cost per Episode | Notes |
|---|---|---|
| Inworld AI TTS | ~$0.30-0.40 | 15-minute episode |
| Image Generation | ~$0.01-0.05 | 3 cover variants |
| Transcription | Minimal | Free tier |
| Storage | ~$0.01 | Wasabi + Cloudinary |
| Total per Episode | ~$0.35-0.50 | Approximate |
Key Features
Cross-Platform Input
Send voice messages via Telegram from any device
Voice Synthesis
Inworld AI TTS creates natural-sounding AI hosts with distinct personalities
AI Art Generation
Unique cover artwork for every episode using Flux AI
Fully Automated
Voice prompt to published episode in minutes
Production Quality
Professional audio normalization and podcast standards
Status Notifications
Get notified via Telegram when your episode is ready
Open Source
The entire pipeline is open source and available on GitHub. View the code, contribute improvements, or adapt it for your own podcast automation projects.
Previous Versions
Documentation for previous pipeline iterations is preserved for reference: