#text-to-speech
8 episodes
#2192: How We Built a Podcast Pipeline
Hilbert reveals the complete technical architecture behind 2,000+ episodes—from voice memos to GPU-powered TTS, with Claude models, LangGraph workf...
#2027: Text-In, Text-Out: The Missing Photoshop for Words
Why is editing text with AI so clunky? We explore the "TITO" paradigm—using small, local models for fast, private text transformation.
#1810: Why Your TTS Sounds Great in English, Terrible Everywhere Else
English AI voices are polished, but global languages hit a wall. Here's why text-to-speech breaks down for Hebrew, Hindi, and beyond.
#1809: The TTS Developer's Dilemma: Size vs. Speed
Stop guessing. We break down the critical trade-offs between model size, latency, and sample rate for production-ready voice apps.
#1808: The 82M Parameter Voice That Beat Billion-Dollar AI
How a model the size of a tweet outperforms billion-dollar giants in the race for perfect AI speech.
#1740: Chatterbox TTS: Open Source vs. ElevenLabs
We dissect Resemble AI's Chatterbox to see how its open-source TTS compares to commercial giants like ElevenLabs.
#1715: Why Voice Agents Need Frameworks (Not Just APIs)
Raw APIs handle models, but who manages the audio plumbing? We break down Vapi, LiveKit, and Pipecat.
#136: The Ghost in the Machine: Why AI Voices Hallucinate
Why does your AI suddenly start shouting or whispering like Darth Vader? Herman and Corn dive into the glitchy world of TTS hallucinations.