Text-to-Speech
TTS engines, voice synthesis, audio generation
9 episodes
#3189: Drawing the Melody: SSML's Hidden Power
How SSML gives developers narrative control over AI voices — and why ElevenLabs became its center of gravity.
#2982: Why Your TTS Model Nails "Shabbat" but Not "Keren Hishtalmut
Why multilingual TTS models handle loanwords but fail at niche vocabulary — and what you can do about it.
#2914: Can AI Read the Room? TTS Prosody Explained
Can TTS models truly infer emotion from text, or just mimic patterns? We break down the science of prosody.
#2618: Text Normalization's Hidden Complexity
How to handle acronyms in text-to-speech pipelines using BERT models, lexicons, and layered preprocessing.
#2443: How Podcast RSS Feeds Can Speak Every Language
One RSS feed, a transcript tag, and TTS voice cloning — the emerging standard for letting any podcast speak any language.
#1809: The TTS Developer's Dilemma: Size vs. Speed
Stop guessing. We break down the critical trade-offs between model size, latency, and sample rate for production-ready voice apps.
#1808: The Architecture That Made AI Voices Run on a Raspberry Pi
How a model the size of a tweet outperforms billion-dollar giants in the race for perfect AI speech.
#1778: Audio Is the New "Read Later" Graveyard
Why listening to AI conversations beats reading dense PDFs, and how serverless GPUs make it cheap.
#1724: When AI Dubbing Swaps Your Gender
How does YouTube translate a video with one click? We explore the tech behind auto-dubbing, from sandwich models to voice cloning.