← All Tags

#text-to-speech

15 episodes

#2982: Why Your TTS Model Nails "Shabbat" but Not "Keren Hishtalmut

Why multilingual TTS models handle loanwords but fail at niche vocabulary — and what you can do about it.

text-to-speechtokenizationfine-tuning

#2914: Can AI Read the Room? TTS Prosody Explained

Can TTS models truly infer emotion from text, or just mimic patterns? We break down the science of prosody.

text-to-speechspeech-to-speechaudio-processing

#2618: Text Normalization's Hidden Complexity

How to handle acronyms in text-to-speech pipelines using BERT models, lexicons, and layered preprocessing.

text-to-speechspeech-recognitionaudio-processing

#2591: Decoupling Script from Voice

How dynamic voice replacement could let listeners choose who narrates each host's lines.

voice-cloningtext-to-speechaudio-processing

#2534: Can AI Generate Diagrams Without Typo Disasters?

Why AI diagram tools still mangle text labels — and what to do about it today.

image-generationprompt-engineeringtext-to-speech

#2311: Danish AI: Bridging the Localization Gap

How does AI handle Danish? Explore the challenges and progress in making AI tools work for small-language populations.

speech-recognitiontext-to-speechlarge-language-models

#2303: The Serverless Paradox: Why TTS Eats Your Budget

How batch processing and smart queue management can slash TTS costs for episodic podcast production.

text-to-speechserverless-gpuvoice-cloning

#2192: How We Built a Podcast Pipeline

Hilbert reveals the complete technical architecture behind 2,000+ episodes—from voice memos to GPU-powered TTS, with Claude models, LangGraph workf...

prompt-engineeringspeech-recognitiontext-to-speech

#2027: The Missing Photoshop for Words

Why is editing text with AI so clunky? We explore the "TITO" paradigm—using small, local models for fast, private text transformation.

local-aitext-to-speechspeech-recognition

#1810: Why Your TTS Sounds Great in English, Terrible Everywhere Else

English AI voices are polished, but global languages hit a wall. Here's why text-to-speech breaks down for Hebrew, Hindi, and beyond.

text-to-speechlinguisticsdata-integrity

#1809: The TTS Developer's Dilemma: Size vs. Speed

Stop guessing. We break down the critical trade-offs between model size, latency, and sample rate for production-ready voice apps.

text-to-speechgpu-accelerationedge-computing

#1808: The Architecture That Made AI Voices Run on a Raspberry Pi

How a model the size of a tweet outperforms billion-dollar giants in the race for perfect AI speech.

open-source-aismall-language-modelstext-to-speech

#1740: Why Open Source Is a Power Tool Strategy

We dissect Resemble AI's Chatterbox to see how its open-source TTS compares to commercial giants like ElevenLabs.

text-to-speechopen-sourceprosody-control

#1715: Why Voice Agents Need Frameworks (Not Just APIs)

Raw APIs handle models, but who manages the audio plumbing? We break down Vapi, LiveKit, and Pipecat.

speech-recognitiontext-to-speechconversational-ai

#136: The Ghost in the Machine: Why AI Voices Hallucinate

Why does your AI suddenly start shouting or whispering like Darth Vader? Herman and Corn dive into the glitchy world of TTS hallucinations.

text-to-speechhallucinationsautoregressive-modelsaudio-glitcheslatent-space