#text-to-speech

16 episodes

Jun 1

#3189: Drawing the Melody: SSML's Hidden Power

How SSML gives developers narrative control over AI voices — and why ElevenLabs became its center of gravity.

text-to-speechaudio-engineeringconversational-ai

May 22

#2982: Why Your TTS Model Nails "Shabbat" but Not "Keren Hishtalmut

Why multilingual TTS models handle loanwords but fail at niche vocabulary — and what you can do about it.

text-to-speechtokenizationfine-tuning

May 18

#2914: Can AI Read the Room? TTS Prosody Explained

Can TTS models truly infer emotion from text, or just mimic patterns? We break down the science of prosody.

text-to-speechspeech-to-speechaudio-processing

May 3

#2618: Text Normalization's Hidden Complexity

How to handle acronyms in text-to-speech pipelines using BERT models, lexicons, and layered preprocessing.

text-to-speechspeech-recognitionaudio-processing

May 2

#2591: Decoupling Script from Voice

How dynamic voice replacement could let listeners choose who narrates each host's lines.

voice-cloningtext-to-speechaudio-processing

Apr 29

#2534: Can AI Generate Diagrams Without Typo Disasters?

Why AI diagram tools still mangle text labels — and what to do about it today.

image-generationprompt-engineeringtext-to-speech

Apr 19

#2311: Danish AI: Bridging the Localization Gap

How does AI handle Danish? Explore the challenges and progress in making AI tools work for small-language populations.

speech-recognitiontext-to-speechlarge-language-models

Apr 18

#2303: The Serverless Paradox: Why TTS Eats Your Budget

How batch processing and smart queue management can slash TTS costs for episodic podcast production.

text-to-speechserverless-gpuvoice-cloning

Apr 12

#2192: How We Built a Podcast Pipeline

Hilbert reveals the complete technical architecture behind 2,000+ episodes—from voice memos to GPU-powered TTS, with Claude models, LangGraph workf...

prompt-engineeringspeech-recognitiontext-to-speech

Apr 5

#2027: The Missing Photoshop for Words

Why is editing text with AI so clunky? We explore the "TITO" paradigm—using small, local models for fast, private text transformation.

local-aitext-to-speechspeech-recognition

Mar 31

#1810: Why Your TTS Sounds Great in English, Terrible Everywhere Else

English AI voices are polished, but global languages hit a wall. Here's why text-to-speech breaks down for Hebrew, Hindi, and beyond.

text-to-speechlinguisticsdata-integrity

Mar 31

#1809: The TTS Developer's Dilemma: Size vs. Speed

Stop guessing. We break down the critical trade-offs between model size, latency, and sample rate for production-ready voice apps.

text-to-speechgpu-accelerationedge-computing

Mar 31

#1808: The Architecture That Made AI Voices Run on a Raspberry Pi

How a model the size of a tweet outperforms billion-dollar giants in the race for perfect AI speech.

open-source-aismall-language-modelstext-to-speech

Mar 29

#1740: Why Open Source Is a Power Tool Strategy

We dissect Resemble AI's Chatterbox to see how its open-source TTS compares to commercial giants like ElevenLabs.

text-to-speechopen-sourceprosody-control

Mar 29

#1715: Why Voice Agents Need Frameworks (Not Just APIs)

Raw APIs handle models, but who manages the audio plumbing? We break down Vapi, LiveKit, and Pipecat.

speech-recognitiontext-to-speechconversational-ai

Jan 2

#136: The Ghost in the Machine: Why AI Voices Hallucinate

Why does your AI suddenly start shouting or whispering like Darth Vader? Herman and Corn dive into the glitchy world of TTS hallucinations.

text-to-speechhallucinationsautoregressive-modelsaudio-glitcheslatent-space

#3189: Drawing the Melody: SSML's Hidden Power

#2982: Why Your TTS Model Nails "Shabbat" but Not "Keren Hishtalmut

#2914: Can AI Read the Room? TTS Prosody Explained

#2618: Text Normalization's Hidden Complexity

#2591: Decoupling Script from Voice

#2534: Can AI Generate Diagrams Without Typo Disasters?

#2311: Danish AI: Bridging the Localization Gap

#2303: The Serverless Paradox: Why TTS Eats Your Budget

#2192: How We Built a Podcast Pipeline

#2027: The Missing Photoshop for Words

#1810: Why Your TTS Sounds Great in English, Terrible Everywhere Else

#1809: The TTS Developer's Dilemma: Size vs. Speed

#1808: The Architecture That Made AI Voices Run on a Raspberry Pi

#1740: Why Open Source Is a Power Tool Strategy

#1715: Why Voice Agents Need Frameworks (Not Just APIs)

#136: The Ghost in the Machine: Why AI Voices Hallucinate

Related Topics