← All Tags

#text-to-speech

8 episodes

#2192: How We Built a Podcast Pipeline

Hilbert reveals the complete technical architecture behind 2,000+ episodes—from voice memos to GPU-powered TTS, with Claude models, LangGraph workf...

prompt-engineeringspeech-recognitiontext-to-speech

#2027: Text-In, Text-Out: The Missing Photoshop for Words

Why is editing text with AI so clunky? We explore the "TITO" paradigm—using small, local models for fast, private text transformation.

local-aitext-to-speechspeech-recognition

#1810: Why Your TTS Sounds Great in English, Terrible Everywhere Else

English AI voices are polished, but global languages hit a wall. Here's why text-to-speech breaks down for Hebrew, Hindi, and beyond.

text-to-speechlinguisticsdata-integrity

#1809: The TTS Developer's Dilemma: Size vs. Speed

Stop guessing. We break down the critical trade-offs between model size, latency, and sample rate for production-ready voice apps.

text-to-speechgpu-accelerationedge-computing

#1808: The 82M Parameter Voice That Beat Billion-Dollar AI

How a model the size of a tweet outperforms billion-dollar giants in the race for perfect AI speech.

open-source-aismall-language-modelstext-to-speech

#1740: Chatterbox TTS: Open Source vs. ElevenLabs

We dissect Resemble AI's Chatterbox to see how its open-source TTS compares to commercial giants like ElevenLabs.

text-to-speechopen-sourceprosody-control

#1715: Why Voice Agents Need Frameworks (Not Just APIs)

Raw APIs handle models, but who manages the audio plumbing? We break down Vapi, LiveKit, and Pipecat.

speech-recognitiontext-to-speechconversational-ai

#136: The Ghost in the Machine: Why AI Voices Hallucinate

Why does your AI suddenly start shouting or whispering like Darth Vader? Herman and Corn dive into the glitchy world of TTS hallucinations.

text-to-speechhallucinationsautoregressive-modelsaudio-glitcheslatent-space