Speech & Audio

Voice AI, speech recognition, and audio processing

32 episodes · Page 2 of 2

#2272: The AI Transcription Sweet Spot

Does higher-quality audio make AI transcription worse? New research reveals a surprising "sweet spot" for bitrate, challenging a core assumption of...

speech-recognitionaudio-processingai-training

Speech-to-Text

Apr 12

#2183: Making Voice Agents Feel Natural

Turn-taking, interruptions, and latency are destroying voice AI UX—and the fixes are deeply technical. Here's what's actually happening underneath.

speech-recognitionconversational-ailatency

Audio Processing

Mar 31

#1809: The TTS Developer's Dilemma: Size vs. Speed

Stop guessing. We break down the critical trade-offs between model size, latency, and sample rate for production-ready voice apps.

text-to-speechgpu-accelerationedge-computing

Text-to-Speech

Mar 31

#1808: The Architecture That Made AI Voices Run on a Raspberry Pi

How a model the size of a tweet outperforms billion-dollar giants in the race for perfect AI speech.

open-source-aismall-language-modelstext-to-speech

Text-to-Speech

Mar 31

#1800: Hacking the Brain's Alarm System

Why some sounds make your skin crawl: the science of emergency alerts.

audio-processinghuman-computer-interactionemergency-preparedness

Audio Processing

Mar 30

#1778: Audio Is the New "Read Later" Graveyard

Why listening to AI conversations beats reading dense PDFs, and how serverless GPUs make it cheap.

audio-processingserverless-gpurag

Text-to-Speech

Mar 29

#1752: Whisper Small Beats Whisper Large in Speed & Accuracy

A 4GPU benchmark on Ubuntu shows the 1.5B parameter Whisper Large is slower and less accurate than the tiny Whisper Small.

speech-recognitiongpu-accelerationlatency

Speech-to-Text

Mar 29

#1724: When AI Dubbing Swaps Your Gender

How does YouTube translate a video with one click? We explore the tech behind auto-dubbing, from sandwich models to voice cloning.

speech-to-speechvoice-cloningmultimodal-ai

Text-to-Speech