Speech & Audio

Speech-to-Text

Whisper, ASR, transcription, voice typing

8 episodes

#2754: Why Your Dictation Setup Might Be Wrong

Modern ASR is shockingly robust. The biggest predictor of accuracy? How well your audio matches its training data.

automatic-speech-recognitionspeech-recognitionaudio-processing

#2707: Foot Pedals vs USB Buttons: The Ergonomics of Dictation

Foot pedals, USB buttons, and under-desk macro pads for voice dictation — a deep dive into the hardware that makes AI dictation work.

ergonomicsaudio-engineeringhardware-engineering

#2512: How Speech-to-Speech Models Eliminate the Robot Voice

Why AI voice agents sound robotic, and how natively integrated speech-to-speech models fix it.

speech-to-speechaudio-processinglatency

#2510: The Design That Makes Voice Agents Tolerable

Drive-thru accuracy, healthcare triage, and the design secret that makes people *want* to talk to a machine.

voice-firstaccessibilityspeech-recognition

#2479: The Screaming Baby Stress Test

Choosing the right headset and control method for dictation when you're holding a baby who won't stop screaming.

speech-recognitionvoice-firstdiy

#2311: Danish AI: Bridging the Localization Gap

How does AI handle Danish? Explore the challenges and progress in making AI tools work for small-language populations.

speech-recognitiontext-to-speechlarge-language-models

#2272: The AI Transcription Sweet Spot

Does higher-quality audio make AI transcription worse? New research reveals a surprising "sweet spot" for bitrate, challenging a core assumption of...

speech-recognitionaudio-processingai-training

#1752: Whisper Small Beats Whisper Large in Speed & Accuracy

A 4GPU benchmark on Ubuntu shows the 1.5B parameter Whisper Large is slower and less accurate than the tiny Whisper Small.

speech-recognitiongpu-accelerationlatency