Speech & Audio
Voice AI, speech recognition, and audio processing
26 episodes
#332: Who’s Talking? The Tech of Speaker Identification
Herman and Corn break down the difference between speaker diarization and identification to help automate meeting transcripts.
#296: Sonic Sorcery: Mapping Spatial Audio in Small Spaces
Discover how spatial audio and room mapping can turn a tiny rental bedroom into a cinematic powerhouse without drilling a single hole.
#233: The Sound Spotlight: How Beamforming Redefines Audio
Discover how math and physics turn simple microphones into "sound spotlights" that can isolate a single voice in even the noisiest environments.
#196: Beyond the Robot: The Science of Modern Voice Cloning
Herman and Corn dive into the mechanics of neural text-to-speech, exploring how AI masters human prosody and the "average voice" accent problem.
#153: Designing the Voice-First Workspace: IKEA for AI Pros
Learn how to transform your home office into a high-performance voice-first workspace using acoustic hygiene and ergonomic IKEA furniture hacks.
#142: Breaking the Voice Wall: The Future of Native Speech AI
Explore why native speech-to-speech AI is 20x more expensive than text pipelines and how "semantic VAD" is solving the awkward silence problem.
#136: The Ghost in the Machine: Why AI Voices Hallucinate
Why does your AI suddenly start shouting or whispering like Darth Vader? Herman and Corn dive into the glitchy world of TTS hallucinations.
#120: Silencing the Siren: Real-Time AI Noise Reduction
How do phones remove sirens and crying babies in real time? Explore the neural networks and hardware making crystal-clear audio possible.
#109: Teaching AI to Hear: Solving the Custom Dictionary Dilemma
Tired of AI mishearing brand names? Learn how to build efficient custom dictionaries for Gemini 1.5 without breaking the bank.
#99: Beyond the Headset: Pro Audio for AI Voice Control
Tired of headsets? Herman and Corn explore professional microphone setups for seamless, high-accuracy AI voice dictation from a distance.
#69: Unsung Hero: The Gooseneck Mic's AI Power
The gooseneck mic: a humble hero with surprising AI power. Discover its secret to crystal-clear speech-to-text accuracy!
#58: Clean Audio, Messy Reality: Noise Removal for Voice-to-Text
Fussy baby, clean audio? We dive into noise removal for voice-to-text. Discover why cleaner audio can transcribe worse.
#57: From Lawyers in Limousines to Developers in Their PJs: The Voice Tech Revolution
From limo-riding lawyers to pajama-clad coders, voice tech is booming. Discover how AI is making it a force for good.
#33: The Unseen Magic of AI's Ears: Decoding VAD
Ever wonder how your AI knows you're talking? We're diving deep into VAD, the unseen magic behind AI's ears.
#29: The Multimodal Audio Revolution: A Screen-Free Future?
Is multimodal audio the future? We explore if AI can truly displace traditional speech-to-text for a screen-free world.
#22: Mic Check: Mastering AI Dictation Hardware
Uncover the secrets to perfect AI dictation! Corn and Herman explore the ultimate speech-to-text hardware.
#26: Personalizing Whisper: The Voice Typing Revolution
Voice typing is changing everything. Join us as we explore the revolution of personalizing Whisper!
#15: AI Gets Personal: The Power of Voice Fine-Tuning
AI that understands *your* voice? Dive into the fascinating world of fine-tuning and discover how AI gets personal.
#9: Benchmarking Custom ASR Tools - Beyond The WER
Benchmarking custom ASR fine-tunes: We're diving deep beyond the WER to truly measure performance.
#4: If Your Voice Ages, Does Your Fine-Tune Become Useless?
Your voice changes, but your fine-tuned model shouldn't become useless. We explore the biology of the larynx and ASR.
#7: Building Custom ASR Tools
Ever wondered how to build your own ASR tools from scratch? Discover the why and how in this episode!
#3: Safetensors or something else: STT inference formats explained
Unpacking ASR weight formats: Safetensors and beyond. Tune in to understand the distinctions.
#6: How To Fine Tune Whisper
Build your own AI transcription tool! We'll walk you through fine-tuning Whisper, from data to notebook.
#8: Building Your Own Whisper
Ever wondered if you could build your own speech recognition tool? We dive deep into crafting custom ASR.
#10: How ASR Went From Frustration To ... Whisper Magic
Speech to text: from frustrating to fantastic. Uncover the magic behind its rapid rise and connection to the AI boom!
#5: Fine-Tuning ASR For Maximal Usability
Fine-tuned ASR is just the start. Discover the next steps for deployment and maximizing usability.