Speech & Audio

Voice AI, speech recognition, and audio processing

26 episodes

#332: Who’s Talking? The Tech of Speaker Identification

Herman and Corn break down the difference between speaker diarization and identification to help automate meeting transcripts.

speaker-diarizationvoice-embeddingsspeaker-identification

#296: Sonic Sorcery: Mapping Spatial Audio in Small Spaces

Discover how spatial audio and room mapping can turn a tiny rental bedroom into a cinematic powerhouse without drilling a single hole.

spatial-audioacoustic-telemetryroom-mapping

#233: The Sound Spotlight: How Beamforming Redefines Audio

Discover how math and physics turn simple microphones into "sound spotlights" that can isolate a single voice in even the noisiest environments.

beamforming-technologymicrophone-arraysdigital-signal-processing

#196: Beyond the Robot: The Science of Modern Voice Cloning

Herman and Corn dive into the mechanics of neural text-to-speech, exploring how AI masters human prosody and the "average voice" accent problem.

neural-text-to-speechvoice-cloninggenerative-modeling

#153: Designing the Voice-First Workspace: IKEA for AI Pros

Learn how to transform your home office into a high-performance voice-first workspace using acoustic hygiene and ergonomic IKEA furniture hacks.

voice-firstacoustic-hygieneikeaworkspaceergonomics

#142: Breaking the Voice Wall: The Future of Native Speech AI

Explore why native speech-to-speech AI is 20x more expensive than text pipelines and how "semantic VAD" is solving the awkward silence problem.

large-language-modelslocal-aispeech-to-speech

#136: The Ghost in the Machine: Why AI Voices Hallucinate

Why does your AI suddenly start shouting or whispering like Darth Vader? Herman and Corn dive into the glitchy world of TTS hallucinations.

text-to-speechhallucinationsautoregressive-modelsaudio-glitcheslatent-space

#120: Silencing the Siren: Real-Time AI Noise Reduction

How do phones remove sirens and crying babies in real time? Explore the neural networks and hardware making crystal-clear audio possible.

noise-reductionaudio-engineeringneural-networksmobile-devicesedge-computing

#109: Teaching AI to Hear: Solving the Custom Dictionary Dilemma

Tired of AI mishearing brand names? Learn how to build efficient custom dictionaries for Gemini 1.5 without breaking the bank.

automatic-speech-recognitioncustom-dictionariesgemini-15context-bloatdynamic-hint-system

#99: Beyond the Headset: Pro Audio for AI Voice Control

Tired of headsets? Herman and Corn explore professional microphone setups for seamless, high-accuracy AI voice dictation from a distance.

voice-dictationai-accuracymicrophonesaudio-qualitysignal-to-noise-ratio

#69: Unsung Hero: The Gooseneck Mic's AI Power

The gooseneck mic: a humble hero with surprising AI power. Discover its secret to crystal-clear speech-to-text accuracy!

gooseneck-micspeech-to-textmicrophoneai-voice-captureaudio-technology

#58: Clean Audio, Messy Reality: Noise Removal for Voice-to-Text

Fussy baby, clean audio? We dive into noise removal for voice-to-text. Discover why cleaner audio can transcribe worse.

noise-removalvoice-to-textaudio-processingsignal-processingneural-networks

#57: From Lawyers in Limousines to Developers in Their PJs: The Voice Tech Revolution

From limo-riding lawyers to pajama-clad coders, voice tech is booming. Discover how AI is making it a force for good.

voice-technologyaccessibilityproductivity

#33: The Unseen Magic of AI's Ears: Decoding VAD

Ever wonder how your AI knows you're talking? We're diving deep into VAD, the unseen magic behind AI's ears.

voice-activity-detectionvadspeech-recognitionasrspeech-to-text

#29: The Multimodal Audio Revolution: A Screen-Free Future?

Is multimodal audio the future? We explore if AI can truly displace traditional speech-to-text for a screen-free world.

multimodal-audiospeech-to-textscreen-freeaudio-aiaccessibility

#22: Mic Check: Mastering AI Dictation Hardware

Uncover the secrets to perfect AI dictation! Corn and Herman explore the ultimate speech-to-text hardware.

large-language-modelsspeech-recognitionaudio-hardware

#26: Personalizing Whisper: The Voice Typing Revolution

Voice typing is changing everything. Join us as we explore the revolution of personalizing Whisper!

speech-recognitionfine-tuningtransformers

#15: AI Gets Personal: The Power of Voice Fine-Tuning

AI that understands *your* voice? Dive into the fascinating world of fine-tuning and discover how AI gets personal.

fine-tuningspeech-recognitionpersonalized-ai

#9: Benchmarking Custom ASR Tools - Beyond The WER

Benchmarking custom ASR fine-tunes: We're diving deep beyond the WER to truly measure performance.

asrbenchmarkingwerspeech-recognitionfine-tuning

#4: If Your Voice Ages, Does Your Fine-Tune Become Useless?

Your voice changes, but your fine-tuned model shouldn't become useless. We explore the biology of the larynx and ASR.

speech-recognitionfine-tuningvocal-physiology

#7: Building Custom ASR Tools

Ever wondered how to build your own ASR tools from scratch? Discover the why and how in this episode!

asrspeech-recognitioncustom-asrmachine-learningspeech-to-text

#3: Safetensors or something else: STT inference formats explained

Unpacking ASR weight formats: Safetensors and beyond. Tune in to understand the distinctions.

safetensorsasrspeech-recognitioninferenceweight-formats

#6: How To Fine Tune Whisper

Build your own AI transcription tool! We'll walk you through fine-tuning Whisper, from data to notebook.

fine-tuningspeech-recognitiongpu-acceleration

#8: Building Your Own Whisper

Ever wondered if you could build your own speech recognition tool? We dive deep into crafting custom ASR.

asrspeech-recognitionwhispermachine-learningaudio-processing

#10: How ASR Went From Frustration To ... Whisper Magic

Speech to text: from frustrating to fantastic. Uncover the magic behind its rapid rise and connection to the AI boom!

automatic-speech-recognitionspeech-to-textasr-technology

#5: Fine-Tuning ASR For Maximal Usability

Fine-tuned ASR is just the start. Discover the next steps for deployment and maximizing usability.

asrspeech-recognitionfine-tuningdeploymentusability