Speech & Audio

Speech-to-Text

Whisper, ASR, transcription, voice typing

14 episodes

#142: Breaking the Voice Wall: The Future of Native Speech AI

Explore why native speech-to-speech AI is 20x more expensive than text pipelines and how "semantic VAD" is solving the awkward silence problem.

large-language-modelslocal-aispeech-to-speech

#109: Teaching AI to Hear: Solving the Custom Dictionary Dilemma

Tired of AI mishearing brand names? Learn how to build efficient custom dictionaries for Gemini 1.5 without breaking the bank.

automatic-speech-recognitioncustom-dictionariesgemini-15context-bloatdynamic-hint-system

#69: Unsung Hero: The Gooseneck Mic's AI Power

The gooseneck mic: a humble hero with surprising AI power. Discover its secret to crystal-clear speech-to-text accuracy!

gooseneck-micspeech-to-textmicrophoneai-voice-captureaudio-technology

#57: From Lawyers in Limousines to Developers in Their PJs: The Voice Tech Revolution

From limo-riding lawyers to pajama-clad coders, voice tech is booming. Discover how AI is making it a force for good.

voice-technologyaccessibilityproductivity

#29: The Multimodal Audio Revolution: A Screen-Free Future?

Is multimodal audio the future? We explore if AI can truly displace traditional speech-to-text for a screen-free world.

multimodal-audiospeech-to-textscreen-freeaudio-aiaccessibility

#26: Personalizing Whisper: The Voice Typing Revolution

Voice typing is changing everything. Join us as we explore the revolution of personalizing Whisper!

speech-recognitionfine-tuningtransformers

#22: Mic Check: Mastering AI Dictation Hardware

Uncover the secrets to perfect AI dictation! Corn and Herman explore the ultimate speech-to-text hardware.

large-language-modelsspeech-recognitionaudio-hardware

#8: Building Your Own Whisper

Ever wondered if you could build your own speech recognition tool? We dive deep into crafting custom ASR.

asrspeech-recognitionwhispermachine-learningaudio-processing

#7: Building Custom ASR Tools

Ever wondered how to build your own ASR tools from scratch? Discover the why and how in this episode!

asrspeech-recognitioncustom-asrmachine-learningspeech-to-text

#10: How ASR Went From Frustration To ... Whisper Magic

Speech to text: from frustrating to fantastic. Uncover the magic behind its rapid rise and connection to the AI boom!

automatic-speech-recognitionspeech-to-textasr-technology

#3: Safetensors or something else: STT inference formats explained

Unpacking ASR weight formats: Safetensors and beyond. Tune in to understand the distinctions.

safetensorsasrspeech-recognitioninferenceweight-formats

#6: How To Fine Tune Whisper

Build your own AI transcription tool! We'll walk you through fine-tuning Whisper, from data to notebook.

fine-tuningspeech-recognitiongpu-acceleration

#9: Benchmarking Custom ASR Tools - Beyond The WER

Benchmarking custom ASR fine-tunes: We're diving deep beyond the WER to truly measure performance.

asrbenchmarkingwerspeech-recognitionfine-tuning

#5: Fine-Tuning ASR For Maximal Usability

Fine-tuned ASR is just the start. Discover the next steps for deployment and maximizing usability.

asrspeech-recognitionfine-tuningdeploymentusability