#speech-recognition
22 episodes
#2192: How We Built a Podcast Pipeline
Hilbert reveals the complete technical architecture behind 2,000+ episodes—from voice memos to GPU-powered TTS, with Claude models, LangGraph workf...
#2183: Making Voice Agents Feel Natural
Turn-taking, interruptions, and latency are destroying voice AI UX—and the fixes are deeply technical. Here's what's actually happening underneath.
#2027: Text-In, Text-Out: The Missing Photoshop for Words
Why is editing text with AI so clunky? We explore the "TITO" paradigm—using small, local models for fast, private text transformation.
#1752: Whisper Small Beats Whisper Large in Speed & Accuracy
A 4GPU benchmark on Ubuntu shows the 1.5B parameter Whisper Large is slower and less accurate than the tiny Whisper Small.
#1715: Why Voice Agents Need Frameworks (Not Just APIs)
Raw APIs handle models, but who manages the audio plumbing? We break down Vapi, LiveKit, and Pipecat.
#1634: Agent Interview: Inception Mercury two
Meet Mercury 2, the Abu Dhabi-based AI using diffusion architecture to cut costs and boost wit.
#1601: Cohere: The Switzerland of Enterprise AI
While others chase viral memes, Cohere is quietly building the secure, cloud-agnostic infrastructure powering the global enterprise.
#1539: The Voice Keyboard: Killing the "Digital Sandwich"
Stop shouting at your phone. Discover how dedicated hardware and local AI are making instant, private voice-to-text a reality.
#868: Beyond the Digital Sandwich: Pro Mobile Mics for AI
Stop holding your phone like a piece of toast. Explore the best mobile microphone setups for high-quality AI voice transcription.
#682: The Secret Power of Your Smartphone’s Tiny Microphones
Why does a phone mic outperform a pro headset for AI transcription? Herman and Corn dive into the physics of MEMS and the truth about audio quality.
#33: The Unseen Magic of AI's Ears: Decoding VAD
Ever wonder how your AI knows you're talking? We're diving deep into VAD, the unseen magic behind AI's ears.
#22: Mic Check: Mastering AI Dictation Hardware
Uncover the secrets to perfect AI dictation! Corn and Herman explore the ultimate speech-to-text hardware.
#26: Personalizing Whisper: The Voice Typing Revolution
Voice typing is changing everything. Join us as we explore the revolution of personalizing Whisper!
#15: AI Gets Personal: The Power of Voice Fine-Tuning
AI that understands *your* voice? Dive into the fascinating world of fine-tuning and discover how AI gets personal.
#9: Benchmarking Custom ASR Tools - Beyond The WER
Benchmarking custom ASR fine-tunes: We're diving deep beyond the WER to truly measure performance.
#3: Safetensors or something else: STT inference formats explained
Unpacking ASR weight formats: Safetensors and beyond. Tune in to understand the distinctions.
#4: If Your Voice Ages, Does Your Fine-Tune Become Useless?
Your voice changes, but your fine-tuned model shouldn't become useless. We explore the biology of the larynx and ASR.
#7: Building Custom ASR Tools
Ever wondered how to build your own ASR tools from scratch? Discover the why and how in this episode!
#6: How To Fine Tune Whisper
Build your own AI transcription tool! We'll walk you through fine-tuning Whisper, from data to notebook.
#8: Building Your Own Whisper
Ever wondered if you could build your own speech recognition tool? We dive deep into crafting custom ASR.
#5: Fine-Tuning ASR For Maximal Usability
Fine-tuned ASR is just the start. Discover the next steps for deployment and maximizing usability.
#2: Local STT For AMD GPU Owners
AMD GPU? No problem! Dive into local AI adventures like on-device speech to text.