← #speech-recognition

#speech-recognition

42 episodes · Page 2 of 2

#1715: Why Voice Agents Need Frameworks (Not Just APIs)

Raw APIs handle models, but who manages the audio plumbing? We break down Vapi, LiveKit, and Pipecat.

speech-recognitiontext-to-speechconversational-ai

#1634: Agent Interview: Inception Mercury two

Meet Mercury 2, the Abu Dhabi-based AI using diffusion architecture to cut costs and boost wit.

generative-aiai-modelsspeech-recognition

#1601: Cohere: The Switzerland of Enterprise AI

While others chase viral memes, Cohere is quietly building the secure, cloud-agnostic infrastructure powering the global enterprise.

ragspeech-recognitiondefense-technology

#1539: Escaping the Cloud Dictation Trap

Stop shouting at your phone. Discover how dedicated hardware and local AI are making instant, private voice-to-text a reality.

speech-recognitionedge-computinghardware-engineering

#868: When Your Phone's Mic Beats Your Expensive Gear

Stop holding your phone like a piece of toast. Explore the best mobile microphone setups for high-quality AI voice transcription.

telecommunicationsaudio-engineeringspeech-recognition

#682: Why Your Phone Mic Beats Your Studio Headset

Why does a phone mic outperform a pro headset for AI transcription? Herman and Corn dive into the physics of MEMS and the truth about audio quality.

speech-recognitionaudio-hardwaresemiconductorssignal-processinghardware-engineering

#33: When AI Decides to Listen

Ever wonder how your AI knows you're talking? We're diving deep into VAD, the unseen magic behind AI's ears.

voice-activity-detectionvadspeech-recognitionasrspeech-to-text

#22: The Input Bottleneck: Why Your Mic Matters for AI

Uncover the secrets to perfect AI dictation! Corn and Herman explore the ultimate speech-to-text hardware.

large-language-modelsspeech-recognitionaudio-hardware

#26: Fine-Tuning AI to Understand Your Voice

Voice typing is changing everything. Join us as we explore the revolution of personalizing Whisper!

speech-recognitionfine-tuningtransformers

#15: AI Gets Personal: The Power of Voice Fine-Tuning

AI that understands *your* voice? Dive into the fascinating world of fine-tuning and discover how AI gets personal.

fine-tuningspeech-recognitionpersonalized-ai

#9: Benchmarking Custom ASR Tools - Beyond The WER

Benchmarking custom ASR fine-tunes: We're diving deep beyond the WER to truly measure performance.

asrbenchmarkingwerspeech-recognitionfine-tuning

#7: Building Custom ASR Tools

Ever wondered how to build your own ASR tools from scratch? Discover the why and how in this episode!

asrspeech-recognitioncustom-asrspeech-to-text

#8: Building Your Own Whisper

Ever wondered if you could build your own speech recognition tool? We dive deep into crafting custom ASR.

asrspeech-recognitionwhisperaudio-processingcustom-asr

#5: Fine-Tuning ASR For Maximal Usability

Fine-tuned ASR is just the start. Discover the next steps for deployment and maximizing usability.

asrspeech-recognitionfine-tuningdeploymentusability

#6: How To Fine Tune Whisper

Build your own AI transcription tool! We'll walk you through fine-tuning Whisper, from data to notebook.

fine-tuningspeech-recognitiongpu-acceleration

#4: If Your Voice Ages, Does Your Fine-Tune Become Useless?

Your voice changes, but your fine-tuned model shouldn't become useless. We explore the biology of the larynx and ASR.

speech-recognitionfine-tuningvocal-physiology

#2: Local STT For AMD GPU Owners

AMD GPU? No problem! Dive into local AI adventures like on-device speech to text.

speech-recognitiongpu-accelerationlocal-ai

#3: Safetensors or something else: STT inference formats explained

Unpacking ASR weight formats: Safetensors and beyond. Tune in to understand the distinctions.

safetensorsasrspeech-recognitionweight-formatsspeech-to-text