Audio & Speech

Speech recognition, TTS, voice cloning, audio engineering

42 episodes Page 2 of 3

#1809: The TTS Developer's Dilemma: Size vs. Speed

Stop guessing. We break down the critical trade-offs between model size, latency, and sample rate for production-ready voice apps.

text-to-speechgpu-accelerationedge-computing

#1808: The Architecture That Made AI Voices Run on a Raspberry Pi

How a model the size of a tweet outperforms billion-dollar giants in the race for perfect AI speech.

open-source-aismall-language-modelstext-to-speech

#1800: Hacking the Brain's Alarm System

Why some sounds make your skin crawl: the science of emergency alerts.

audio-processinghuman-computer-interactionemergency-preparedness

#1778: Audio Is the New "Read Later" Graveyard

Why listening to AI conversations beats reading dense PDFs, and how serverless GPUs make it cheap.

audio-processingserverless-gpurag

#1752: Whisper Small Beats Whisper Large in Speed & Accuracy

A 4GPU benchmark on Ubuntu shows the 1.5B parameter Whisper Large is slower and less accurate than the tiny Whisper Small.

speech-recognitiongpu-accelerationlatency

#1724: When AI Dubbing Swaps Your Gender

How does YouTube translate a video with one click? We explore the tech behind auto-dubbing, from sandwich models to voice cloning.

speech-to-speechvoice-cloningmultimodal-ai

#1555: Beyond Whisper: NVIDIA’s Real-Time Speech Revolution

Move over Whisper. NVIDIA's new models offer 10x speed increases and better accuracy for real-time speech-to-text.

#1218: The Architectural Divide Between Batch and Live Speech

Why does voice typing feel so clunky compared to recording a memo? We explore the technical hurdles of real-time AI transcription.

#947: Pro Audio in Acoustic Nightmares: Mobile Recording Tips

Learn how to turn a marble-floored room into a studio using your phone, simple blankets, and the right USB-C gear.

audio-engineeringmobile-recordingacoustic-treatment

#868: When Your Phone's Mic Beats Your Expensive Gear

Stop holding your phone like a piece of toast. Explore the best mobile microphone setups for high-quality AI voice transcription.

telecommunicationsaudio-engineeringspeech-recognition

#732: Why Your Recorded Voice Sounds Wrong

Use AI to find your perfect EQ profile and build a pro vocal chain. Fix nasality, master de-essing, and sound your best on any device.

audio-engineeringaudio-processingaudio-qualitycomputational-audio

#727: The Math of Immersion: How 360-Degree Sound Actually Works

Learn how object-based audio and clever math trick your brain into hearing 360-degree sound from even the smallest mobile devices.

sensory-processingspatial-audiocomputational-audio

#725: Finding a Speaker That Loves Voices

Stop listening to podcasts through tinny speakers. Learn how to choose hardware optimized for the human voice and clear, room-filling audio.

smart-homeaudio-engineeringcomputational-audio

#720: Why Your Ears Prefer Imperfect Plastic to Perfect Pixels

Why do we still buy plastic discs in an age of neural-link streaming? Explore the science of analog warmth and the "ritual" of the record.

sensory-processinganalog-audiodigital-compression

#660: The Bit Rate Dilemma: How Much Audio Data Do You Need?

Herman and Corn explore the science of audio compression, psychoacoustics, and finding the perfect bit rate for podcasts and AI.

audio-processingdata-integritypsychoacoustics

#647: The Golden Rule of Audio Engineering

Why does digital data need to become analog? Explore the physics of sound and the critical role of the DAC in modern audio engineering.

audio-engineeringsignal-processingdigital-to-analog

#598: Audio Engineering as Prompt Engineering: Better Sound, Better AI

Can better audio quality actually make an AI smarter? Discover how audio post-production functions as a new form of prompt engineering.

prompt-engineeringlarge-language-modelsaudio-engineering

#233: How Math Gives Microphones Directional Ears

Discover how math and physics turn simple microphones into "sound spotlights" that can isolate a single voice in even the noisiest environments.

beamforming-technologymicrophone-arraysdigital-signal-processing

#196: Why Your Irish Accent Sounds American

Herman and Corn dive into the mechanics of neural text-to-speech, exploring how AI masters human prosody and the "average voice" accent problem.

neural-text-to-speechvoice-cloninggenerative-modeling

#99: The Mic That Hears You from Across the Desk

Tired of headsets? Herman and Corn explore professional microphone setups for seamless, high-accuracy AI voice dictation from a distance.

voice-dictationai-accuracymicrophonesaudio-qualitysignal-to-noise-ratio