#multimodal-ai

20 episodes

Jun 3

#3234: Who Should Sponsor This Podcast? Open-Book Economics

We open the books on our AI-generated podcast: $200/month costs, 180K plays, zero sponsors. Who should we pitch?

open-sourcesustainabilitymultimodal-ai

May 7

#2688: Intelligent Frame Extraction for Multimodal AI

Use multimodal AI and smart frame extraction to turn a walk-through video into an actionable decluttering plan.

multimodal-aicomputer-visionprompt-engineering

Apr 25

#2409: When AI Cheats on Cultural Knowledge

Five benchmarks that reveal how AI systems fail at cultural knowledge — and what their methodologies tell us.

cultural-biasbenchmarksmultimodal-ai

Apr 3

#1964: The Three Layers That Make AR Finally Work

See a 3D arrow pointing to the exact bolt you need, or read a street sign in real-time translation.

multimodal-aiaugmented-realitycomputer-vision

Mar 31

#1792: Google's Native Multimodal Embedding Kills the Fusion Layer

Google’s new embedding model maps text, images, audio, and video into a single vector space—cutting latency by 70%.

multimodal-airagai-models

Mar 29

#1724: When AI Dubbing Swaps Your Gender

How does YouTube translate a video with one click? We explore the tech behind auto-dubbing, from sandwich models to voice cloning.

speech-to-speechvoice-cloningmultimodal-ai

Mar 27

#1592: The Vector Debt Trap: Choosing Embeddings That Last

Stop treating embedding models like plumbing. Learn how to navigate vector debt, multimodal retrieval, and database configuration for RAG.

ragvector-databasesmultimodal-ai

Mar 27

#1586: The Rocketbook Sunset and the Search for a Clean Erase

Bridge the gap between handwritten notes and AI. Discover the best whiteboard notebooks and markers for seamless digital transcription.

multimodal-aimaterial-sciencehuman-computer-interaction

Mar 26

#1568: The Signal Versus Symbol Gap

Is Gemini a brilliant audio engineer or just a talented lip-reader? Explore the "signal vs. symbol" gap in AI audio processing.

multimodal-aiaudio-processinghallucinations

Mar 26

#1564: The Death of the Cascaded Pipeline

Forget basic transcription. Explore how native omni-modal models are capturing the "soul" of speech with near-instant latency.

multimodal-aispeech-to-speechvoice-first

Mar 23

#1482: The Hidden Cost of Choosing an Embedding Model

From Matryoshka models to multimodal search, discover how the fundamental units of AI memory are being optimized for efficiency and scale.

multimodal-aivector-databasesrag

Mar 10

#1085: The Tokenization Lie: How AI Actually Processes Media

Think 1,000 tokens equals 750 words? For audio and video, that rule is a lie. Discover the hidden math behind multimodal AI.

large-language-modelsquantizationmultimodal-ai

Feb 22

#786: The Cost of a Touch: When Your Hoard Becomes a Liability

Learn how to manage thousands of parts without losing your mind using AI, QR codes, and professional logistics strategies.

security-logisticsmultimodal-aidata-integrity

Feb 22

#769: When Manuals Learn to See in 3D

Discover how AI and spatial computing are turning complex hardware repairs into real-time, interactive experiences.

multimodal-aicomputer-visionhardware-engineeringindustrial-automationaugmented-reality

Feb 21

#749: The Live vs. Scripted Trade-Off in AI Podcasting

Can AI podcasts move from polished scripts to raw, real-time conversation? Explore the technical and financial shift to live multimodal models.

large-language-modelsarchitecturemultimodal-ai

Jan 2

#132: How AI Learns to See Time as a Dimension

Discover how spatial-temporal tokenization and 3D world modeling are revolutionizing real-time video-to-video AI interaction.

video-aimultimodal-aireal-time-videospatial-temporal-tokenizationslam

Dec 18

#64: How AI Learns to See, Hear, and Think Together

AI is evolving beyond text, learning to see, hear, and understand our world. Discover the future of human-AI interaction!

multimodal-aiai-sensescomputer-visionaudio-processingdata-integration

Dec 11

#54: How AI Unifies Images, Audio, and Text

Omnimodal AI: How do models process images, audio, video, and text all at once? Discover the engineering behind AI that accepts anything.

omnimodal-aitokenizationai-modelsmultimodal-aidata-types