← All Tags

#multimodal-ai

17 episodes

#1964: AI Glasses That See Through Your Eyes

See a 3D arrow pointing to the exact bolt you need, or read a street sign in real-time translation.

multimodal-aiaugmented-realitycomputer-vision

#1792: Google's Native Multimodal Embedding Kills the Fusion Layer

Google’s new embedding model maps text, images, audio, and video into a single vector space—cutting latency by 70%.

multimodal-airagai-models

#1724: YouTube's Invisible AI Dubbing Machine

How does YouTube translate a video with one click? We explore the tech behind auto-dubbing, from sandwich models to voice cloning.

speech-to-speechvoice-cloningmultimodal-ai

#1592: Mastering Embedding Models: From Gemini 2 to Vector Debt

Stop treating embedding models like plumbing. Learn how to navigate vector debt, multimodal retrieval, and database configuration for RAG.

ragvector-databasesmultimodal-ai

#1586: Whiteboard Notebooks: Bridging the Pen and AI

Bridge the gap between handwritten notes and AI. Discover the best whiteboard notebooks and markers for seamless digital transcription.

multimodal-aimaterial-sciencehuman-computer-interaction

#1568: Is Your AI Listening or Just Lip-Reading?

Is Gemini a brilliant audio engineer or just a talented lip-reader? Explore the "signal vs. symbol" gap in AI audio processing.

multimodal-aiaudio-processinghallucinations

#1564: Why AI is Trading Transcripts for Raw Audio

Forget basic transcription. Explore how native omni-modal models are capturing the "soul" of speech with near-instant latency.

multimodal-aispeech-to-speechvoice-first

#1482: The Multimodal Shift: Navigating the New Vector Landscape

From Matryoshka models to multimodal search, discover how the fundamental units of AI memory are being optimized for efficiency and scale.

multimodal-aivector-databasesrag

#1085: The Tokenization Lie: How AI Actually Processes Media

Think 1,000 tokens equals 750 words? For audio and video, that rule is a lie. Discover the hidden math behind multimodal AI.

large-language-modelsquantizationmultimodal-ai

#786: Mastering the Hoard: AI-Powered Inventory Management

Learn how to manage thousands of parts without losing your mind using AI, QR codes, and professional logistics strategies.

security-logisticsmultimodal-aidata-integrity

#769: The Living Manual: AI and AR for High-Tech Repairs

Discover how AI and spatial computing are turning complex hardware repairs into real-time, interactive experiences.

multimodal-aicomputer-visionhardware-engineeringindustrial-automationaugmented-reality

#749: Breaking the Fourth Wall: Moving to Real-Time AI Audio

Can AI podcasts move from polished scripts to raw, real-time conversation? Explore the technical and financial shift to live multimodal models.

large-language-modelsarchitecturemultimodal-ai

#132: Can AI Map Your House Just by Looking Around?

Discover how spatial-temporal tokenization and 3D world modeling are revolutionizing real-time video-to-video AI interaction.

video-aimultimodal-aireal-time-videospatial-temporal-tokenizationslam

#64: AI's Senses: Seeing, Hearing, Understanding

AI is evolving beyond text, learning to see, hear, and understand our world. Discover the future of human-AI interaction!

multimodal-aiai-sensescomputer-visionaudio-processingdata-integration

#54: Tokenizing Everything: How Omnimodal AI Handles Any Input

Omnimodal AI: How do models process images, audio, video, and text all at once? Discover the engineering behind AI that accepts anything.

omnimodal-aitokenizationai-modelsmultimodal-aidata-types

#53: Instructional vs. Conversational AI: The Distinction Nobody Talks About

Instructional vs. conversational AI: a crucial distinction reshaping how AI is built. Discover why it matters for the future of AI development.

instructional-aiconversational-aiai-modelsai-trainingmultimodal-ai

#46: Pixels, Prompts & Pseudo-Text: AI's Word Problem

AI paints stunning images, but can't spell "cat." Why do advanced models struggle with simple text? Dive into AI's weird word problem!

image-generationpseudo-texttext-in-imagesmultimodal-ailanguage-models