#computer-vision
19 episodes
#2825: The Patient Who Filmed His Own Bloating
How to set up cameras, markers, and time-lapse to capture abdominal distension for clinical or AI analysis.
#2688: Intelligent Frame Extraction for Multimodal AI
Use multimodal AI and smart frame extraction to turn a walk-through video into an actionable decluttering plan.
#2668: When a Flamethrower Is Overkill
Tesseract, EasyOCR, or a cloud vision model? How to build a fast, reliable label scanner for real-world conditions.
#2657: When Puppeteers Stopped Hiding
Background removal isn't magic — it's multiple AI systems working in sequence. Here's what's actually happening under the hood.
#2546: The Invisible Engineering Behind a Single Click
The technical stack behind click-to-edit features in tools like Canva and Google Photos — from segmentation to inpainting.
#2539: When Does AI Stop Hallucinating and Start Reconstructing?
What happens when you feed hundreds of photos into an AI world generator — do you capture reality or just a convincing dream?
#2352: The Structured Output Gap in Vision APIs
How do object detection APIs like Gemini, AWS Rekognition, and YOLO compare for automated annotation workflows?
#2325: Why Depth Is the Hardest Thing for AI to See
Can AI turn your apartment photos into a precise 3D model? Explore the tech behind photogrammetry and spatial reconstruction.
#2089: Open-Source vs. Military ATR: The Drone Recognition Gap
A public GitHub model spotted by a listener reveals the massive gap between hobbyist AI and lethal military drone detection systems.
#1964: The Three Layers That Make AR Finally Work
See a 3D arrow pointing to the exact bolt you need, or read a street sign in real-time translation.
#1963: RPA: Dead or Just Getting Smart?
Traditional RPA is brittle and blind. See how AI vision and agentic orchestration are turning it into a self-healing powerhouse.
#1962: Moravec's Paradox: Why Robots Can Write Poetry but Can't Fold a Fitted Sheet
We explore the tech letting robots "reason" about physical tasks using vision-language-action models.
#1855: When AI Makes Game Assets, Who Owns the Art?
From blocky polygons to photorealistic assets, AI is transforming how 3D models are made.
#1817: The Hidden Taxonomy of AI: Why Specialized Models Outperform Giants
Explore the vast ecosystem of niche AI models for computer vision and document understanding, far beyond large language models.
#1799: The Original AI Blueprints: BERT & CLIP
Before GPT, two models changed everything. Discover how BERT and CLIP taught machines to read and see the world.
#1541: Why Your Phone Beats Your PC at Video
Explore why mobile devices handle real-time video AI better than desktops and how the NPU gap is finally closing in 2026.
#769: When Manuals Learn to See in 3D
Discover how AI and spatial computing are turning complex hardware repairs into real-time, interactive experiences.
#768: The Missing Nail: When Tiny Parts Stop Big Projects
From tiny laptop screws to industrial rivnuts, discover why the smallest components are often the biggest hurdles in any DIY project.
#64: How AI Learns to See, Hear, and Think Together
AI is evolving beyond text, learning to see, hear, and understand our world. Discover the future of human-AI interaction!