← All Tags

#computer-vision

19 episodes

#2825: The Patient Who Filmed His Own Bloating

How to set up cameras, markers, and time-lapse to capture abdominal distension for clinical or AI analysis.

computer-visiondigestive-healthpost-cholecystectomy-syndrome

#2688: Intelligent Frame Extraction for Multimodal AI

Use multimodal AI and smart frame extraction to turn a walk-through video into an actionable decluttering plan.

multimodal-aicomputer-visionprompt-engineering

#2668: When a Flamethrower Is Overkill

Tesseract, EasyOCR, or a cloud vision model? How to build a fast, reliable label scanner for real-world conditions.

computer-visionedge-computinglatency

#2657: When Puppeteers Stopped Hiding

Background removal isn't magic — it's multiple AI systems working in sequence. Here's what's actually happening under the hood.

image-generationcomputer-visiongenerative-ai

#2546: The Invisible Engineering Behind a Single Click

The technical stack behind click-to-edit features in tools like Canva and Google Photos — from segmentation to inpainting.

image-generationcomputer-visiongenerative-ai

#2539: When Does AI Stop Hallucinating and Start Reconstructing?

What happens when you feed hundreds of photos into an AI world generator — do you capture reality or just a convincing dream?

urban-planningcultural-biascomputer-vision

#2352: The Structured Output Gap in Vision APIs

How do object detection APIs like Gemini, AWS Rekognition, and YOLO compare for automated annotation workflows?

computer-visionapi-integrationbenchmarks

#2325: Why Depth Is the Hardest Thing for AI to See

Can AI turn your apartment photos into a precise 3D model? Explore the tech behind photogrammetry and spatial reconstruction.

spatial-audiocomputer-visiondigital-twins

#2089: Open-Source vs. Military ATR: The Drone Recognition Gap

A public GitHub model spotted by a listener reveals the massive gap between hobbyist AI and lethal military drone detection systems.

computer-visionmilitary-strategyai-agents

#1964: The Three Layers That Make AR Finally Work

See a 3D arrow pointing to the exact bolt you need, or read a street sign in real-time translation.

multimodal-aiaugmented-realitycomputer-vision

#1963: RPA: Dead or Just Getting Smart?

Traditional RPA is brittle and blind. See how AI vision and agentic orchestration are turning it into a self-healing powerhouse.

ai-agentslegacy-systemscomputer-vision

#1962: Moravec's Paradox: Why Robots Can Write Poetry but Can't Fold a Fitted Sheet

We explore the tech letting robots "reason" about physical tasks using vision-language-action models.

ai-agentscomputer-visionreasoning-models

#1855: When AI Makes Game Assets, Who Owns the Art?

From blocky polygons to photorealistic assets, AI is transforming how 3D models are made.

generative-aigaussian-splattingcomputer-vision

#1817: The Hidden Taxonomy of AI: Why Specialized Models Outperform Giants

Explore the vast ecosystem of niche AI models for computer vision and document understanding, far beyond large language models.

computer-visionragai-models

#1799: The Original AI Blueprints: BERT & CLIP

Before GPT, two models changed everything. Discover how BERT and CLIP taught machines to read and see the world.

transformersai-historycomputer-vision

#1541: Why Your Phone Beats Your PC at Video

Explore why mobile devices handle real-time video AI better than desktops and how the NPU gap is finally closing in 2026.

npuedge-computingcomputer-vision

#769: When Manuals Learn to See in 3D

Discover how AI and spatial computing are turning complex hardware repairs into real-time, interactive experiences.

multimodal-aicomputer-visionhardware-engineeringindustrial-automationaugmented-reality

#768: The Missing Nail: When Tiny Parts Stop Big Projects

From tiny laptop screws to industrial rivnuts, discover why the smallest components are often the biggest hurdles in any DIY project.

structural-engineeringcomputer-visionhardware-standards

#64: How AI Learns to See, Hear, and Think Together

AI is evolving beyond text, learning to see, hear, and understand our world. Discover the future of human-AI interaction!

multimodal-aiai-sensescomputer-visionaudio-processingdata-integration