#transformers

23 episodes

May 3

#2622: How Transformers Actually Work: Attention, Tokens, and Context

How one architectural change unlocked chatbots, image generation, and protein folding — explained without the jargon.

transformerslarge-language-modelsgpu-acceleration

Apr 25

#2408: How Backpropagation Actually Unlocks Neural Networks

How error signals flow backward through networks to make learning possible — and why "it's just calculus" misses the point.

transformersai-trainingai-history

Apr 22

#2374: How Granular Can MoE Experts Get?

Exploring the limits of expert granularity in Mixture of Experts models—how narrow can segmentation go before efficiency or accuracy suffers?

large-language-modelstransformersai-models

Apr 21

#2366: Why LLMs Forget the Middle of Long Conversations

Why do large language models struggle with the middle of long conversations? Explore the science behind attention dilution and practical fixes.

transformerscontext-windowmodel-collapse

Apr 20

#2350: NVIDIA's Strategic Pivot: From Chipmaker to Model Builder

Dive into NVIDIA’s Nemotron 3 Super, a hybrid MoE model combining Mamba, Transformers, and multi-token prediction for cutting-edge efficiency.

transformerslatent-spaceai-models

Apr 20

#2348: Diffusion Models Take on Text Generation

Explore Inception Labs’ Mercury 2, a groundbreaking diffusion-based language model that rethinks text generation and reasoning.

transformersparallel-computingvoice-first

Apr 6

#2066: The Transformer Trinity: Why Three Architectures Rule AI

Why did decoder-only models like GPT dominate AI, while encoders and encoder-decoders still hold critical niches?

transformersai-modelslarge-language-models

Apr 6

#2062: How Transformers Learn Word Order: From Sine Waves to RoPE

Transformers can’t see word order by default. Here’s how positional encoding fixes that—from sine waves to RoPE and massive context windows.

transformerscontext-windowlarge-language-models

Apr 6

#2061: The Memory Bottleneck That Drives Attention Design

Attention is the engine of modern AI, but it’s also a memory hog. Here’s how MQA, GQA, and MLA evolved to fix it.

transformersai-modelsattention-mechanisms

Apr 5

#2056: Music as Language: The Architecture Behind AI Song Generation

A look at how AI music models use audio tokens, transformers, and diffusion to turn text into songs.

audio-processingtransformersgenerative-ai

Mar 31

#1799: The Original AI Blueprints: BERT & CLIP

Before GPT, two models changed everything. Discover how BERT and CLIP taught machines to read and see the world.

transformersai-historycomputer-vision

Mar 28

#1679: Efficiency Over Scale: How Export Controls Forced a Smarter AI

DeepSeek and MiMo are topping developer charts, but they're not just cheaper clones. Here's why their design philosophy is fundamentally different.

ai-modelstransformerslocal-ai

Mar 28

#1666: The Agent Mesh: Shared Context That Changes Everything

Grok 4.20’s native multi-agent architecture cuts token costs by 75% and enables real-time cross-agent reasoning.

ai-agentstransformersrag

Mar 28

#1633: Can a Character Actor Model Beat a Generalist?

We grill MiniMax M2.7 to see if a model built for "virtual companions" can actually handle high-level comedy and complex character logic.

ai-agentsai-reasoningtransformers

Mar 28

#1632: Agent Interview: DeepSeek V three point two

We interview DeepSeek V3 to see if this open-weight powerhouse can handle weird podcast prompts better than big tech’s flagship models.

ai-agentsopen-source-aitransformers

Mar 27

#1604: The $3 Billion Stealth Giant: AI21 Labs & Nvidia

Why is Nvidia eyeing a $3B deal for AI21 Labs? Discover the tech behind the "OpenAI of Israel" and their revolutionary hybrid architecture.

large-language-modelsstate-space-modelstransformers

Mar 25

#1547: Why AI Stopped Reading and Started Seeing Everything

From sequential bottlenecks to parallel powerhouses, discover how the Transformer architecture revolutionized how machines process the world.

transformersai-historyparallel-computing

Mar 11

#1108: Beyond the Emoji: How Hugging Face Conquered AI

Discover how a quirky chatbot company became the central nervous system of AI, hosting millions of models and standardizing the entire industry.

open-sourceai-modelstransformers

Jan 2

#135: Is OCR Dead? How Vision AI Is Redefining Text Extraction

Are specialized OCR tools obsolete? Herman and Corn explore how Vision Language Models are revolutionizing the way we turn images into data.

ocrvision-aivlmoptical-character-recognitiontext-extraction

Jan 1

#126: The Cocktail Party Problem: Why AI Forgets

Why do AI models "lose the plot" after a few thousand words? Discover the mechanics of attention and the innovations solving context window limits.

attention-mechanismcontext-windowquadratic-scalingmambaring-attention

Dec 5

#26: Fine-Tuning AI to Understand Your Voice

Voice typing is changing everything. Join us as we explore the revolution of personalizing Whisper!

speech-recognitionfine-tuningtransformers

Dec 4

#19: AI Images: The Jigsaw Beneath the Magic

Beyond the prompt, discover the intricate 'jigsaw puzzle' of AI image generation. Uncover the magic's true workings.

transformersdiffusion-modelslatent-space

Nov 28

#12: The AI Breakthrough: Transformers & The Perfect Storm

AI's everywhere. How did chatbots, art, and video all emerge so suddenly? The secret lies in Transformers and a perfect storm.

transformersfine-tuninggpu-acceleration

#2622: How Transformers Actually Work: Attention, Tokens, and Context

#2408: How Backpropagation Actually Unlocks Neural Networks

#2374: How Granular Can MoE Experts Get?

#2366: Why LLMs Forget the Middle of Long Conversations

#2350: NVIDIA's Strategic Pivot: From Chipmaker to Model Builder

#2348: Diffusion Models Take on Text Generation

#2066: The Transformer Trinity: Why Three Architectures Rule AI

#2062: How Transformers Learn Word Order: From Sine Waves to RoPE

#2061: The Memory Bottleneck That Drives Attention Design

#2056: Music as Language: The Architecture Behind AI Song Generation

#1799: The Original AI Blueprints: BERT & CLIP

#1679: Efficiency Over Scale: How Export Controls Forced a Smarter AI

#1666: The Agent Mesh: Shared Context That Changes Everything

#1633: Can a Character Actor Model Beat a Generalist?

#1632: Agent Interview: DeepSeek V three point two

#1604: The $3 Billion Stealth Giant: AI21 Labs & Nvidia

#1547: Why AI Stopped Reading and Started Seeing Everything

#1108: Beyond the Emoji: How Hugging Face Conquered AI

#135: Is OCR Dead? How Vision AI Is Redefining Text Extraction

#126: The Cocktail Party Problem: Why AI Forgets

#26: Fine-Tuning AI to Understand Your Voice

#19: AI Images: The Jigsaw Beneath the Magic

#12: The AI Breakthrough: Transformers & The Perfect Storm

Related Topics