How LLMs Actually Work: Inside the Models That Changed Everything

The most important technology of the 2020s is also one of the least understood. Large language models power everything from coding assistants to medical research tools, yet most people using them have no idea what’s actually happening inside. These fourteen episodes build a working mental model — from the decades of research that made modern AI possible to the fundamental limitations that still can’t be trained away.

The History That Made It Possible

  • AI: Not an Overnight Success Story is the essential corrective to the narrative that AI appeared from nowhere in 2022. The episode traced the decades of neural network research, failed AI winters, and incremental advances that accumulated before the modern era — explaining why the breakthroughs happened when they did, and what conditions had to be true simultaneously for them to occur.

  • The Heavy Metal of Machine Learning: Inside PyTorch zoomed in on the software infrastructure that made modern AI research possible. PyTorch’s dynamic computation graphs, automatic differentiation, and research-friendly design didn’t just make training neural networks easier — they changed the pace of experimentation across the entire field. The episode covered the project’s origins at Facebook AI Research, its technical architecture, and why it became the dominant research framework despite TensorFlow’s head start.

How Models Are Built and Trained

  • How Does Fine Tuning Work Anyway? explained the fundamental technique that turns a generic pretrained model into a useful specialized one. Rather than training from scratch — which requires billions of dollars of compute — fine-tuning starts from an existing model’s learned representations and adapts them for a specific task or domain. The episode explained the mechanics, the data requirements, and the tradeoffs between different fine-tuning approaches.

  • Building an AI Model from Scratch: The Hidden Costs examined what’s actually involved when a company decides to train a foundation model from the ground up. The compute costs are the obvious part; the data acquisition, curation, labeling, evaluation infrastructure, and engineering team costs are often larger. The episode broke down where the money actually goes and why the economics favor a small number of very large players.

  • AI’s Blind Spot: Data, Bias & Common Crawl investigated the training data that shapes what models know and believe. Common Crawl — a massive web scrape that underlies most foundation model training — is not a neutral sample of human knowledge. It over-represents English, wealthy countries, and certain demographics, and under-represents languages, perspectives, and knowledge domains that are less visible online. The episode examined how this shapes model behavior and what the realistic options are for addressing it.

How Understanding Works

  • AI’s Secret Language: Vectors, Embeddings & Control pulled back the curtain on one of the most counterintuitive aspects of how language models work: they don’t understand words as words. Everything — text, images, audio, code — gets converted into numerical vectors in high-dimensional space, where meaning is represented as geometric relationships. The episode explained what embeddings actually are, how similarity search works, and why this approach enables capabilities that symbolic AI could never achieve.

  • From Keywords to Vectors: How AI Decodes Meaning traced the evolution from keyword matching (which treats “car” and “automobile” as completely different) to semantic understanding (which represents them as nearby points in vector space). This shift underlies everything from improved search to retrieval-augmented generation, and the episode explained the technical progression from word2vec through BERT to modern embedding models.

The Problems Models Can’t Escape

  • Why AI Lies: The Science of Digital Hallucinations explained why language models confidently state false things. LLMs are fundamentally next-token predictors, not truth machines — they generate plausible-sounding continuations of text, and sometimes plausible-sounding is not the same as accurate. The episode examined what’s happening mechanically when a model hallucinates, why the problem is hard to eliminate, and what detection and mitigation strategies actually work.

  • The Scaling Wall: Why Bigger AI Isn’t Always Smarter confronted the assumption that has driven AI investment for years: that more parameters and more data reliably produce better models. The episode examined the evidence that scaling returns are diminishing, the phenomenon of model collapse when training on AI-generated data, and what the plateau in benchmark performance improvements suggests about where the next gains will come from.

Model Design Decisions

  • The Price of Politeness: Should AI Guardrails Stay? examined one of the more heated debates in AI: the tradeoffs between safety filters and model capability. RLHF and constitutional AI training can make models safer and more aligned with human values — or they can produce models that refuse reasonable requests, give wishy-washy answers, and are less useful. The episode looked at the genuine tension between safety and utility, and at the market dynamics driving demand for uncensored models.

  • AI’s Secret: Decoding the .5 Updates decoded what actually changes in the incremental model updates that labs release. The naming conventions (GPT-4, 4.5, 4o) often obscure significant architectural or training changes, and the episode examined how to actually evaluate whether a new model version is meaningfully better for specific tasks — rather than relying on benchmark press releases.

The Competitive Landscape

  • The Benchmark Battle: Decoding the Rise of Chinese AI analyzed the emergence of Chinese AI labs as serious competitors. DeepSeek, Qwen, and others achieved benchmark results that matched or exceeded Western models at dramatically lower training costs — which raised questions about whether benchmark performance was being gamed, or whether the efficiency gap was genuinely closing. The episode examined the evidence for both interpretations.

  • The $5.5 Million Breakthrough: DeepSeek’s AI Disruption dove deeper into the DeepSeek moment specifically: a model that matched GPT-4-class performance trained for a fraction of the cost, using a mixture-of-experts architecture that activates only a subset of parameters per inference. The episode explained the technical choices that made this possible and what it means for the economics of the entire AI industry.

What Comes Next

  • Deep Think: The Rise of Deliberate AI Reasoning examined the move beyond pattern matching toward chain-of-thought reasoning, tree search, and models that can allocate more compute to hard problems at inference time. Systems like OpenAI’s o1 and DeepSeek-R1 represent a different approach to intelligence than scaling pure parameter counts — and the episode explained what deliberate reasoning actually does architecturally and where it outperforms standard generation.

These episodes won’t turn you into a machine learning researcher, but they will give you the conceptual scaffolding to understand why AI systems behave the way they do — and to evaluate claims about AI capabilities with something more than vibes.

Episodes in this playlist

February 2026
#650 When AI Thinks Longer, Not Bigger Explore how Gemini 3.0’s Deep Think mode shifts AI from "fast" reflexes to "deliberate" reasoning to solve complex quantum physics problems. Feb 16, 2026
January 2026
#170 How PyTorch Beat TensorFlow and Became AI's Backbone Discover why PyTorch is the "oxygen" of AI. Herman and Corn explore its history, the magic of Autograd, and the move to the PyTorch Foundation. Jan 5, 2026
December 2025
#118 AI in 2025: Is Small the New Big? If the cost is the same, should you always use the biggest AI model? Discover why smaller models often win on speed, steering, and accuracy. Dec 28, 2025
#117 From Keywords to Vectors: How AI Decodes Meaning Why can AI write poetry but struggle to find a file? Explore the history and math of semantic understanding with Herman and Corn. Dec 28, 2025
#107 The $5.5 Million Breakthrough: DeepSeek’s AI Disruption Discover how DeepSeek-V3 is disrupting the AI market with massive cost savings and technical innovations like Multi-Head Latent Attention. Dec 26, 2025
#92 Is AI Eating Its Own Trash? Is brute force the only path to AGI? Corn and Herman explore the limits of scaling, the risk of model collapse, and the future of world models. Dec 23, 2025
#86 The Price of Politeness: Should AI Guardrails Stay? Herman and Corn debate the hidden costs of AI safety layers and what happens when we strip away the "corporate HR" personality of LLMs. Dec 23, 2025
#85 When Probability Beats Truth: Why AI Must Lie Why do smart AI systems make up fake facts? Corn and Herman explore the "feature" of digital hallucinations and how to spot them. Dec 23, 2025
#56 The Thought Experiment Nobody Runs Building an AI model from scratch? It's a brutal reality of trillions of tokens and millions in GPUs. Discover the hidden costs of modern AI. Dec 11, 2025
#37 From Keywords to Meaning: How AI Understands You Unlock AI's secret language! Discover how vectors and embeddings create a "semantic galaxy" for true understanding and control. Dec 9, 2025
#23 Common Crawl's Cultural Blindspot Uncover the unseen influences shaping AI. We dive deep into training data, bias, and Common Crawl. Dec 5, 2025
November 2025
#13 AI: Not an Overnight Success Story AI's "overnight success" is a myth. Unravel the true story behind its rise, from humble beginnings to today's innovations. Nov 28, 2025
#14 AGI's Crossroads: Are LLMs a "Dead End" to True AI? Are LLMs a dead end for true AGI? We dive into the electrifying debate with AI's forefathers. Nov 28, 2025
#11 How Does Fine Tuning Work Anyway? Unlock the secrets of AI fine-tuning. Discover how your small dataset can shape a giant model. Nov 24, 2025