1 episode
Attention is the engine of modern AI, but it’s also a memory hog. Here’s how MQA, GQA, and MLA evolved to fix it.