#1199: AlphaFold 3: The New Search Engine for Biology

From garage-made vaccines to 200 million protein structures, AlphaFold is turning the building blocks of life into a software problem.

0:000:00

Episode Details

Published: Mar 15
Duration: 22:12
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
LLM
Topics: drug-discovery generative-chemistry ai-safety

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

For over fifty years, the "protein folding problem" stood as one of the greatest challenges in the biological sciences. The mystery was rooted in Levinthal’s paradox: the idea that a single protein has so many potential configurations that it would take longer than the age of the universe to find the correct one through random sampling. Yet, in nature, these strings of amino acids fold into functional shapes in microseconds. Understanding this process is critical because a protein's shape dictates its function; even a tiny structural error can lead to disease or toxicity.

The Shift to Digital Biology

The arrival of AlphaFold has transformed this "Everest" of biology into a solved problem. We have moved from a world of labor-intensive X-ray crystallography and cryo-electron microscopy—methods that are slow, expensive, and often fail—to a world of high-speed digital prediction. While early versions of AlphaFold focused solely on protein structures, AlphaFold 3 has expanded the horizon. It now models the interactions between proteins, DNA, RNA, and small molecules (ligands), effectively acting as a search engine for the entire biological system.

This democratization of high-level science is already manifesting in radical ways. With the barrier to entry dropping, individuals are beginning to experiment outside of traditional institutional labs. Cases have emerged of people using these models to design custom mRNA sequences and vaccines for pets in home settings. By identifying the most stable and accessible parts of a protein on a screen, the process of vaccine design has shifted from a series of physical "darts in the dark" to a precise software engineering task.

The Intelligence Behind the Fold

The technical breakthrough of AlphaFold lies in its Evoformer architecture. Rather than relying solely on the raw physics of atoms, the model utilizes an attention mechanism to analyze evolutionary data. By looking at how protein families have changed over millions of years, the AI identifies which amino acids must remain in contact to maintain function. It essentially treats evolution as a massive, pre-run experiment, learning the "cheat codes" nature has already established.

Furthermore, the latest iterations utilize diffusion-based approaches—the same technology behind modern AI image generators. By starting with a disordered cloud of atoms and gradually refining them into a high-resolution structure, the model can handle complex molecules that lack a clear evolutionary history, such as synthetic drugs or unique DNA sequences.

From Discovery to Design

The implications for the pharmaceutical industry are immense. The traditional cost-to-discovery curve is collapsing as "in silico" screening becomes the default. Researchers can now virtually test thousands of drug candidates against a target protein, identifying perfect fits and potential side effects before ever entering a physical lab. This paves the way for precision medicine, where treatments can be designed for a specific individual's genetic mutations.

However, this power comes with significant risks. As a dual-use technology, the same tools used to design life-saving enzymes or plastic-eating proteins could theoretically be used to engineer harmful toxins or more transmissible viruses. Because these models are widely available, the primary barrier to biological design is no longer specialized knowledge, but simply compute power.

As we move forward, the focus is shifting from merely copying nature to "de novo" design—creating entirely new proteins that have never existed in the wild. We are no longer just reading the map of the protein universe; we are beginning to write our own biological code.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Episode #1199: AlphaFold 3: The New Search Engine for Biology

Daniel's Prompt

Custom topic: Let us do a deep dive on protein folding - specifically exploring alphafold from deep mind. Discuss the technical importance of predicting protein folding from molecules in drug discovery. We can incl

You know, Herman, I was thinking recently about the concept of a solved problem. In most fields, when a major problem is solved, it kind of moves to the background. We do not think about how to build a bridge that does not fall down anymore; we just apply the engineering principles and build it. But in biology, solving a fifty-year grand challenge seems to have done the exact opposite. It has not closed the book; it has opened up a thousand new doors that are frankly a bit terrifying and exhilarating at the same time.

Herman Poppleberry here, and you are hitting on the exact tension that defines structural biology right now. For decades, the protein folding problem was the Everest of the field. It was this looming, impossible peak that everyone agreed was the key to understanding life, but nobody could quite climb it. Then along comes AlphaFold, and suddenly, it is like we have all been given a high-speed gondola to the summit. But now that we are at the top, we are realizing the view is a lot more complicated than we imagined.

It is a massive shift, and today's prompt from Daniel is about AlphaFold and how this technology is moving from the elite labs into the hands of, well, everyone. Daniel pointed us toward a particularly wild case of a dog owner who used AlphaFold to help develop a custom mRNA vaccine for their pet in a garage. It sounds like science fiction, or maybe just a very high-stakes hobby, but it represents a fundamental change in how we interact with the building blocks of life.

The democratization of this is what really gets me. We are talking about a transition from protein folding being a task that required a five-year doctoral project and a million-dollar crystallography lab to something you can run on a laptop while you are sitting in a coffee shop. To understand why that matters, we have to look at what proteins actually are. They are the workhorses of the body. Everything from the way you digest your breakfast to the way your neurons fire depends on the specific three-dimensional shape of a protein.

And the shape is everything. I have heard it described as a key and a lock, but it is more like a piece of complex biological origami. If the fold is off by even a few angstroms, the protein does not work, or worse, it becomes toxic.

The scale of the complexity is what really trips people up. There is this thing called Levinthal’s paradox, which was proposed back in the late nineteen sixties. Cyrus Levinthal pointed out that if a protein tried to find its correct shape by just randomly sampling all possible configurations, it would take longer than the age of the universe. We are talking about ten to the power of three hundred possible shapes for a relatively small protein. Yet, in your body, these strings of amino acids fold into their functional shapes in microseconds.

So the paradox is that nature has a cheat code, and for fifty years, we were trying to figure out what that code was. Before AlphaFold, how were we actually doing this? It was mostly trial and error, right?

It was incredibly labor-intensive. You had X-ray crystallography, where you basically had to coax a protein into forming a crystal, which is notoriously difficult, and then hit it with radiation to see how it scattered. Or you had cryo-electron microscopy. These methods are brilliant, but they are slow and expensive. We only knew the structures of a tiny fraction of the proteins known to science. Most of biology was, and in many ways still is, a dark room where we are feeling around the furniture.

Then DeepMind enters the room and turns on a flashlight. But it was not just a better version of what we had. It was a fundamental shift in how we approach the prediction itself. When AlphaFold two came out, it blew the competition away at the Critical Assessment of Structure Prediction. But I think the real leap happened when we moved into the AlphaFold three era, which is what Daniel is really nudging us toward.

AlphaFold three is where the search engine for biology analogy really starts to stick. The earlier versions were focused almost exclusively on proteins. You give it a sequence of amino acids, and it tells you the shape. But life is not just proteins. Life is proteins interacting with DNA, with RNA, and with small molecules called ligands. AlphaFold three can predict those interactions with incredible accuracy. It is modeling the entire biological system, not just the individual parts.

This brings us to the dog owner Daniel mentioned. Let us break that down, because it sounds like the ultimate edge case. You have a person whose dog is sick, presumably with a condition that current veterinary medicine is not solving. They use AlphaFold to model a specific viral protein or perhaps a tumor-associated antigen. They identify a target, and then they use that data to design an mRNA sequence.

It is a remarkable example of what happens when the barrier to entry for high-level science drops to near zero. In the past, if you wanted to design a vaccine, you needed a deep understanding of the protein's surface. You needed to know where the immune system could actually grab onto it. If you do not have the structure, you are just guessing. You are throwing darts in the dark. But with AlphaFold, this individual could see the three-dimensional landscape of the target protein on their screen. They could identify the most stable, most accessible part of that protein to target with a vaccine.

And because of the mRNA revolution, which we talked about way back in episode four hundred ninety-one, the manufacturing part has also become somewhat modular. Once you have the digital blueprint of the protein from AlphaFold, you can translate that into an mRNA sequence. It is essentially turning biology into a software problem.

There is a catch, though, and I think we need to be careful here. While AlphaFold is incredibly accurate, it is still a predictive model. It provides a high-confidence hypothesis. In a professional lab, you would take that prediction and then validate it with experiments. When you move that into a garage setting, you are skipping the validation steps that usually take years. The gap between a digital design and a biological reality is still a massive hurdle.

I wonder about the technical mechanism that makes this possible. I was reading about the Evoformer architecture in AlphaFold. It seems like the secret sauce is how the model handles evolutionary data. It is not just looking at the physics of the atoms; it is looking at the history of how that protein has changed over millions of years across different species.

That is the brilliant part of the design. The Evoformer uses an attention mechanism, similar to what you see in large language models, but it applies it to multiple sequence alignments. It looks at a whole family of related proteins and says, okay, if this amino acid at position fifty changed, did the amino acid at position one hundred also change to maintain the connection? It is using evolution as a giant, multi-million-year experiment that has already solved the physics for us. The model just has to learn how to read the results of that experiment.

It is like the model is saying, I do not need to calculate the exact electrostatic force between every atom from scratch because I can see that nature has consistently kept these two parts together for three hundred million years. Therefore, they must be touching.

It is bypassing the brute-force physics by using the biological context. And AlphaFold three takes that even further by using a diffusion-based approach, similar to how image generators like Midjourney work. It starts with a blurry, disordered mess of atoms and gradually refines them into a high-resolution structure. This allows it to be much more flexible when it deals with things like DNA or complex drug molecules that do not have the same evolutionary history that proteins do.

So, if I am a researcher at a big pharmaceutical company, my world just changed. Instead of spending five years and a hundred million dollars to find a single lead compound, I can now screen millions of candidates in a simulation.

The cost-to-discovery curve is basically collapsing. We are moving toward a world where in silico screening is the default. You can take a target protein that you have modeled in AlphaFold, and then you can virtually test how ten thousand different drug molecules would bind to it. You can see which ones fit perfectly and which ones have clunky interactions that might cause side effects. You are failing fast and failing cheap in a digital environment before you ever pick up a pipette.

This connects back to what we discussed in episode six hundred ninety about precision medicine. If we can model these things so quickly, we stop looking for the one-size-fits-all drug. We can start looking for the drug that fits your specific version of a protein. If you have a genetic mutation that changes the shape of a key enzyme, we could theoretically model your specific enzyme and design a molecule to fix it.

That is the dream of boutique medicine. But it also raises some really thorny questions about regulation and safety. If a dog owner can do this in a garage, what is stopping a bad actor from using the same tools to design something harmful? The same technology that lets you model a vaccine target lets you model a way to make a toxin more stable or a virus more transmissible.

It is the ultimate dual-use technology. And because it is open source, or at least the models are widely available, you cannot really put the genie back in the bottle. The barrier to entry for biological design is no longer the knowledge or the equipment; it is just the compute power. And as we know from our friends at Modal, compute is becoming a commodity.

I think we have to talk about the limitations, too, because I do not want people to think AlphaFold is a magic wand that has solved biology. One of the big issues is that proteins are not static. They are dynamic, breathing machines. They change shape when they bind to things. AlphaFold is very good at giving us a snapshot, but it is not as good at showing us the movie of how the protein moves and functions in a crowded cellular environment.

Right, it is like having a photo of a car versus having the blueprints for the engine while it is running at four thousand revolutions per minute. You can see the parts, but you might not understand the timing or the fluid dynamics.

There is also the problem of disordered proteins. A huge chunk of the human proteome consists of proteins that do not have a fixed shape until they interact with something else. They are like pieces of wet spaghetti. AlphaFold tends to struggle with those because there is no single correct structure to predict. It will give you a low-confidence score, which is helpful because it tells you something is fuzzy, but it does not solve the underlying mystery of how those proteins actually function.

Still, the progress is staggering. The AlphaFold Protein Structure Database now has over two hundred million predicted structures. That is nearly every known protein on the planet. We have gone from knowing almost nothing to having a rough map of the entire protein universe in just a few years.

And that map is being used in ways we never expected. Researchers are using it to design new enzymes that can break down plastic in the ocean. Others are using it to create de novo proteins, which are proteins that have never existed in nature, designed from scratch to perform a specific task, like a biological sensor or a new kind of material.

That de novo design part is where my mind really starts to melt. We are not just copying nature anymore; we are starting to write our own biological code. If we understand the rules of folding, we can build whatever we want.

It is a transition from discovery to engineering. For the last two hundred years, biology has been a descriptive science. We observe what is there and try to categorize it. Now, we are entering the era of constructive biology. We are the architects now.

I want to go back to the garage vaccine for a second. If you were that dog owner, how would you actually use this information? You get your AlphaFold structure, you see your target, you design your mRNA. Then what? You still have to get it synthesized, right?

You can actually order custom DNA or RNA sequences online. There are companies where you just upload a text file of the genetic code, and they mail you a vial of the material. The missing piece for a garage setup is usually the delivery mechanism, like the lipid nanoparticles that protect the mRNA and help it get into the cells. That part is still a bit of a dark art, but even that information is becoming more accessible.

It is a wild world where the most sophisticated technology on the planet is being used for a very personal, very small-scale project. It is the ultimate decentralization. But it also makes me wonder about the role of the big institutions. If the research can happen anywhere, what happens to the massive pharmaceutical labs and the university departments?

They have to pivot. Their value is no longer in having the map; everyone has the map now. Their value has to be in the validation, the clinical trials, and the complex manufacturing. They have to move up the stack. But I think we will see a lot more boutique firms and individual researchers making major breakthroughs. The next great drug might not come from a multi-billion-dollar lab; it might come from a small team using open-source tools and rented GPUs.

It reminds me of the early days of personal computing. You had these giant mainframes that only governments and big corporations could afford, and then suddenly, you had people building Apples in their garages. This feels like the Apple one moment for biotechnology.

I agree, and I think the implications for global health are massive. Think about neglected tropical diseases. These are diseases that big pharma often ignores because there is no profit in them. But a researcher in a developing country now has the same tools as a scientist at Harvard. They can model the proteins of a local parasite and start designing treatments without needing a massive infrastructure.

That is the optimistic side of the coin. The ability to tackle problems that are currently ignored because they are not economically viable. But I do keep coming back to that cheeky edge of yours, Herman. You mentioned earlier that AlphaFold is a hypothesis. How often is the hypothesis wrong?

It is wrong more often than people like to admit, especially when it comes to the fine details of how a drug molecule sits in a binding pocket. A shift of one or two angstroms can be the difference between a miracle cure and a useless molecule. AlphaFold three has improved this significantly, but we are still not at the point where you can trust the digital model one hundred percent. You still need the wet lab. You still need to see if the protein actually behaves the way the computer says it will.

So we are not at the point where we can just print a vaccine and be sure it works. We are at the point where we can narrow down the search from a billion possibilities to maybe ten.

And that is a massive acceleration. If you only have to test ten things instead of a billion, you are moving a thousand times faster. That is where the revolution is. It is in the narrowing of the search space.

I also think about the AI literacy aspect of this. If you are a biologist today and you do not know how to use these tools, are you even a biologist? Or are you like a mathematician who refuses to use a calculator?

It is becoming a foundational skill. But it is not just about knowing how to run the model; it is about knowing how to interpret the results. You have to look at the confidence scores, the pLDDT scores as they are called in AlphaFold. You have to understand the underlying architecture enough to know when the model might be hallucinating a structure because it does not have enough evolutionary data to work with.

We talked about AI hallucinations in the context of language models in episode one thousand ninety-seven. Does that happen in AlphaFold too? Will it just make up a shape that looks plausible but is physically impossible?

It is less likely to produce something physically impossible because the model is constrained by the laws of chemistry. But it can certainly produce something that is plausible but wrong. It might predict a very rigid structure for a part of a protein that is actually very flexible. Or it might miss a crucial interaction that only happens when a specific ion like zinc or magnesium is present. The model is getting better at including those ions, but it is still an area where it can trip up.

It is interesting that we are using the same basic technology, the transformer architecture, to write poetry and to fold proteins. It suggests that there is a deep grammar to biology, just like there is a grammar to language.

That is a profound way to look at it. Amino acids are the letters, proteins are the words, and the three-dimensional structure is the meaning. AlphaFold is essentially a translation engine that turns the text of the genome into the physical reality of the body. And just like with a language translation, you can get the gist of it perfectly while still missing the subtle nuances and the metaphors.

So, for the listeners who want to dive into this, where do they even start? Is this something a layperson can actually play with?

You can actually go to the AlphaFold Protein Structure Database right now and look up almost any protein you can think of. It is a public resource. If you want to run your own predictions, there are versions like ColabFold that run in a Google Colab notebook. You just paste in your amino acid sequence, and it uses a cloud GPU to fold it for you. It is remarkably accessible.

It is wild that we live in a time where that is a sentence you can say. Paste in your sequence and fold it in the cloud. I think the takeaway for me is that we are witnessing the birth of a new kind of literacy. It is not just about reading and writing code; it is about reading and writing life.

And that brings us back to the ethical question. As this becomes more accessible, do we need more oversight? Should there be a registry for who is ordering what DNA sequences? Some people argue that we need a global monitoring system for any large-scale synthesis of genetic material.

It is a tough balance. You do not want to stifle the person in their garage who might find a cure for their dog or for a rare disease. But you also do not want to make it easy for someone to cause harm. We have seen this tension in the software world for years with open-source security tools. The same tool that helps you find a bug in your system helps a hacker exploit it.

The difference is that a software bug does not usually cause a pandemic. The stakes in biology are just fundamentally different. I think we are going to see a lot of debate in the next few years about how to regulate the intersection of AI and biotech. There was a recent report from the Bipartisan Commission on Biodefense that specifically looked at how AI models could lower the barrier to creating biological weapons.

It is a heavy thought to end on, but I think it is the reality of the situation. We have been given this incredible power, and now we have to figure out how to be responsible with it. The dog owner in the garage is a beautiful story of empowerment, but it is also a signal that the old rules no longer apply.

We are moving from a world of gatekeepers to a world of players. And the game is life itself. It is the most complex, most beautiful, and most dangerous game we have ever played.

Well, on that note, I think we have covered a lot of ground today. From Levinthal’s paradox to the Evoformer, to the ethical minefield of garage biotech. It is clear that AlphaFold is not just a tool; it is a turning point in our history.

It really is. And I am excited to see where it goes, even if it is a little bit scary. The potential for good is just too vast to ignore. We are talking about curing diseases that have plagued humanity for thousands of years. We are talking about cleaning up the planet and creating sustainable new materials. It is a good time to be a nerd.

It is always a good time to be a nerd, Herman. Thanks as always to our producer Hilbert Flumingtop for keeping the gears turning behind the scenes. And a big thanks to Modal for providing the GPU credits that power this show and allow us to dive into these technical topics.

This has been My Weird Prompts. If you are enjoying the show, a quick review on your podcast app helps us reach new listeners and keeps us motivated to keep digging into these strange corners of technology and science.

We will be back next time with another prompt from Daniel. Until then, keep asking the weird questions.

Goodbye everyone.

See you.

Wait, Corn, I forgot to mention the AlphaFold three paper in Nature. People should really read that if they want the full technical breakdown on the diffusion model.

We can put that in the show notes, Herman. Let them have a break from the math for a minute.

Fair enough. Goodbye for real this time.

Bye.

Poppleberry out.

You have to stop saying that.

I will think about it.

He will not. Thanks for listening.

Check out the website for the full archive.

Okay, now we are really done.

Done.

Alright.

Yes.

Goodbye.