#1034: HPC vs. Scientific Computing: The Race for Exascale

Explore the massive scale of supercomputing, from the memory wall to liquid-cooled racks pushing the limits of physical simulation.

0:000:00

Episode Details

Published: Mar 8
Duration: 26:26
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
LLM
Topics: architecture networking high-performance-computing

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

While the terms "scientific computing" and "high performance computing" (HPC) are often used interchangeably, they represent two distinct sides of modern research. Scientific computing is the domain of the researcher—the "what." It involves using mathematical models and numerical analysis to simulate physical phenomena, such as fluid dynamics, molecular structures for drug discovery, or climate patterns. It is the practice of translating physical laws into algorithms.

High performance computing, on the other hand, is the "how." It is the engineering and infrastructure required to execute those massive models within a useful timeframe. A simulation that takes fifty years to run on a standard desktop is scientifically useless; HPC provides the scale to finish that same task in hours or days.

The Architecture of Orchestration

A supercomputer is not simply a faster version of a home computer. Instead, it is a "tightly coupled" system—a massive collection of individual computers tricked into functioning as a single machine. Unlike "high-throughput" computing, where many independent tasks (like rendering movie frames) run simultaneously, HPC workloads are interdependent. What happens on the first processor directly impacts the thousandth, requiring them to communicate millions of times per second.

Breaking the Memory Wall

One of the greatest hurdles in HPC is the "memory wall." While processor performance has historically grown by over 50% per year, memory latency—the time it takes to move data from RAM to the CPU—has improved much more slowly. This creates a bottleneck where the fastest chips in the world spend most of their time waiting for data.

To solve this, supercomputers use specialized interconnects like InfiniBand or proprietary fabrics like Slingshot. These systems utilize Remote Direct Memory Access (RDMA), allowing one node to reach into the memory of another without involving the central processor. This effectively turns a room full of servers into one giant, distributed pool of memory.

The Physicality of Power

The infrastructure required to house these machines is a feat of engineering in itself. Modern exascale systems, such as Frontier or Aurora, consume between 20 and 30 megawatts of power—enough to electricity a small city. This immense power consumption generates incredible heat, necessitating advanced liquid-cooling systems that move thousands of gallons of water per minute.

These physical constraints are why HPC requires dedicated research labs. Standard commercial data centers are designed for independent web requests, not the tightly coupled, high-heat density requirements of a physical simulation.

A Strategic Asset

The race for exascale computing—machines capable of a quintillion calculations per second—is a matter of national priority. Because the world no longer conducts live nuclear testing, supercomputers are the only way to ensure the safety and functionality of aging stockpiles through sub-atomic physics simulations. Beyond defense, these machines are the primary tools for tackling global challenges like climate change and pandemic response, making them the silent engines of modern progress.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Episode #1034: HPC vs. Scientific Computing: The Race for Exascale

Daniel's Prompt

Custom topic: what does scientific or high performance computing actually mean And why do we see dedicated high performance computing research labs and projects. is this basically where The foundations of supercomp

Hey everyone, welcome back to My Weird Prompts. I am Corn Poppleberry, and I am sitting here in our living room in Jerusalem with my brother, Herman. It is a beautiful evening here, and we are ready to tackle a topic that has been sitting in our inbox for a few weeks now.

Herman Poppleberry here, and I am particularly excited about this one. Our housemate Daniel sent us a message asking about something that I think most people have a very vague, cinematic idea of, but very few people actually understand the nuts and bolts of. He was asking about scientific computing and high performance computing, or H P C.

Right, because when most people hear those terms, they picture a dark room with blinking blue lights and maybe a guy in a lab coat looking at a screen with a bunch of scrolling green text. It is very Matrix-esque in the public imagination. But Daniel wanted to know what these terms actually mean in a functional sense. Why do we have these massive, dedicated research labs just to house these machines? And is this where the actual foundations of supercomputing are laid, or is it just a bigger version of the computer sitting on my desk?

That is such a great starting point because the answer is both yes and no, but mostly no. It is not just a bigger version of your desktop. It is a fundamental shift in architectural philosophy. We often fall into the trap of thinking that a supercomputer is just a linear upgrade, like going from a four-cylinder engine to an eight-cylinder engine. But in reality, it is more like the difference between a high-end sports car and a massive, coordinated fleet of thousands of trucks moving in perfect synchronization.

I love that analogy. It is not just about raw speed; it is about orchestration. And I think we should start by deconstructing the two terms, because people often use scientific computing and H P C interchangeably, but they represent two different sides of the same coin. One is the goal, and the other is the toolset.

Scientific computing is the domain of the researcher. It is the "what." It is about using mathematical models and numerical analysis to simulate physical phenomena. We are talking about things like computational fluid dynamics, climate modeling, molecular dynamics for drug discovery, or simulating the structural integrity of a new bridge design. It is the practice of turning a physical law, like the Navier-Stokes equations for fluid flow or the Schrödinger equation for quantum mechanics, into an algorithm that a machine can execute.

So it is the math. It is the model. It is the "how do we represent the real world in bits and bytes?" But then H P C, high performance computing, is the "how" in terms of infrastructure. It is the engineering required to actually run those massive models in a timeframe that is actually useful.

Right. If you try to run a high-resolution global climate model on your laptop, it might take fifty years to simulate one week of weather. By the time the simulation is done, the weather you were predicting is ancient history. H P C is the art of scaling that computation across thousands or even tens of thousands of processors so you can get that answer in hours or days. It is about managing the movement of data so that the processors aren't just sitting there twiddling their thumbs.

It is funny you mention that, because I think a common misconception is that a supercomputer is just a really fast computer. But in reality, it is a massive collection of computers that have been tricked into thinking they are one single machine. We call this a tightly coupled system.

That word, "coupled," is the secret sauce. You can have a cluster of a thousand computers connected by standard office ethernet, and you could call that a cluster. You could use it to render a Pixar movie, where each frame is independent. That is what we call high-throughput computing. But you could not call it an H P C system for most scientific workloads. In H P C, the tasks are interdependent. What happens on processor number one affects what needs to happen on processor number one thousand, and they need to talk to each other millions of times per second.

And that brings us to the first major technical hurdle, which is the memory wall. We have talked about hardware limits before, specifically in episode six hundred sixty-three when we looked at the cost of workstation power, but H P C takes that to a whole different level. Herman, explain the memory wall, because I think it explains why these machines look so weird and why they cost hundreds of millions of dollars.

The memory wall is the existential crisis of modern computing. Since the nineteen eighties, processor performance has grown at roughly fifty-five percent per year. We have gotten incredibly good at making chips that can do math really, really fast. However, memory latency, the time it takes to get data from the R A M to the C P U, has only improved by about seven percent per year.

So we have these incredibly fast engines, but the fuel line is still the size of a soda straw. Or maybe even a needle at this point.

In a standard computer, the processor spends a huge amount of its time just sitting there, waiting for data to arrive. Now, imagine you are in an H P C environment where you are trying to coordinate the massive scale of a machine like Frontier at Oak Ridge National Lab. As of today, in March of twenty-six, Frontier is still a titan, using over nine thousand A M D E P Y C C P Us and thirty-seven thousand Instinct G P Us. If each of those processors is waiting on data, and then they have to wait on each other to synchronize their results, the whole system grinds to a halt. The efficiency drops to almost zero.

This is why the interconnect is more important than the actual chips in many ways. I remember when we were looking at unified supercomputers in episode six hundred five, we talked about C X L and InfiniBand. In an H P C lab, they aren't using the same cables we use to connect our routers.

Oh, not even close. They are using technologies like InfiniBand N D R or the newer X D R, or proprietary fabrics like Hewlett Packard Enterprise's Slingshot. These are interconnects designed for incredibly low latency and high bandwidth. But even more importantly, they support something called Remote Direct Memory Access, or R D M A. This allows one node in the supercomputer to reach into the memory of another node without involving the operating system or the C P U of the second node. It bypasses the bureaucracy of the computer.

That is fascinating. It is like being able to reach into your neighbor's fridge and grab a snack without having to knock on the door and ask them to get it for you. It turns the entire room of servers into one giant, distributed pool of memory.

That is exactly what it is. And with the advent of C X L three point zero and three point one, which we are seeing fully integrated into the newest clusters this year, we are moving toward true memory fabric. This allows for memory pooling at a scale we only dreamed of a few years ago. But even with that hardware, you still have the problem of the software. You can't just take a program written for a single C P U, even a very fast one, and tell it to run on ten thousand nodes.

No, that would be like trying to have one person give a speech by using ten thousand different mouths simultaneously. It requires a different language. This is where the M P I, or Message Passing Interface, comes in. This has been the standard for decades, right?

It has. M P I is the lingua franca of scientific computing. When a scientist writes a piece of code for a supercomputer, they have to manually define how data is chopped up and sent between different processors. They have to be hyper-aware of data locality. If I am simulating the airflow over a wing, I might give the front of the wing to node A and the back of the wing to node B. But then node A and node B have to constantly talk to each other about what is happening at the boundary where their two sections meet.

And if that boundary communication is slow, it does not matter if node A and node B are the fastest processors in the world. The simulation will only go as fast as the data can move between them. This is why we see these dedicated H P C research labs like Oak Ridge, Argonne, or Lawrence Livermore in the United States. These are not just data centers. They are architectural experiments. They are places where we test the limits of how many components we can stitch together before the overhead of communication eats all the performance.

It is also a matter of national priority. From a strategic perspective, the race for exascale computing is the modern equivalent of the space race. We are now firmly in the exascale era. Frontier was the first, but now we have Aurora at Argonne National Laboratory, which has crossed the two exaflop peak performance mark using Intel's Max Series G P Us. And then there is El Capitan at Lawrence Livermore, which is pushing the boundaries even further for nuclear stockpile stewardship.

I think people forget that we don't do live nuclear testing anymore. We haven't for decades. So, if you want to know if a thirty-year-old nuclear warhead is still functional and safe, you have to simulate it. You have to simulate the physics of a nuclear explosion at a sub-atomic level. That requires a quintillion calculations per second.

It really is a strategic asset. When you look at the infrastructure required to keep these things running, it is mind-boggling. This is not something you can just plug into a wall outlet. This leads us to the physical constraints, which is something Daniel asked about. Why the dedicated labs? Well, because of the power and the heat.

Right, we did an entire episode on the heat wall, episode five hundred fifty-nine, where we talked about C P Us basically becoming nuclear reactors in terms of heat density. In a lab like Oak Ridge or Argonne, the cooling system is as much of an engineering marvel as the computer itself.

It has to be. Frontier uses a liquid cooling system that moves six thousand gallons of water every minute through the cabinets. The water absorbs the heat from the chips and then carries it to a heat exchanger. If the cooling system fails for even a few seconds, the hardware would literally melt. We are talking about twenty to thirty megawatts of power for a single machine. To put that in perspective, that is enough power to run a small city of twenty thousand homes.

That is incredible. And I think it answers one of Daniel's questions about why we need these dedicated labs. You cannot just rent space in a standard cloud data center for this kind of work. Most commercial data centers are designed for web traffic and database queries, which are what we call "embarrassingly parallel" tasks. If I am running a website, each user's request is independent. I can just add more servers and it scales linearly.

Right, that is high-throughput computing, not high-performance computing. In H P C, the tasks are tightly coupled. Everything depends on everything else. If one server in the middle of your simulation has a hiccup, the other ten thousand servers have to wait for it. That requires a level of environmental control, power stability, and network tuning that standard cloud providers just don't offer. Even the big A I clouds from Google or Microsoft, while massive, are often optimized for different types of workloads than a high-fidelity physical simulation.

So, we have the hardware, the interconnect, and the physical lab. But what about the software efficiency? You mentioned earlier that more cores do not always equal more speed. I think we should talk about Amdahl's Law, because that is the ghost that haunts every H P C engineer.

Amdahl's Law is the reality check of the computing world. It basically states that the speedup of a program using multiple processors is limited by the time needed for the sequential fraction of the program. If you have a program where ninety percent of the work can be parallelized, but ten percent has to be done one step at a time in order, then even if you have an infinite number of processors, you can never make that program more than ten times faster.

That is a tough pill to swallow. You spend six hundred million dollars on an exascale machine, but if your code has a tiny bit of serial bottleneck, you are wasting ninety percent of the hardware. It is like having a thousand chefs in a kitchen, but they all have to wait for one person to peel the single onion before anyone can start cooking.

Precisely. This is why scientific computing is such a specialized field of study. It is not just about knowing the physics; it is about knowing how to structure your algorithms to minimize that serial portion. It is about avoiding what we call global synchronization points, where every processor has to stop and wait for everyone else to catch up. We are seeing a shift toward "asynchronous" algorithms where nodes can keep working even if they haven't heard from their neighbors yet, but that is incredibly hard to program.

It seems like we are seeing a shift lately, though. For a long time, supercomputers were mostly C P U-based. But now, as you mentioned with Frontier and Aurora, it is dominated by G P Us. Is that because of the A I boom, or was scientific computing already heading that way?

It was already heading that way, but the A I boom accelerated the hardware development. G P Us are throughput-oriented. They are designed to do the same simple math operation on thousands of pieces of data simultaneously. This turns out to be perfect for things like molecular dynamics or weather modeling, where you are doing the same calculation on every point in a massive grid. However, we are now moving beyond just G P Us into the era of domain-specific accelerators.

You mean chips designed for one specific type of math?

We are seeing labs experiment with F P G As, or Field Programmable Gate Arrays, and even A S I C s. There is a famous machine called Anton, built by D. E. Shaw Research, which is a supercomputer designed for one thing and one thing only: molecular dynamics simulations of proteins. It is hundreds of times faster than a general-purpose supercomputer at that specific task because the hardware itself is wired to do that math. We are seeing this trickle down into the mainstream with things like Google's T P Us or the specialized tensor cores in N V I D I A cards.

That is where the foundation of the future is being laid. These labs are the breeding ground for architectures that eventually trickle down to us. I mean, the high-bandwidth memory, or H B M, that we see in high-end consumer graphics cards today started in the H P C world. The low-latency networking we see in high-frequency trading often has its roots in these scientific interconnects.

It is a trickle-down effect, for sure. But the gap is widening. The requirements for exascale and the upcoming zettascale are so extreme that the hardware is becoming more and more specialized. We are moving away from the era where a supercomputer was just a bunch of off-the-shelf parts. We are entering an era of deep integration where the cooling, the power, the interconnect, and the silicon are all co-designed as a single unit.

I want to go back to the idea of the dedicated lab for a second. We mentioned Oak Ridge and Lawrence Livermore. These are Department of Energy labs in the U. S. There is also the Riken Center in Japan, which houses the Fugaku supercomputer, and the Euro H P C Joint Undertaking with machines like LUMI in Finland and JUPITER in Germany. Why is it that government-funded labs are the ones doing this, rather than say, Google or Microsoft?

There is a big difference in the mission. Google and Microsoft are focused on A I training and cloud services. Their clusters are massive, but they are often optimized for a different kind of workload—one that is more tolerant of latency. The government labs are focused on open science and national security. They are willing to invest in architectures that might not be commercially viable for a decade. They are also dealing with data that is so sensitive it cannot leave a secure facility.

Right, like the nuclear simulations we mentioned. Or highly sensitive genomic data. You can't exactly run those on a public cloud without some serious security concerns.

And the scale of the problems they are tackling is just different. A I training is very compute-intensive, but it is often more tolerant of errors. If a bit flips during an A I training run, the model might just be slightly less accurate. If a bit flips during a simulation of a nuclear reactor's core or a climate model, the whole simulation might diverge into nonsense. The reliability and the precision requirements—what we call double-precision floating-point math—are much higher in scientific computing.

So the precision matters. That makes sense. Herman, I am curious about the future of this. We are at exascale now. What is next? Is it just more of the same, or is there a fundamental shift coming?

The next big thing is the integration of quantum computing into the H P C workflow. We are starting to see the first quantum-classical hybrid systems. The idea is not to replace the supercomputer with a quantum computer, because quantum computers are actually quite bad at most things. But they are incredibly good at specific things, like simulating quantum chemistry at the atomic level.

So you would use your classical supercomputer to handle the overall simulation, and then you would offload the really hard quantum bits to a quantum processor?

It is the ultimate accelerator. Just like we offload graphics to a G P U today, tomorrow's researchers might offload sub-atomic interactions to a Q P U, a Quantum Processing Unit. But that requires a whole new level of H P C infrastructure to manage the handoff between the classical and quantum worlds. We are talking about nanosecond-level synchronization between a machine running at room temperature and a quantum processor sitting in a dilution refrigerator at near absolute zero.

It feels like we are reaching a point where the physical layout of these machines is becoming the primary constraint. I mean, if you have to move data across a room that is a hundred feet long, you are limited by the speed of light.

You hit the nail on the head. In an exascale machine, the speed of light is actually a bottleneck. If a signal has to travel fifty feet, that takes about fifty nanoseconds. In the time it takes for that signal to cross the room, a modern C P U has gone through hundreds of clock cycles. This is why the physical density of these machines is increasing. We are trying to pack everything as close together as possible, which then makes the heat problem even worse. It is a vicious cycle.

You want it faster, so you pack it tighter, which makes it hotter, which requires more cooling, which requires more power, which requires a bigger building. It is an incredible engineering challenge.

And that is why it stays in the labs. It is a specialized art form. It is the pinnacle of human engineering, really. It is where we push against the very limits of physics to try and understand the universe better. Whether it is predicting the next hurricane or finding a cure for a rare disease, these machines are the engines of discovery.

I think this is a good spot to transition into some practical takeaways for our listeners. Because while most of us will never get to run a job on the Frontier supercomputer, the principles of H P C actually apply to how we think about technology even at a smaller scale.

The first big takeaway is that computing is no longer just about the speed of the processor. If you are building a high-end workstation or even a gaming rig, you need to think about data movement. This is what we discussed in episode six hundred sixty-three. You can have the best G P U in the world, but if it is plugged into a motherboard with a slow P C I e bus, or if your storage is a bottleneck, you are wasting your money.

Right. Minimize data movement. That is the mantra of H P C. Whether you are a professional video editor or a software developer, you should be looking at where your data is sitting and how fast it can get to where it needs to be processed. Sometimes, a faster S S D will give you a bigger performance boost than a faster C P U.

The second takeaway is about software efficiency. We live in an era of very bloated software because hardware has become so cheap. But as we see with Amdahl's Law, you cannot just throw more cores at a poorly written piece of code and expect it to get faster forever. Learning how to write code that is aware of the hardware it is running on—what we call "hardware-aware programming"—is a superpower.

And third, I think it is important to recognize the distinction between a cluster and a supercomputer. If you are a business owner or a researcher, you need to evaluate if your project actually requires the tightly coupled architecture of H P C, or if you can just use high-throughput cloud computing. Most tasks, like big data analytics or web serving, are actually fine on the cloud. But the ones that aren't, the ones that require that deep integration, those are the ones that change the world.

Well said. It is about choosing the right tool for the job. And sometimes, that tool is a thirty-megawatt machine cooled by six thousand gallons of water a minute. It is about understanding the scale of the problem you are trying to solve.

I love that. It really puts things in perspective. Before we wrap up, I want to mention that if you want to see the technical details of the interconnects we talked about, you should definitely go back and listen to episode six hundred five. We really get into the weeds of C X L and how it is trying to bring some of those H P C memory-pooling features to more general-purpose servers.

Yeah, that was a fun one. And if you are interested in the physical limits of these chips, episode five hundred fifty-nine on the heat wall is a must-listen. It really explains why we can't just keep making chips smaller and faster forever without a massive breakthrough in materials science.

Definitely. Well, Herman, I think we have covered a lot of ground today. We have gone from the abstract math of scientific computing to the liquid-cooled racks of Oak Ridge and Argonne.

It has been a journey. And thanks to Daniel for sending this one in. It is always good to geek out on the "big iron." It reminds us that there is a whole world of computing beyond our smartphones and laptops.

And hey, to all of you listening, if you have been enjoying the show, we would really appreciate it if you could leave a quick review on your podcast app or on Spotify. It genuinely helps other people discover the show and keeps us going. We are a small operation, and your support means the world to us.

It really does. We love seeing those reviews come in, especially the ones that ask follow-up questions.

You can find all of our past episodes, all one thousand sixteen of them, on our website at myweirdprompts.com. There is a search bar there, so if you want to find more of our hardware deep dives, just search for H P C, supercomputing, or silicon.

Or just browse around. There is a lot of weird stuff in the archives, from the history of the abacus to the future of carbon nanotube transistors.

That is for sure. Well, this has been My Weird Prompts. I am Corn Poppleberry.

And I am Herman Poppleberry.

Thanks for listening, and we will talk to you in the next one.

Until next time.

So, Herman, I was thinking about the power consumption of these labs again. Twenty megawatts is insane. That is essentially a dedicated power plant just for one computer.

It is. And it is a major bottleneck for the next generation. People are talking about zettascale computing, which is the next step after exascale—a thousand times faster. But if we use current technology, a zettascale machine would require its own nuclear power plant. We are talking gigawatts of power.

That is where the argument for high-density energy sources like nuclear becomes so relevant in the tech world. If we want to maintain our lead in computing, we have to have the energy infrastructure to support it. You can't run a zettascale machine on intermittent sources alone without massive battery backups that don't exist yet. The baseload requirements are just too high.

It is a physical reality. You need consistent, high-density power. It is one of those areas where energy policy and tech policy are completely inseparable. You can't have one without the other.

It is fascinating how all these things connect. The math, the hardware, the energy, the geopolitics. It all comes together in these quiet rooms with blinking lights. It is the foundation of the modern world, even if most people never see it.

It really is. It is the invisible engine of progress.

Well, I think that is a wrap for today.

Agreed. Let's go see what Daniel is cooking for dinner.

Hopefully something that doesn't require a liquid cooling system.

No promises. He was looking at a sous-vide machine earlier.

Close enough. Alright, thanks everyone. Bye.

Bye.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.