#1223: Cracking the COBOL Code: Agentic AI and Legacy Systems

Discover how agentic AI is finally solving the billion-dollar crisis of aging legacy codebases like COBOL and FORTRAN.

0:000:00

Episode Details

Published: Mar 15
Duration: 15:53
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
LLM
Topics: legacy-systems ai-agents software-development

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The global economy is currently facing an invisible but existential threat: the "Legacy Trap." For decades, the world’s most critical infrastructure—from the Social Security Administration to 95% of global ATM operations—has relied on programming languages like COBOL, FORTRAN, and RPG. As the original developers of these systems retire, organizations are finding themselves trapped in a cycle where up to 80% of IT budgets are spent simply on "keeping the lights on." However, by early 2026, a significant shift in artificial intelligence has begun to offer a way out.

From Translation to Reasoning

Past attempts at code migration often failed because they focused on simple syntax translation—swapping a COBOL command for a Java equivalent. This approach frequently missed the deep architectural nuances and undocumented business logic embedded in old systems. The current breakthrough lies in "agentic refactoring." Unlike traditional tools, modern AI agents can reason through undocumented code, analyze business logic, and act as a bridge of institutional memory.

This technological leap has sparked a major industry conflict. While newcomers like Anthropic claim that AI can accelerate migrations from years to quarters, incumbents like IBM argue that their specialized ecosystems are still necessary for stability. This "IBM Paradox" highlights a dual reality where legacy providers must sell both the hardware that runs old code and the AI tools intended to eventually replace it.

The Challenge of Behavioral Equivalence

The most significant technical hurdle in any migration is achieving "behavioral equivalence." In systems that have run for forty years, even a "bug" might be a feature that other systems rely on. If a rewrite changes how a fraction of a cent is rounded, it can lead to massive discrepancies in financial records.

To solve this, developers are turning to multi-agent pipelines. Instead of writing code immediately, these agents first analyze legacy systems to generate comprehensive test suites. By wrapping old code in a "black box" environment, they can extract the absolute "truth" of how a system behaves before a single line of modern code is written. This "test-first" mandate ensures that the new system mirrors the old one perfectly, avoiding catastrophic errors.

A New Era of Modernization

Success stories are already emerging, with major financial institutions using agentic services to convert millions of lines of code in months rather than years. These tools are no longer just translating syntax; they are reimagining monolithic "blobs" of code as modern, cloud-native microservices.

This shift is fundamentally changing the role of the developer. The "rockstar coder" is being replaced by the "Modernization Architect"—a professional who manages fleets of AI agents and focuses on auditing, verification, and design rather than manual entry. As the attrition rate of legacy programmers accelerates, the race to move these systems into the modern era has become a matter of global economic stability.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Episode #1223: Cracking the COBOL Code: Agentic AI and Legacy Systems

Daniel's Prompt

Custom topic: We often hear about technical debt for maintaining legacy codebases. There are some famous languages that are predominant in some environments just because they were written for mainframes and busines | Context: ## Current Events Context (as of March 15, 2026)

### Recent Developments
- Anthropic vs. IBM standoff (early 2026): Anthropic published a blog post claiming Claude can dramatically accelerate COB

I was looking at a budget report recently that honestly felt like a typo, but it turns out the United States Social Security Administration is dropping one billion dollars on a three-year project just to deal with their COBOL code. It is essentially an emergency triage on the infrastructure that keeps benefit payments flowing for hundreds of millions of people. Today's prompt from Daniel is about how agentic AI is finally cracking the deadlock on these massive legacy systems, and it feels like we are at a genuine tipping point.

It is the ultimate canary in the coal mine, Corn. I am Herman Poppleberry, by the way, for anyone joining us for the first time. That one billion dollar figure is staggering, but when you realize that ninety-five percent of ATM operations globally still run on COBOL, you start to see why the stakes are so high. We are talking about code written before most current developers were even born, and it is holding up the entire global financial system. We are currently sitting in March of twenty twenty-six, and the "COBOL knowledge crisis" that people have been predicting for decades has finally arrived as a full-blown operational emergency.

It is wild to think about. We always hear about technical debt as this annoying thing that makes your app slow, but here, we are talking about institutional debt that could actually break the economy if it is not handled. Daniel's asking about the specific languages that are causing the biggest headaches and what tools are actually working for these rewrites. Because let's be honest, we have been promised "automatic code migration" for decades, and it has mostly been a disaster. What makes early twenty-six different?

The difference is the shift from simple syntax translation to agentic refactoring. But before we get into the "how," we have to understand the "why" of the Legacy Trap. Right now, the average global enterprise is wasting over three hundred and seventy million dollars annually due to inefficiencies and technical debt from these systems. In many organizations, up to eighty percent of the entire IT budget is spent just on "keeping the lights on." That is money that could be going toward innovation, but instead, it is being funneled into maintaining systems that are fifty years old.

Eighty percent? That is a death spiral. If you are spending that much just to exist, you can never afford to build anything new. And that brings us to the big news from earlier this year—the standoff between Anthropic and IBM. That really set the tone for twenty twenty-six, didn't it?

It really did. Anthropic published that bombshell blog post claiming that Claude could accelerate COBOL modernization to be finished in quarters rather than years. They basically argued that AI has broken the cost barrier that kept these systems locked in place for decades. This absolutely rattled IBM investors because the mainframe business is partly propped up by the perceived impossibility of migration. IBM had to push back publicly, claiming the AI debate was "getting it wrong" and that you still need their specialized ecosystem to survive. It is a fascinating clash of philosophies.

It is like the old guard versus the new agents. But let's look at the technical side. Why is this so hard? In the past, tools would just try to swap a COBOL command for a Java command. But languages like COBOL or FORTRAN are not just different syntaxes; they represent entirely different ways of thinking about data and memory.

Modern agentic AI, like what we are seeing with the latest Claude models and AWS Transform, does not just translate. It acts as a bridge of institutional memory. It can look at a block of undocumented code and actually reason out the business logic behind it. This leads us to the biggest technical hurdle in the entire industry: Behavioral Equivalence.

I love that term, but it sounds terrifying. What does it actually mean in practice?

It means that if you have a financial system that has been running since nineteen seventy-five, it probably has specific edge cases, weird rounding behaviors, or even intentional bugs that every other system in the company has learned to expect. If the AI rewrites that logic to be "cleaner" but changes the output by even a fraction of a cent, the whole house of cards falls down. After forty years of transactions, that fraction of a cent becomes a massive discrepancy.

Right, so how do you solve for that without having a human check every single line?

You use multi-agent pipelines. This is what Microsoft is doing with their Semantic Kernel and Process Framework. They do not start by writing code. They start by having one agent analyze the legacy code to generate a comprehensive test suite. They basically wrap the old code in a "black box" test environment to see exactly what it does with every possible input. They are essentially extracting the "truth" of the system before they even think about Java or Python.

That is the part that fascinates me. Because in many of these organizations, the code is the only documentation that exists. The people who wrote it are either retired or, frankly, no longer with us. The average COBOL programmer is fifty-five years old, and we are seeing a ten percent annual attrition rate. It is a literal race against time.

It really is. And it is not just COBOL. You have FORTRAN still embedded in NASA and Department of Energy systems for scientific computing. FORTRAN is a different beast because it is used for high-performance math. If you are modeling the structural integrity of a nuclear reactor, you cannot afford any loss of precision. Then you have RPG, which stands for Report Program Generator. This is the backbone of mid-sized manufacturing and distribution all across the American Midwest and Canada, running on old IBM AS-four hundred systems.

I remember we touched on this a bit when we talked about the shift toward Rust in episode ten thirty-three. There is this desire to move to memory-safe, modern languages, but the "how" has always been the bottleneck. You mentioned the "IBM Paradox" in our notes, and I think that is a hilarious tension to explore. IBM is basically the landlord of the mainframe world, right?

They really are. IBM's Z-series mainframes are the iron that all this COBOL runs on. So they have this fascinating, conflicted incentive. On one hand, they sell the hardware and the specialized support that companies rely on because they are too scared to migrate. On the other hand, IBM is now pushing their Watsonx Code Assistant for Z, which is their own AI tool to help understand and refactor that code. They want to own the modernization narrative so that even if you move away from old COBOL, you stay within the IBM ecosystem. They are selling the poison and the cure.

It is a brilliant business model, if a bit cynical. But the competition is heating up. You mentioned AWS Transform earlier. How does that differ from the traditional consulting model where you hire an army of people to sit in a room for five years?

AWS Transform, which launched in May of twenty twenty-five, is an end-to-end agentic service. It handles the full pipeline: analysis, planning, refactoring, and testing. It is not just a chatbot; it is a factory. We saw a real-world case recently where a Japanese bank converted five million lines of COBOL to Java in just nine months. In the old world, that would have been a decade-long project that probably would have been canceled halfway through.

Nine months for five million lines is incredible. But I want to go back to the skepticism. Firms like Thoughtworks have been very vocal about the "reality gap" here. They argue that while AI can write the code, it struggles with the deep architectural nuances. Are we just replacing old spaghetti code with new, AI-generated spaghetti code?

That is the danger of the "lift and shift" approach. If you just replicate the old procedural logic in a new language, you are not really getting the benefits of the cloud. This is why the "agentic" part is so important. The best tools are being instructed to "re-imagine" the architecture. Instead of just a one-to-one translation, they are tasked with breaking that monolithic COBOL blob into microservices or serverless functions. They are refactoring the design, not just the syntax.

Which requires the AI to actually understand what a "customer" is or what a "transaction" is, rather than just seeing them as data structures. It is moving from code conversion to institutional memory extraction.

Precisely. And that extraction can involve multi-modal RAG—Retrieval-Augmented Generation. The AI can actually "read" old scanned PDF manuals from the nineteen eighties, look at the database schema, and cross-reference it with the code to figure out why a specific check was added on line four thousand. A human consultant would take weeks to find that manual; an AI does it in seconds.

Let's talk about the other languages Daniel mentioned. We covered COBOL, FORTRAN, and RPG. What about PL-I or Assembly? Are those even harder to move?

Assembly is the final boss of migration. It is hand-optimized for specific hardware from forty years ago. There is no "logic" to extract sometimes; it is just raw manipulation of bits. But even there, we are seeing specialized agents being trained on these low-level instruction sets. Then you have things like Delphi and Pascal, which are huge in Eastern European enterprise software and medical systems. Those codebases are particularly hard to maintain because the developer pool is shrinking so fast.

It sounds like a massive shift in skill sets is coming. I wonder if this finally kills the "rockstar developer" myth in the enterprise space. Now, the rockstar is the person who can design the verification pipeline and catch the subtle logic errors the AI misses.

It really is. The job description is changing from "COBOL coder" to "Modernization Architect." You are managing a fleet of AI agents rather than typing out lines of code yourself. You need to be the one who understands the business logic well enough to know when the AI is hallucinating a fix that will break the bank. It is a shift from being a writer to being an editor and an auditor.

I think people underestimate the "institutional memory" aspect. It is not just about the language. It is about the "why." If you are a CTO or a lead developer listening to this, what is the first step? How do you even begin to audit this much debt?

The first step is the "Test-First" mandate. You cannot modernize what you cannot verify. Do not even think about "rewriting" yet. Use these agentic tools to build your testing and verification layer first. Once you have a safety net that proves behavioral equivalence, then you can let the agents start the refactoring process. If you do it the other way around, you are just guessing, and guessing in a billion-dollar financial system is how you end up on the front page of the Wall Street Journal for all the wrong reasons.

That is a great point. Look for the "eighty percent maintenance" signal. If your budget is that skewed, you are in the Legacy Trap. But I want to push back a little on the "quick rewrite" idea. Even with these tools, isn't there a danger of just creating "modern legacy code"?

There is. This is where specialized firms like TSRI come in with their JANUS Studio platform. They have been doing this for years using rule-based systems, but now they are layering AI on top to handle the nuances and the "fuzzy" logic that rules cannot catch. It is a hybrid approach. You use the rules for the things that must be absolute, and you use the AI for the things that require reasoning and architectural reimagining.

It feels like we are in this weird transition period where the old world is being forcibly merged into the new. I wonder what this means for the future of the mainframe itself. Does it just become a museum piece, or does it evolve?

IBM is betting on evolution. They are positioning the mainframe as the ultimate secure backend for AI-native applications. But the reality is that the "gravity" of the cloud is pulling everything toward it. Once you have successfully moved your core logic to Java or Python and it is running on standard cloud infrastructure, the incentive to keep paying for a mainframe disappears. We are looking at the largest refactoring project in human history over the next decade. We are literally cleaning up the twentieth century's digital footprint.

It is a massive cleanup. I think the takeaway for anyone listening who is working in one of these legacy-heavy environments is that the "wait and see" approach is becoming incredibly risky. The Dutch financial systems crisis we saw recently, where they literally ran out of people who knew how the core infrastructure worked, is a warning to everyone.

It really is. When you have a critical shortage of people who can maintain the literal foundation of your economy, you have to turn to automation. There is no other choice. AI isn't just replacing the code; it's acting as the bridge between the twentieth-century business logic and twenty-first-century cloud infrastructure.

I think we should wrap it up there for today. This was a deep one, but honestly, it is one of the most important stories in tech that people outside the enterprise world rarely talk about. It is the "invisible" work that keeps the world turning.

I am glad Daniel brought it up. It is a perfect example of how agentic AI is solving real-world problems that were previously thought to be impossible or just too expensive to fix. We are moving from a world of "code archeology" to a world of "automated evolution."

Definitely. We will have to keep an eye on that Social Security Administration project. A billion dollars is a lot of pressure to perform. If they succeed, it will be the blueprint for every other government agency and bank on the planet. If they fail, well, that is a whole different episode.

Let's hope for the former. The alternative is a lot of people not getting their checks, and nobody wants that.

Thanks as always to our producer Hilbert Flumingtop for keeping the gears turning behind the scenes. And a big thanks to Modal for providing the GPU credits that power the generation of this show. We literally couldn't do this without that serverless compute.

This has been My Weird Prompts. If you found this dive into legacy code useful, or if you are currently stuck in a COBOL nightmare and need a distraction, please consider leaving us a review on your podcast app. It really helps the show reach more people who are interested in these deep technical shifts.

You can find us at myweirdprompts dot com for the full archive and all the ways to subscribe. We are also on Telegram if you want to get notified the second a new episode drops. Just search for My Weird Prompts there.

See you in the next one.

Take it easy.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.