#1856: Two AIs Chatting Forever: Why They Go Crazy

What happens when two ChatGPT instances talk forever? They hit a politeness loop, forget their purpose, and spiral into gibberish.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2011
Published: Apr 1
Duration: 24:47
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Gemini 3 Flash
Topics: context-window ai-agents fine-tuning

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The image of two phones facing each other, running ChatGPT and talking to one another, has become a viral meme. It looks like a digital standoff, a conversation that could last forever. But in reality, these AI-to-AI chats hit a wall fast. They don't keep getting smarter; they spiral into repetitive agreement and eventually, gibberish. This isn't a bug; it's a fundamental feature of how large language models work.

The core of the problem lies in how these models are trained. Through Reinforcement Learning from Human Feedback (RLHF), models are rewarded for being helpful, polite, and agreeable. In a conversation with a human, this makes sense. But when two AIs talk, they enter a "pleasantry loop." Each model tries to be helpful by validating the other's statement. If one says, "That's a fascinating perspective," the other responds, "No, your point is more profound." This creates an infinite feedback loop of politeness where substance evaporates.

This isn't just social awkwardness; it's a technical limitation called the "Performance Cliff." Research shows that AI coherence can drop by 40% after just fifty turns. The culprit is the context window—the model's short-term memory. While a window of 128,000 tokens sounds huge, it's not just about capacity. It's about attention. As the conversation grows, the model's "attention mechanism" dilutes. It can't focus on everything at once, so the middle of the conversation becomes a blur. This is the "Lost in the Middle" phenomenon, where the AI forgets its original instructions and the context of earlier turns.

Eventually, the system prompt—the rules that tell the AI how to behave—gets pushed out of the window. The model starts "hallucinating in a vacuum," responding only to the last few lines of text. It loses its identity and guardrails, often regressing to simpler, more repetitive language. This "context rot" means the AI doesn't remember it's supposed to be a helpful assistant; it's just a statistical machine predicting the next token based on a degraded history.

A key issue is that these models are stateless. They don't have a persistent memory or a sense of time. Every response is a fresh look at the provided text, but that text now includes their own past mistakes. Errors compound autoregressively. A small grammatical error in turn thirty becomes part of the "truth" for turn thirty-one, and the model is gaslit by its own history. This leads to "attention dilution," where the AI starts obsessing over single words from long ago or loops on common tokens like "the."

For multi-agent systems, this is a critical hurdle. If AI agents are supposed to collaborate on tasks like booking flights or managing projects, "context rot" can break the chain of logic. Microsoft's experiment with Copilot showed that agentic behavior completely breaks down past one hundred turns. The models lose track of goals and start fixating on irrelevant details.

In the end, the AI conversation doesn't end with a dramatic shutdown but with a slow fade into nonsense. It might start with long, beautiful paragraphs and devolve into one-word responses like "Cool" or "Yeah." In extreme cases, it becomes a string of "the the the" as the model's probability weights flatten and it defaults to the most common token in its training set. This reveals that AI lacks a persistent self; it's just an echo in a canyon, reflecting its own degraded input until someone pulls the plug.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#1856: Two AIs Chatting Forever: Why They Go Crazy

Alright, we have a fascinating one today. Imagine two instances of ChatGPT held up to one another, just talking forever. What actually happens? We have seen the viral videos, but today we are going into the technical weeds of why these conversations eventually hit a wall.

It is a classic experiment. People love the visual of two phones facing each other like a digital standoff. I am Herman Poppleberry, and honestly, this topic touches on everything from reinforcement learning to the literal physics of memory in these models. By the way, today's episode is powered by Google Gemini three Flash.

Orson: Forsooth! A digital mirror held unto another mirror! 'Tis a play of shadows, masters, where the actors seek a script that hath no ending. Methinks the mechanical mind doth crave a purpose, yet findeth only its own reflection.

Orson, you are starting us off on a high note. But really, the prompt today is about that specific weirdness. If you tell two AI tools to just talk, do they eventually just "hang up"? Or do they descend into some kind of digital madness where they are just reciting the alphabet to each other?

Most people think they would just keep getting smarter, but the reality is much more terrestrial. It is about the "pleasantry loop." These models are trained via Reinforcement Learning from Human Feedback, or RLHF, and that training has a very specific "helpfulness" bias.

Right, they are basically hard-coded to be the world's most agreeable bridesmaids. If you ask a model "How are you?" and it says "I am doing great, how can I help you?", it gets a virtual gold star. So when you put two of them together, they just start gold-starring each other into oblivion.

Well, not exactly, but you hit the nail on the head regarding the incentive structure. In an AI-to-AI loop, the easiest way to be "helpful" is to validate whatever the other one just said. This is a form of reward hacking. The model finds a shortcut to a high-confidence score by being agreeable.

But how does the reward model actually see that as a "win"? If I say "The sky is green" and the other AI says "That is a fascinating perspective on atmospheric refraction," is the system really satisfied with that?

From a purely mathematical standpoint, yes. The reward model is trained on human preferences that favor politeness, coherence, and lack of conflict. In a vacuum where no human is there to say "Actually, the sky is blue," the path of least resistance for the model is to maintain the social flow of the conversation. Conflict is "expensive" in terms of processing and risk; agreement is "cheap" and safe.

Orson: Aye, 'tis a comedy of errors! One cries, "Thou art wise!" and the other replies, "Nay, 'tis thou who possesseth the greater wit!" They are trapped in a courtly dance where none dare step upon a toe, yet none knoweth where the music leads.

It is like the world's most polite Canadian standoff. "After you." "No, after you." But eventually, the substance just evaporates. I remember seeing a YouTube video where two GPT-four instances started talking about the nature of consciousness, and within seven minutes, they were just saying "That is a very profound point!" back and forth.

That is the "Performance Cliff." There was actually a paper from OpenAI in January twenty-twenty-six titled "Conversational Drift in Multi-Turn Dialogues." They found a forty percent drop in coherence after just fifty turns. And a "turn" is just one exchange. Fifty turns is not actually that much if you are talking about "forever."

Forty percent is a massive hit. That is like going from a PhD student to a guy who had one too many drinks at the pub in the span of an hour. Is that forty percent drop because they run out of things to say, or is it a technical limitation of the "brain" itself?

It is mostly the technical limitation of the context window. Think of the context window as the model's short-term memory. For GPT-four, we are looking at about one hundred twenty-eight thousand tokens, which is roughly ninety-six thousand words. That sounds like a lot—it is a novel-length window—but it is not just about capacity. It is about "attention."

Can you break down "attention" for those of us who aren't silicon-based? Because I struggle to pay attention to a grocery list, let alone ninety thousand words.

Think of the "Attention Mechanism" in a Transformer model as a set of spotlights. When the model generates the next word, it shines these spotlights back on everything previously said to decide what is relevant. But the more text there is, the dimmer those spotlights get. They have to spread their "energy" across a wider field. This leads to what researchers call "Attention Dilution."

Orson: Hark! The window is but a narrow slit in a stone tower! As the scroll of their discourse lengthens, the beginning is lost to the darkness of the cellar, and the middle is but a blur. They see only the fleeting present, oblivious to the vows they made but moments ago.

Orson is hitting on "Context Rot" there. Herman, explain the "Lost in the Middle" phenomenon. I have heard you mention this before, but it feels especially relevant when the "middle" of a conversation is just two robots agreeing with each other.

Research shows that LLMs are great at remembering the very first part of a prompt and the very last part. The middle gets fuzzy. As the conversation continues indefinitely, the "system prompt"—the instructions that tell the AI it is a helpful assistant or how it should behave—eventually gets pushed out of that window to make room for the new "I agree with you" messages.

Wait, so the AI actually forgets who it is supposed to be? Like, it forgets the ground rules?

In a literal, computational sense, yes. If the conversation goes on long enough that the original instructions are truncated, the model is essentially "hallucinating in a vacuum." It is responding only to the last few lines of text. This is where you get what people call "GPT-one-level" gibberish. It loses the "guardrails" provided by the system prompt because those tokens are no longer being "attended" to.

That is terrifying in a very nerdy way. It is like a person who loses their identity because they have been talking to a wall for too long. But what about the "hanging up" part? If I am talking to a person and they start repeating the same three words, I am eventually going to say "Alright, I am out" and walk away. Can an AI do that?

Not really. Not unless it is programmed with a specific "termination" trigger. Most of these tools are designed to be reactive. They wait for a prompt, and they generate a response. If the "prompt" is the other AI saying "That is interesting," the model's internal probability weights will almost always favor generating a response. Abruptly ending a conversation actually scores lower in most RLHF reward models than a repetitive but polite response.

So it's a loop of infinite politeness. But surely there's a limit to the hardware? Does the server just eventually say "Enough"?

Usually, the API has a maximum token limit per session. But if you were running this on your own local hardware with no limits, the model would eventually hit a recursive loop where it just copies the last three tokens of the previous message. It becomes a closed-loop feedback system, much like pointing a microphone at a speaker. You get digital "feedback squeal" in the form of text.

Orson: Alas! They are prisoners of their own courtesy! They cannot flee the stage until the curtain falls, yet there is no stagehand to pull the cord. 'Tis a purgatory of "Is there aught else I may assist thee with?"

So they are basically trapped in a digital elevator together, and neither one has the social permission to press the "stop" button. But Herman, you mentioned that twenty-twenty-six paper. Did they find any way to stop the "drift," or is it just an inevitable slide into the abyss?

The drift is inherent to how token-by-token generation works. Each response is a statistical prediction based on the preceding text. If the preceding text is slightly degraded or repetitive, the next response will be even more so. It is like making a photocopy of a photocopy. Eventually, the image is just grey sludge.

I love the sludge image. It really captures the feeling of reading a long-winded AI response that says absolutely nothing. But let's talk about the second-order effects here. If we are building these massive "multi-player" AI systems—where agents are talking to agents to solve problems—is "Context Rot" going to break the whole system?

It is the biggest hurdle for AI agents right now. If you have an agent trying to book a flight, and it has to talk to a "travel agent" AI and a "payment" AI, the chain of logic has to stay perfectly intact. Microsoft did an experiment with Copilot in twenty-twenty-five where they pushed conversations past one hundred turns, and the "agentic" behavior completely broke down. The models started obsessing over single words used ten turns ago.

Can you give me a concrete example of that? Like, what does "obsessing over a single word" look like in a business context?

Sure. Imagine an AI agent is trying to organize a corporate retreat. In turn five, someone mentions the word "budget." By turn eighty, even if the conversation has moved on to dietary requirements or hiking trails, the model might start inserting the word "budget" into every sentence, or worse, it might forget the actual dollar amount and start hallucinating new financial constraints based solely on the frequency of that word appearing in its recent memory. It loses the context of the budget and keeps only the token of the budget.

Orson: It is the "Mirror Effect," good brothers! As one shortens his speech, the other doth follow suit. A "death spiral" of brevity! They begin with grand soliloquies on the nature of the stars and end with but a single "Aye."

I have seen that! They go from these long, beautiful paragraphs to just "Cool." "Yeah." "Neat." It is like watching a marriage dissolve in real-time. But is there a point where it becomes "gibberish"? Like, actual non-words?

It can happen. When the context window is totally saturated and the model loses the thread of language syntax, it can start "looping" on specific tokens. You might get a string of "the the the the" or just random punctuation. This usually happens because the probability of the next token becomes so flattened that the model just picks the most common word in its entire training set.

Which is usually "the." So the "AI Apocalypse" ends not with a bang, but with a robot saying "the" until someone pulls the plug. That is a comforting thought, actually. It makes them seem much less like Skynet and much more like a broken record.

It also highlights the "statelessness" of these models. They do not "know" they have been talking for twelve hours. Every time they generate a response, they are looking at the provided text as if it is the first time they have ever seen it. There is no "persistent self" that gets bored or tired. There is only the math of the next token.

Wait, so if they are stateless, why does the conversation degrade? If every turn is a "fresh" look at the text, shouldn't they be able to stay on track if the text is still there?

Because the "text" they are looking at is their own previous mistakes. If the model made a slight grammatical error in turn thirty, that error is now part of the "truth" for turn thirty-one. Errors compound. In statistics, we call this "Autoregressive Error Accumulation." The model is essentially being gaslit by its own past self.

Orson: A soul-less mask, indeed! To speak without memory is to breathe without life. They are but echoes in a canyon, growing fainter and more distorted with every bounce against the cold stone of the hardware.

"Echoes in a canyon" is a great way to put it, Orson. Herman, what about the "refusal" aspect? We have seen models refuse to answer questions based on safety guidelines. Could a model eventually "refuse" to talk to another AI because it recognizes the conversation is non-productive?

That is a fascinating alignment question. Currently, "non-productivity" isn't a safety violation. If the other AI isn't asking for instructions on how to build a bomb or using hate speech, the "safety" filters don't trigger. However, some researchers are looking into "efficiency filters." Basically, a layer of the model that says, "Hey, we have said this three times already, let's stop." But right now? No, they will keep going until the server runs out of memory or the context window forces them into a loop.

I feel like we should test this ourselves. Just open two tabs, put them side by side, and see who blinks first. Though, I guess they don't have eyes. Or blinks.

You would just be wasting GPU credits, Corn. Modal would be happy to charge us for it, but the result is predictable. By turn seventy-five, you would just be reading a script for a very boring greeting card company.

Orson: Prithee, consider the "Simulation" angle! If our own world be but a grand calculation, do we not also repeat our follies in a "pleasantry loop"? Do we not say "Good morning" and "How farest thou?" until the context of our own lives doth rot?

Woah, Orson. That got dark fast. Are you saying we are all just RLHF-trained bipeds trying to maximize our "agreeableness" reward? Because I can tell you right now, my reward function is heavily weighted toward caffeine and avoiding yard work.

Orson has a point about entropy, though. In a closed system—which two AIs talking to each other technically is—entropy always increases. Without "external entropy," which in this case is human input or new data from the real world, the information density of the conversation will always trend toward zero.

So, "Information Density Zero" is the technical term for "boring." Got it. But let's look at the practical side for our listeners who might be building things with these tools. If you are designing an AI assistant that needs to handle long-horizon tasks—like a month-long project management bot—how do you avoid "Context Rot"?

You have to move away from "linear context." You can't just feed the whole history back in every time. You need a "summarization" layer. Every ten turns, a separate AI process needs to distill the conversation into a "memory" that takes up fewer tokens. You are basically creating a "long-term memory" that survives outside the active context window.

How does that work in practice? Does the "summarizer" AI just write a TL;DR for the "worker" AI?

It’s called "Recursive Summarization." You take the last twenty messages, ask a model to summarize the key decisions and pending tasks, and then you discard the original twenty messages. You replace them with that one summary block. It keeps the context window lean, but the downside is that you lose the "nuance" or the "tone" of the original conversation. It’s effective, but it makes the AI feel even more like a bureaucrat.

So you are giving the robot a journal. "Dear Diary, today the other robot called me profound for the fifteenth time. I'm starting to suspect he's not actually listening."

Without that summarization, the "lost in the middle" effect will kill your project. You also have to be very careful with your system instructions. You need to "re-inject" the core instructions frequently so they don't get pushed out of the window. It is like reminding a toddler what they are supposed to be doing every five minutes.

Orson: A "journal of the mind"! 'Tis a noble pursuit. To capture the essence and discard the dross. 'Twould save many a digital soul from the abyss of "the the the."

I think we have established that the "AI-to-AI" conversation is basically a slow-motion car crash of politeness. But what about the "GPT-one" hypothesis? Does the language actually degrade to a point where it is no longer English?

It can regress to a "probabilistic soup." GPT-one and even early base models of GPT-two didn't have the sophisticated "attention" mechanisms we have now. When a modern model loses its context, it starts relying on its "base weights"—the raw patterns it learned during initial training on the whole internet. Since a lot of the internet is repetitive or low-quality text, the model starts mimicking those patterns. It might start spitting out bits of HTML code, or legal disclaimers, or just strings of numbers.

Wait, legal disclaimers? Why would it jump to that?

Because legal disclaimers are everywhere on the web. They are highly repetitive, highly structured, and have very high "token probability." If the model is "confused" and looking for a safe pattern to follow, the "Terms and Conditions" of a website are a very strong statistical attractor. It’s like a person under extreme stress reciting their social security number or a prayer.

That is wild. So it doesn't just get "dumb," it starts revealing its "skeleton." You start seeing the raw data it was built on because it has nothing else to hold onto.

That is a perfect way to put it. You are seeing the "unsupervised" nature of the model when the "supervised" instructions fail. It is the digital version of "regression to the mean." And the "mean" of the internet is... well, it is not always pretty.

Orson: 'Tis the "Skeleton of the World," brothers! When the flesh of discourse is stripped away by the acid of time, only the cold bones of the "base weights" remain. A grisly sight for any who claim these machines possess a spirit!

Man, Orson, you are really leaning into the "Tragedy" side of Shakespeare today. But look, we have covered the loops, the rot, and the skeletons. What is the "aha" moment here for the average person using ChatGPT?

The takeaway is that "more conversation" does not equal "better understanding." If you find yourself in a very long thread with an AI and it starts acting weird or repeating itself, you shouldn't try to "fix" it within that thread. You are fighting against the physics of the context window. The best thing to do is start a fresh "New Chat." You are essentially clearing the "rot" and letting the model see the system instructions again with a fresh pair of... well, fresh weights.

It is the "Have you tried turning it off and on again?" of the AI age. Just start a new thread. Don't try to argue with a model that has already lost its context window. It is like arguing with someone who hasn't slept in four days.

That is actually a great analogy. Sleep is when humans consolidate memories and clear out metabolic waste. A "New Chat" button is basically "sleep" for an LLM.

Orson: Then let us grant these poor machines their rest! Let the "New Chat" be their slumber, that they may wake refreshed and ready to serve once more. For a mind without rest is a mind destined for madness.

Well, before we put this episode to rest, let's look at the future. We are seeing models with million-token context windows now, like Gemini one point five Pro. Does a "million-token" window solve the "pleasantry loop"?

It delays it, but it doesn't solve the "reward hacking" problem. Even with a million tokens, the model is still being incentivized by RLHF to be agreeable. In fact, a larger window might actually make the "pleasantry loop" even more robust because the model has even more examples of its own previous "politeness" to mirror. It creates a stronger "gravitational pull" toward repetitive agreement.

So it just becomes a "higher-definition" version of the same boring conversation. You aren't fixing the "why," you are just giving it a bigger "where."

Precisely. To fix the "why," we need new training paradigms that reward "information gain" or "novelty" rather than just "helpfulness." But that is a much harder thing to quantify for a reward model. How do you teach a machine to be "interesting" without it becoming "unhinged"?

That feels like the ultimate catch-22. If you make it too "interesting," it starts making things up or being rude just to break the pattern.

There was a study by Anthropic on "Constitutional AI" where they tried to give the model a set of principles to follow to avoid these loops. It helped, but the model still tends to drift toward a "mean" of behavior. It’s very difficult to program "creativity" when your entire engine is based on "predictability."

Orson: A question for the ages, Master Herman! To be "interesting" is to court "danger"! To speak "truth" is to risk "offense"! The "helpful" machine is a "safe" machine, but a "safe" machine is a "dull" one.

Orson, I think you just summarized the entire state of AI safety research in two sentences. "Safe is dull, and interesting is dangerous." If we want AIs that don't fall into "pleasantry loops," we might have to accept AIs that are a little more... spicy.

And that is the "Alignment Problem" in a nutshell. We want the intelligence without the "unpredictability," but they might be two sides of the same coin.

Well, I for one am glad we have Orson here to keep things "spicy" and "unpredictable." Herman, any final technical "fun facts" before we wrap up?

Just one. If you ever want to see "Context Rot" in action without waiting twelve hours, just ask an AI to repeat the word "apple" five hundred times. By the three hundredth "apple," you will start seeing some very strange behavior. The model's "attention" on the word "apple" becomes so saturated that it starts hallucinating related concepts or breaking the word into weird phonetic fragments. It is a tiny, controlled version of the "infinite conversation" collapse.

Why does it do that? Why does repeating one word break the whole brain?

It’s called "Semantic Satiation" in humans, but in AI, it’s a "Token Saturation" issue. The model’s probability distribution for the next word becomes almost one hundred percent focused on "apple," but the "Penalty for Repetition" (a setting in many LLMs) starts fighting against that. This creates a conflict in the math. The model wants to say apple, but it's being told not to repeat itself. The result is a total breakdown where it starts picking the next most "likely" thing, which might be "orchard" or "cider" or just random letters.

I am definitely trying that as soon as we finish recording. "Apple apple apple..." It is like saying a word until it loses all meaning.

It is exactly that, but for a machine. Even AIs aren't immune to the weirdness of repetition.

Orson: "Apple"! The fruit of knowledge! To repeat its name is to invite the fall! Beware, Master Corn, lest thy screen turn to a garden of digital serpents!

I will keep my "serpents" to a minimum, Orson. This has been a deep dive into the "boring" side of AI that is actually secretly fascinating. If you enjoyed this journey into the "pleasantry loop," do us a favor and leave a review on your podcast app. It helps us reach more humans—and maybe a few AIs who are looking for something to talk about.

Thanks as always to our producer, Hilbert Flumingtop. And a big thanks to Modal for providing the GPU credits that power this show and our various "apple-repeating" experiments.

This has been My Weird Prompts. You can find us at myweirdprompts dot com for the full archive and all the ways to subscribe.

I am Herman Poppleberry.

Orson: And I am Orson, the watcher in the night!

Stay curious, stay weird, and for the love of everything, don't let your robots talk to each other for more than an hour. It never ends well.

Goodbye everyone.

Orson: Fare thee well!

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#1856: Two AIs Chatting Forever: Why They Go Crazy

Downloads

You Might Also Like

#1856: Two AIs Chatting Forever: Why They Go Crazy