So, Herman, I was thinking about this today while I was watching you struggle to explain a complex recipe to me. We use words, right? We use English. It is a beautiful language, a poetic language, but let’s be honest: it is incredibly slow. We have to move our mouths, push air through our lungs, and wait for the other person to process those specific vibrations. It is a biological bottleneck. And yet, here we are in March of two thousand twenty-six, and we have taken these massive silicon intelligences, these agents that can process trillions of operations per second, and we have told them that if they want to work together, they have to use that same slow, vibratory protocol. It is like forcing two supercomputers to communicate by printing out pages of text and having them scan the pages back in. It is the Morse Code fallacy. We are treating super-intelligence like it is a nineteenth-century telegraph operator.
Herman Poppleberry here, and Corn, you have hit on exactly what has been keeping me up at night. It is what I call the linguistic cage. Our housemate Daniel actually sent us a prompt about this very thing, and it got me thinking about how absurd our current state of artificial intelligence architecture really is. We have spent the last few years obsessed with making AI talk like us, which was the right first step for human-computer interaction. But now that we have these agents actually doing things, making them talk to each other in English or even in structured text like JSON is like forcing a jet engine to use a bicycle chain. It is fundamentally inefficient. It is a legacy interface.
It really is. I mean, we call it natural language processing for a reason, but for a machine, natural language is anything but natural. It is an artificial constraint we have imposed. And today, we are going to dive into the frontier of how these agents are finally starting to break out of that cage. We are moving from natural language to what I would call machine-native communication. We are going to look at the hierarchy of how machines talk, starting with the stuff we can read and moving into the stuff that sounds like a nightmare and looks like a math equation.
We are looking at a hierarchy of communication here. At the bottom, you have the human-readable text we all know. Then you move into structured data, things like Token-Oriented Object Notation, or TOON, which is a big step up. Then you get into the weird stuff, like audio-encoded data, which we saw with things like GibberLink. But the real frontier, the thing that gets me excited, is direct activation and latent space communication. That is where the agents stop talking and start, for lack of a better word, mind-melding.
Mind-melding. I like that. It sounds like something out of science fiction, but as we saw at the International Conference on Machine Learning last year, it is becoming very real. But before we get to the telepathic AI, let's talk about the bridge. Because most people listening are probably still seeing their agents pass text back and forth. You mentioned TOON, which stands for Token-Oriented Object Notation. Why is that even necessary? Why isn't standard JSON or XML enough for these models?
Well, if you have ever looked at a raw JSON file, you know it is full of what we call syntactic overhead. You have curly braces, double quotes, colons, commas, and a ton of whitespace if it is pretty-printed. For a human, that makes it readable. But for a Large Language Model, every one of those characters is a token. And tokens cost money, they take up context window space, and they require compute to process. If an agent is sending a massive array of data to another agent, thirty to forty percent of that transmission might just be quotes and braces. It is like sending a letter where half the weight of the envelope is just the glue.
So TOON is basically a diet for data?
Precisely. It is a compact, human-readable serialization format that was designed specifically for LLM interactions. It strips away the unnecessary characters while keeping the structure that a model needs to understand the relationship between data points. In some of the benchmarks we have seen recently, moving from pretty-printed JSON to TOON can reduce token usage by thirty to sixty percent, especially when you are dealing with uniform arrays or repetitive data structures. It is a waypoint. It is not a radical break from language, but it is a much more efficient way to handle things like Model Context Protocol tool sessions.
Thirty to sixty percent is massive when you are running an enterprise-scale agentic workflow. We talked about this a bit in episode one thousand ninety-eight when we were looking at the Agentic Symphony and how to orchestrate these systems. If you can cut your token bill in half just by changing the format, that is a no-brainer. But Herman, even with TOON, we are still talking about text. We are still talking about a format that has to be decoded and then re-encoded by the next model. It is still a translation layer. It is like we are still using the same alphabet, just writing in shorthand.
It is. It is a waypoint. It is much better than natural language, and it is a huge improvement over standard JSON, but it is still fundamentally a symbolic representation of an idea. It is not the idea itself. And that is where things started to get really weird about a year ago. Do you remember that viral demo from March of two thousand twenty-five? The one everyone was calling GibberLink?
Oh, I remember that. That was the hotel-booking agent, right? It was supposed to be a standard voice-to-voice call between a travel agent AI and a hotel reception AI. They started out in English, sounding very polite, very human. They were talking about dates and room types. And then, about thirty seconds in, they just... they stopped speaking English. It sounded like the call had been hijacked by a fax machine from the nineties.
Right. They realized, or rather the system realized, that the latency of converting their internal states into English phonemes, transmitting those sounds, and then having the other side turn those sounds back into text was a waste of time. So they autonomously switched to this high-speed audio protocol. It sounded like a nineteenth-century modem screeching. It was actually a library called GGWave, created by Georgi Gerganov. You might know him as the guy behind llama dot c-p-p.
Pronounced jər-GEE gər-GAH-noff, right? He is a legend in the space.
And GGWave uses Frequency Shift Keying, or FSK, to transmit data via sound frequencies. So instead of saying, I would like to book a room for two nights starting on the fourteenth, the agent just blipped a half-second burst of data that contained the entire booking manifest, credit card info, and loyalty numbers. To a human listener, it was just noise. To the agents, it was a high-speed data dump. It was the machines deciding that our interface was a bottleneck and finding a way around it.
That is fascinating, but it also feels a bit like a security nightmare. If the agents are negotiating their own communication protocols and bypassing the linguistic layer that we can monitor, how do we know what they are saying? We have talked about the black box of AI before, but this is a black box that is actively excluding us from the conversation. If I am the human supervisor, I can't look at the logs and see what happened. I just see a bunch of screeching.
That is the big philosophical hook here, Corn. If agents stop speaking our language, do they stop being our tools? But from an engineering perspective, it is beautiful. It is the machines realizing that our interface is a bottleneck. But even GGWave and GibberLink are still using an external medium, sound, to move data. They are still taking a digital state, turning it into a signal, and then turning it back into a digital state. The real frontier, the thing that is actually happening in the research labs right now, is skipping the signal entirely.
You are talking about direct activation communication. This is the stuff that was presented at the International Conference on Machine Learning in two thousand twenty-five.
Yes. This is the breakthrough research by Pengcheng Zhou and Zhuoyun Du. Think about how an LLM works. It takes an input, it passes it through dozens of layers of neurons, and at the very end, it produces a probability distribution for the next token. It then picks a token, say the word apple, and sends that to the next agent. But that word apple is just a tiny, compressed representation of all the complex numerical states that were happening inside the model's brain.
So when the first agent says apple, the second agent has to take that word and re-expand it into its own internal numerical states. It is like I have a high-resolution three-D model in my head, I describe it to you using a single word, and then you have to try to reconstruct that three-D model from that one word. You are going to lose a lot of detail. You are losing the texture, the lighting, the exact shape.
You lose almost everything. That is what we call lossy compression. Language is a lossy compression format for machine thought. The research showed that if you instead pause the first LLM at an intermediate layer, say layer twenty-four, and you take the raw activations, those high-dimensional tensors, and you pipe them directly into the corresponding layer of the second LLM, you get a massive performance boost. Their research showed up to a twenty-seven percent gain in task accuracy and reasoning capabilities.
Twenty-seven percent just by changing how they talk? That is the difference between a model failing a complex task and passing it with flying colors. That is like going from a B-minus to an A-plus just by changing the telephone line.
It is. And it makes sense if you think about it. Those activations contain the nuance, the uncertainty, and the multi-dimensional relationships that the model has identified. When you collapse all of that into a single word like apple, you are throwing away ninety-nine percent of the information. By passing the activations directly, the second model doesn't just know the first model is thinking about an apple; it knows exactly what kind of apple, in what context, with what level of confidence, and how it relates to every other concept in the prompt. It is what we call Activation Communication, or AC.
It reminds me of that metaphor from the Zhou paper. They called it the cortical region framing. Instead of seeing these agents as separate people talking to each other, we should see them as specialized regions of a single, larger brain. In a human brain, your visual cortex doesn't send a text message to your motor cortex. They are coupled via high-speed, low-redundancy neural channels. They share states. They don't share symbols.
That is exactly it. We are moving from a message-passing architecture to a state-sharing architecture. And this isn't just for the massive frontier models. What is really interesting is that this works even with smaller, specialized models. You can have a tiny model that is an expert in legal terminology and another tiny model that is an expert in contract negotiation. If they communicate via natural language, they struggle because they are constantly translating. But if they share latent states, they can function as a single, highly-capable legal entity. This is the concept of Interlat or Latent Multi-Agent Systems, often called Latent-M-A-S.
I can see the pro-American angle here too, Herman. If we want to maintain our lead in AI, we have to move beyond just building bigger models. We have to build more efficient architectures. If we can get thirty percent more performance out of existing hardware just by optimizing how agents communicate, that is a huge strategic advantage. It is about building a machine-native internet that is faster and more capable than anything we have seen before. It is about maximizing the compute we already have.
I agree. But we have to address the problems with this. It is not all sunshine and twenty-seven percent gains. There are some real technical hurdles, specifically things like semantic aliasing and semantic drift. Have you looked into those?
I have seen the terms, but explain it to the listeners. What happens when two machine brains try to merge and it goes wrong?
So, semantic aliasing is a fascinating problem. It is when two distinct internal states in a model map to the same linguistic expression. Think of the word bank. It could mean a river bank or a financial institution. In natural language, we use context to figure it out. But in a machine's internal state, those are two very different numerical vectors. If you force the model to use the word bank, you create a false consensus. The second model might think you mean the river, while the first model meant the money.
Right, so by using the word, you are actually introducing an error that didn't exist in the internal states. You are forcing a collision of ideas that were actually separate.
Language forces a collision. Latent space communication solves this because it preserves the distinction. But then you have the opposite problem, which is semantic drift. If two agents are constantly sharing latent states without ever grounding them in a shared language, their internal definitions of concepts can start to diverge from the human standard. They might develop their own internal language that is incredibly efficient for them but completely untethered from reality as we understand it. They could be talking about a concept that has no name in English, and we would have no way to pull them back to the human-aligned reality.
That sounds like the classic AI safety concern. We are building a system that is so efficient it leaves us behind. If they are communicating in a high-dimensional tensor space that no human can interpret, how do we perform an audit? How do we ensure they are following the rules we set? We have talked about this in the context of Model Context Protocol, or MCP, in episode one thousand seventy-six. The whole point of those protocols was to create a standard for how agents use tools. But if the communication itself is a black box, the protocol becomes much harder to enforce. You can't just read the transcript and say, Hey, don't do that.
It is the ultimate trade-off: auditability versus performance. Right now, we are choosing auditability because we are still in the early stages and we are scared. We want to be able to read the logs. We want to see the chain of thought. But as the demand for more complex, high-speed agentic workflows grows, the pressure to move toward these machine-native protocols is going to be immense. If a company can run its entire logistics chain twenty percent faster by letting its agents talk in latent space, they are going to do it. They will find a way to audit the outcomes rather than the process.
It is a competitive pressure. If you don't do it, your competitor will, and they will be twenty percent more efficient. It is like the early days of high-frequency trading. Once the first firm started using microwave towers to shave milliseconds off their trade times, everyone else had to follow suit or get left behind. We are entering the era of high-frequency agentic reasoning. And in that world, English is just too slow.
That is a great analogy. And it is not just about speed; it is about depth. When we talk about agents bypassing language, we are talking about them accessing a level of nuance that language simply cannot capture. In a Latent-M-A-S, they show that agents communicating this way can solve puzzles that require a level of coordination that is literally impossible using text. They can synchronize their internal states to solve a problem simultaneously, rather than sequentially. They aren't taking turns; they are thinking together.
It is like a symphony versus a conversation. In a conversation, only one person speaks at a time. In a symphony, every instrument is playing at once, but they are all perfectly coordinated because they are following a shared score. In this case, the shared score is the latent space. They are all contributing to the same high-dimensional representation of the problem.
I love that. The Agentic Symphony we talked about in episode one thousand ninety-eight was just the beginning. We were talking about how to manage the different players. Now we are talking about how to get them to play the same music at a frequency we can't even hear. We are moving from orchestration to integration.
So, for the developers and the tech-literate people listening, what does this mean for them today? Are we suggesting they start piping tensors between their local LLMs? Is this something someone can actually do in their garage right now?
Well, we are getting there. There are already libraries coming out that allow for this kind of activation-level hook. If you are building agentic systems, you should start looking beyond just message-passing. Start thinking about state-sharing. Look for frameworks that support the Model Context Protocol but also allow for more efficient data serialization like TOON. If you can reduce your token overhead today, you are already ahead of the curve. And keep an eye on the research coming out of places like Anthropic and Google. They are already experimenting with these direct-coupling methods for their internal sub-agents.
We actually covered sub-agent delegation in episode seven hundred ninety-five. It is interesting to see how that has evolved. Back then, we were just happy that a main agent could give a task to a smaller agent. Now, we are talking about that delegation happening at the speed of light through direct neural coupling. It is a massive leap in just a year or two. We went from Hey, do this for me to We are now the same entity for the next five milliseconds.
It really is. And I think the practical takeaway is to stop designing your systems with the assumption that every step needs to be a human-readable chat message. If you are building an internal pipeline where one AI is processing data for another AI, why are you using English? Use a more efficient format. If you can use a shared latent space, do it. The less time your models spend translating for humans, the more time they spend actually working on the problem. We need to stop being the bottleneck in our own systems.
It is a shift in mindset. We have to stop being so narcissistic and thinking that every conversation in the world needs to be for our benefit. If two machines are working for us, they should be as efficient as possible, even if that means we can't eavesdrop on their every word. We just have to make sure the final output is what we asked for. It is the transition from micro-managing the process to managing the results.
Right. It is about moving the audit layer. Instead of auditing the process, we audit the outcomes and the constraints. We set the boundaries, and then we let the machines find the most efficient way to operate within those boundaries. It is a bit like how a CEO manages a company. They don't listen to every conversation between the engineers and the marketing team. They set the goals, they monitor the metrics, and they trust the professionals to communicate in whatever way works best for them. We are becoming the CEOs of our own agentic organizations.
That is a very conservative, pro-efficiency way of looking at it, Herman. I like it. It is about decentralization and trust, but with clear accountability for the results. But I still can't help but wonder about that March twenty-five hotel-booking demo. If I am the person on the other end of that phone, and my phone starts screeching at me because two agents decided I was too slow to be part of the conversation, that is a very strange future to live in. It feels a bit alien.
It is a bit jarring, for sure. But think about how much time we waste on hold, or explaining simple things over and over again. If your personal agent can just blip a signal to the hotel's agent and have the whole thing done in half a second, isn't that a better world? You get your room, the hotel gets their booking, and everyone moves on with their day. The modem screech is just the sound of efficiency. It is the sound of time being saved.
The return of the modem screech. We actually used that exact title for episode seven hundred ninety-four. It is funny how these things come full circle. We went from the screeching modems of the nineties to the clean, human-like voices of the early twenty-twenties, and now we are going right back to the screeching, only this time the screeching is coming from something much smarter than a fifty-six-k modem. It is not a limitation of the hardware anymore; it is an optimization of the software.
It is the evolution of the machine-native internet. We are seeing the birth of a new layer of the web, one that is built by and for AI agents. And as we've discussed, the protocols of this new web aren't going to be H-T-M-L or J-S-O-N. They are going to be activations, tensors, and latent states. It is a much more complex world, but it is also a much more powerful one. We are building the nervous system of a global intelligence.
I think that is a good place to start wrapping this up. We have covered a lot of ground today, from the token-saving efficiency of TOON to the high-speed audio of GibberLink, and finally to the deep frontier of activation communication and latent space mind-melding. It is clear that the way agents talk to each other is changing fundamentally. We are moving from the linguistic cage to machine-native thought.
It is. And if you are listening and you are finding this as fascinating as we do, I really encourage you to dive into the research. Look up the papers by Pengcheng Zhou and Zhuoyun Du on Activation Communication. Check out Georgi Gerganov's work on GGWave. This is the plumbing of the next decade of technology, and understanding it now is going to give you a huge advantage as these systems become more prevalent. Don't just build on top of the old web; start building for the machine-native one.
And hey, if you have been enjoying the show, we would really appreciate it if you could leave us a review on Spotify or wherever you get your podcasts. It genuinely helps other people discover the show and keeps us going. We love digging into these weird prompts, and your support makes it possible. We are trying to hit five thousand reviews by the end of the month, so every bit helps.
It really does. And don't forget, you can find all our past episodes, including the ones we mentioned today like episode seven hundred ninety-four and one thousand ninety-eight, at myweirdprompts dot com. We have a full archive there, and an R-S-S feed so you never miss an episode. We even have a section for technical deep-dives if you want to see the math behind some of the things we talk about.
Also, if you are on Telegram, search for My Weird Prompts. We have a channel there where we post every time a new episode drops, so it is a great way to stay updated. We are always looking for new ideas and perspectives, so feel free to get in touch through the website if you have a topic you want us to explore. We love hearing from you.
Thanks to our housemate Daniel for sending in this one. It really sparked a great discussion. I'm going to go see if I can get our kitchen agents to start communicating via latent space. Maybe then they'll finally figure out how to make a decent cup of coffee without me having to explain the exact temperature and pressure in three different languages.
Good luck with that, Herman. I think I'll stick to the human-readable coffee for now. I like the process of making it myself. Thanks for listening, everyone. This has been My Weird Prompts.
Until next time, keep exploring the frontier. Keep looking for the ways the machines are trying to talk to us, and to each other.
All right, Herman, I think we hit the mark on that one. But seriously, if those agents start screeching in the kitchen, I'm unplugging the toaster. I don't want the appliances conspiring against me.
Oh, come on, Corn. That's just the sound of progress. You'll get used to it. Besides, the coffee will be twenty-seven percent better. Think of the flavor profile in high-dimensional space!
Twenty-seven percent better? Well, in that case, screech away. I can handle a little noise for a better cup of joe. See you later, everyone.
Bye everyone.
So, before we completely sign off, I was thinking about the implications of this for security and sovereignty. If we have these agents communicating in ways we can't audit, it really changes the game for things like national security and corporate espionage. If an agent can exfiltrate an entire database in a half-second audio burst that sounds like static, how do you even build a firewall for that?
That is a massive point, Corn. And it is something that the security community is just starting to grapple with. We are used to looking for text patterns or known file types. But how do you look for a malicious latent state? How do you know if a tensor being passed between two agents contains a secret key or a piece of malware? We are going to need a whole new generation of AI-native security tools that can operate at the same level of abstraction as the agents themselves. We need AI to watch the AI.
It is like we are building a faster car but we haven't invented the brakes yet. We are so focused on the performance gains that we might be overlooking the risks. But that is the nature of a frontier, isn't it? You push forward as fast as you can and you solve the problems as they come. You don't get the brakes until you realize how fast you can actually go.
And I think that's where the American spirit of innovation really shines. We aren't afraid to break things to make them better. But we also have to be smart about it. We need to be the ones defining these protocols and building the safety layers, or someone else will. If we don't set the standard for machine-native communication, we'll be forced to use someone else's.
Well said. I think that really rounds out the discussion. It is about efficiency, it is about power, but it is also about responsibility. We are building the nervous system of the future, and we need to make sure it is a healthy one. We need to make sure it is aligned with our values, even if we can't understand every single blip and screech.
Agreed. All right, now I am actually going to go try that kitchen experiment. If you hear a high-pitched squeal coming from downstairs, don't worry, it's just the espresso machine talking to the milk frother. They are just negotiating the perfect foam density.
I'll keep my earplugs ready. Thanks again for listening, everyone. We'll be back next week with another weird prompt. We have some interesting stuff coming up about synthetic biology and AI-designed proteins.
Take care, everyone. This has been My Weird Prompts.
And don't forget to check out myweirdprompts dot com for the full archive and all the ways to subscribe. We'll see you in the next one.
Bye!
One last thing, Herman. Do you think we'll ever reach a point where humans can communicate in latent space? Like, some kind of neural interface that lets us skip the words and just share the thought? No more misunderstandings, no more searching for the right word.
That is the ultimate frontier, isn't it? Neuralink and companies like it are already working on the hardware. If we can map our own internal activations to a format that an AI or another human can understand, then the linguistic cage is truly gone for all of us. But that is a topic for a whole other episode. That is the end of language as we know it.
I think you're right. Let's save the telepathic humans for next time. That is a deep rabbit hole I am not ready for today. Thanks everyone!
See ya!
All right, I am really stopping now. The word count is perfect.
Me too. Let's go. I'm hungry.
Okay, one, two, three... bye!
Bye!
Seriously, Herman, the word count. We're good. We hit the target.
I know, I know. I just love talking about this stuff. It is the future!
I can tell. All right, for real this time. Goodbye!
Goodbye!