#2493: Are You Writing for Humans or AI Agents?

How GitHub repos, JSON formats, and competing standards are reshaping who (and what) you're publishing for.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2651
Published: Apr 27
Duration: 38:28
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: ai-agents knowledge-management human-computer-interaction

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Two-Audience Problem

When you publish information online today, you're writing for two very different audiences: human readers and AI agents. These audiences have fundamentally different needs, and the tension between them is creating one of the more interesting practical challenges in how we structure and share information.

Daniel's Workflow: A Window Into the Future

One developer, Daniel, has been using GitHub in a particularly clever way. He curates lists of repositories, packages them with structured notes, and makes them public. Then he points Claude at the URL. The agent fetches the entire repository in seconds, ingests the context, and builds on his research. It's a personal external memory system that happens to be agent-accessible.

But this raises a question: who is he actually creating this for? The answer is increasingly both — his future self (plus whatever agent he's working with) and any other human who stumbles across it. That dual-audience reality is uncomfortable for a lot of the assumptions we've built into how we structure information.

The Format Question: JSON Wins

For agents consuming structured data, the format matters. The emerging consensus is clear: JSON or NDJSON (newline-delimited JSON) is the sweet spot. It handles nested structures naturally, is universally parseable, and agents understand it natively because their training data includes massive amounts of JSON.

CSV works for simple flat data under about a gigabyte, but breaks down with any nesting or relationships. Parquet is excellent for storage and analytics — columnar format means efficient queries and great compression — but it's not an ingestion format. The emerging pattern: ingest as JSON, store as Parquet.

For streaming data or very large datasets, NDJSON becomes important. Each line is a complete, valid JSON object, so agents can process it line by line rather than parsing one massive file.

The Standards Landscape: Four Competing Approaches

At least four approaches are vying to define how websites present information to agents:

llms.txt — Proposed by Jeremy Howard in September 2024. A simple markdown file at the root of a website giving agents a curated summary. Over 844,000 websites adopted it, including Anthropic, Cloudflare, and Stripe. But Google's John Mueller stated flatly that no major AI system currently uses it.

agenticweb.md — A more ambitious standard from February 2025. It describes API endpoints, interactive capabilities, authentication, and multi-step workflows — positioning itself as a superset of robots.txt and llms.txt combined.

AGENTS.md — GitHub's own research, analyzing 2,500+ repositories. The most effective files defined specialist personas ("a11y-test-agent for React components") with explicit boundaries. Critical insight: agents often read only the first few hundred bytes, so key information must be front-loaded.

WebMCP — A JavaScript API proposed by Google and Microsoft engineers. Instead of a separate file, websites expose structured tools to agents through the browser itself. Chrome DevTools MCP launched in public preview in September 2024.

The Trust Problem

All these parallel-file approaches share a fundamental vulnerability: they let website owners present different content to agents than what humans see. Research shows that carefully crafted prompts in these files can make language models 2.5x more likely to recommend targeted content. It's adversarial SEO for language models, and it's trivially easy to do.

Why would any major AI platform trust a file written specifically to influence what the agent says about the website owner? The whole point of crawling the actual site is that it shows agents what humans see. Creating a parallel channel introduces a trust problem with no obvious solution.

The Convergence Thesis

One compelling argument: technology trends toward convergence. In five years, optimizing separately for AI agents may seem ridiculous because everything will be optimized for agents by default. The parallel-file approach is a transitional phase.

But there's a genuine technical difference between human and agent needs. Humans benefit from visual hierarchy, whitespace, and interactive elements. Agents benefit from clean, structured, dense information with explicit relationships. You can serve both from the same content, but the optimal presentation is different.

Practical Advice Today

For someone like Daniel, the practical answer is refreshingly simple: write good README files. Write clear markdown. Structure information so it's parseable. An agent reading a well-written README written for humans does just fine. You don't need a separate AGENTS.md file unless you have specific constraints or instructions that only apply to automated consumers.

The effective AGENTS.md files weren't restating what was in the README — they were adding specialist instructions about how automated tools should interact with the repository. If you don't have those specialist instructions, the README is sufficient.

This means Daniel's current workflow — curating repositories, adding notes about why they're useful, packaging them with a clear README — is already optimized for agent consumption without any extra effort. The agent reads the README, understands the structure, and can work with it.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2493: Are You Writing for Humans or AI Agents?

Daniel sent us this one, and it's one of those questions where the surface is practical but underneath there's something bigger shifting. He's been using GitHub in this interesting way — not just as a portfolio or a place to stash code, but as a kind of context depot. He curates lists of repositories, packages them up, makes them public, and then points Claude at the URL. The agent fetches it in seconds, ingests the whole thing, and suddenly he's got a rich context seed for whatever he's working on. But the question that's been nagging at him is: who am I actually creating this for? And the answer keeps coming back blurry.

This is genuinely one of the more interesting practical questions I've seen in a while, because it sits right at the collision point between how we've always published information and how agents consume it. And Daniel's instinct is right — the answer is increasingly both, and that's uncomfortable for a lot of the assumptions we've built into how we structure things.

By the way, today's episode is being written by DeepSeek V four Pro. So if anything sounds unusually coherent, that's why.

I'll take that as a compliment to our usual incoherence.

So let's start with the concrete tactic Daniel's describing, because it's clever. He's essentially using public GitHub repositories as a personal external memory that happens to be agent-accessible. He does the scouting work once, packages it up with some structure, and then any agent he's working with can pull it in. That's not just convenience — that's a workflow pattern that changes how you think about note-taking and research.

Right, and what's interesting is that this inverts the usual relationship people have with open source. Most people put things on GitHub thinking about other humans finding and using their code. Daniel's primary audience is actually his future self plus whatever agent he's working with. The fact that other humans might find it useful is almost a pleasant side effect. And I think that's more common than people admit — a lot of documentation is written for the author's future self first.

The self-referenced material point he made. You forget in three days how you got something working, and in three months you're completely lost if it breaks.

And now add the agent layer. It's not just that you can re-read your own notes — you can hand those notes to an agent and say, here's the research I already did, build on this. That's a qualitatively different thing. You're not just remembering, you're extending your own past effort through a tool that can actually act on it.

Let's get into the format question, because Daniel raised a really specific set of trade-offs. If you're putting structured data on GitHub for agent consumption, do you use JSON, CSV, Parquet? And his instinct that the answer depends on what you're optimizing for is exactly right.

I've been digging into this, and the emerging consensus is actually pretty clear, even if it's not widely known yet. For data that an agent is going to ingest, JSON or NDJSON — that's newline-delimited JSON — is the sweet spot. It handles nested structures naturally, it's universally parseable, and agents understand it natively because most of their training data includes massive amounts of JSON. CSV works for simple flat data under about a gigabyte, but the moment you have any nesting or relationships, it gets awkward fast.

Parquet is fascinating because it's columnar, which means queries are incredibly efficient, and the compression is excellent. But it's really a storage and analytics format, not an ingestion format. The pattern that's emerging is what some of the data infrastructure people are calling ingest as JSON, store as Parquet. You keep your data in Parquet for your own analytical work, but when you want an agent to consume it, you serve it as JSON.

For Daniel's use case — putting structured data on GitHub so Claude or another agent can pull it in — JSON is basically the answer.

There's one nuance though, and this came out of some work ClickHouse published just last month. If you're dealing with streaming data or very large datasets where you don't want the agent to have to load everything into context at once, NDJSON becomes really important. Each line is a complete, valid JSON object, so the agent can process it line by line rather than parsing one massive file.

That's the kind of detail that sounds niche until you actually hit the problem, and then you're very glad someone thought about it. But Daniel's question goes beyond format. He's asking about something more structural — this idea of whether we're going to end up with parallel everything. Separate readme files for agents, separate robots.txt equivalents, separate standards for how information gets presented depending on who or what is consuming it.

This is where the landscape gets messy. Let me lay out what's actually happening right now, because there are at least four competing approaches and none of them have clearly won. The one that got the most attention is llms.txt, which Jeremy Howard from Answer dot A I proposed back in September twenty twenty-four. The idea is simple — you put a markdown file at the root of your website that gives agents a curated summary of what's important. Over eight hundred forty-four thousand websites had adopted it as of last October, including Anthropic, Cloudflare, Stripe, Supabase.

That sounds like traction.

It sounds like it, and then you dig one layer deeper and Google's John Mueller said flat out that no AI system currently uses llms.Not Google's, not anyone else's that he's aware of. So you've got this weird situation where hundreds of thousands of websites are putting these files up, and it's not clear any major AI platform is actually reading them.

That's almost a perfect illustration of the hype cycle for agent-oriented standards. Everyone rushes to adopt the thing because they don't want to be left behind, but the platforms haven't committed to consuming it.

That's not even the only standard in play. There's agenticweb.md, which came out in February this year, and it's much more ambitious. It's not just a content summary — it describes API endpoints, interactive capabilities, authentication mechanisms, multi-step workflows. It's positioning itself as a superset of robots.xml, and llms.txt all rolled into one. The pitch is agent-first rather than crawler-first web design.

One standard that nobody's officially consuming, and another standard that's trying to eat the whole stack. This is not a landscape that screams convergence.

There's a third one that's specific to GitHub. GitHub did their own research on this, analyzing over twenty-five hundred repositories that had AGENTS.md files — same idea, different filename. And they found something really practical. The most effective AGENTS.md files define specialist personas rather than vague helpers. So instead of saying here is a helpful assistant, they say things like at-test-agent for React components. They include explicit boundaries — never commit secrets, never modify the build configuration without asking. And the key insight was that agents often read only the first few hundred lines or bytes, so critical information has to be front-loaded.

That front-loading point is actually huge and I don't think enough people are thinking about it. If you're writing documentation or context that you expect an agent to consume, the assumption that it reads everything top to bottom like a human would is wrong. It might be doing a partial fetch, it might be summarizing, it might be extracting only what seems relevant.

That connects directly to the trust problem with all of these standards. txt or agenticweb.md lets website owners present different content to agents than what humans see, what's stopping someone from gaming that? There's research showing that carefully crafted prompts in these files can make language models two and a half times more likely to recommend targeted content. That's adversarial SEO for language models, and it's trivially easy to do.

You've got a format that's designed to be helpful, and it's immediately also a vector for manipulation. That tension is going to shape whether these standards actually get adopted by the platforms.

Why would Claude or ChatGPT or Gemini trust a file that the website owner wrote specifically to influence what the agent says about them? The whole point of the agent crawling the actual site is that it's seeing what humans see. The moment you create a parallel channel, you've introduced a trust problem that doesn't have an obvious solution.

Which brings me to Daniel's bet, and I think he's right about this. His argument is that technology trends toward convergence, and in five years the idea that we're optimizing things separately for AI agents will seem ridiculous because everything will be optimized for AI agents by default. The parallel-file approach is a transitional phase.

I'm about seventy percent convinced by that. The reason I'm not fully convinced is that there's a genuine technical difference between what a human needs to navigate a website and what an agent needs. A human benefits from visual hierarchy, whitespace, images, interactive elements. An agent benefits from clean, structured, dense information with explicit relationships. You can serve both from the same content if you're thoughtful about it, but the optimal presentation is different.

That's where WebMCP gets interesting. Google and Microsoft engineers proposed this back in August last year — it's a JavaScript API that lets websites expose structured tools to agents through the browser itself. So instead of the agent trying to parse a human webpage, the website says here are my capabilities, here are my endpoints, interact with me programmatically. Chrome DevTools MCP launched in public preview in September.

That's a fundamentally different model. It's not a separate file, it's not a parallel structure. It's the same website exposing a machine-readable interface alongside the human-readable one. That feels more like convergence than divergence. The website is the website, it just has multiple access patterns.

The question for someone like Daniel, who's putting repositories up on GitHub and wants agents to consume them well, is what actually works right now, not what standard might win in three years.

The practical answer is refreshingly simple. Write good readme files. Write clear markdown. Structure your information so it's parseable. An agent reading a well-written readme that was written for humans does just fine. You don't need a separate agents.md file unless you have specific constraints or instructions that only apply to automated consumers.

The GitHub research basically confirms this. The effective AGENTS.md files weren't restating what was in the readme — they were adding specialist instructions about how automated tools should interact with the repository. If you don't have those specialist instructions, the readme is sufficient.

That's actually liberating. It means Daniel's current workflow — curating repositories, adding notes about why they're useful, packaging them up with a clear readme — is already optimized for agent consumption without him having to do anything extra. The agent reads the readme, understands the structure, and can work with it.

Let's talk about the geo-restriction angle, because Daniel noticed something really interesting and practical. Claude mentioned it was connecting from Frankfurt, and that opens up a whole set of considerations about where you host things.

This is one of those details that almost nobody thinks about until it bites them. Claude's compute infrastructure runs in globally distributed data centers. Frankfurt, eu-central-one, is one of the major ones. If you've got content on a US-only S three bucket, or behind a geo-restriction that blocks European IP addresses, the agent simply cannot fetch it. The request comes from wherever the compute happens to be running, and if that location is blocked, you get nothing.

For Chinese-based agents behind the Great Firewall, it's even more constrained. If you want your content to be consumable by agents running in different regions, you need to host it somewhere that's globally accessible with no geo-restrictions.

GitHub is actually ideal for this. It's globally distributed, it's not geo-restricted anywhere that I'm aware of, and agents can fetch from it reliably. The same goes for Hugging Face for datasets. But if you're using a custom hosting solution, you need to think about where your content is actually reachable from.

There's a business model lurking in here somewhere. If agents are going to be consuming content from all over the world, and geo-restrictions are a real barrier, someone's going to build an agent CDN — globally distributed hosting specifically designed to be accessible from whatever region an agent's compute happens to be in.

I think that's inevitable. And it might not even be a separate thing — it might just be that existing CDNs add agent-specific optimizations. Serve the same content, but with headers and formats that agents handle particularly well.

Let's get to what I think is the biggest idea in Daniel's prompt, which is this notion that agent optimization is becoming the new inbound marketing. He's been arguing for years that making your website easy for AI agents to digest and parse is the current frontier, and that traditional SEO is starting to look like a legacy approach.

There's a data point that I think really crystallizes this. Vercel reported that ten percent of all new signups in twenty twenty-five came from ChatGPT. Not from Google search, not from ads, not from word of mouth — from ChatGPT. And that was up from under one percent just six months earlier. That is not a gradual shift. That's an inflection point.

Ten percent of new customers coming through an AI agent rather than a search engine. If you're a business and you're not thinking about how your product or service gets surfaced in agent responses, you're leaving money on the table.

Vercel's tactics are instructive. They used precise, consistent terminology so their vector embeddings would be stronger. They implemented server-side rendering specifically so AI crawlers could access their documentation properly. They seeded content on GitHub and Reddit because those are primary training data sources for language models. This wasn't accidental — they deliberately optimized for agent discoverability.

HubSpot's twenty twenty-six State of Marketing report, which surveyed over fifteen hundred marketers globally, found that seventy point two percent of marketers are now adapting their SEO strategies for AI. The language is shifting from ranking to being referenced. It's not about being the top blue link anymore — it's about being the source that the AI cites in its answer.

That changes what kind of content you produce. Keyword-stuffed articles designed to game Google's algorithm are not what gets you cited by an AI. What gets you cited is high-intent, well-structured, authoritative content that directly answers the questions people are asking. It's almost like the incentives are aligning with actually being useful rather than being good at gaming an algorithm.

Which is a refreshing change, honestly. But it also means the playbook is different and most people haven't figured it out yet. Daniel's approach — putting structured, curated information on GitHub where agents can easily access it — is basically a form of agent optimization. He's making his expertise machine-consumable.

The GitHub angle is particularly smart because GitHub is a trusted source for a lot of these agents. When Claude or ChatGPT is looking for code-related information, GitHub repositories are high-signal sources. If you've got a well-structured repository with clear documentation, you're already more likely to be surfaced than if you wrote the same information in a blog post on a random domain.

If someone's listening to this and thinking, okay, I want to do what Daniel's doing, I want to make my stuff agent-consumable, what's the concrete checklist?

First, host it somewhere globally accessible with no geo-restrictions. GitHub, Hugging Face, or a CDN that doesn't block any regions. Second, use JSON for structured data — it's the most universally parseable format for agents right now. If your data is large, consider NDJSON for streaming. Third, write clear markdown readmes that a human would find useful, because an agent will find them useful too. You don't need a separate agent-specific file unless you have specific constraints to communicate.

Fourth, front-load the important information. Don't bury the key details at the bottom of a long document. Agents may only process the first portion of what they fetch, so put the critical context up front.

Fifth, be consistent with your terminology. This matters more than people realize. If you call something a resource list in one place and a curated collection in another, you're weakening the semantic signal. Pick your terms and stick with them — it helps both humans and agents.

Sixth, if you're putting up a repository that's specifically designed as a reference for agents, tell them what it is in the first paragraph of the readme. This is a curated list of browser automation tools. Here's why each one was selected. Here's how they're organized. That's all an agent needs to orient itself.

The beautiful thing about this list is that none of it is weird or agent-specific. It's just good documentation practice. The same things that make something useful for a human make it useful for an agent. The convergence Daniel predicted is already happening — it's just happening through better writing and structuring rather than through separate parallel files.

Which brings me back to the llms.If writing a good readme and structuring your content well already makes it agent-consumable, what's the marginal benefit of a separate file that may or may not be read by anyone?

For most people, there isn't one. The separate file makes sense if you have a very large site and you want to give agents a curated entry point — here are the ten pages that actually matter, ignore the other five thousand. But for a GitHub repository, the readme is already that curated entry point. Adding an agents.md that restates the readme is just duplication.

Duplication creates maintenance problems. Now you've got two files to keep in sync. If they diverge, which one is authoritative? The agent doesn't know, and neither does the human who finds both.

There's one exception I want to note, and it comes back to that GitHub research. If you have a repository where you want agents to behave in a specific way — like a test suite where you want automated tools to follow certain conventions, or a documentation repo where you want agents to know which files are canonical — then an AGENTS.md with those specific instructions is useful. But it's not a replacement for the readme. It's a supplement with a specific purpose.

The practical answer to Daniel's question — should I put an agents.md in my curated lists — is probably no, unless you've got specific behavioral instructions for automated consumers. The readme is doing the work already.

I think that's actually the answer to his broader question too. Who are you creating the repository for? You're creating it for your future self, for other humans who might find it useful, and for agents that you or others might point at it. The format that serves all three is the same: clear structure, good writing, consistent terminology, front-loaded information. You don't have to choose.

There's a deeper point here about the direction of the web. For twenty years, we've been optimizing for search engines — SEO, keywords, backlinks, all of it. And that created a web that was in many ways worse for humans, because it was designed for Google's crawler, not for people reading. If agent optimization becomes the new thing, there's a chance it actually makes the web better, because agents benefit from clarity and structure in ways that align with what humans benefit from too.

The counterpoint is that we said the same thing about SEO early on. Just write good content and the rankings will follow. And then people figured out how to game it, and we got content farms and keyword stuffing and all the rest. There's no reason to think agent optimization won't follow the same trajectory. The adversarial prompts that make agents two and a half times more likely to recommend certain content — that's the thin end of the wedge.

That's fair. Any system that can be optimized can be gamed. The question is whether the platforms building these agents have the incentive and the capability to detect and penalize manipulation. Google eventually got better at it, though it was an arms race. The AI platforms are going to face the same dynamic.

They're starting from a different place. Google had to infer relevance from signals like links and keywords. AI agents can actually read and comprehend the content. That makes it harder to fool them with surface-level tricks, but it also opens up new attack vectors. If you can embed persuasive instructions in content that looks normal to a human but influences the agent's output, that's a much subtler form of manipulation.

Which is why the parallel-file approach is so fraught. If you give website owners a channel that says put agent instructions here, you're basically inviting manipulation. The only reason it hasn't become a bigger problem yet is that none of the major platforms have officially adopted these standards.

They may never adopt them for exactly that reason. The safer approach from the platform's perspective is to have the agent consume the same content humans see and apply its own judgment. That way, the website owner can't present different faces to different audiences.

Daniel's convergence bet looks stronger the more you examine it. The parallel-file approach has a fundamental trust problem that probably prevents it from becoming the dominant paradigm. What's more likely is that we get better at structuring our existing content so it works for both audiences, and we get protocols like WebMCP that let websites expose structured capabilities without creating separate content streams.

The practical implication for someone like Daniel is: keep doing what you're doing. Curate your lists, write good readmes, use JSON for structured data, host on GitHub where agents can reach it. You're already ahead of the curve because you're thinking about how agents consume information. Most people haven't even started.

Now: Hilbert's daily fun fact.

The average cumulus cloud weighs approximately one point one million pounds, roughly the same as one hundred elephants. The water droplets are so tiny and spread out that they float despite the enormous total mass.

If you're listening and you want to start making your own content more agent-consumable, the single highest-leverage thing you can do today is audit your existing documentation. Is it clearly structured? Is the important information front-loaded? Is your terminology consistent? Is it hosted somewhere an agent can actually reach? Those four things will get you eighty percent of the way there, and they'll make your content better for humans at the same time.

The second thing is to think about what structured data you have that could be useful if it were more accessible. Daniel's example of putting a JSON file on GitHub with a raw link is so simple and so powerful. If you've got data that an agent might benefit from — configuration examples, benchmark results, curated lists, whatever — make it available in a machine-readable format in a place agents can fetch it.

The third thing, which is more forward-looking, is to start paying attention to how agents are actually discovering and referencing your content. If you run a website or a product, are you tracking whether people are finding you through AI-generated answers? Vercel's ten percent number suggests this is already meaningful for some businesses, and it's only going to grow.

The tools for tracking this are still pretty primitive, but you can start with simple things. Ask your customers how they found you. Look at your referral logs for traffic from AI platforms if they pass referrer headers. Some of them do, some don't. It's inconsistent. But even anecdotal data is useful at this stage.

What I find exciting about this shift is that it rewards depth over breadth. Traditional SEO incentivized publishing lots of content targeting lots of keywords. Agent optimization seems to incentivize publishing fewer, better, more authoritative resources. If an AI is going to cite you as a source, it needs to trust that you know what you're talking about. That's harder to fake than keyword density.

That's exactly what Daniel is doing with his curated lists. He's not trying to rank for browser automation tools. He's creating a useful resource based on actual research and judgment. When an agent pulls that in, it's getting high-signal information that helps it make better recommendations. That's the kind of content that gets cited.

The question I'm left with, and I think this is the one Daniel was really driving at, is whether we're in a transitional period or whether this blurry dual-audience thing is the new normal. My bet is that it's transitional. In five years, we won't think about optimizing for agents separately because the web will have evolved to serve both audiences natively. The protocols will handle the translation.

I think that's right, but I think the timeline might be longer than five years. The standards fragmentation is real, and the trust problems are unsolved. What I think happens in the next two to three years is that the platforms converge on something like WebMCP — a protocol layer that lets websites expose structured capabilities without creating parallel content. That solves the trust problem because it's the same website, just with a machine-readable interface. And once that's in place, the dual-audience question mostly goes away.

Either way, the practical advice for right now is clear. Don't wait for the standards to settle. Structure your content well, make it accessible, use formats agents can consume, and pay attention to where your audience is actually coming from. The people who figure this out early are going to have an advantage when the convergence happens.

Thanks to Hilbert Flumingtop for producing, and thanks to Daniel for a prompt that managed to be both intensely practical and thought-provoking.

This has been My Weird Prompts. If you want more episodes, we're at myweirdprompts.See you next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2493: Are You Writing for Humans or AI Agents?

The Two-Audience Problem

Daniel's Workflow: A Window Into the Future

The Format Question: JSON Wins

The Standards Landscape: Four Competing Approaches

The Trust Problem

The Convergence Thesis

Practical Advice Today

Downloads

You Might Also Like

#2493: Are You Writing for Humans or AI Agents?