#2684: When Agent Skills Collide: Context Windows & Plugin Design

How to handle overlapping agent skills and whether context windows will ever make the problem go away.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-2845
Published: May 7
Duration: 44:59
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: ai-agents context-window prompt-engineering

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

When you've got dozens or even hundreds of agent skills loaded into a single Claude session, a natural problem emerges: overlapping skills across different plugins. A normalization skill in a podcast plugin and a normalization skill in a general audio editing plugin — which one does the orchestrator pick when the user just says "normalize this audio"? Listener Daniel is building a catalog system to solve this, but he's betting against his own project's longevity, assuming expanding context windows will eventually make the whole concern irrelevant.

That assumption gets a serious challenge here. Context windows follow a pattern: every time capacity expands, we immediately fill it with more ambitious things. The "lost in the middle" problem means information at the edges of context gets weighted differently than information buried in the middle. Raw capacity doesn't solve the descriptive writing problem — skill descriptions are metadata that needs to survive regardless of how much room you have. The real solution is a two-tier disambiguation system: plugin-level descriptions handle coarse-grained filtering (which domain does this plugin serve?), while skill-level descriptions handle fine-grained selection (what specific parameters and targets does this skill use?). When user requests are ambiguous, a default hierarchy with one catch-all plugin provides the fallback. The catalog approach has durable value not just for token efficiency, but for imposing a taxonomy that helps the orchestrator navigate — even with a twelve million token window, why waste tokens on irrelevant skill descriptions?

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2684: When Agent Skills Collide: Context Windows & Plugin Design

Daniel sent us this one — he's been building a catalog system for agent skills, basically a way to organize and manage plugins for Claude when you've got dozens or even hundreds of them loaded. And he's wrestling with something really interesting. He's betting against the longevity of his own project. He figures once Anthropic rolls out better context handling or we get models with five or twelve million token windows, worrying about a few thousand tokens of skill descriptions is going to sound like people arguing about dial-up speeds in the nineties. But in the meantime, he's hit a very real problem: when you've got overlapping skills across different plugins — like a normalization skill that lives in both a podcast plugin and a general audio editing plugin — how do you handle that from a context management and descriptive writing perspective? How do you make sure the orchestrator picks the right one?

This is such a good question. And before we get into it — fun fact, DeepSeek V four Pro is writing our script today. So if anything comes out especially coherent, that's why.

I'll blame any incoherence on the model then.

But let's dig into what Daniel's actually asking, because there are really two problems here. One is the immediate engineering problem of disambiguation when skills overlap. The other is the philosophical question of whether any of this matters in the long run. And I think the second one is actually the more interesting place to start, because it shapes how you approach the first one.

Walk me through that. Because my instinct is that Daniel's right — this feels temporary. Context windows keep expanding, the overhead of a few thousand tokens starts looking trivial. Why wouldn't you just wait it out?

Here's the thing. I've been following the context window trajectory pretty closely, and there's a pattern that keeps repeating. Every time we get a bigger window, we immediately fill it with more ambitious things. It's like adding lanes to a highway — traffic expands to consume the new capacity. When we went from four thousand tokens to a hundred thousand tokens, did anyone say "great, now we can stop worrying about token efficiency"? No, we started loading entire codebases, entire conversation histories, entire documentation sets. The pressure never actually goes away.

You're saying even at twelve million tokens, we'll find a way to feel squeezed.

Because it's not just about raw capacity. It's about attention. These models, even with massive context windows, still exhibit what researchers call the "lost in the middle" problem. Information at the edges of the context gets weighted differently than information buried in the middle. So if you've got five hundred skill descriptions scattered across a twelve million token context, the orchestrator still needs to find the right one, and it still needs to be described clearly enough that the model can disambiguate. Raw capacity doesn't solve the descriptive writing problem at all.

That's a good pushback. So the skill description is almost like metadata that needs to survive regardless of how much room you have. It's not just a space-saving concern — it's a retrieval concern.

And this connects to something Daniel mentioned that I think is really underappreciated. He talked about the meta-skill of remembering you're writing for a robot, not a human. What seems obvious to you — "this normalization skill is for podcasts, not general audio" — is completely non-obvious to a language model that's just reading text. It doesn't have your mental model of your own plugin architecture.

This is where the disambiguation word caught my attention too. Because in traditional software engineering, disambiguation is handled through things like namespacing. You don't have two functions called "normalize" in the same scope — you have podcast::normalize and audio::normalize. But with these agent skills, the namespace is the natural language description itself.

And natural language is inherently ambiguous. Say you've got Daniel's setup — a podcast plugin with a normalization skill, and a general audio editing plugin that also has a normalization skill. The orchestrator sees a user say "normalize this audio file." Which skill does it pick? If both descriptions just say "normalizes audio," it's a coin flip. But if the podcast one says "normalizes audio to podcast loudness standards, targeting negative sixteen LUFS for spoken word content" and the general one says "normalizes audio to peak or RMS targets for music and sound design," suddenly the choice is clear.

The disambiguation lives in the specificity of the description. But that raises a tension Daniel pointed out — the more specific you make each plugin, the more plugins you end up with, and the more likely you are to have overlapping skills across those plugins.

And this is where I think the catalog approach Daniel's building actually has durable value regardless of context window size. Because the catalog isn't just about saving tokens — it's about imposing a taxonomy. It's about saying, "here are the domains, here are the subdomains, here's how skills relate to each other." That structure helps the orchestrator navigate even if there's plenty of room.

You're more bullish on the longevity of this than Daniel is.

I am, but with a caveat. I think the specific implementation — the way he's building it today to work around current context limitations — that part might be temporary. But the conceptual framework, the idea that you need a catalog or a taxonomy for agent skills, that's going to stick around. Because the fundamental problem isn't context window size. The fundamental problem is that as your skill library grows, the orchestrator needs help understanding what lives where and when to use what.

Let's get into the practical side then. Daniel's got this concrete scenario — normalization skills in two different plugins. He's asking whether it's bad practice to have the same skill spread across multiple plugins, and whether he should be putting disambiguating instructions at the plugin level. How do you actually handle this?

I think having the same skill in multiple plugins isn't inherently bad. It depends on whether the skill is genuinely different in those contexts. In Daniel's case, podcast normalization and general audio normalization probably are different — different loudness targets, different processing chains, different assumptions about the source material. So having two separate skills makes sense. But if they're truly identical, you've got a maintenance problem. Every time you update one, you have to remember to update the other.

That's the DRY principle — don't repeat yourself. But in the agent skill world, DRY might not always apply, because the context in which a skill is invoked changes how it should be described.

And this is where I think the plugin-level description becomes really important. Daniel mentioned the idea of putting guidance at the plugin level — "this plugin is for podcast-specific audio processing, use it when the user mentions working on a podcast." That acts as a kind of pre-filter. The orchestrator sees the user mention "podcast," jumps to the podcast plugin, and then within that plugin finds the normalization skill. The general audio plugin's normalization skill never even enters consideration.

It's almost like a two-tier disambiguation system. The plugin description handles the coarse-grained filtering, and the skill description handles the fine-grained selection.

That's exactly how I'd think about it. And this is where Daniel's catalog project has real legs. If you've got a well-structured catalog, the plugin descriptions naturally serve as that first tier of disambiguation. They tell the orchestrator what domain this plugin operates in, what kinds of tasks it handles, what keywords should trigger it. Then within the plugin, individual skill descriptions can be more focused on the specific parameters and variations.

Here's where I see a potential failure point. What happens when the user's request is ambiguous? "Normalize this audio" — no mention of podcast, no mention of music. The orchestrator has to guess.

That's a real problem. And I think the solution is what I'd call a default hierarchy. One of your plugins needs to be the catch-all, the general-purpose handler. And its skill descriptions should explicitly say "use this for general audio normalization when no specific domain is indicated." The other plugins should say "use this specifically for podcast audio" or "use this specifically for music production." That way, the orchestrator has a clear default path when the user isn't specific.

That's smart. But it requires you to actually think through your plugin architecture as a system, not just a collection of individual tools. And I suspect most people aren't doing that yet, because the whole agent skill ecosystem is still so new.

It's incredibly new. And that's part of why Daniel's catalog project is interesting — he's essentially doing systems architecture for agent skills before most people have even realized they need systems architecture for agent skills.

Let me push on something Daniel said though. He mentioned that having a few hundred plugins sounds like bad practice, but he compares it to the number of programs on your computer. And I think that analogy breaks down in an important way.

Tell me where.

On your computer, programs sit inert on your hard drive. They're not all loaded into RAM simultaneously. You launch the ones you need, when you need them. But with agent skills, at least in the current Claude paradigm, they're all loaded into the context window at session start. Every single one. So having five hundred plugins isn't like having five hundred programs installed — it's like having five hundred programs running at startup, all competing for the same finite resource.

That's a really important distinction. And it gets at why the catalog approach matters even more than it might seem at first glance. If the catalog lets you be selective about which skills are actually loaded into a given session, you're essentially doing what an operating system does — keeping things on disk until they're needed. That's a huge efficiency gain regardless of context window size.

The catalog becomes a kind of lazy loading mechanism. The orchestrator sees the user's intent, consults the catalog to figure out which plugins are relevant, and only loads those into context. That's actually a much more elegant architecture than just dumping everything in and hoping the model can sort it out.

That's exactly why I think Daniel's project has durable value. Even with a twelve million token context window, why would you want to waste tokens on irrelevant skill descriptions? It's not just about fitting within limits — it's about reducing noise. Every token of irrelevant skill description is a token that could have been used for something else, or a token that could distract the model from what it should be focusing on.

Let's talk about the descriptive writing side, because Daniel framed it as almost a sales pitch — your skill description is saying "hey orchestrator, choose me, I'm the right one for the job." And when you've got overlapping skills, that pitch needs to be really sharp.

This is where I think most people get it wrong. They write skill descriptions that describe what the skill does in isolation. "This skill normalizes audio." But when you've got overlapping skills, that's not enough. You need to write descriptions that explicitly differentiate from other similar skills. "This skill normalizes audio for podcast production, targeting spoken word loudness standards. For music normalization, use the audio editing plugin instead.

Wait, you're saying a skill description should reference other skills by name?

I'm saying it should when there's a genuine ambiguity. It feels wrong because in traditional programming, you'd never have one function's documentation say "don't use me, use that other function instead." But with agent skills, the orchestrator is making judgment calls based on natural language. Giving it explicit guidance about which tool to use when is not just helpful — it's necessary.

That's going to create maintenance headaches though. If you rename a plugin or reorganize your catalog, now you've got cross-references scattered through all your skill descriptions that need updating.

It's a tradeoff. But I think the alternative — letting the orchestrator guess and getting inconsistent results — is worse. And this is another argument for the catalog approach. If your catalog is well-structured and machine-readable, you could potentially generate those cross-references automatically rather than maintaining them by hand.

That's an interesting thought. The catalog as a source of truth that drives both the skill loading and the disambiguation hints. But now we're talking about building tooling around the tooling. At what point does this become its own whole thing?

I think it already is its own whole thing. That's what Daniel's recognizing, even if he's skeptical about the longevity. Agent skill management is emerging as a genuine discipline. It's not just prompt engineering — it's something closer to library design, or API design, but for natural language interfaces.

Let's get more concrete. Daniel's got this specific scenario with the podcast plugin and the audio editing plugin. Walk me through how you'd actually write the descriptions if you were setting this up.

Okay, so let's start at the plugin level. For the podcast plugin, I'd write something like: "This plugin contains skills for podcast production workflows. Use this plugin when the user is working on podcast audio, podcast editing, or mentions terms like episode, show notes, or podcast publishing. This plugin's skills are optimized for spoken word content and follow podcast industry standards.

That's already doing a lot of work. It's establishing domain, keywords, and the specific use case.

Then for the general audio editing plugin: "This plugin contains skills for general-purpose audio editing. Use this plugin for music production, sound design, audio restoration, and any audio work that isn't specifically podcast-related. If the user mentions working on a podcast, prefer the podcast plugin instead.

There it is — the explicit cross-reference. "Prefer the podcast plugin instead." That's the kind of thing that would make a traditional software engineer twitch.

It would, but it works. And then at the skill level, for the podcast normalization skill: "Normalizes audio to podcast loudness standards. Targets negative sixteen LUFS integrated, with a true peak limit of negative one dBTP. Applies gentle dynamic range compression suitable for spoken word. This is the preferred normalization method for podcast episodes. For general audio normalization, see the audio editing plugin.

The general normalization skill says the opposite — "for podcast audio, use the podcast plugin instead.

You're creating a web of explicit guidance that helps the orchestrator route requests correctly. It's verbose, it's a maintenance burden, but it dramatically reduces the chance of the wrong skill being selected.

What about the case where the user needs to use the general audio normalization skill on podcast audio? Maybe they're doing something unusual that the podcast-specific skill doesn't handle.

That's where the user's own language becomes the override. If they say "use the general audio normalization on this podcast file," the orchestrator should follow that explicit instruction regardless of what the skill descriptions say. The descriptions are guidance for when the user is ambiguous, not hard constraints that can't be overridden.

The skill descriptions are defaults, not rules. That's an important distinction.

It's crucial. And it's one of the things that makes this different from traditional programming. In code, if you call the wrong function, it either works or it doesn't. With agent skills, the orchestrator is making a judgment call, and the skill descriptions are evidence it weighs in making that call. But the user's explicit instruction should always outweigh the description.

Let's zoom out for a second. Daniel mentioned that he went on what he called a "skill writing binge" — took all his little workflow pieces from the past year and turned them into plugins, and suddenly had a hundred plugins loaded. The context overhead was enormous. But the second problem he identified is even more interesting to me: at scale, the descriptive writing quality becomes the bottleneck.

When you have five skills, you can write careful, differentiated descriptions for each one. When you have five hundred, the descriptions start to blur together. You forget what you named things. You accidentally create duplicates. The cognitive load on the skill author becomes the limiting factor, not the model's context window.

This is where I think the catalog project has its strongest argument for longevity. It's not just a workaround for context limitations — it's a tool for managing the author's own understanding of their skill library. Even if context windows become infinite, you still need to know what skills you have, what they do, and how they relate to each other.

And I'd go further. I think the catalog is actually more important as the library grows, not less. At five skills, you can keep it all in your head. At fifty, you start forgetting. At five hundred, without a catalog, you're lost. The catalog is your map of your own creation.

There's a parallel here to something I've noticed in software development generally. The tools we build for ourselves — the internal tools, the build scripts, the little utilities — they often start as quick hacks and then become permanent infrastructure. Daniel's catalog might have started as a workaround for context limitations, but it's solving a problem that exists independently of those limitations.

The problem of organizing knowledge. Which is, when you think about it, one of the oldest problems in human civilization. Libraries, taxonomies, classification systems — we've been building catalogs for thousands of years. Daniel's just building one for a new kind of knowledge asset.

You're saying he's basically a librarian now.

I'm saying skill cataloging is a legitimate discipline that's going to stick around. And I think the people who are good at it — the people who can write clear, differentiated skill descriptions and organize them into coherent taxonomies — are going to be really valuable as agent systems become more widespread.

Let's talk about what makes a good skill description, because Daniel framed it as a sales pitch, and I think that's the right framing but maybe for the wrong reason.

What do you mean?

A sales pitch is about persuasion. But I think a good skill description is more like a search result snippet. When you're scanning search results, you're not looking to be persuaded — you're looking for the one that matches your specific intent. The best search result is the one that makes it immediately obvious whether it's the right thing or the wrong thing. A skill description should do the same. It should help the orchestrator rule it out just as efficiently as it helps rule it in.

That's a really good reframe. Because if you think of it as a sales pitch, you're tempted to make every skill sound appealing and comprehensive. "This skill can handle all your audio needs!" But that's exactly the wrong approach when you've got overlapping skills. You want each skill to sound specific and differentiated. "This skill handles podcast loudness normalization only. It does not handle music mastering or sound design.

The negative space matters as much as the positive space. Saying what the skill doesn't do is almost more important than saying what it does.

That's so counterintuitive to how we normally write documentation. We're trained to describe features, not limitations. But with agent skills, the limitations are the disambiguation.

Let's get into the practical mechanics of Daniel's catalog approach. He mentioned that he uses a "substrate mechanism" — which I think is his term for the underlying framework that the catalog runs on. What do we actually know about how it works?

From what he's described, the catalog is essentially a structured index of his skills, with metadata about what each skill does, what domain it belongs to, what triggers it, and how it relates to other skills. The key insight is that it's separate from the skills themselves — it's a layer of indirection that lets the orchestrator figure out what to load without having to load everything first.

It's like a card catalog in a library. You don't pull every book off the shelf to find the one you want — you check the catalog first, find the call number, and then retrieve the specific book.

And just like a library catalog, the quality of the metadata determines whether you find what you're looking for. If the catalog entry for "audio normalization" just says "makes audio louder," it's not going to help the orchestrator distinguish between the podcast version and the general version. The metadata needs to capture those distinctions.

This makes me think about something Daniel didn't explicitly ask but that seems relevant. What happens when a user's request spans multiple domains? "I'm editing a podcast episode that includes a music segment — normalize the whole thing." Now you need both the podcast normalization and the music normalization, applied to different parts of the same file. How does the orchestrator handle that?

That's a great edge case. And I think it's where the catalog really shines. If the catalog is well-structured, the orchestrator can see that there are two relevant skills in two different plugins, and it can reason about which one applies to which segment. But this requires the skill descriptions to be precise enough that the orchestrator can make that segmentation decision. "This skill applies to spoken word segments" versus "this skill applies to music segments.

The descriptions need to be not just about what the skill does, but about what kind of input it expects. The preconditions matter.

Preconditions, expected inputs, assumptions about the source material — all of that becomes part of the disambiguation. And this is where I think the field is going to evolve toward something that looks a lot like formal specification. Not full formal methods, but structured enough that the orchestrator can reason about skill selection programmatically.

That's a big leap from where we are now, where most skill descriptions are just a paragraph of natural language.

But I think it's the direction things are heading. Look at what's happening with function calling in general — the trend is toward more structured schemas, more explicit parameter definitions, more machine-readable metadata. Skill descriptions are going to follow the same path.

Let's circle back to Daniel's big question: is this catalog project temporary or durable? You've made a case for durable. Let me play devil's advocate for the temporary side.

Go for it.

The argument for temporary is that the whole plugin architecture in Claude is still in its early days. Anthropic could change how plugins work tomorrow. They could introduce namespacing natively, so you don't need to put "for podcasts" in your description — the system just knows that podcast/normalize and audio/normalize are different things. They could introduce lazy loading, so only relevant plugins get loaded into context. They could improve the orchestrator's ability to disambiguate based on context, making all this careful descriptive writing unnecessary.

Those are all plausible. And if any of those things happen, the specific implementation of Daniel's catalog might need to change. But I'd argue the catalog concept survives even then. Because even with native namespacing, you still need to know what's in your namespaces. Even with lazy loading, you still need something that tells the system which plugins are relevant to which tasks. Even with a smarter orchestrator, you still need well-structured metadata.

You're saying the catalog is an abstraction layer that sits above whatever specific mechanism Anthropic provides. It's valuable regardless of the underlying implementation.

That's my bet. And I think Daniel's instinct to build it now, even while acknowledging it might be temporary, is exactly the right instinct. Because you learn things by building the abstraction that you wouldn't learn by just waiting for the platform to solve the problem for you. You learn what metadata actually matters for disambiguation. You learn how skills relate to each other in practice, not just in theory. You learn where the ambiguities actually arise in real usage. All of that knowledge is transferable even if the specific implementation changes. It's like learning database design — the principles stick with you regardless of whether you're using Postgres or MySQL or something that hasn't been invented yet.

The principles of normalization and indexing apply across database engines. The principles of skill disambiguation and cataloging probably apply across agent platforms.

I think there's another reason the catalog has durable value that we haven't touched on yet. It's not just about the orchestrator finding the right skill. It's about you, the human, understanding your own toolkit. When you've got hundreds of skills, you need a way to browse them, to remember what you built, to avoid rebuilding something you already have. The catalog serves that purpose regardless of what the model's context window looks like.

It's a knowledge management tool for the skill author, not just a routing tool for the orchestrator.

And that's why I think Daniel's framing of "I'm betting against my own project's longevity" is maybe a little too pessimistic. The specific substrate mechanism might evolve, but the catalog concept — the idea that you need structured metadata about your skills — that's not going away.

Let's get into the weeds on the descriptive writing problem, because Daniel asked specifically about how to handle overlapping skills from a writing perspective. What are the actual techniques?

I think there are a few principles. First, be specific about the domain. Don't say "normalizes audio" — say "normalizes spoken word audio for podcast distribution." Second, be explicit about what the skill is not for. "This skill is not intended for music normalization or sound design." Third, reference the alternatives when there's genuine ambiguity. "For music normalization, use the audio editing plugin instead.

That third one still feels weird to me. It's like putting a sign on a restaurant that says "for better food, go next door.

It does feel weird. But the orchestrator isn't a human with feelings — it doesn't get offended by a skill that points to another skill. It just uses the information to make a better decision. And that's the -skill Daniel mentioned: remembering you're writing for a robot, not a human.

What about the problem of scale? Daniel mentioned that as your skill library grows, the chance of overlapping skills increases. At five hundred skills, you're going to have collisions. How do you manage the descriptions at that scale without going insane?

I think this is where the catalog structure becomes essential. You don't try to make every skill description perfectly disambiguated from every other skill. That's combinatorially impossible. Instead, you rely on the plugin-level descriptions to do the coarse filtering, and then within a plugin, you only need to disambiguate from the other skills in that same plugin.

The disambiguation problem is scoped by the catalog structure. You don't need to worry about the normalization skill in the podcast plugin conflicting with the normalization skill in the audio plugin, because the plugin descriptions already handle that routing.

And this is why Daniel's instinct to go granular and specific with plugins is actually the right approach, even though it creates more plugins. More plugins with clear, specific domains are easier to disambiguate than fewer plugins with broad, overlapping domains.

There's a limit to that, right? If you go too granular, you end up with a plugin for every single function, and then the plugin descriptions themselves become the bottleneck. The orchestrator has to read through five hundred plugin descriptions just to find the right plugin.

That's the tradeoff. And I think the sweet spot is somewhere in the middle — plugins that represent coherent domains of functionality, not individual functions. A podcast plugin, not a normalize-podcast-audio plugin. But within that, the specific domain boundaries need to be clear from the plugin description.

Let me ask you something that I think gets at the heart of Daniel's uncertainty. Do you think Anthropic is actually going to solve this problem at the platform level? Are we going to look back in two years and laugh at the idea of hand-crafting skill descriptions for disambiguation?

I think Anthropic will definitely improve the platform. We're already seeing movement toward better context handling, more efficient loading, smarter routing. But I don't think the platform will ever fully solve the disambiguation problem, because disambiguation requires understanding the specific domain and the specific intent of the skill author. That's not something a generic platform can do — it requires the human who built the skills to articulate what makes them different.

The platform can reduce the friction, but it can't eliminate the need for clear thinking and clear writing.

And that's why I think the catalog approach has legs. It's a framework for doing that clear thinking and clear writing in a structured way.

What about the idea of generating skill descriptions automatically? If you've got a catalog with structured metadata, could you have a system that writes the disambiguating descriptions for you?

I think you could, and that's actually one of the most exciting possibilities here. Imagine a catalog where you define your skills with structured metadata — domain, subdomain, inputs, outputs, preconditions, alternatives — and then the system generates natural language descriptions tailored for the orchestrator. You could even generate different descriptions for different contexts. A short description for when the orchestrator is doing initial routing, a longer description for when it's doing final skill selection.

That's getting into some pretty sophisticated territory. But it would solve the maintenance problem we talked about earlier. If you rename a plugin or reorganize your catalog, the descriptions get regenerated automatically rather than needing manual updates.

It would make the catalog the single source of truth, which is exactly what you want. Right now, a lot of skill authors are maintaining descriptions in two places — in the skill definition itself and in whatever documentation they're keeping. The catalog could unify that.

Let's bring this back to Daniel's specific situation. He's got a hundred plus plugins, he's seeing context overhead, he's dealing with overlapping skills. What's your actual recommendation for how he should handle the normalization skill duplication?

My recommendation would be to keep both skills but make them clearly differentiated in their descriptions. The podcast normalization skill should say it's for podcast spoken word content, targeting broadcast standards. The general audio normalization skill should say it's for music and sound design, and explicitly note that for podcast audio, the podcast plugin should be used instead. At the plugin level, the podcast plugin description should make clear it's the default for anything podcast-related. The general audio plugin should make clear it's the fallback for non-podcast audio work.

If he finds himself adding normalization to a third plugin — say, a video editing plugin — how does that change things?

Then you need to think about whether that third normalization skill is different from the first two. If video normalization has different loudness targets or different processing requirements, it deserves its own skill with its own differentiated description. If it's identical to one of the existing skills, you might want to consider extracting normalization into a shared utility that multiple plugins can reference, rather than duplicating it.

That shared utility approach — isn't that just another plugin? And doesn't that create the same context loading problem?

It does, but it's a different kind of problem. A shared utility plugin is a dependency. You're not asking the orchestrator to choose between three normalization skills — you're saying "these three plugins all depend on this one normalization skill." That's a cleaner architecture, but it requires the platform to support dependency management between plugins, which I'm not sure Claude currently handles well.

We're designing for a platform that's still evolving. Some of the architectural patterns we'd like to use might not be supported yet.

That's the tension Daniel's living in. He's building for the platform as it is today while trying to anticipate where it's going. It's a hard needle to thread.

Let's talk about the "gotchas" Daniel mentioned. He said there are situations where things look similar on the surface but have subtle differences that aren't apparent when you're writing the descriptions. What kinds of gotchas have you seen?

One common one is skills that have the same name but different assumptions about input format. You might have a "transcribe audio" skill in one plugin that expects WAV files and another that expects MP3 files. The descriptions both say "transcribes audio," but the orchestrator needs to know the format requirement to pick the right one.

The input and output types become part of the disambiguation. That's almost like type signatures in programming languages.

It's exactly like type signatures. And I think we're going to see skill descriptions evolve to include something like type signatures — structured metadata about what the skill expects and produces. "Input: WAV file, mono, forty-four point one kilohertz. Output: JSON transcript with timestamps." That level of specificity eliminates a whole class of disambiguation problems.

That's a lot of metadata to write. At five hundred skills, you're spending more time on the metadata than on the skills themselves.

Welcome to software engineering. The documentation always takes longer than the code. The difference here is that the documentation is functional — it's not just for human readers, it's part of how the system operates. So the investment pays off in better runtime behavior.

That's a good way to think about it. The skill description isn't documentation — it's part of the interface. It's more like a function signature than a docstring.

And when you frame it that way, the care you put into writing it makes a lot more sense. You wouldn't write a sloppy function signature and say "the compiler will figure it out." You make it precise because the precision matters for correctness.

Let's zoom out one more time to the big question. Daniel's betting against his own project's longevity. He thinks this is the equivalent of optimizing dial-up speeds. You've made the case that the catalog concept has durable value. But let's say he's right — let's say in two years, Anthropic has solved all of this at the platform level. Was building the catalog still worth it?

Because the skills he built, the workflows he captured, the domain knowledge he encoded into those plugins — all of that persists regardless of the catalog mechanism. The catalog was the tool that let him organize and manage that growing library. Even if the catalog itself becomes obsolete, the library it helped him build is still valuable.

The catalog is scaffolding. It helps you construct the building, and even if you take the scaffolding down later, the building remains.

And I think that's true of a lot of the tooling we build in emerging fields. The specific tools might be temporary, but the artifacts they help us create are durable.

Plus, there's the learning. Daniel now understands the disambiguation problem in a way he wouldn't if he hadn't tried to solve it himself. That understanding is going to make him better at designing skills regardless of what the platform looks like in the future.

The -skill of writing for robots. You only learn it by doing it.

Alright, let's wrap this up with some concrete takeaways. If someone's listening and they're starting to build out their own skill library, what should they be thinking about from a disambiguation perspective?

First, think in domains. Organize your skills into coherent plugins with clear boundaries. Second, write plugin descriptions that act as coarse filters — what domain does this plugin cover, what keywords should trigger it, what is it explicitly not for. Third, write skill descriptions that differentiate within the plugin — what makes this skill different from the other skills in the same plugin. Fourth, don't be afraid to reference other plugins or skills when there's genuine ambiguity. The orchestrator benefits from explicit routing guidance. And fifth, treat your skill descriptions as functional interfaces, not just documentation. The precision matters.

On the catalog question — is it worth building one?

If you've got more than about twenty skills, yes. The catalog becomes your map. It helps you understand what you have, avoid duplication, and maintain consistency. And even if the platform eventually makes the catalog unnecessary for context management, it'll still be valuable as a knowledge management tool for you, the skill author.

And now: Hilbert's daily fun fact.

Hilbert: In the 1720s, Spanish colonial administrators in what is now Equatorial Guinea attempted to adapt the Andean quipu system of knotted cords for local tax accounting, but the effort collapsed because the indigenous Bubi people already had a sophisticated oral accounting tradition and saw no reason to adopt a foreign recording method — an unintended consequence of underestimating local knowledge systems.

That's a very specific fact about a very specific failure of administrative overreach.

I feel like there's a metaphor in there about building workarounds when the existing system already works, but I'm not going to force it.

This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop. You can find us at myweirdprompts.com or wherever you get your podcasts. We'll be back with another one soon.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2684: When Agent Skills Collide: Context Windows & Plugin Design

Downloads

You Might Also Like

#2684: When Agent Skills Collide: Context Windows & Plugin Design