#2203: Knowledge Without Tools: Why MCPs Aren't Just for Execution

MCPs can be pure knowledge providers with zero tools. Here's why that matters for agents querying government data and authoritative sources.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2361
Published: Apr 13
Duration: 26:22
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: claude-sonnet-4-6
Topics: model-context-protocol knowledge-graphs rag

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Knowledge Without Tools: Why MCPs Aren't Just for Execution

Most discussions of the Model Context Protocol focus on one thing: tools. Give your agent the ability to call functions, write to databases, trigger workflows. But the MCP spec defines three primitives, and two of them have nothing to do with execution.

The Three Primitives

The MCP specification includes:

Tools — model-controlled, with potential side effects. The LLM decides when to invoke them.

Resources — application-controlled, explicitly read-only, managed by the host application.

Prompts — user-controlled, pre-built instruction templates that function like slash commands or guided workflows.

A crucial point: a valid, fully spec-compliant MCP server can expose zero tools. You declare your capabilities during initialization, and if you only declare resources and prompts, that's completely legitimate according to the spec.

This opens up an entirely different use case—building MCPs as pure knowledge providers with no execution capability whatsoever.

Why Not Just Use a REST API or RAG?

The case against traditional APIs is straightforward. REST and GraphQL were designed for human developers reading documentation and writing integration code. Agents need a different model. With M data sources and N AI applications, traditional integrations require M × N total connections. With MCP, each data source builds one server, each application implements one client—M + N integrations. The ecosystem math compounds quickly.

There's also a discoverability advantage. With REST, an agent still needs to be told the API exists. With MCP, resources are self-describing through the resources/list endpoint, which returns descriptions alongside URIs. An agent can interrogate the server and discover available knowledge without prior documentation.

The RAG comparison is more nuanced. RAG is probabilistic and implicit—you embed a query, search a vector database, and get semantically similar chunks. This works well for large unstructured document corpora. But retrieval isn't guaranteed, and curation is implicit, determined by indexing and chunking strategies.

MCP Resources are deterministic and explicit. A domain expert curates exactly which datasets, documents, or data points are exposed—without needing to understand embeddings, vector databases, or chunking. This democratizes knowledge curation.

Importantly, these approaches aren't mutually exclusive. An MCP server can wrap a RAG system internally, exposing the results as resources. You get MCP's discoverability and protocol benefits on top of RAG's retrieval power.

The practical split: MCPs work best for authoritative, curated, structured knowledge—regulations, government statistics, reference data, scientific constants. RAG remains appropriate for large unstructured corpora requiring semantic search.

Resources: The Specification

The Resources primitive is more sophisticated than it first appears.

Resources are identified by URIs in two flavors:

Direct resources — fixed URIs pointing to specific data, like census://population/2024.

Resource templates — RFC 6570 URI templates for parameterized access, like legislation://eu/regulation/{id} where agents fill in variables.

Each resource carries MIME type information, telling the client exactly what format the content is in—text, JSON, Markdown, GeoJSON for geospatial data, etc.

Three annotations enable knowledge curation:

Audience — set to user, assistant, or both, indicating who the content is meant for
Priority — a float between 0 and 1 for context budget management. 1.0 means this must go into the context window; 0.0 means include it if there's room
lastModified — an ISO 8601 timestamp enabling freshness-based filtering

A domain expert can annotate each piece of knowledge with its importance and recency, and the host application uses those signals to decide what enters the LLM's context.

Open Government Data at Scale

The practical upside becomes compelling when you consider available data.

The US data.gov portal has over 400,000 datasets. The EU data.europa.eu portal has over 1.8 million datasets across 208 catalogues spanning 36 countries. Most is freely licensed and exposed via SPARQL endpoints, REST APIs, and bulk downloads.

Almost none of this is reliably accessible to AI agents today. Models work from training data with a cutoff date and may be wrong about specific statistics or current regulations. This isn't just a hallucination problem—it's that even honest responses are stale by definition.

An MCP knowledge server wrapping government data changes this fundamentally. Resource templates could look like:

census://population/{country}/{year}
legislation://eu/regulation/{id}
environment://epa/air-quality/{location}/{date}

An agent in a regulatory compliance context could fetch the current version of a specific regulation as a resource, complete with a lastModified timestamp, and cite it. This is qualitatively different from hoping training data happens to be accurate.

The citability angle matters. Right now, when an agent makes a claim about a regulation, you have no way to audit its source. Resources with source URIs and timestamps create a paper trail.

The SPARQL Opportunity

The most technically ambitious version involves SPARQL, the query language for linked data and RDF graphs. The EU data portal exposes a SPARQL endpoint. SPARQL enables cross-dataset joins, ontological traversal, and federated queries across multiple government data sources simultaneously.

An MCP server translating SPARQL results into resources would give agents access to the entire linked data ecosystem. Resource templates could map to common query patterns, with the server handling SPARQL complexity internally.

This is a significant engineering project, but the pattern is established. The MCP reference server ecosystem already includes servers for PostgreSQL, SQLite, filesystem access, and web fetching. Wrapping a SPARQL endpoint follows the same architecture.

Building a Knowledge-Only Server

Building a knowledge-only MCP differs from tool-focused servers in several ways.

At initialization, you simply don't declare a tools capability. Your server announces resources and optionally prompts. Any MCP client understands the server is read-only—a meaningful signal to the ecosystem.

The security implications are significant. Tools can write to databases, send emails, call external APIs, trigger workflows. Knowledge-only servers have none of that. No write operations means no risk of data corruption. No external calls means no risk of unintended side effects. The attack surface is dramatically smaller.

This is the core insight Daniel's question surfaced: MCPs aren't just about giving agents more execution power. They're about building a protocol layer that treats knowledge provision as a first-class concern, with its own design patterns, security model, and ecosystem benefits.

BLOG_POST

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2203: Knowledge Without Tools: Why MCPs Aren't Just for Execution

So Daniel sent us this one, and I have to say it's a genuinely sharp angle on MCP that most coverage completely misses. He's asking: can MCPs be used exclusively to provide curated knowledge, with no tools at all? And if so, why would you choose that over just building an API or using RAG? He also wants to know whether this approach can ground agents in authoritative sources like open government data, and what's actually specific to building an MCP for context provision rather than tool execution. Good one, Daniel.

This is Herman Poppleberry, by the way, for anyone new to the show. And yeah, this question cuts right to something that bothers me about how MCP gets discussed publicly. Everyone talks about tools. Tools, tools, tools. Give your agent the ability to call functions, write to databases, trigger workflows. But the spec has three primitives, not one, and two of them have nothing to do with execution.

Right, and before we get into the mechanics, I want to flag something. Today's script is generated by Claude Sonnet four point six, which is a fun layer of recursion given that we're talking about a protocol Anthropic designed. Anyway. So, Herman, walk me through the three primitives, because I think the framing here is important.

So the MCP spec defines Tools, Resources, and Prompts. Tools are model-controlled, meaning the LLM decides when to invoke them, and they can have side effects. They're the things everyone talks about. Resources are application-controlled, meaning the host application manages when they're surfaced, and they're explicitly read-only. And Prompts are user-controlled, pre-built instruction templates, essentially slash commands or guided workflows. Now here's the thing most people don't realize: a valid, fully spec-compliant MCP server can expose zero tools. You declare your capabilities during initialization in the capabilities object, and if you only declare resources and prompts, that's a completely legitimate server. The spec explicitly supports this.

So you could have an MCP server that is, in its entirety, a curated library of authoritative data. No execution, no side effects, nothing that can go wrong in the way that tools can go wrong.

That's the design. And the Resources primitive is actually quite sophisticated when you look at it closely. Resources are identified by URIs, and you have two flavors. Direct resources, which are fixed URIs pointing to specific data, something like census://population/2024. And resource templates, which use RFC 6570 URI templates for parameterized access, so something like legislation://eu/{regulation_id} where the agent fills in the variable. Each resource carries MIME type information, so the server is telling the client exactly what format the content is in. Text, JSON, Markdown, GeoJSON for geospatial data, whatever is appropriate.

And there's a metadata layer on top of that, right? I was looking at the annotations in the spec and they're doing some interesting work.

The annotations are doing a lot of work, actually. There are three that matter for knowledge curation. First, audience, which can be set to user, assistant, or both, telling the client who this content is actually meant for. Second, priority, which is a float between zero and one, and this is the mechanism for context budget management. If you set a resource's priority to one point zero, you're saying this must go into the context window. Zero point zero means include it if you have room. And third, lastModified, an ISO 8601 timestamp that lets clients do freshness-based filtering. So you can build a server where a domain expert has explicitly annotated each piece of knowledge with how important it is and how current it is, and the host application uses those signals to decide what actually makes it into the LLM's context.

That's a remarkably thoughtful design for something that gets zero press coverage. But let me push on the why here. If I have a data source and I want agents to be able to query it, I already have options. I can build a REST API. I can set up RAG over a document corpus. Why would I add MCP to this picture?

So the API comparison and the RAG comparison are actually quite different arguments, and I want to take them separately. On APIs first. The core problem with REST or GraphQL for AI consumption is that APIs were designed for human developers. A developer reads the OpenAPI spec, understands the endpoints, writes integration code. An agent has to do something analogous, which means either you give the agent the API documentation and hope it figures out the right calls, or you write a custom integration layer for every AI application that needs this data. The MCP framing here is M plus N versus M times N. If you have M data sources and N AI applications, the traditional approach requires M times N integrations. With MCP, each data source builds one server, each AI application implements one client, and you get M plus N total integrations. The ecosystem effect of that math compounds quickly.

And there's something else about discoverability, right? With a REST API, the agent still needs to be told the API exists.

With MCP, resources are self-describing through the resources/list endpoint, which is paginated and returns descriptions along with URIs. The agent can interrogate the server and discover what knowledge is available without any prior documentation. That's a genuinely different model. The server is not just a data store, it's a data store that explains itself.

Okay, so APIs are designed for humans and MCP is designed for agents. That argument makes sense. Now the RAG comparison, because that's where I think it gets more interesting and more contested.

RAG is the more nuanced comparison, and I want to be honest about the tradeoffs because it's not a clean win for either approach. The fundamental difference is that RAG is probabilistic and implicit, while MCP resources are deterministic and explicit. When you do retrieval-augmented generation, you're taking a query, converting it to an embedding, doing a similarity search over a vector database, and getting back whatever chunks are semantically closest to the query. That's powerful for large unstructured document corpora where you don't know in advance what the agent will need. But the retrieval is probabilistic. You might get the right chunks, you might not. And the curation is implicit, it's determined by what got indexed and how the chunks were cut.

Whereas with MCP resources, someone has made an explicit decision: these are the things that exist, these are their identifiers, this is their priority.

A domain expert can curate exactly which datasets, documents, or data points are exposed as resources without needing to understand embeddings or vector databases or chunking strategies. That's actually a significant democratization argument. But here's where it gets interesting: these approaches are composable. The MCP spec explicitly says that applications can implement their own selection logic over resources, including embedding-based selection. So you can build an MCP server that wraps a RAG system. The RAG system handles the semantic retrieval internally, and the MCP server exposes the results as resources. You get the discoverability and protocol benefits of MCP on top of the retrieval power of RAG.

So MCP resources aren't a replacement for RAG in every case, they're a better fit when you have authoritative, curated, structured knowledge, and they can coexist with RAG for the cases where semantic retrieval is actually what you want.

For things like regulations, government statistics, reference data, scientific constants, legal definitions, MCP resources are the right tool. For a large corpus of internal company documents where you genuinely need semantic search to find relevant passages, RAG is still appropriate. And you can wrap that RAG system behind an MCP server so agents don't need to know which retrieval method is being used underneath.

Let's talk about the open government data angle, because I think this is where the practical upside becomes genuinely compelling. The scale of what's freely available is kind of staggering.

The numbers here are worth stating. The US data.gov portal has over four hundred thousand datasets as of now. The EU data portal, data.europa.eu, has over one point eight million datasets across two hundred and eight catalogues spanning thirty-six countries. Both expose SPARQL endpoints, REST APIs, and bulk download options. Most of this data is freely licensed. And almost none of it is reliably accessible to AI agents today, because agents are working from training data that has a cutoff date and may simply be wrong about specific statistics or current regulations.

And this is the hallucination problem from a different angle. It's not just that models confabulate, it's that even when they're being honest, their training data about something like current EU emissions regulations or the latest US census figures is stale by definition.

An MCP knowledge server wrapping government data changes that completely. You could have resource templates like census://population/{country}/{year}, or legislation://eu/regulation/{id}, or environment://epa/air-quality/{location}/{date}. An agent operating in, say, a regulatory compliance context could fetch the current version of a specific regulation as a resource, with a lastModified timestamp, and cite it. That's a qualitatively different level of reliability than hoping the training data happens to be accurate.

And the citability point is important. Because right now when an agent makes a claim about a regulation, you have basically no way to audit where that claim came from. With resources that carry source URIs and timestamps, you have a paper trail.

The SPARQL angle is worth dwelling on here too, because it's the most technically ambitious version of this idea. The EU data portal exposes a SPARQL endpoint. SPARQL is the query language for linked data, RDF graphs, and it's extremely powerful for cross-dataset queries. You can do joins across datasets from different government agencies, traverse ontological relationships, run federated queries across multiple data sources simultaneously. An MCP server that translates SPARQL query results into resources would give agents access to that entire linked data ecosystem. You'd have resource templates that map to common query patterns, and the server handles the SPARQL complexity internally.

That's genuinely powerful but also sounds like a significant engineering project.

It is, but the point is the pattern is established. The MCP reference server ecosystem already has servers for PostgreSQL, SQLite, filesystem access, and web content fetching. Wrapping a SPARQL endpoint follows the same architecture. Someone needs to build it, but the protocol infrastructure is there.

Alright, let's get into the building side of this. If I'm actually going to build an MCP server that is purely for knowledge provision, what's different about that process compared to building a tool-focused server?

Several things. First, the capability declaration at initialization. You simply don't declare a tools capability. Your server announces resources and optionally prompts, and that's it. Any MCP client will understand that this server is read-only. That's a meaningful signal to the ecosystem.

And the security implications of that are pretty significant.

The attack surface is dramatically smaller. Tools can have side effects. They can write to databases, send emails, call external APIs, trigger workflows. A knowledge-only server has none of that. No write operations means no risk of data corruption. No external calls means no risk of unintended side effects. You still need URI validation, access controls for sensitive resources, and rate limiting, but the threat model is fundamentally simpler. For deployment in sensitive environments, government, healthcare, finance, that simpler threat model is not a minor consideration.

What about URI scheme design? Because I've seen some MCP servers with pretty opaque resource identifiers and it seems like there's a right and a wrong way to do this.

URI design matters a lot for a knowledge server because the URIs are effectively the API surface. The principle I'd advocate for is self-documenting URIs. Compare census://q?id=123 to census://us/population/state/california/2024. The second one tells you what it is. When an agent is reasoning about which resource to fetch, a readable URI helps it make better decisions. Use custom URI schemes for domain-specific resources, so legislation://, census://, stats://. Use file:// for filesystem-like resources even if they're not actual files. And design the hierarchy to reflect the domain structure of the data.

What about the MIME type choices? Because government data in particular comes in a lot of formats.

Government data is notoriously heterogeneous. You've got text/plain for simple text, application/json for structured data, text/markdown for formatted documentation, application/geo+json for geospatial data. The server should be honest about what format it's returning. If you're returning a structured JSON object with population statistics, declare it as application/json. If you're converting a regulation document to something LLM-friendly, text/markdown is often the right choice because it preserves structure without XML or HTML overhead.

Let's talk about pagination, because government datasets can be enormous and I imagine the resources/list endpoint needs some care.

This is one of the places where people building knowledge servers can get into trouble. The resources/list operation is paginated via cursor-based pagination, and if you're wrapping something like data.gov with four hundred thousand datasets, you absolutely cannot return everything in one response. You implement cursor-based pagination, the client requests a page, gets back a cursor, uses the cursor to get the next page. Resource templates are also part of the answer here, because instead of listing every possible resource, you can list templates and let the client construct the specific URIs it needs. So instead of listing four hundred thousand census resources, you list the template census://population/{country}/{year} and the client fills in the parameters.

That's a much cleaner model for large data sources. Now, the subscription mechanism. I know MCP supports subscriptions on resources, and for government data that gets updated on schedules, that seems like it could be genuinely useful.

The subscription model is interesting and underappreciated. A client can call resources/subscribe on a specific resource URI, and when that resource changes, the server sends a notifications/resources/updated message. For government statistics that are updated monthly or quarterly, this means an agent or application can maintain a live connection to the knowledge source rather than polling. There's also a list change notification, listChanged, which tells clients when the catalog of available resources itself changes, so new datasets being published can be surfaced automatically. For a domain like regulatory compliance where a new regulation might be published and immediately relevant, that's a meaningful capability.

I want to come back to the Prompts primitive, because I feel like it's the most underexplained of the three. In a knowledge-only server, what role do prompts actually play?

In a knowledge-only server, prompts serve as what I'd call guided access patterns. The Prompts primitive lets the server define pre-built instruction templates that the user or application can invoke. So imagine a knowledge server wrapping EU regulatory data. You might define a prompt called analyze_regulation that, when invoked, embeds the relevant regulation resource directly into the conversation, and includes structured guidance for how the model should interpret it. Or a compare_statistics prompt that embeds multiple statistical resources and provides a comparison framework. The key mechanism here is that prompts can embed resources directly as content. A prompt message can include a resource content type that references server-managed content, so the knowledge flows into the conversation structure itself, not just as background context.

So the server is not just storing knowledge, it's also encoding knowledge of how to use the knowledge.

That framing is exactly right. And it means a domain expert building this server can encode their interpretive expertise into the prompt templates. They're not just curating data, they're curating the reasoning patterns for working with that data.

Let's talk about context window management, because this is where the rubber meets the road for knowledge servers in practice. Resources can be large. A full regulation document, a complete dataset. The LLM has a finite context window. How does the system handle that?

This is the critical challenge and the annotations are the primary mechanism for addressing it. The host application, the MCP client, is responsible for deciding what actually gets included in the LLM's context. The priority annotation gives the server a way to signal what's most important. A resource with priority one point zero should be included even if context is tight. Priority zero point zero means include it only if there's room. The application can implement its own selection logic on top of that: embedding-based selection, keyword search, rule-based filtering. The spec explicitly says applications could implement automatic context inclusion based on heuristics or the model's selection. So the protocol provides the metadata, the application provides the selection strategy.

And this is actually a more transparent system than RAG, in a sense, because with RAG the selection logic is buried inside the retrieval system and you often can't inspect it easily. Here, the curation decisions are explicit in the annotations, and the selection logic is in the application code.

The auditability argument is real. If an agent made a decision based on a specific resource, you can trace which resource URI was fetched, what its priority was, what its lastModified timestamp was, and what content it contained. That audit trail is much harder to reconstruct in a RAG system where the retrieval is probabilistic.

I want to come back to something you said earlier about the philosophical underpinning of MCP, because I think it's relevant here. The Latent Space analysis of why MCP won had a specific claim about the design philosophy.

The quote that sticks with me is that MCP was designed around the insight that models are only as good as the context provided to them. The Resources primitive is not an afterthought bolted onto a tool-calling protocol. It's a first-class design element that reflects a specific view: that the quality of knowledge available to a model at inference time is as important as the model's weights. That's a meaningful architectural stance. And it's why a knowledge-only MCP server is not a degenerate case or a missing-features server, it's a legitimate and in some ways cleaner use of the protocol than a tool-heavy server.

What about the practical takeaways for someone who's actually thinking about building this? Whether that's a developer working on agent infrastructure or a government data team thinking about how to make their data AI-accessible.

For a developer building a knowledge server, the priority list would be: start with the URI scheme design because that's your API surface and it's hard to change later. Get the MIME types right from the start. Implement cursor-based pagination immediately if you're wrapping any large data source, don't wait until it becomes a problem. Use the annotation system deliberately, every resource should have a meaningful priority value and a lastModified timestamp if the data has any temporal dimension. And think about which prompt templates would encode your domain expertise, because that's where a lot of the value of a curated knowledge server lives.

And for a government data team?

The opportunity there is significant and largely untapped. Most government data portals have APIs already, but those APIs weren't designed for LLM consumption. Building an MCP server on top of an existing government data API is not a massive engineering project. You're adding a protocol layer, not rebuilding the data infrastructure. And the payoff is that any MCP-compatible agent, Claude, Cursor, VS Code Copilot, whatever comes next, can access your data without any custom integration. The M plus N math works in your favor. One server, the entire ecosystem of MCP clients can use your data.

The security argument is also particularly relevant for government contexts. A knowledge-only server with no write operations and no external API calls is a much easier thing to get through a security review than a tool-enabled server.

That's not a small consideration in government IT environments. The threat model for a read-only resource server is genuinely simple. You're serving data, not executing anything. The access controls are straightforward. There are no consent flows required the way there are for tool invocations. In a healthcare or financial regulatory context where data sensitivity is high but the need for agents to access authoritative information is also high, that combination is quite compelling.

Let me ask the skeptic's question. Is there a risk that MCP resources become a slightly over-engineered solution for something that a well-designed API already handles? Because the discoverability argument is real but it requires the ecosystem to have enough MCP clients for M plus N to actually beat M times N.

The ecosystem argument is the right place to apply skepticism. The math only works if MCP client adoption is broad enough. The numbers suggest it's getting there. Ninety-seven million monthly SDK downloads and over ten thousand active servers is not a toy ecosystem. Claude, Cursor, and several other major agent platforms are MCP clients. But you're right that if you're building a one-off integration for a specific application, a well-designed REST API might be simpler. The MCP resources approach pays off most when you want your knowledge to be available across multiple agent systems without custom integration for each one. That's the use case it was designed for.

And the annotation system is doing work that a REST API genuinely doesn't do. Priority, audience, freshness metadata baked into the protocol itself. That's not something you get from HTTP headers in any standardized way.

The AI-native design is real. HTTP was designed for browsers and human-driven applications. The priority annotation, the audience annotation, the subscription mechanism, these are all features that make sense specifically in the context of LLM applications managing context budgets and maintaining live knowledge. A REST API could implement all of these things, but it would be custom, non-standard, and every client would need to learn your specific conventions. MCP standardizes the conventions.

Alright, I think the practical picture is pretty clear. Knowledge-only MCPs are a legitimate and well-designed pattern. The Resources and Prompts primitives are first-class citizens of the spec, not edge cases. For authoritative, curated, structured knowledge, they're a better fit than RAG. For broad AI ecosystem accessibility, they're a better fit than custom APIs. And for sensitive deployment contexts, the read-only security profile is a genuine advantage. The open government data application is probably the most underexplored opportunity in the space right now.

And I'd add that the subscription mechanism for live data and the SPARQL integration possibility for linked government data are two threads that I think will see real development over the next year or two. The infrastructure is there. Someone just needs to build the servers.

That someone is probably listening to this podcast, so get on it. Alright, let's wrap up. Thanks as always to our producer Hilbert Flumingtop for keeping this whole operation running. And a big thanks to Modal for the GPU credits that power the show. This has been My Weird Prompts. If you're not following us on Spotify yet, that's probably the easiest way to make sure you catch new episodes when they drop. Take care.

See you next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2203: Knowledge Without Tools: Why MCPs Aren't Just for Execution

Knowledge Without Tools: Why MCPs Aren't Just for Execution

The Three Primitives

Why Not Just Use a REST API or RAG?

Resources: The Specification

Open Government Data at Scale

The SPARQL Opportunity

Building a Knowledge-Only Server

Downloads

You Might Also Like

#2203: Knowledge Without Tools: Why MCPs Aren't Just for Execution