#1764: Vector Databases as a Single File

How to give AI agents instant memory of your entire project—without cloud costs or complex infrastructure.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-1918
Published: Mar 29
Duration: 26:43
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Gemini 3 Flash
Topics: vector-databases rag local-ai

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Context Wall and the Local Solution

Developers working with AI agents eventually hit a hard limit: the context window. Whether it's a podcast archive with 1,700 episodes or a decade of documentation, you can't just paste everything into a single prompt. While models with million-token windows exist, they introduce latency, high costs, and the "lost in the middle" problem where information gets buried. The alternative isn't to abandon large archives but to make them intelligently accessible. This is where local Retrieval-Augmented Generation (RAG) comes in—specifically, the idea of a "vector database as a file."

How a Vector Database as a File Works

The concept moves away from heavyweight, cloud-based vector databases like Pinecone or Milvus, which often feel like overkill for a single git repository. Instead, it uses lightweight, in-process databases that store data as high-dimensional vectors directly on disk. Here’s the basic flow:

Ingestion & Embedding: Text from your project—transcripts, markdown files, code comments—is processed by an embedding model. This model converts text into lists of numbers (vectors) that represent semantic meaning.
Storage: These vectors are stored in a local file format, such as LanceDB’s columnar format, a SQLite database with a vector extension, or a simple file-based Chroma index.
Retrieval: When an AI agent needs context, it doesn't read the entire archive. Instead, it converts the user's query into a vector and performs a similarity search against the local file, retrieving only the most relevant chunks of text.
Integration via MCP: The Model Context Protocol (MCP) acts as the bridge. A tiny local server "owns" the vector file and exposes a simple search tool to the agent. The agent calls this tool, gets the relevant text, and continues its work—all within the repository's ecosystem, no cloud API keys required.

Tools and Trade-offs

Several tools make this approach practical:

LanceDB: A serverless, open-source database designed for in-process use. It’s fast, supports appends, and has community-built MCP servers.
Chroma: Offers an "ephemeral" or persistent local mode, popular in the Python ecosystem.
SQLite with sqlite-vss: The most conservative option. Since SQLite is ubiquitous, adding a vector extension creates a single, portable .db file that supports hybrid searches (e.g., "AI episodes from 2025").

The main trade-off is accuracy versus cost. Local embedding models (run on your CPU) are cheaper but may be less precise than commercial APIs. The bigger challenge is chunking strategy—how you split text. Poor chunking can break context, like cutting a guide in half and losing the "what" while keeping the "how." For episodic content like podcasts, chunking is straightforward: each episode is a natural unit.

The "Agentic Repository"

This approach enables "agentic repository engineering." The repository itself becomes AI-ready, with an index that lets agents act as their own librarians. Instead of manually feeding files to an agent, the agent autonomously queries the local vector store for the context it needs. This is portable: sharing a repo with a collaborator means they instantly have the same AI-accessible history, provided they have the same MCP setup.

Conclusion: Beyond the Library of Congress

The "vector database as a file" model isn't just about saving money or speeding up prompts. It's a shift in how we manage knowledge for AI. Rather than treating documentation as static PDFs or wikis, we treat it as a dynamic, queryable space. For developers like Daniel, managing complex sites with years of history, this means agents can finally navigate deep context without getting lost—or breaking the bank. The future of coding might involve less manual file librarian work and more strategic querying, with local, portable vector indexes leading the way.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#1764: Vector Databases as a Single File

You know, Herman, I was looking at the back catalog for the show the other day—over seventeen hundred episodes now—and I realized if I tried to dump all those transcripts into a single prompt for an AI agent, the context window wouldn't just break, it would probably file for a restraining order. It’s a massive amount of data.

It really is. We’ve been talking into these microphones for a long time, Corn. Herman Poppleberry here, by the way, for anyone keeping track. And you’re right, once you hit that scale, the "standard" way of interacting with an LLM—just pasting what you need—completely falls apart. You can’t just "give it the files" anymore.

And today’s prompt from Daniel is tapping right into that frustration. He’s working on the My Weird Prompts website, trying to get Claude Code to help out, but he can’t just say, "Hey, look at every single episode we’ve ever done," because there are fifteen hundred plus items. It’s too much context for a single "glob," as he calls it.

It’s the classic RAG problem—Retrieval-Augmented Generation—but with a twist. Usually, when people hear RAG, they think of these massive, enterprise-grade cloud setups with Pinecone or Milvus, sitting behind an API, costing a hundred dollars a month just to idle. But for a single git repository or a personal project, that feels like bringing a nuclear reactor to power a toaster.

It’s total overkill. Daniel’s asking about a "vector database as a file"—something in-repo, lightweight, that just "mops up" the context and exposes it to the agent via MCP, the Model Context Protocol. Basically, a miniature retrieval system that lives right where the code lives. By the way, quick shout-out to our script-writer today: we are being powered by Google Gemini Three Flash. It’s handling the heavy lifting behind the scenes while we dive into this "local RAG" concept.

I love that "mop up" phrasing. It captures the modern developer's need perfectly. We don't want to manage infrastructure; we just want the agent to stop being forgetful.

So, let’s get into the guts of this. If I’m a developer and I don’t want a cloud database, but I need my agent to "know" fifteen hundred episodes of a podcast, how does "vector database as a file" actually work? We aren't just talking about a JSON file with some text in it, right?

No, because if you just put it in a JSON file, the agent still has to read the whole file to find anything, which puts us right back at the context window limit. To make it a "vector database," you have to transform that text into embeddings. Essentially, you take a chunk of text—say, the transcript of Episode 402—and run it through an embedding model. That model turns the text into a long list of numbers, a vector, which represents the "semantic meaning" of that chunk.

And the "file" part of this is just storing those lists of numbers on disk?

Precisely. Well, not just the numbers, but a way to search them. If you store them in a format like LanceDB or even a specialized SQLite extension, you can perform a "similarity search." When you ask the agent a question, the agent turns your question into a vector, compares it to the thousands of vectors in your local file, and pulls out the three or four most relevant chunks.

Okay, so instead of the agent "reading" fifteen hundred episodes, it’s only reading the four snippets that the local file told it were relevant. But where does the Model Context Protocol, or MCP, fit in? Because Daniel mentioned that specifically as the bridge.

MCP is the game-changer here. Before MCP, if you had a local vector file, you’d have to write custom code to make Claude or any other agent use it. You’d be constantly copy-pasting results. With MCP, you can run a tiny local server—literally just a script running on your machine—that "owns" that vector file. It tells the agent, "Hey, I have a tool called 'search_episodes.' If you give me a query, I’ll look at my local file and give you the best matches."

So the agent doesn't even know it's talking to a database. It just thinks it has a new superpower.

Precisely. It’s a tool-calling loop. The agent says, "I need to know what Corn said about sloths in 2024," calls the MCP tool, the MCP tool queries the local LanceDB file, returns the text, and the agent continues. It stays entirely within the repo's ecosystem. No AWS credentials, no API keys for a database provider, just a dot-file in your folder.

I like the sound of that. It feels "cleaner." But does this actually exist? Are there projects doing this right now, or is Daniel dreaming of a future that hasn't arrived?

Oh, it’s happening. LanceDB is probably the poster child for this right now. They describe themselves as "serverless" but in the true sense—it’s an open-source database that stores data in the Lance format, which is a high-performance columnar format. You can run it in-process. There’s no "database server" to start. You just point your code at a folder, and boom, you have a vector store.

And can LanceDB talk to MCP?

There are already community-built MCP servers for LanceDB. You can point an MCP server at a directory of documents, it uses LanceDB to index them locally, and suddenly your Claude Code or Cursor instance has full RAG capabilities over that specific folder. Another one is Chroma, which has an "ephemeral" or "persistent" local mode. It’s very popular in the Python ecosystem. You just initialize the client with a local path, and it handles the rest.

What about the "smallness" of it? If I have a repo with, say, five hundred markdown files of documentation, is the index file going to be bigger than the actual code?

It can be. Embeddings take up space. Each vector might be 768 or 1536 dimensions long. If you're using a high-resolution model, those floating-point numbers add up. But we're talking megabytes, maybe a few hundred megabytes for a very large project. In the age of multi-gigabyte node_modules folders, a two hundred megabyte vector index is a rounding error.

That’s a fair point. We’ve already surrendered our hard drives to npm; what’s a few more vectors among friends? But Herman, help me understand the tradeoff here. If I’m Daniel, why wouldn't I just use a "long context" model? Gemini 1.5 Pro has a two-million-token window. Why bother with all this local indexing and MCP overhead if I can just shove the fifteen hundred episodes into the prompt?

Because of the "lost in the middle" problem and, more importantly, the cost and latency. Even if a model can take two million tokens, every time you send a new message, you are re-sending those two million tokens—or at least the provider is re-processing them. It makes the "chat" feel sluggish. It’s also expensive. RAG—especially local, file-based RAG—is instantaneous. You’re only sending the relevant five hundred words to the model.

It’s like the difference between carrying the entire Library of Congress in your backpack versus just having a really good index card that tells you exactly which page to look at.

That’s a decent analogy, actually. And with the "vector database as a file" approach, you get something called "agentic repository engineering." It’s a term that’s been floating around lately. The idea is that the repository itself is prepared for an AI to work in it. You don’t just have code; you have an index that makes the code "searchable" for a machine.

I want to dig into that "agentic repository engineering" bit, but first, let’s talk about the actual "job" Daniel is describing. He’s talking to Claude Code. Claude Code is a CLI agent. It lives in the terminal. If Daniel is using a local vector file, how does the "ingestion" happen? Does he have to run a separate script every time he adds an episode?

Ideally, you'd have a git hook or a simple CLI command. You run something like "index-repo," and it scans for new files, generates the embeddings—which you can do using a cheap API like OpenAI's 'text-embedding-3-small' or even a local model like Nomic—and updates the local file. The beauty of something like LanceDB is that it supports fast appends. You don't have to rebuild the whole index every time.

So it’s basically an augmented version of 'grep.' Instead of searching for keywords, you’re searching for concepts.

It’s 'grep' with a brain. If you 'grep' for "authentication," you only find files with that exact word. If you use a local vector store, you can search for "how do users log in?" and it will find the files even if they only use the word "signin" or "oauth." For someone like Daniel, managing a complex site with years of history, that's the difference between the agent actually being helpful and the agent just saying, "I can't find that file."

I can see how this changes the workflow. Usually, when I work with an agent, I spend half my time being a "file librarian." I’m saying, "Okay, now read this file. Now read that file." If the agent has an MCP connection to a local vector store, it becomes its own librarian.

And that’s the "agentic" part. The agent can realize, "I don't have enough information to solve this bug. I will call the 'search_index' tool to find relevant context." It’s no longer waiting for you to feed it. It’s hunting for the data it needs.

What’s the catch? There’s always a catch with these "local" solutions. Is it the accuracy?

Accuracy is one. Local embedding models—the ones you might run on your own CPU to avoid API costs—aren't always as sharp as the big ones. Also, managing the "chunking" is a nightmare. Do you break the text at every paragraph? Every five hundred words? Every function? If you chunk it poorly, the vector search returns a snippet that's missing the crucial context from the sentence right before it.

Right, if you cut a "how-to" guide in half, the agent might get the "how" but lose the "what."

Well, not exactly—I mean, you're right on the money there. The "chunking strategy" is where the real engineering happens. But for a project like Daniel's website, where he has distinct "episodes," the chunking is actually pretty easy. Each episode is a chunk. Or maybe each episode is three chunks. It’s manageable.

Let’s talk about the specific tools again. You mentioned LanceDB and Chroma. Are there others that fit this "database as a file" mold? I’ve heard people talking about using SQLite for this.

SQLite is actually becoming a very viable vector store. There's an extension called 'sqlite-vss'—Vector Similarity Search. It allows you to store vectors in a standard SQLite table and run queries against them. Since almost every developer already has SQLite on their machine, and it’s literally just a single '.db' file, it fits Daniel’s "in-repo" requirement perfectly.

That sounds like the most "conservative" and reliable approach. Just a SQL file sitting in the '.github' folder or something.

It’s very robust. And because it’s SQL, you can mix-and-match. You can search for "episodes about AI" and "filter for episodes released in 2025." A pure vector database often struggles with that kind of hybrid search, but SQLite handles it naturally.

I love the idea of Daniel’s repo having a 'context.db' file. It makes the project portable. If he shares the repo with a collaborator, and that collaborator has the same MCP server installed, they immediately have the same "AI-ready" environment. They don't have to spend three hours getting up to speed on the project's history because the agent already has the map.

That’s the vision. We’re moving away from "documentation as a PDF" to "documentation as a queryable vector space."

I’m thinking about the second-order effects here. If every repo starts carrying around its own vector index, does that change how we write code? Do we start writing specifically so the embedding models can understand us better?

I think we already do that to some extent with clear variable names and docstrings. But this might push us toward "context-first" development. You might have a "context" folder in your repo that isn't for humans at all—it’s just a collection of "globs" designed to be indexed for the agent.

It’s like SEO but for your local AI agent. "Agent Engine Optimization."

AEO! I like it. But there’s a deeper implication for the "statelessness" of these models. Daniel’s notes mentioned that the "conversation" we have with AI is often an illusion. Every time you send a message, the whole history is rebuilt and sent back. If we move the "memory" into a local file-based vector store, we’re essentially giving the agent a "long-term memory" that doesn't depend on the chat provider's UI.

It uncouples the intelligence from the interface. I could use Claude Code today, Cursor tomorrow, and a custom CLI script the day after, and as long as they all point to that same 'context.db' file via MCP, the agent "knows" the same things. It maintains a consistent state across different tools.

That is a huge win for productivity. One of the most annoying things about AI tools right now is the "fragmentation of context." You have one conversation in the browser, one in your IDE, and they don't talk to each other. "Vector database as a file" solves that by making the repo the "source of truth" for context, not the chat history.

Let’s look at the practical side for Daniel. He’s working on the website. He wants Claude to, say, "Update the CSS for all the episode cards to match the style we used for the special 1500th episode celebration." If he has that local index, Claude can find that 1500th episode, look at the code changes from that date, and apply the style. Without it, Daniel has to go find the commit hash himself.

And that’s where the "agentic loop" pays off. The agent calls 'search,' finds the commit, calls 'git show,' analyzes the CSS, and then writes the new code. It’s reducing the "cognitive load" on Daniel. He doesn't have to be the middleman between the agent and the history.

You know, I can see some of our more privacy-conscious listeners really vibing with this. If everything is in a local file, you aren't uploading your entire proprietary codebase to a third-party vector database like Pinecone.

That’s a massive selling point. Even if you’re using an API for the embeddings—which you can avoid if you use a local model—the actual "database" of your knowledge stays on your hard drive. For a lot of companies, that’s the only way they’ll ever allow these agents to touch their code.

So, we’ve got LanceDB, Chroma, SQLite-VSS. We’ve got MCP as the bridge. Is there a "one-click" solution yet? Or is Daniel still in the "assemble it yourself" phase?

It’s still a bit "IKEA-style" assembly, but the parts are getting better. There’s a project called "MCP-Server-SQLite" that you can adapt easily. There’s also "Rag-on-Edge" and similar experimental projects. But honestly, for someone with Daniel’s technical background, setting up a LanceDB-based MCP server is a weekend project, not a month-long ordeal.

I bet someone is going to turn this into a standard feature of things like Claude Code or Cursor. "Enable Local Indexing" as a toggle.

I would be shocked if they didn't. Cursor already does some version of this with their ".cursorrules" and their local indexing, though it's more of a "black box." What Daniel is asking for is more "transparent." He wants to own the file. He wants to see the "glob" and know exactly what’s in his agent’s head.

I think that transparency is key. I hate it when an agent says, "I know about your project," and then hallucinates a file that doesn't exist. If I can query the vector file myself, I can see why it’s confused. "Oh, I see, the indexer only caught the first ten lines of this file, no wonder it doesn't know about the export at the bottom."

That brings up a good point about "metadata." A good file-based vector store doesn't just store the text; it stores the file path, the line numbers, the last modified date, and maybe even the "importance" of the file. When the agent gets a search result, it’s not just getting a "string," it’s getting a "rich object."

It’s a map. A semantic map of the repository.

And when you think about fifteen hundred episodes of My Weird Prompts, that map is essential. You’ve got episodes about everything from battery chemistry to the geopolitics of the semiconductor industry. If Daniel wants to build a "recommendation engine" for the site, he can use that same local vector file. It’s not just for the agent; it’s a new piece of infrastructure for the website itself.

Wait, that’s a great insight. If he builds this "vector database as a file" for his AI agent to use while he's coding, he can then just ship that same file as part of the website's deployment. Then the website itself can have a "semantic search" feature for the listeners, powered by the exact same data.

Now you’re thinking in "agentic repository engineering" terms! The context becomes a "build artifact." You generate the index at build time, and it serves both the developer and the end-user. It’s a "write once, query everywhere" model.

That’s actually really cool. It justifies the effort. It’s not just a "dev tool" anymore; it’s a feature.

And it’s a very "pro-American," "pro-innovation" way of looking at it, too. We’re not waiting for some giant tech monopoly to give us a "knowledge management" solution. We’re building lightweight, decentralized tools that let us manage our own data. It’s very much in the spirit of open-source development.

I can hear the donkey-sized enthusiasm in your voice, Herman. You’re ready to go index the whole world into a '.db' file, aren't you?

Guilty as charged! But think about the efficiency! If we can reduce the energy cost of these AI interactions by sending smaller, more relevant prompts, that’s a win for everyone.

It is. But let’s play devil’s advocate for a second. If I have a really, really huge repo—like, the Linux kernel scale—does "database as a file" still hold up? Or do you hit a wall where you need a server?

You hit a wall eventually. LanceDB can handle millions of vectors, but at some point, the "similarity search" starts to take hundreds of milliseconds on a standard laptop. If you’re a single developer, that’s fine. If you’re a team of a thousand people all trying to query the same file on a shared drive, it’s going to fall apart. But for Daniel? For a project with a few thousand "items"? It’s perfectly scaled.

It’s the "Goldilocks" zone of RAG. Not too big, not too small.

Precisely. And the "Model Context Protocol" is what makes it feel "modern." It’s that standardized interface that says, "I don't care how you store your data, as long as you can answer my questions." It’s like the "USB-C" of AI context.

I’m still stuck on the "sloth" speed of some of these things, though. If I ask a question, and the agent has to call an MCP tool, which opens a file, which runs a vector search, which returns text, which then gets sent to the LLM... aren't we adding a lot of "hops"?

You’d be surprised. Reading a local SQLite or LanceDB file is incredibly fast. We’re talking five to ten milliseconds. The "bottleneck" is still, and always will be, the time it takes the LLM to generate the tokens. The "retrieval" part is basically free in comparison.

So it’s not going to make the agent feel "slower"?

If anything, it makes it feel "smarter" faster. Instead of the agent flailing around trying to "remember" what you told it ten minutes ago, it just "looks it up" and gives you a coherent answer immediately.

I think Daniel is onto something here. This "miniature retrieval" within a repo feels like the next logical step for AI-assisted coding. We’ve had the "chat with your docs" phase, but that was always external. Bringing it "in-repo" makes it part of the developer’s craft.

It’s about "contextual awareness." An agent that doesn't know the history of a project is just a talented intern who hasn't been onboarded yet. An agent with a local vector index is the senior dev who’s been there since day one.

And who doesn't complain about the coffee.

Though I’m sure if we gave Claude an MCP tool for a smart coffee maker, it would have some very strong opinions on the roast profile.

Oh, don't give it any ideas. So, if Daniel is going to do this, what’s his first step? He’s got fifteen hundred episodes in markdown or JSON or whatever. What does he actually do on Monday morning?

Step one: Choose an embedding model. If he wants to keep it all local, grab the 'nomic-embed-text' model and run it via Ollama. Step two: Pick a storage format. I’d recommend LanceDB for its speed and ease of use in JavaScript and Python. Step three: Write a script to "glob" all those episode files, run them through the embedding model, and save them to a 'vectors/' folder in the repo.

And step four is the MCP part.

Right. He needs to run an MCP server—there are plenty of templates on GitHub—that "exposes" a search function. He links that to his LanceDB folder. Then, he tells Claude Code, "Hey, here is the config for my local MCP server." Suddenly, Claude can "see" the history.

It sounds like a lot of "plumbing," but once the pipes are laid, the water just flows.

And once it’s done, it’s done. He never has to manually explain the "My Weird Prompts" ethos or history to the agent ever again. It’s "resident knowledge."

I like that. "Resident knowledge." It’s a good term for it. And it fits perfectly with that "agentic repository engineering" concept from Episode 1481—the one where we talked about Merkle trees and ASTs. This is the "semantic" version of that. We’re not just indexing the structure of the code; we’re indexing the meaning of the project’s history.

It’s the "why" behind the "what." If the code is the "what," the episode history is the "why." And giving an agent the "why" is how you get it to actually write code that fits the project’s goals, not just code that "works" but feels out of place.

I’m sold. I want a '.db' file for my whole life, Herman. Can I index my breakfast choices and expose them via MCP?

We might need a very specialized embedding model for your "sloth-like" breakfast habits, Corn. But theoretically, yes. If it can be turned into a "glob" of text, it can be vectorized.

Well, I’ll stick to indexing the podcast for now. I think Daniel’s approach is the right one for anyone dealing with "context bloat." It’s an elegant, local, and modular solution. It doesn't rely on big-tech "memory" features that might change or disappear. It’s just a file in a repo.

It’s "sovereign context." That’s what we’re talking about. Sovereign context for the independent developer.

I love that. "Sovereign Context." That should be the title of the next whitepaper you write, Herman. But wrap it up for us—what are the three things Daniel should keep in mind as he starts "mopping up" his context?

First, focus on the "chunking." Don't just dump whole transcripts in; break them into meaningful segments so the search is more precise. Second, use a robust, file-based format like LanceDB or SQLite-VSS—don't try to roll your own JSON-based search, it won't scale. And third, lean into MCP. It’s the standard that makes all this "plumbing" actually usable by the agents we have today.

Solid advice. And I suspect that as Daniel builds this, he’s going to find all sorts of "weird" connections between episodes that he’d forgotten about. That’s the hidden "aha!" of RAG—it’s a discovery tool for the human, too.

It really is. You start seeing patterns in your own thinking that you didn't notice at the time.

Well, I’ve noticed a pattern in our thinking today, which is that we both really like this concept. It’s practical, it’s technical, and it solves a real problem that’s only going to get worse as our "digital footprints" grow.

It’s the only way forward. We can't just keep making the context windows bigger; we have to make the retrieval smarter.

On that note, I think we’ve given Daniel plenty to chew on. I’m looking forward to seeing the "context-aware" version of the My Weird Prompts site. Maybe it’ll finally be able to explain why you’re a donkey and I’m a sloth, Herman.

I think even the most advanced vector database might struggle with that one, Corn. Some things are just "inherent properties" of the universe.

Fair enough. Well, this has been a great deep dive. I’m feeling much more "indexed" already.

And I’m ready to go optimize some embeddings. Thanks for the great prompt, Daniel. This is exactly the kind of "weird" technical intersection we love exploring.

Before we sign off, big thanks to our producer, Hilbert Flumingtop, for keeping the gears turning. And a huge thank you to Modal for providing the GPU credits that power our script generation and our experimentation. They really are the backbone of this whole "agentic" workflow we’ve got going on here.

They make the "impossible" feel like a weekend project.

They really do. If you’re enjoying "My Weird Prompts" and you want to make sure you never miss an episode, search for us on Telegram. We’ve got a channel there where we post every time a new "glob" of wisdom drops.

"Glob of wisdom." I like that. We should vectorize that.

We’ll put it in the "Herman quotes" file. Find us at myweirdprompts dot com for the full archive—all seventeen hundred plus episodes of it. Maybe you can build your own local vector store and tell us what we’ve been talking about for the last few years!

I’d love to see the results of that search.

Me too. This has been My Weird Prompts. Catch you in the next one.

See ya.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#1764: Vector Databases as a Single File

Downloads

You Might Also Like

#1764: Vector Databases as a Single File