#1212: The Postgres Vector Revolution: Killing the Sprawl

Is your tech stack a sprawling suburb of microservices? Discover why a 40-year-old database is winning the AI infrastructure war.

0:000:00

Episode Details

Published: Mar 15
Duration: 20:57
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
LLM
Topics: vector-databases rag architecture

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Architectural Tax of Modern AI

The rapid adoption of Large Language Models (LLMs) has forced developers to confront a new infrastructure challenge: where to store and query embeddings. For many, the initial reflex has been to reach for specialized vector databases. This has led to a "sprawl" of microservices, where teams manage separate databases for relational data, caching, and vectors. However, this fragmented approach is increasingly being viewed as an unnecessary "architectural tax."

A Decision Forty Years in the Making

The reason we can even discuss alternatives to specialized vector databases today dates back to 1986. When Michael Stonebraker started the POSTGRES project, he prioritized extensibility. Unlike the rigid databases of the era, Postgres was designed as a shell that users could fill with custom data types and indexing methods. This philosophy allowed for the creation of pgvector, an extension that transforms a traditional relational database into a powerful engine for AI embeddings without requiring a rewrite of the core code.

The Power of Hybrid Search

One of the most compelling arguments for staying within the Postgres ecosystem is the concept of hybrid search. In real-world applications, a search is rarely just about mathematical similarity. Users often need to find information that is "similar to this document" but also "created within the last month" and "owned by a specific user."

Dedicated vector databases often struggle with these multi-faceted queries. They require developers to either sync metadata across two different systems or perform expensive filtering in the application layer. Because Postgres is ACID-compliant, it can handle relational filters and vector similarity in a single transaction. This ensures data consistency and security, eliminating the risk of "eventual consistency" where deleted or private data might accidentally appear in search results.

Closing the Performance Gap

Critics once argued that a general-purpose database could never match the speed of specialized hardware. However, recent advancements have largely neutralized this argument. The introduction of HNSW (Hierarchical Navigable Small Worlds) indexing has allowed Postgres to handle millions of vectors with 99% recall and high query speeds.

Furthermore, new extensions like pgvectorscale are pushing the boundaries even further. By optimizing how data moves from disk to memory, Postgres is now outperforming some dedicated vector databases written in specialized languages like Rust. When combined with quantization techniques—which compress high-dimensional vectors to a fraction of their original size—Postgres is proving capable of handling massive scales that were once thought to be the exclusive domain of specialized startups.

The Return to the One-Stack Approach

The current shift mirrors the NoSQL era of a decade ago. Just as Postgres eventually absorbed document storage via the JSONB type, it is now absorbing vector capabilities. For the vast majority of use cases, the simplicity, security, and performance of a single-stack approach outweigh the perceived benefits of a specialized tool. The "sprawling suburb" of infrastructure is being replaced by a more efficient, integrated "apartment complex," proving that a well-designed foundation from the 1980s is still the most robust choice for the AI era.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Episode #1212: The Postgres Vector Revolution: Killing the Sprawl

Daniel's Prompt

Custom topic: A summary of the history of Postgres and in particular pgvector and its suitability for AI applications. A lot of developers are confused as to whether they "need" to use a vector database at all, esp | Context: ## Current Events Context (as of March 2026)

### Recent Developments

- pgvector's current stable release is v0.8.2, with v0.8.0 announced via the official PostgreSQL news channel in late 2024. C

So, Herman, I was looking at some architectural diagrams yesterday, and it felt like I was looking at a map of a city that just kept adding annexes because it forgot to build enough housing in the first place. You have the main database, then a cache, then a search engine, and now everyone is telling us we need a dedicated vector database just to remember what our users said five minutes ago. It feels like we are building a sprawling suburb of microservices when we really just needed a better apartment complex.

It is a bit of a sprawl, isn't it? I am Herman Poppleberry, by the way, and I have actually been obsessed with this specific sprawl lately. It feels like the tech industry has this reflex where every time a new data type becomes popular, we assume we need a brand new piece of infrastructure to handle it. We did it with documents and No-S-Q-L, we did it with graphs, and now we are doing it with vectors. We are effectively paying an architectural tax every time we add a new line to our infrastructure-as-code files.

It is the shiny object syndrome, but on a multi-million dollar infrastructure scale. Today's prompt from Daniel is about exactly this tension. He is asking about the history of Postgres, specifically the pgvector extension, and whether developers actually need a dedicated vector database if they are already running on Postgres. It is a great question because there is a lot of confusion about where the line is between a feature and a whole new product category. Are we witnessing the birth of a new essential tool, or are we just seeing a feature that belongs in the database we already have?

Daniel is hitting on the biggest architectural debate in the AI space right now. To understand why we are even talking about this, we have to go back way further than the current AI hype. Most people think of PostgreSQL as this reliable, maybe even boring, relational database that has been around forever. But the reason we can even have this conversation is because of a decision made in nineteen eighty-six by Michael Stonebraker at the University of California, Berkeley. Stonebraker, who eventually won the Turing Award in twenty-fourteen, wasn't just building a better version of Ingres; he was building a platform.

Nineteen eighty-six. That is basically the stone age for modern software. I mean, the Macintosh was only two years old. What was the specific decision that makes a forty-year-old database relevant to a transformer model today?

Stonebraker started the POSTGRES project with a focus on extensibility. Most databases back then were rigid. If you wanted a new data type or a new way to index information, you had to wait for the core developers to bake it into the engine. Stonebraker wanted a system where users could define their own types, operators, and index methods without touching the core code. He wrote a paper titled "The Design of POSTGRES" where he argued that the database should be a shell that you can fill with custom logic. That philosophy is the only reason pgvector exists today. It is not a hack; it is the database working exactly how it was designed forty years ago. It is the ultimate "I told you so" from the eighties.

It is pretty wild to think that a design choice from the eighties is what is currently holding up the AI revolution. So, let us fast forward. We go from the eighties to twenty-twenty-one, and Andrew Kane releases pgvector. It was a niche extension at first, right? I don't think anyone realized it would become the center of the universe.

Very niche. Andrew Kane is a prolific open-source author—you might know him from things like Groupdate or Searchkick—and he basically added a vector data type and allowed for similarity searches. But then the large language model explosion happened in late twenty-twenty-two, and suddenly everyone needed a place to store embeddings. For those who might be a little fuzzy on the term, an embedding is just a long list of numbers—a vector—that represents the meaning of a piece of text or an image. If you want an AI to find relevant documents, you have to compare these lists of numbers to see which ones are mathematically close to each other.

And that is where the "vector database" industry came from. Companies like Pinecone, Weaviate, and Qdrant stepped in and said, "Postgres is for rows and columns, but we are built from the ground up for these lists of numbers." They argued that a relational database could never handle the high-dimensional math required for AI. But then pgvector started catching up. We are now in March of twenty-twenty-six, and we are on version zero point eight point two, and it is a completely different beast than it was even two years ago.

The jump in performance is staggering. In the early days, pgvector used something called an I-V-F-Flat index. It was fine for small datasets, but it was slow and the accuracy wasn't great for high-dimensional data. But the introduction of H-N-S-W changed everything. H-N-S-W stands for Hierarchical Navigable Small Worlds. It is a bit of a mouthful, but it is effectively the gold standard for vector indexing.

I have heard you talk about H-N-S-W before. It is essentially building a multi-layered map where the top layer has just a few "landmark" points, and as you go deeper, the map gets more detailed, right? Like zooming in on Google Maps from a continental view down to a street view.

That is a perfect way to visualize it. It allows the database to skip over millions of irrelevant vectors almost instantly. What is fascinating is the trade-off involved. Building an H-N-S-W index is roughly thirty-two times slower than building the old I-V-F-Flat index. If you have a million vectors, you are going to be sitting there for a while while it compiles that map. But once it is built, the query speed is fifteen times faster. We are talking about going from a few queries per second to forty or fifty queries per second at nearly ninety-nine point eight percent recall.

So the "Postgres is too slow" argument is essentially dead for most use cases? If I can get ninety-nine percent accuracy at high speed, why would I leave?

For the vast majority of developers, yes, it is dead. If you are a developer and you have ten thousand, or even a million documents, pgvector is going to be incredibly fast. But there is a much bigger reason to stay on Postgres than just raw speed, and that is what I call the "Killer Feature" of the one-stack approach: hybrid search.

This is where it gets interesting for actual product development. Because in a real application, you are never just doing a vector search in a vacuum. You aren't just saying "find me things like this." You are saying "find me things like this that were written by user forty-two in the last thirty days and are tagged as public."

And that is exactly where dedicated vector databases start to struggle. If you use a separate database like Pinecone, you have to do one of two things. Either you pull all the vector results and then filter them in your application code, which is slow and expensive, or you have to sync all your metadata—like user I-Ds, timestamps, and permissions—into the vector database. Now you have two sources of truth. You have to worry about data consistency, two different backup strategies, and two different sets of security permissions.

It sounds like a nightmare for a small team. If a user deletes their account in Postgres, but the vector database doesn't get the memo for five minutes because the sync worker lagged, you might accidentally serve their private data in a search result to someone else. That is a massive security risk.

It is a huge risk. In Postgres, because it is A-C-I-D compliant—which stands for Atomicity, Consistency, Isolation, and Durability—that filter happens in a single transaction. The database engine looks at your S-Q-L filters and your vector similarity at the same time. It can use a standard B-tree index for the user I-D and the H-N-S-W index for the vector. It is efficient, it is safe, and it is all in one place. You don't have to worry about "eventual consistency" between your users table and your embeddings table. They are the same table!

Okay, so if pgvector is so good, why does anyone use a dedicated vector database? There has to be a ceiling where Postgres starts to creak under the weight. We can't just say it's perfect for everything, right?

There is definitely a ceiling, though it is much higher than it used to be. For a long time, the "soft limit" for pgvector was around ten to one hundred million vectors. Once you get into that range, the memory requirements for the H-N-S-W index start to become a problem. H-N-S-W needs to keep a lot of that "map" we talked about in R-A-M to stay fast. If your index is larger than your available memory, performance falls off a cliff. You start swapping to disk, and suddenly your fifty-millisecond query takes five seconds.

But even that ceiling is moving, isn't it? I saw some benchmarks from Timescale recently about their pgvectorscale extension. They seem to be picking a fight with the dedicated players.

Timescale is doing some incredible work there. They released pgvectorscale which uses an indexing method called Streaming-Disk-A-N-N. Instead of trying to cram everything into R-A-M, it optimizes how the index is read from the disk. Their benchmarks showed they could handle fifty million vectors and get over four hundred and seventy queries per second at ninety-nine percent recall. That is actually eleven point four times better than Qdrant, which is a dedicated vector database written in Rust.

Wait, let me get that straight. A Postgres extension is actually outperforming a dedicated, specialized database at a scale of fifty million vectors? That feels like it should be impossible.

In that specific benchmark, yes. It really challenges the idea that "specialized" always means "faster." It turns out that the Postgres community has spent decades optimizing how data moves from disk to memory, and that expertise is now being applied to vectors. When you combine that with the fact that A-W-S Aurora recently published results showing vector embedding loads up to sixty-seven times faster with pgvector zero point seven point zero compression, you see that the performance gap isn't just closing—it is disappearing.

It feels like we are seeing a repeat of the No-S-Q-L era. Ten years ago, everyone said you needed MongoDB or Cassandra because Postgres couldn't handle J-S-O-N or high-scale writes. Then Postgres added the J-S-O-N-B type, optimized its indexing, and suddenly most people realized they didn't need a separate document store after all.

It is the exact same pattern. And now we are seeing the same thing with memory efficiency. One of the biggest complaints about pgvector was the storage size of embeddings. If you are using something like OpenAI's text-embedding-three-small model, each vector has fifteen hundred and thirty-six dimensions. That adds up fast. But new developments like VectorChord—which used to be called pgvecto dot r-s—are introducing quantization.

Quantization is basically just compression for vectors, right? Like turning a high-resolution photo into a J-P-E-G?

Essentially. You can take those high-precision floating-point numbers and compress them down to four-bit or eight-bit integers. VectorChord is leading the way on this in the Postgres ecosystem with their RaBit-Q-Four and RaBit-Q-Eight types. You can reduce your storage requirements by eight or sixteen times, and surprisingly, the "recall"—or the accuracy of the search—only drops by a tiny fraction. For most R-A-G pipelines, which is Retrieval-Augmented Generation, you would never even notice the difference in the AI's response, but your cloud bill would be significantly lower because you aren't paying for massive amounts of R-A-M.

So if I am a developer at a startup, and I am building a R-A-G pipeline for our internal documentation or a customer support bot, is there any reason at all for me to look at something other than Postgres?

Honestly, probably not. If you are already on a managed provider like Supabase, Neon, or A-W-S R-D-S, you are literally one command away from having a production-grade vector database. You just run "CREATE EXTENSION vector" and you are done. No new infrastructure to provision, no new A-P-I keys to manage, and no new bill to pay. It is the ultimate "low-friction" entry point.

I think the "when to leave" question is the one people struggle with most. If you are at a billion vectors, like if you are Netflix and you are trying to do real-time recommendations for every single user on the planet simultaneously, that is a different story.

At that scale, the specialized hardware optimizations of something like Pinecone or the distributed architecture of Weaviate might start to make sense. If your core product is the search itself—if you are building the next Google or a massive multi-modal search engine—you probably want a tool where every single line of code is dedicated to that one task. But for ninety-nine percent of S-a-a-S applications where vector search is a feature, not the entire product, Postgres is the rational choice.

It is about the "operational tax." Every new piece of infrastructure you add is a tax on your team's time. You have to monitor it, you have to patch it, you have to understand its weird failure modes. If you can pay that tax once to the Postgres gods and get everything you need, why would you open a second account?

And the Postgres gods are very generous these days. Even the managed providers are making it easier. Neon and Supabase have basically made pgvector a first-class citizen. They have optimized their underlying storage to handle these large vector files, and they provide the tooling to help you choose the right distance metrics.

Let's talk about those distance metrics for a second, because that is another place where developers get tripped up. pgvector supports L-Two, Inner Product, and Cosine distance. Does it actually matter which one you pick?

It matters immensely, but the choice is usually dictated by the model you used to create the embeddings. If you are using OpenAI's models, they recommend Cosine distance. If you use the wrong metric, your "similarity" results will be mathematically accurate but semantically useless. It is like trying to measure the distance between two cities using a thermometer instead of a ruler. You get a number, but it doesn't mean what you think it means. Most developers should just check the documentation for their embedding model and match the operator in their S-Q-L query.

So the workflow for a developer today is: generate embeddings using something like an OpenAI or Anthropic model, store them in a "vector" column in Postgres, index them with H-N-S-W, and then query them using the specific distance operator that matches their model.

That is the whole roadmap. And the beauty is that you can do all of this using the libraries you already know. If you are using an O-R-M like Prisma or Drizzle, or just standard S-Q-L, you are just writing queries. You aren't learning a new GraphQL-based A-P-I or some proprietary syntax. You can join your embeddings table with your users table and your comments table in one go.

I love the irony that the "hottest" new technology in AI is basically just a very clever plugin for a database that turned thirty recently. It speaks to the robustness of that original design. Stonebraker didn't know what a transformer model was in nineteen eighty-six, but he knew that he couldn't predict the future, so he built a system that could adapt to it.

It is a profound lesson in software engineering. If you build for extensibility at the core, you don't have to rebuild the world every time the world changes. And we are seeing that play out now. Even things like "long-term memory" for AI agents, which we talked about way back in episode eight hundred and forty-six, are increasingly being built on top of Postgres. It is becoming the "memory" of the AI stack because it is the one place where the data is already sitting.

That is a great point. Why move the data to the memory when you can just teach the data's home how to remember? It is much more efficient. So, to wrap this up for Daniel and anyone else wondering about their stack—what is the heuristic? What is the "if-then" statement for choosing a database in twenty-twenty-six?

The heuristic is simple: Start with Postgres. If you are already using it, there is almost no reason to leave until you hit a very specific wall. If you have fewer than a hundred million vectors, stay on Postgres. If you need to filter your search by user, date, or category—which is almost everyone—stay on Postgres. If you want to keep your architecture simple and your data consistent, stay on Postgres.

And if you do hit that wall? If you are that one percent?

Then you look at extensions like pgvectorscale or VectorChord first. They push that wall so much further back—potentially up to five hundred million vectors—that most companies will never actually hit the real limit. Only when you are at the scale where you have a dedicated "Search Infrastructure Team" do you really need to start looking at the specialized standalone players like Pinecone or Qdrant.

It is a rare case where the "lazy" option—just using what you already have—is actually the technically superior option for most people. I find that very satisfying. It cuts through the hype and gets back to building things that actually work.

It really does. And it is exciting to see the community rally around it. The fact that A-W-S R-D-S and Google Cloud S-Q-L are shipping these updates so quickly shows that the demand is massive. We are moving away from the "fragmented stack" and back toward a more unified, elegant architecture. It is the "One Database to Rule Them All" philosophy we talked about in episode eleven twenty-three.

Well, I think we have thoroughly defended the honor of the elephant today. Postgres continues its reign as the most versatile tool in the shed.

It really is the Swiss Army knife that somehow also manages to be a high-powered chainsaw when you need it to be. I don't see that changing anytime soon, especially as the vector tooling continues to mature. The ecosystem is just too strong.

If you are looking to get started with this, I highly recommend checking out Andrew Kane's GitHub for pgvector. It is a masterclass in how to write a clean, effective extension. And if you are interested in the philosophy of architectural minimalism, definitely go back and listen to our previous deep dives on Postgres.

One more thing for the developers out there: don't sleep on quantization. If you are worried about your storage costs or memory usage, look into four-bit quantization via VectorChord. It is the closest thing to a "free lunch" we have in the vector space right now. You get almost all the accuracy for a fraction of the cost.

A free lunch in tech? I will believe it when I see the cloud bill. But for now, it sounds like Postgres is the safest bet for anyone trying to make their data accessible to AI without losing their mind over infrastructure management.

Safe, fast, and extensible. Just like Stonebraker intended forty years ago.

Thanks as always to our producer, Hilbert Flumingtop, for keeping the gears turning behind the scenes. And a big thanks to Modal for providing the G-P-U credits that power this show. If you are doing heavy lifting with AI models or data processing, Modal is the way to go.

This has been My Weird Prompts. If you are finding these deep dives helpful, we would love for you to leave us a review on your podcast app. It really does help other people find the show and helps us keep the lights on.

You can find us at myweirdprompts dot com for our full archive and all the ways to subscribe. We will be back soon with another prompt from Daniel. Until then, stay curious.

Goodbye everyone.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.