#2458: Can Graph Databases Go Mainstream?

Graph databases are powerful but niche. Will they ever power mainstream CRMs and ERPs?

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2616
Published: Apr 26
Duration: 23:24
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: graph-databases ai-agents vector-databases

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Graph Database Question: Why Mainstream Adoption Remains Elusive**

Graph databases excel at handling relationships — the very fabric of how businesses actually operate. A CRM isn't just a list of contacts; it's a web of introductions, deal connections, and account histories. An ERP tracks supply chains, dependencies, and workflows. These are fundamentally graph structures. Yet despite decades of awareness that relational databases are a poor fit for relationship-heavy data, graph databases remain a niche technology.

The Standard That Isn't Quite Enough

In April 2024, the ISO officially approved GQL (Graph Query Language) as an international standard — the first cross-vendor standard for querying graph databases. Before GQL, developers faced a fragmented landscape: Cypher, Gremlin, SPARQL, and others. The promise was that standardization would unlock mainstream adoption.

But two years in, the ecosystem is still catching up. A late 2025 survey by the Graph Data Council found members still requesting a GQL testing and compatibility toolkit. Proposed task forces for natural-language-to-GQL conversion remain proposals. And Cypher — Neo4j's query language, now made fully GQL-compliant — remains the de facto standard in practice. Standardization was supposed to be the unlock, but the tooling hasn't arrived yet.

The Hybrid Reality

The most revealing signal about graph's future comes from companies like PuppyGraph, which offers "zero-ETL graph querying" — allowing organizations to query their existing relational databases as virtual graphs without migrating any data. Half of the top twenty cybersecurity companies use it, along with AMD and Coinbase. These organizations want graph semantics, but they absolutely will not migrate their storage layer.

This pattern is winning in production. Large enterprise customers are deploying AI agents that decompose queries into three sub-queries: one to a SQL database for structured data, one to a graph database for relationship traversal, and one to a vector index for semantic similarity. The agent aggregates results using patterns like GraphRAG. This is not graph-native — it's graph as one specialist on a team of specialists, with the AI agent as coordinator.

The Chicken-and-Egg Problem

The graph advantage is real for certain questions. Finding every second-degree connection involved in deals over $50,000 in the last 18 months who shares a board membership with an existing customer is a nightmare in SQL but a concise traversal query in Cypher. But most CRM usage remains basic CRUD operations. The tools don't support graph queries, so people don't ask graph questions, so there's no demand for graph tools.

The Graph Data Council's survey found that the graph computing domain "lacks a killer application." GraphRAG is the closest thing, but it's a pattern, not a product. Compare that to relational databases in the 1970s and 1980s, which had payroll, accounting, and inventory management from day one.

What Could Change the Calculus?

The emergence of graph foundation models — trained on massive graph structures to learn generalizable patterns of relationships — could address the retraining problem that has kept graph databases specialist tools. Frameworks like Amazon's GraphStorm and Snapchat's GiGL are already deployed at billion-scale. But we're in the very early days.

For now, the industry consensus is clear: start with vectors, add graphs for reasoning-heavy queries. Graph-native mainstream applications are not on the visible horizon.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2458: Can Graph Databases Go Mainstream?

Daniel sent us this one — he's been thinking about the hybrid approach a lot of us are using now, SQL databases with embedding support bolted on. And his question is basically: wouldn't it be cleaner to just use graph databases as the native structure? If we take something like a CRM or ERP, could it be graph-native from the ground up? Because right now, graph mostly lives in these specialized corners — fraud detection, intelligence analysis, drug discovery, things with millions or billions of edges. But what about the mainstream? And the bigger question underneath that: how far away are we from graph having an easy wrapper, the kind of tooling where you don't need to retrain your entire team to think in nodes and edges?

Oh, this is the right question at the right time. And I want to flag something before we dive in — DeepSeek V four Pro is writing our script today, so if the transitions feel unusually crisp, that's why.

I was going to say, you sound suspiciously well-organized.

I'll take that as a compliment to our silicon colleague. But here's what makes Daniel's timing interesting. April twenty twenty-four, the ISO officially approved GQL — Graph Query Language — as an international standard. That's the first real, cross-vendor standard for querying graph databases. Neo4j's Andreas Kollegger put it plainly: before that, you had Cypher, you had Gremlin, you had SPARQL, all these fragmented approaches. Now there's one spec with broad industry backing.

A standard is nice, but standards take time to matter. SQL was standardized in nineteen eighty-six and it took years before anyone could move between databases without rewriting half their queries. Is GQL actually changing anything on the ground, or is this just a press release milestone?

It's early. The Graph Data Council did their member survey in late twenty twenty-five and the results were blunt — members are still asking for a GQL testing and compatibility toolkit. They want conformance suites. There's a proposed task force for natural-language-to-GQL conversion. So the standard exists, but the ecosystem around it is still embryonic. And Cypher remains the dominant query language in practice — Neo4j made Cypher fully GQL-compliant, which tells you where the center of gravity actually is.

The standard is basically Cypher with a stamp on it.

Not entirely, but directionally yes. The point is, standardization was supposed to be the unlock for mainstream adoption, and we're two years in with the tooling still catching up. That's one data point for how far away we are.

Let me push on the other side of Daniel's question though. He's asking whether a CRM or ERP could be graph-native. And I think the instinct is right — these are relationship-heavy domains. A CRM isn't really about contacts in a table. It's about who knows whom, who introduced whom, which deals are connected to which conversations, which support tickets relate to which account history. That's all edges.

And this is where the hybrid approach that Daniel described — traditional SQL with embeddings — starts showing its seams. You're storing relationship data in a structure that wasn't designed for relationships. You end up with these massive JOIN operations, recursive queries, and the embeddings are floating in a separate vector index that has no structural awareness of your actual business logic.

Here's my question — and I think this is the tension Daniel is really pointing at — if the relational model is so wrong for this, why hasn't anyone built a graph-native CRM that took over the market? The problem has been obvious for at least a decade.

Because the market voted for something else entirely, and I think that's the most revealing signal here. Look at what PuppyGraph is doing. They offer what they call zero-ETL graph querying — you keep your CRM data in your existing relational database, and PuppyGraph queries it as a virtual graph without duplicating anything. They've got half of the top twenty cybersecurity companies using this, plus AMD, Coinbase. These are organizations that want graph semantics, but they are absolutely not willing to migrate their storage layer.

They want the graph query without the graph database. That's almost perverse.

It's pragmatic. The cost of migration is enormous, and the benefit of native graph storage, for most of these applications, is marginal. Think about what a migration actually entails. You're not just moving data from tables to nodes — you're rethinking your entire data model, rewriting every integration, retraining your ops team on backup and recovery procedures for an entirely different storage engine. PuppyGraph's own blog from September twenty twenty-five explicitly positions this as getting the advantages of a knowledge graph without duplicating data into a separate database. They're not arguing that graph-native is better. They're arguing that graph-native is unnecessary if you can overlay the query layer.

Which brings us to the uncomfortable question. Daniel's asking how far away we are from graph-native mainstream applications. And the answer might be: we're not moving toward that at all. We're moving toward multi-paradigm orchestration where graph is one query pattern among several, and the storage layer stays relational because it's good enough and nobody wants to retrain.

I think that's half right. Let me give you the production pattern that's actually emerging. Kollegger from Neo4j described what their large enterprise customers are doing with AI agents. An agent receives a query, and it decomposes that query into three sub-queries. One goes to a SQL database for structured relational data. One goes to a graph database using Cypher for relationship traversal. One goes to a vector index for semantic similarity on unstructured data. The agent then aggregates the results using something like GraphRAG.

The agent is the orchestrator, not the database.

And this is the pattern that's winning in production right now. Cedars-Sinai is using this hybrid approach for Alzheimer's research. Precina Health for Type two diabetes care. It's not graph-native. It's graph as one specialist on a team of specialists, with the AI agent as the coordinator.

Which is elegant, but it also means you're running three databases instead of one. The operational complexity doesn't go away — it just moves to the orchestration layer.

That's where the retraining burden Daniel mentioned really bites. Even with GQL being more SQL-like, developers still need to think in nodes and edges. Neo4j's Kollegger acknowledged this directly — historically, users needed some understanding of graph structures to write queries effectively, and that meant relying on developers to write queries and interpret results. It made graph databases feel like a specialist domain. The promise now is that large language models can translate natural language into GQL, democratizing access. But that's unproven at scale.

Let me play this back. The industry consensus right now is: start with vectors for most needs, add graphs for reasoning-heavy queries. That's straight from the Memgraph blog, September twenty twenty-five. Not start with graphs. Add graphs when vectors aren't enough.

The numbers bear this out. The global graph database market was about one point one billion dollars in twenty twenty-four. The knowledge graph market, about one point zero six billion. These are growing fast — projected to hit seventeen or eighteen billion by the early twenty-thirties, thirty-six percent compound annual growth rate. But the overall database market is well over a hundred billion. Graph is still a single-digit percentage.

Small base, fast growth, but small base. So when Daniel asks how far away we are, the honest answer is: graph-native for mainstream CRM and ERP is not on the visible horizon. What's on the horizon is graph as a query layer over relational storage, and graph as a specialized component in multi-paradigm architectures.

I don't want to be too dismissive, because there's something happening that could change the calculus. The Year of the Graph newsletter in May twenty twenty-five declared that the era of Graph Foundation Models has begun. You've got AnyGraph, Amazon's GraphStorm framework — which has been deployed for over a dozen billion-scale industry applications — and Snapchat using their own GiGL framework for large-scale graph neural networks in production.

Graph foundation models. Explain what that actually means.

Instead of building a graph for each application, you train a foundation model on massive graph structures, and it learns generalizable patterns of relationships. The same way a language model learns general language patterns from text, a graph foundation model learns structural patterns from graphs. The promise is that you could apply these to CRM data without building a custom graph schema from scratch.

That sounds like it addresses the retraining problem. If the model understands graph structures, the developer doesn't have to.

But we're in the very early days. And I keep coming back to the Graph Data Council survey finding that the graph computing domain, quote, lacks a killer application. Graph RAG is the closest thing, but it's a pattern, not a product. Compare that to relational databases in the seventies and eighties — they had payroll, accounting, inventory management from day one. Those were the killer apps that drove adoption. Graph hasn't found its equivalent yet.

What would a graph-native CRM even look like, functionally? What could it do that Salesforce on PostgreSQL with embeddings can't?

That is exactly the right question. Let me think about this concretely. In a traditional CRM, if you want to know who introduced you to a prospect, you're probably looking at a custom field or a notes field with someone's name typed in. In a graph-native CRM, that introduction is a first-class edge. It's queryable. You can traverse the introduction graph to find the strongest connectors in your network. You can weight relationships by frequency of interaction, by deal size, by time decay. And all of that is native to the data model, not bolted on with application logic.

The graph advantage is real, but it's in the kinds of questions you can ask, not in the basic CRUD operations. And most CRM usage is still basic CRUD.

That's the crux of it. The graph advantage shows up when you're asking questions like show me every second-degree connection who's been involved in a deal over fifty thousand dollars in the last eighteen months and who shares a board membership with someone in our existing customer base. That's a nightmare in SQL — it's multiple recursive CTEs, it's ugly, it's slow. In Cypher or GQL, it's a concise traversal query.

How often is a sales team actually asking that question?

Rarely, today, because the tools don't support it. This is the chicken-and-egg problem. The tools don't support graph queries, so people don't ask graph questions, so there's no demand for graph tools. Break that cycle and you might discover latent demand.

Or you might discover that most sales teams just want to log calls and track pipeline stages.

I'm not sure about this part, but I suspect the real unlock for graph-native CRMs would be in the analytics layer, not the operational layer. You keep your operational CRM on whatever storage is cheapest and most reliable, and you mirror the data into a graph for the kinds of questions that actually benefit from traversal. That's essentially the PuppyGraph model, just with a different implementation.

Which circles back to hybrid. Everything circles back to hybrid. Daniel's asking whether graph-native could be cleaner, and the answer seems to be: yes, conceptually, but the industry is voting for hybrid with both feet, and there are structural reasons for that.

Let me add one more structural reason that doesn't get enough attention. The retraining burden isn't just about learning a new query language. It's about rethinking data modeling from the ground up. In a relational database, you think in tables, rows, foreign keys, normalization. In a graph database, you think in nodes, edges, properties, traversal patterns. Those are fundamentally different mental models. The Graph Data Council survey explicitly flagged lack of skilled labor as a major barrier. The Research and Markets report on the knowledge graph market called out lack of expertise and awareness, plus standardization and interoperability, as the major challenges.

You can't fix a mental model problem with better tooling. You can paper over it with natural language interfaces, but eventually someone has to design the schema.

Unless the graph foundation models get good enough that schema design becomes automated. That's the long bet. But we're years away from that being production-ready for arbitrary business domains.

Let me try to synthesize where we are, because I think Daniel's question deserves a direct answer. How far away is graph from being a native backend for mainstream applications with an easy wrapper? My read: we are at least five to seven years from the point where a mid-market company could reasonably choose a graph-native CRM over a relational one, and that's assuming GQL standardization accelerates, the tooling ecosystem matures, and the skills gap narrows. All three of those are uncertain.

I'd put it at five to ten, and I'd add a caveat: the easy wrapper might never come in the form Daniel is imagining. What's more likely is that the wrapper is an AI agent that speaks natural language and translates to GQL, Cypher, or SQL as needed. The ease doesn't come from simplifying the database — it comes from hiding the database behind an intelligent interface.

Which is already happening. The Kollegger description of the three-way query decomposition, that's not a future vision, that's what Neo4j's enterprise customers are doing now.

So the answer to would a graph-native CRM be cleaner is yes, for certain kinds of queries. The answer to will we see one soon is probably not, because hybrid approaches are delivering most of the value at a fraction of the migration cost. And the answer to how far away is the easy wrapper is: the wrapper is already here, it's just an AI agent, not a database feature.

There's one more angle I want to hit before we move to takeaways. Daniel mentioned that graph has been limited to large-scale uses — KYC, intelligence, drug discovery, things with millions or billions of edges. And I think there's an implicit assumption that graph only makes sense at that scale. But is that actually true? Could a small business with ten thousand contacts and fifty thousand interactions benefit from graph?

The scale argument is mostly about compute, not about value. A small graph can absolutely deliver insights that a relational database would struggle with. The issue is that the fixed cost of setting up and maintaining a graph database — the operational overhead, the learning curve — doesn't scale down well. For a small business, the insight might be real, but it's not worth hiring a graph specialist or learning an entirely new paradigm.

It's not that graph only works at scale. It's that the cost-benefit only pencils out at scale.

And that's another reason the AI wrapper matters. If the wrapper gets good enough that the small business owner never needs to know there's a graph database underneath, then the cost-benefit changes. But we're not there yet.

One thing I want to flag from the research — Neo4j has integrated native vector search into its core database. So the vector-versus-graph tension that Daniel is describing, where you have SQL with embeddings on one side and pure graph on the other, that's already blurring. Neo4j can do vector similarity search natively, capturing implicit relationships based on similar data characteristics rather than exact matches.

Kollegger's point was that this lets you perform similarity searches while still preserving the graph structure. So you're not choosing between vectors and graphs. You're getting both in one system. That's a significant architectural advantage over the SQL-plus-embeddings approach, where the vector index is essentially a separate system that happens to live in the same database process.

Again — if you can get both from Neo4j, why aren't CRMs migrating? And the answer is: because their existing PostgreSQL instance already works, and the vector extension was a five-minute install, and nobody got fired for choosing PostgreSQL.

The nobody got fired argument is more powerful than any technical comparison. And it's going to keep winning until graph databases have their Salesforce moment — a killer application that makes the advantage undeniable.

Which brings us to practical takeaways. Daniel, and anyone listening who's wrestling with this same question, what do you actually do?

First, don't rip out your relational database. The hybrid approach that you're already using — SQL with embedding support — is the industry consensus for good reason. It works, it's well-understood, and the operational risks are low.

Second, if you're curious about graph, start with a read-only overlay. PuppyGraph's zero-ETL approach, or Neo4j's connectors to relational sources — query your existing data as a graph without migrating anything. See if the graph queries actually surface insights that your current setup misses.

Third, pay attention to GQL. It's an ISO standard now, and while the ecosystem is immature, it's the direction the industry is moving. If you're going to invest in graph skills, invest in GQL, not a proprietary query language that might not survive standardization.

Fourth, watch the AI orchestration layer. The most interesting developments aren't in the databases themselves — they're in the agents that can query across SQL, graph, and vector stores simultaneously. That's where the easy wrapper is being built.

Fifth, if you're building something new and your data is inherently relationship-heavy — if the core value proposition is about connections, networks, introductions, influence — then yes, consider graph-native from day one. The migration cost is zero when you're starting from scratch, and you'll avoid the pain of retrofitting relationship logic onto a relational schema later.

The bottom line: graph-native mainstream applications are coming, but slowly, and the path runs through hybrid architectures and AI wrappers, not through a sudden replacement of relational databases. Daniel's instinct that graph would be cleaner is correct in principle. In practice, the industry is choosing pragmatism over purity, and that's probably the right call for now.

There's actually a fun historical parallel here that I think illuminates the whole debate. Back in the nineteen nineties, object-oriented databases were going to replace relational databases. The argument was exactly the same — the relational model doesn't match how developers think about data, objects are more natural, impedance mismatch is killing productivity. And what actually happened? Object-relational mapping layers won. Hibernate, Entity Framework, ActiveRecord. We didn't replace the relational database. We put a translation layer on top of it. Graph databases today are in the exact same position object databases were in thirty years ago. The question isn't whether graph concepts will win — it's whether they'll win at the storage layer or at the translation layer. And history suggests the translation layer usually wins.

That's a genuinely useful analogy. And it makes me think the PuppyGraph approach — graph queries over relational storage — might not be a transitional phase at all. It might be the end state. Just like ORMs weren't a stepping stone to object databases, they were the destination.

The abstraction layer becomes the product. And if that's true, then the "easy wrapper" Daniel is asking about won't be a graph database with good tooling. It'll be a query layer that speaks graph on top of whatever storage you already have, with an AI agent handling the translation.

Now: Hilbert's daily fun fact.

The average cumulus cloud weighs about one point one million pounds. Roughly the same as a hundred elephants, floating over your head.

I'm never looking at a sunny day the same way again. For anyone trying to navigate this landscape, the concrete thing to do this week is simple. Pick one business question you currently can't answer easily with your SQL setup — something involving multi-hop relationships, influence paths, or connection strength. Try answering it with a graph query, even if you're just using a read-only overlay on your existing data. If the answer is valuable and the query is clean, you've got your business case. If the answer is meh and the query is still a headache, you know graph isn't your bottleneck.

That's the thing about database debates — they sound abstract until you tie them to a specific question you actually need answered. Daniel's asking the right question. The answer just happens to be more complicated than yes or no.

This episode was produced by Hilbert Flumingtop. This has been My Weird Prompts. Find us at myweirdprompts dot com or wherever you listen to podcasts.

If you're wrestling with a database architecture question, send it in. We'll dig into the research and give you the honest answer, even when it's we're still figuring this out.

Until next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2458: Can Graph Databases Go Mainstream?

Downloads

You Might Also Like

#2458: Can Graph Databases Go Mainstream?