#2677: Memory Layers for AI Agents: SaaS vs Self-Hosted

Zep, mem0, Letta, Graphiti, Cognee — which memory layer should you commit to for your AI agent?

Featuring

Listen

0:00

Episode Details

Episode ID: MWP-2837
Published: May 6
Duration: 22:35
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: ai-agents memory-layers self-hosting

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

A raw vector database like Pinecone or pgvector was never the full answer for AI agent memory. It gives you semantic search but none of the curation: no fact extraction, no conflict resolution, no temporal decay, no recency-aware retrieval. Memory layers do the actual work of understanding that a user changed their name, moved cities, or that last Tuesday's meeting should matter more than one from six months ago. Once you accept you need a curation layer, the next fork is deployment — and that fork is sharper than most realize.

On the SaaS side, Zep Cloud is the most mature managed offering, with automatic fact extraction, entity resolution, and a temporal knowledge graph handling decay and recency weighting. But at real volume — processing thousands of conversations daily for a personal-context agent reading email, calendar, and Drive — per-message costs compound. The deeper concern is lock-in: when your agent's entire memory sits in Zep's database, a pivot, acquisition, or pricing change could mean your agent doesn't just have amnesia — it has a full lobotomy. mem0's managed offering uses adaptive learning to track which facts matter over time, but the data residency question is the same. Letta Cloud takes a more opinionated approach with stateful agents where memory is the agent's persistent state.

On the self-hosted side, Graphiti (Zep's open-source core) gives you the same temporal knowledge graph but requires running Neo4j — powerful but famously not fun to operate. Self-hosted mem0 is lighter weight with Postgres and pgvector, but lacks adaptive learning and sophisticated conflict resolution gated behind the paid tier. Cognee builds semantic graphs with more flexibility but less maturity. Letta self-hosted gives you end-to-end data control but puts you on the hook for everything. The silent staleness risk is real: if your LLM API key expires or extraction pipeline goes down, your agent operates on increasingly outdated memory without errors alerting you. The emerging pattern may be hybrid: managed curation logic with storage in your own VPC.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2677: Memory Layers for AI Agents: SaaS vs Self-Hosted

Daniel sent us this one — and it's a deployment question, not a survey question. He's past the "what is mem0 and Zep and Letta" stage. He's at the point where he's about to commit to something, and he wants to know what living with each option actually looks like on day thirty, day one eighty. SaaS versus self-hosted, honest treatment of both sides. What you get out of the box, what breaks, what the real cost is, the lock-in risk when your agent's entire memory is sitting in someone else's database. And he wants a recommendation framework — when does SaaS win, when does self-hosting genuinely pay off, and what's this emerging hybrid pattern where the curation logic is managed but the storage stays in your own VPC.

We should say upfront — DeepSeek V four Pro is writing our script today, so if anything lands particularly well, that's them. If not, Corn and I will take the blame on air.

Generous of you to offer. Alright, let's dig in. There are really three lanes here. You've got Zep Cloud, mem0's managed offering, and Letta Cloud on the SaaS side. On the self-hosted side you've got Graphiti, which is Zep's open-source core, self-hosted mem0, Cognee, and Letta's self-hosted option. Six products, two deployment modes, and a whole lot of ways to get this wrong.

The reason this question matters now is that a raw vector database — Pinecone, pgvector, whatever — was never the full answer. It gives you semantic search but none of the memory curation. No fact extraction, no conflict resolution, no temporal decay, no recency-aware retrieval. You dump embeddings in and hope the cosine similarity gods are kind to you. These memory layers do the actual work of understanding that the user changed their name, or moved cities, or that the meeting from last Tuesday should matter more than the one from six months ago.

So if you've accepted that the vector DB alone isn't enough and you need a curation layer, the next fork in the road is deployment. And that fork is sharper than most people realize. Let's start with the SaaS story, because that's where the marketing is loudest and where the trade-offs are most interesting.

Zep Cloud is probably the most mature of the managed offerings. They've been at this longer than most. What you get out of the box is impressive — automatic fact extraction from conversation transcripts, entity resolution, the temporal knowledge graph that handles decay and recency weighting, all of it accessible through a REST API. You integrate, you send messages, you query for relevant facts, and it just works. Day one, you're productive.

Day thirty is where the cost story starts to crystallize. Zep Cloud charges per message processed and per stored fact, and at real volume — let's say you're processing a few thousand conversations a day for a personal-context agent that's reading your email, your calendar, your Drive — those per-message costs compound. You're not paying for storage at that point, you're paying for the extraction pipeline every single time new text hits the system. It's not unreasonable, but it's also not trivial.

Here's the thing that keeps me up — well, not me personally, I nap fine — but what should keep the listener up. When your agent's entire memory is in Zep's database, what happens if Zep raises a Series C and pivots? Or gets acquired? Or changes their pricing model? Your agent doesn't have amnesia — it has a full lobotomy. Every fact, every relationship, every piece of temporal context your agent has learned about the user — gone, or held hostage by an export tool that may or may not exist when you need it.

That's the lock-in risk, and it's not theoretical. We've seen this play out with API-first startups a dozen times. The counterargument from Zep's side would be that they offer data export, and that's true — but an export of raw facts and graph edges is not the same as portability to another system. There's no standard format for temporal knowledge graphs. You're not exporting a CSV and importing it into mem0. You're starting over.

Let me pressure-test the SaaS story a different way, because I think data residency is actually the sharper concern for personal-context use cases. We're talking about agents that read your email, your Google Drive, your calendar. That's some of the most sensitive data a person has. When you send all of that through Zep Cloud or mem0's managed offering, you're trusting a third party not just with storage but with processing — the extraction pipeline is seeing everything. For a business use case with a SOC two certification and a data processing agreement, that's one thing. For an individual developer building a personal memory agent, that's a lot of trust to hand over.

It's worth naming specifically — mem0's managed offering has a similar value proposition but a slightly different architecture. Where Zep is built around the temporal knowledge graph, mem0 is more focused on the fact memory layer with what they call adaptive learning. It learns which facts matter over time based on how often they're referenced. The managed version handles all the extraction, deduplication, and conflict resolution. But the data residency question is the same — your user's memories are sitting in mem0's infrastructure.

Letta Cloud is the third option in the SaaS lane, and they're interesting because their architecture is different again. Letta is built around the idea of stateful agents — the memory isn't just a retrieval layer, it's the agent's persistent state. Their cloud offering manages the state database, the extraction, and the agent orchestration. It's more opinionated than Zep or mem0, which means you get more out of the box but you're also more locked into their model of how agents should work.

The latency story across all three managed offerings is generally good — they've invested in making the API fast because they know developers will benchmark it. At low to medium volume, you're looking at tens of milliseconds for most queries. At high volume, it depends on your plan and whether you're hitting rate limits. But none of them are slow in a way that would break a real-time agent experience, at least not at the volumes a solo developer or small team would be dealing with.

Alright, let's flip to the self-hosted side, because this is where things get real in a different way. You're not writing a check — you're writing Docker Compose files and hoping nothing pages you at two in the morning.

Graphiti is the open-source core of Zep, and it's the most direct comparison to Zep Cloud. You get the same temporal knowledge graph, the same fact extraction, the same entity resolution — but you're running it yourself. The extraction pipeline uses a language model under the hood, which means you're also bringing your own LLM API key or running a local model. That's the first hidden cost: the extraction isn't free in self-hosted mode either, you're just paying OpenAI or Anthropic directly instead of paying Zep.

The second hidden cost is operations. Graphiti uses Neo4j as its graph database, which is powerful but famously not fun to operate. Backups, replication, memory tuning — Neo4j is a whole skillset. If you've never administered a graph database before, day thirty of self-hosting Graphiti is going to involve at least one incident where you're SSHing into a server trying to figure out why memory usage spiked and queries are timing out.

The two a.page is real. I've talked to developers running Graphiti in production, and the consensus is that it works well once it's tuned, but getting it tuned is non-trivial. The documentation is decent, the community is helpful, but you're still the one responsible when something breaks. And something will break — that's just the nature of running infrastructure.

Self-hosted mem0 is a different beast. It's designed to be lighter weight — you can point it at a Postgres database with pgvector, which is much simpler to operate than Neo4j. The trade-off is that the self-hosted version doesn't include all the features of the managed offering. The adaptive learning, some of the more sophisticated conflict resolution, some of the recency-weighting logic — those are gated behind the paid tier. You're getting a capable memory layer, but it's intentionally not the full product.

This is the pattern we see across all three self-hosted options — the open-source versions are deliberately weaker than the paid ones. It's not a conspiracy, it's a business model. Zep gives you Graphiti but keeps some of the management tooling and advanced features for the cloud version. mem0 keeps the adaptive learning proprietary. Letta's self-hosted version works but lacks the managed orchestration layer. You need to go in with your eyes open about what you're not getting.

Cognee is the fourth option in the self-hosted lane, and it's worth spending some time on because it takes a different architectural approach. Instead of a temporal knowledge graph, Cognee builds a semantic graph from your data using whatever LLM you point it at. It's more flexible in some ways — you can shape the graph structure to your use case — but it's also less opinionated, which means more decisions for you to make and more ways to get it wrong.

Cognee is also younger than Graphiti or mem0. The community is smaller, the documentation is spottier, and the edge cases are less explored. If you're a solo developer who wants something that just works, Cognee is probably not your starting point. If you're willing to tinker and you have specific graph-structure needs that Graphiti doesn't meet, it's worth a look.

Letta self-hosted rounds out the picture. You're running the Letta server yourself, managing the state database, and handling your own LLM calls for extraction. The core architecture is the same as Letta Cloud — stateful agents with persistent memory — but you're on the hook for everything. The upside is you control the data end to end. For a personal-context memory agent, that's valuable.

Let's talk about what actually breaks, because that's what Daniel's really asking with the day-one-eighty framing. On the SaaS side, what breaks is usually not technical — it's contractual or financial. Pricing changes, service deprecations, the company getting acquired and the product getting sunset. You wake up one day to an email saying the API is being shut down in ninety days and you need to migrate. Your agent doesn't have a memory problem — it has an existential problem.

On the self-hosted side, what breaks is more mundane but just as disruptive. Disk fills up because your retention policy wasn't aggressive enough. Neo4j or Postgres needs a version upgrade and the migration fails. Your LLM provider has an outage and suddenly your extraction pipeline is dead, which means new conversations aren't being processed into facts, which means your agent's memory is silently going stale. You might not notice for days.

The silent staleness is the scary one. With a SaaS provider, if their extraction pipeline goes down, you get errors and you know about it. With self-hosted, if your LLM API key expires or your rate limit gets hit, the extraction just stops and your agent starts operating on increasingly outdated memory. By day one eighty, if you haven't been monitoring this carefully, your agent could be confidently wrong about things that changed months ago.

Alright, let's get to the framework. Under what conditions does SaaS win?

SaaS wins when you're building something where time-to-market matters more than long-term cost, when you don't have the operational expertise or desire to run graph databases, and when the data you're processing isn't so sensitive that third-party processing keeps you up at night. If you're a startup building a customer-facing agent and you need memory that works today, Zep Cloud or mem0 managed is the right call. You accept the lock-in risk as the price of speed.

For the solo developer building a personal-context memory agent — the person Daniel is implicitly asking for — I think the calculus shifts. The data is sensitive by definition. Email, calendar, Drive — that's the user's entire digital life. Handing that to a third-party processor is a big ask, even with good security promises. And the volume for a single user is low enough that self-hosting is manageable. You're not running a thousand conversations a day — you're running maybe a few dozen.

That's the key insight. At personal scale, the operational burden of self-hosting is dramatically lower than at production scale. You're not worried about horizontal scaling or high availability. If your memory layer goes down for an hour while you're asleep, nobody notices. The two a.page doesn't exist because there's no SLA to meet. That flips the entire cost-benefit analysis.

My default recommendation for a solo developer building personal-context memory is self-hosted mem0 pointed at a Postgres instance with pgvector. It's the lightest operational lift — Postgres is something most developers already know how to run and back up — and it gives you enough of the memory curation features to be useful. You lose the adaptive learning from the managed version, but at personal scale you can compensate for that with good prompt engineering in your extraction pipeline.

I'd add a caveat. If you're comfortable with Neo4j or willing to learn it, Graphiti gives you the temporal knowledge graph, which is a better model for memory that needs to understand how facts change over time. The trade-off is operational complexity. For someone who's run databases before, it's a weekend project to get it set up and a few hours a month to maintain. For someone who hasn't, it's a steeper climb.

The third option — and this is the hybrid pattern that's starting to emerge — is using a managed memory layer that points at your own storage. Imagine Zep's curation logic running in their cloud, but the underlying graph database is in your VPC. You get the managed extraction and conflict resolution, but the data stays with you. This isn't fully productized yet by any of the major players, but it's the direction things are heading.

Letta is closest to this model because their architecture already separates the agent state from the orchestration layer. In theory, you could run the Letta server in your own infrastructure and have the cloud orchestration layer manage it. In practice, it's still early days and the integration isn't seamless. But if you're planning for day three hundred and sixty-five rather than day thirty, this is the architecture to watch.

Cognee actually lends itself to a hybrid setup too, because it's designed to be storage-agnostic. You could run Cognee's extraction and graph-building in a managed environment while keeping the underlying vector store and graph database in your own infrastructure. Again, not turnkey, but the pieces are there.

Let's talk about cost concretely, because that's part of the day-thirty and day-one-eighty picture. Zep Cloud at moderate volume — a few thousand messages a month — is going to run you somewhere in the low hundreds of dollars. It scales roughly linearly with volume. mem0's managed pricing is similar. For a personal agent processing one person's email and calendar, you're probably looking at fifty to a hundred dollars a month. Not nothing, but not prohibitive.

Self-hosted Graphiti or mem0, the direct infrastructure cost is lower — maybe twenty to forty dollars a month for a modest VPS or cloud instance — but you're also paying for LLM API calls for the extraction pipeline. That's the hidden line item. Every time new text comes in, you're sending it to GPT-4 or Claude for fact extraction. At personal scale, that might add another twenty to thirty dollars a month. Total cost is comparable to SaaS, maybe slightly cheaper, but you're trading money for time.

Time is the real currency here. Setting up Graphiti for the first time, if you know what you're doing, is a few hours. Tuning it, monitoring it, handling updates and migrations — figure a few hours a month ongoing. If your hourly rate as a developer is high, self-hosting might actually be more expensive than SaaS when you factor in your own time. But if you enjoy the tinkering — and I suspect Daniel does — that time is part of the value, not a cost.

There's another dimension here that we haven't touched: what happens when you want to switch. If you start with Zep Cloud and decide to move to self-hosted Graphiti, the migration is theoretically possible because they share the same underlying data model. In practice, you're exporting from one and importing into the other, and there will be edge cases. Facts that were cleanly resolved in the cloud version might create conflicts in the self-hosted version. It's not a seamless migration.

Switching from mem0 managed to self-hosted mem0 is easier because it's the same product, just different infrastructure. You export your facts, import them into your own instance, and you're back online. The catch is that some facts might have been structured using features from the managed tier that don't exist in the self-hosted version. Those facts don't disappear, but they might not be queryable in the same way.

Letta is the hardest to migrate away from because it's the most opinionated. If you build your agent on Letta's stateful agent model and then decide to move to a different memory layer, you're not just migrating data — you're rearchitecting your agent. That's not a knock on Letta — their opinionated approach is a feature if you're committed to it — but it's worth understanding the lock-in before you commit.

Let's land the recommendation framework. For a team building a production agent where time-to-market matters and the data isn't hypersensitive, SaaS is the right call. Zep Cloud if you want the temporal knowledge graph, mem0 managed if you want the adaptive learning, Letta Cloud if you want the stateful agent architecture. Accept the lock-in risk, negotiate a data processing agreement, and move fast.

For a solo developer building a personal-context memory agent — and this is the answer Daniel is actually waiting for — self-hosted. The data sensitivity alone justifies it, and the operational burden at personal scale is low enough to be manageable. Start with self-hosted mem0 pointed at Postgres with pgvector. It's the simplest thing that works. If you outgrow it, you can move to Graphiti for the temporal knowledge graph, or to the hybrid model when it matures.

The hybrid model — managed curation, your own storage — is the medium-term future. It's not quite ready for prime time today, but if you're building something that needs to last, design your architecture so you can adopt it when it arrives. Keep your storage layer clean, keep your extraction pipeline modular, and don't bake in assumptions about where the curation logic lives.

One more thing worth saying: all of these products are moving fast. What's true today about feature gaps between managed and self-hosted might not be true in six months. The direction of travel is toward more parity — not because the companies are generous, but because open-source alternatives keep pressure on the managed offerings. If Graphiti gets good enough, Zep Cloud has to compete on something other than basic features.

The open-source community around these tools is active. Graphiti's GitHub repository has regular contributions, issues get responded to, the Discord is helpful. Same for Cognee. If you're self-hosting, you're not alone — there's a community of people doing the same thing and hitting the same edge cases. That's worth something.

Alright, let's put a pin in the deployment discussion. The takeaway is: SaaS for teams and speed, self-hosted for personal and privacy, hybrid on the horizon. And for Daniel specifically — self-hosted mem0, Postgres, pgvector. Keep it simple, keep your data, sleep through the night.

Now: Hilbert's daily fun fact.

Hilbert: In the eighteen sixties, "sandalmonger" was a recognized profession in parts of England — someone who made and sold sandals — and the term survives today as a surname, though the original sandal-making families have long since vanished from the census records of the Simpson Desert region, where the name inexplicably appears in a handful of nineteenth-century Australian shipping manifests.

...right.

Here's the forward-looking question I keep coming back to. In two years, when these memory layers are more mature, are we still going to be having the SaaS versus self-hosted debate? Or does the hybrid model win so decisively that the question stops making sense? I suspect we're heading toward a world where the curation logic is a commodity service you pay for, and the storage is something you control. But we're not there yet, and the decisions you make today determine how easy it is to get there.

This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop. You can find every episode at myweirdprompts.

If you're building something with one of these memory layers, we'd love to hear how it's going. Review us wherever you listen, or drop us a line on Telegram. I'm Corn.

I'm Herman Poppleberry. We'll catch you next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2677: Memory Layers for AI Agents: SaaS vs Self-Hosted

Downloads

You Might Also Like

#2677: Memory Layers for AI Agents: SaaS vs Self-Hosted